I tried writing a disassembler for CM4, and those thumb2 instruction encodings are just AWFUL, in ways that I thought RISC intentionally avoided. Plain thumb (CM0) isn't too bad, but it has a lot of non-orthogonality and special casing that I again thought would have been foreign to "principles."
Right.
T16 has a lot more complex encoding than A32, with 19 different instruction formats.
RISC-V C extension also has more complex encoding than the base ISA, with 8 instruction formats vs 4. At least it maintains the property of the base ISA of the bottom three bits of rs1 and rs2 always being in the same place (if they exist) and the MSB (sign bit) of immediate/offset always being in the same place. Which of the two possible register fields is rd (if it exists) does vary though.
T32 encoding seems just random and ugly. It has the excuse of having to fit in around the two actually independent instructions that make up each of the T16 BL/BLX instructions. In T16 you can separate the two instructions and they still work (though assemblers and compilers never do), but in T32 they are actual 32 bit instructions.
A64 encoding seems equally ugly, for no reason apparent to me.
I guess the increase in code density is considered worthwhile (at a time when most of a microcontroller die is occupied by code memory), but it's pretty ugly.
T16 was also constrained by having to be something close to a complete and efficient ISA in itself, at least for code that a C compiler would generate. The original CPUs could always switch back to A32 mode if you needed a weird thing (such as the hi half of a multiply, or some system function) but you couldn't just randomly and efficiently throw a 32 bit opcode in the middle of 16 bit code. The CM0 has a handful of T32 instructions for those purposes, and you can intermix them.
RVC was designed with the knowledge that it didn't have to be complete, because using a full size instruction instead is always possible at any point.
Similarly, the NVIC is neat, but ... removes choices from the programmer. I prefer the sort of "has vectors, but how much context to save is all up to you" of some of the simpler architectures.
NVIC is easy, but one size fits all. Short simple functions that need only one or two working registers can get lower latency with a simpler mechanism.