I have no idea about how to implement [autoincrement/etc] in RISC terms
Presumably, since an alu operation happens in a single clock cycle, and a memory access takes more than one clock, you just do the the increment/etc while you're waiting for the memory. That's where all those weird modes of RISC operands come from, right: "We have an ALU, a barrel shifter, and ...; they should all be able to do something each clock cycle!"
The less standard thing that I wish were more common is probably bitfield instructions (combining barrel shift and AND.)
I wonder how the PDP11 or PDP10 instruction sets (both of which are nominally regiser/memory architectures) would work i you RISCified them and made it so only the load/store operations could access memory?
I want all the instruction from a modern ARM architecture
The 32-bit ARM instructions, I hope? The Cortex (thumbX only) encodings have lost any elegance, and the CM0 is almost painful in is weird limitations...