I think it's pretty amazing that despite being fairly minimalist with only 37 instructions a compiler will generate from C code (so leaving out fence, system call, debugger call, the CSR instructions), RV32I has everything necessary to efficiently support a modern software stack.
You could even make it a bit more minimalist without any great harm. I'd suggest, for example, leaving out all the "immediate" instructions except addi. Boom! You're now down to 29 instructions. And you've freed up 3% of the opcode space at a stroke. The cost? One instruction to load the desired immediate value into a register and then use the register-to-register version of the instruction instead.
Here are some instruction frequency stats I gathered from the RISC-V Debian distro with the standard packages an an assortment of extras. Format: percentage of total instructions, mnemonic, raw instruction count. I've listed the top 16 instructions in full, but only the full register immediate ones after that.
16.224593 addi 2528047
15.237536 jal 2374248
11.123998 auipc 1733294
9.981167 ld 1555223
6.658275 beq 1037464
4.305509 bne 670866
3.687067 sd 574503
3.418121 lbu 532597
3.376591 jalr 526126
2.435197 lw 379442
2.357368 lui 367315
1.800274 sb 280511
1.768576 addiw 275572
1.472592 sw 229453
1.430902 slli 222957
1.314809 andi 204868
:
0.433172 srli 67495
0.296973 ori 46273
0.244295 xori 38065
0.192047 sltiu 29924
0.131027 srai 20416
0.011071 slti 1725
addi is *the* most popular instruction. This happens on other code bases I've looked at as well. On Fedora addi comes in slightly behind jal. Part of this is that addi does triple duty as both the "move register" instruction and the "load immediate" instruction (both of which could be done by other instructions such as ori instead) but incrementing and decrementing loop variables and the stack pointer is anyway so common that addi would always be in the top instructions. This is 64 bit code, so addiw also makes a showing. If you want to think about RV32I then probably just lump addi and addiw together and call it 18%.
What about the others? slli+andi+srli+ori+xori+sltiu+srai+slti together come to 4.05% of all instructions. That's more than the 3.125% of the opcode space they take up (along with addi), but not a lot more. If you left them all out then RISC-V programs would get at most 4% bigger (less, because the same constant could often be loaded once and left in a register, of which there are usually plenty, to be used several times), and probably no more than 1% slower (because the loading of the constant could often be done outside of a loop).
Do I seriously suggest ripping those immediate instructions out of the standard? No, of course not. The standard is ratified :-) And they are carrying their weight, collectively, even if ori, xori, sltiu, srai, slti individually are not. It would also make the hardware *more* complex to disable them, given that the ALU supports those operations, and the data path for immediates from the instruction decoder to the ALU has to exist anyway.
It is however a simple mathematical fact that an immediate instruction takes up 128x more encoding space than the corresponding instruction with two register sources. We can add hundreds and hundreds of R-type instructions in future without problems, but it's going to need a very strong justification to add more immediate instructions -- at least within the 32 bit opcode space. Future 48 bit, 64 bit or longer instructions are a different matter.
I make an exception for the shift instructions. slli, srli, srai don't use the entire 12 bit immediate field, but only enough bits to encode a number up to the register size -- 5 bits for RV32, 6 for RV64. There is room to add more than 100 "shift-like" instructions in the unused all-zero bits of the slli and srli encodings. (srai already uses one of these). The proposal for the BitManip extension adds a number of "shift-like" instructions with immediate versions e.g. sloi, sroi, rori, grevi, gorci.
All this does I think demonstrate that while RV32I is fairly minimal, it could be made significantly more minimal without huge harm to code size or speed.