To give a concrete example:
I compiled an existing CH32V003 codebase of mine with both 13.3.0-1 and 13.3.0-2 and the size of the output binary increased from 13,136 bytes to 13,264 bytes! And note that this is a codebase that only makes minor use of standard C library functions. According to the map file, only the following library functions are being linked - save/restore stubs, some multiplication and division routines, and strcat, memset, and memcpy:
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/rv32e/ilp32e\libgcc.a(save-restore.o)
obj\Release\flash.o (__riscv_save_2)
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/rv32e/ilp32e\libgcc.a(muldi3.o)
obj\Release\main.o (__mulsi3)
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/rv32e/ilp32e\libgcc.a(div.o)
obj\Release\uart.o (__divsi3)
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/../../../../riscv-none-elf/lib/rv32e/ilp32e\libc_nano.a(libc_a-strcat.o)
obj\Release\main.o (strcat)
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/../../../../riscv-none-elf/lib/rv32e/ilp32e\libc_nano.a(libc_a-memset.o)
obj\Release\main.o (memset)
C:/[...]/xpack-riscv-none-elf-gcc-13.3.0-2/bin/../lib/gcc/riscv-none-elf/13.3.0/../../../../riscv-none-elf/lib/rv32e/ilp32e\libc_nano.a(libc_a-memcpy-asm.o)
obj\Release\i2c.o (memcpy)
To pick out a specific example from the above, __divsi3 was formerly 126 bytes, but afterwards 180.
I could imagine that if one has a codebase that makes much greater usage of stdlib functions, the size difference could be much greater. Maybe in a particularly egregious case you all of a sudden go from just squeezing into the available flash, to no longer fitting at all! :o And you have no clue why until you go digging in the map file - assuming you are outputting one at all...
He must've had a reason.
The obvious reason would be that there are 9999999 possible combinations of extensions, and each combination takes time to build and space in the download / on the disk, so you don't want to do combinations that no one uses.
But rv32ec for CH32V003 and rv32imac_zicsr_zifencei_zba_zbb_zbkb_zbc_zbs_zcb_zcmp for RP2350 are two that there clearly is going to be demand for!
I've been thinking about making a post for this ... I don't know if people are familiar with Zcmp. I was involved with the committee designing it. I'm not entirely convinced that its a good use of silicon compared to alternatives -- and it's DEFINITELY only for simple in-order CPUs (like original 1985 Arm...) -- but as Luke implemented it in Hazard3 it would be silly not to use it as it does save quite a lot of code size.
Example:
void move(char from, char to);
void hanoi(char from, char to, char spare, int n) {
if (n != 0) {
hanoi(from, spare, to, n-1);
move(from, to);
hanoi(spare, to, from, n-1);
}
}
With Zcmp, 34 bytes of code:
hanoi:
cm.push {ra, s0-s3}, -32
cm.mvsa01 s0,s3
mv s2,a2
mv s1,a3
.L3:
beq s1,zero,.L1
addi s1,s1,-1
cm.mva01s s0,s2
mv a3,s1
mv a2,s3
call hanoi
cm.mva01s s0,s3
call move
mv a5,s0
mv s0,s2
mv s2,a5
j .L3
.L1:
cm.popret {ra, s0-s3}, 32
Without Zcmp, 62 bytes of code:
hanoi:
addi sp,sp,-32
sw s0,24(sp)
sw s1,20(sp)
sw s2,16(sp)
sw s3,12(sp)
sw ra,28(sp)
mv s0,a0
mv s3,a1
mv s2,a2
mv s1,a3
.L3:
beq s1,zero,.L1
addi s1,s1,-1
mv a1,s2
mv a0,s0
mv a3,s1
mv a2,s3
call hanoi
mv a0,s0
mv a1,s3
call move
mv a5,s0
mv s0,s2
mv s2,a5
j .L3
.L1:
lw ra,28(sp)
lw s0,24(sp)
lw s1,20(sp)
lw s2,16(sp)
lw s3,12(sp)
addi sp,sp,32
jr ra