Author Topic: GCC ARM32 compiler too clever, or not clever enough? (Read 13921 times)

eutectique · « **Reply #150 on:** November 05, 2022, 11:41:06 pm »

Quote from: peter-h on November 05, 2022, 08:21:45 am

So, yeah, -O0 does not remove unreachable code!

Yes, it does. The record in the map file

 .text.HAL_I2C_DisableListen_IT
                0x0000000000000000       0x3a ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
 .text.HAL_I2C_Master_Abort_IT
                0x0000000000000000       0x84 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
 .text.HAL_I2C_EV_IRQHandler
                0x0000000000000000       0x10 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)

merely tells that the linker saw the functions, their sizes are 0x3a, 0x84, and 0x10, and it does not link them into the final elf file -- the address is 0x0000000000000000.

Contrary, if a function would ever make it into the elf, the record in the map file would be:

Code: [Select]

 .text.HAL_I2C_Init
                0x000000000800a4fc       0xbc ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
                0x000000000800a4fc                HAL_I2C_Init
 .text.HAL_I2CEx_ConfigAnalogFilter
                0x000000000800a5b8       0x5c ./dev/libsdk.a(stm32l4xx_hal_i2c_ex.o)
                0x000000000800a5b8                HAL_I2CEx_ConfigAnalogFilter

You can always check which symbols are in the elf and their sizes with nm utility :

Code: [Select]

> arm-none-eabi-nm --print-size --size-sort censored/censored.elf | grep HAL_I2C
0800a614 00000058 T HAL_I2CEx_ConfigDigitalFilter
0800a5b8 0000005c T HAL_I2CEx_ConfigAnalogFilter
08008afc 0000006c T HAL_I2C_MspInit
0800a4fc 000000bc T HAL_I2C_Init

or to display it in decimal:

Code: [Select]

> arm-none-eabi-nm --print-size --size-sort --radix=d censored/censored.elf | grep HAL_I2C
134260244 00000088 T HAL_I2CEx_ConfigDigitalFilter
134260152 00000092 T HAL_I2CEx_ConfigAnalogFilter
134253308 00000108 T HAL_I2C_MspInit
134259964 00000188 T HAL_I2C_Init

You can not have addresses in hex and sizes in decimal.

As was already noted, LTO gives even more space savings. But behold, the vector table is likely to vanish from the elf, because it is not referenced by anything. To avoid this, add __attribute__((used)) to it.

brucehoult · « **Reply #151 on:** November 06, 2022, 12:58:49 am »

Quote from: peter-h on November 05, 2022, 04:58:06 pm

My conclusion is that -O0 includes all sources loaded into the Cube editor structure.

NEVER use -O0 (or, equivalently, no -O option at all). It is just awful. It is not just lack of optimisation, is is active pessimisation on modern ISAs.

C code compiled with -O0 runs slower than JavaScript!

SiliconWizard · « **Reply #152 on:** November 06, 2022, 01:00:06 am »

Quote from: brucehoult on November 06, 2022, 12:58:49 am

C code compiled with -O0 runs slower than JavaScript!

Maybe not quite, but it's really bad indeed!

brucehoult · « **Reply #153 on:** November 06, 2022, 01:10:33 am »

Quote from: SiliconWizard on November 06, 2022, 01:00:06 am

Quote from: brucehoult on November 06, 2022, 12:58:49 am
C code compiled with -O0 runs slower than JavaScript!

Maybe not quite, but it's really bad indeed!

No, seriously, it does, except on very short-running programs [1]. v8 and Nitro are amazing with their multiple levels of JIT and runtime code profiling.

[1] and even then v8 is always going to beat combined gcc/llvm compile time plus run time.

peter-h · « **Reply #154 on:** November 06, 2022, 08:27:43 am »

If -O0 does not remove unreachable code, that means the level of bloat is just incredible. However, very relevantly, my product does still run then, perfectly, which just shows that this doesn't matter much in my case. Except the larger code would actually cause problems for unrelated reasons.

Yes I should have spotted that in

Code: [Select]

.text.HAL_I2C_Master_Abort_IT
                0x0000000000000000       0x84 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)

the zero load address means it is not there. I thought the size of 0x84 meant it was present.

Learn something every day.

For javascript, I use freelancer.com, with wildly varying results

SiliconWizard · « **Reply #155 on:** November 06, 2022, 07:44:41 pm »

Seriously, I too really, really advise against using -O0 for any production code. If, for any reason, your code doesn't "work" with any optimization level using industry-standard tools such as GCC with official versions, then there's definitely something wrong with it and letting it pass with bandaids (like, oh, it appears to work with -O0) doesn't sound good.

Now, the potential "bugs" that seem mitigated with -O0 may be in 3rd-party libraries that you seem to be very reliant on, and I don't blame you for not willing (/having time) to debug other people's code. But sometimes you just don't have a choice.

Now if it's just for debugging purposes, you'd do as many do, build for debugging with appropriate options, and optimized builds for releases.
But if something badly breaks once you use any kind of optimization, this is definitely not a good sign as to the robustness of the code.

peter-h · « **Reply #156 on:** November 06, 2022, 09:14:16 pm »

Quote

letting it pass with bandaids (like, oh, it appears to work with -O0) doesn't sound good.

As my project stands right now, the only -O0 bits are specific functions which are in a "boot block" and have no access to stdlib, and have loops which were getting replaced with memcpy() etc. Now, I know this can be blocked (and has been with -fno-tree-loop-distribute-patterns), but I left these so I don't have to re-test them, and in case that compiler option got dropped off one day by accident.

Quote

may be in 3rd-party libraries that you seem to be very reliant on, and I don't blame you for not willing (/having time) to debug other people's code. But sometimes you just don't have a choice.

I now use very few of the bloated ST HAL functions. I've been busy either removing these or replacing them with local versions, stripped down to only the bare minimum required. The SPI code has been eliminated and replaced mostly with a single generic DMA function.

Quote

and optimized builds for releases.

That is commonly done but the amount of testing (what is now a complex product, with 2 years of my code, plus ETH, LWIP, MBEDTLS, USB CDC & MSC, FATFS, http server, https client, etc) is more than I want to do, and -Og is so damn close to anything else, that it is hardly worth the time.

And since, as they say, 99% of CPU time is spent in 1% of the code, one can always put a -O3 attribute on a specific function. But, as with assembler, the biggest speedups come not from making code faster but from doing it differently altogether. As I posted before, I once speeded up an IAR Z180 float sscanf about 1000x by specialising it for the actual input format which was always xx.yyyy (it was an HPGL to Postscript converter). I also wrote it in assembler, but that was secondary. Biggest speedups come from cunning use of hardware e.g. I have got a tracking waveform generator which runs almost no software (just DMA, timers, etc).

Some HAL code does not work with -O3, but I don't think I have any of that now. A lot of it was in the "min CS=1 time" department where if you called 2 functions in rapid succession, the time between them was too short. There is also an awful lot of stuff on github which was somebody's "work in progress on some 16MHz AVR, before he got bored" which runs at 168MHz only by luck. Especially if it involves driving chips like SPI FLASH chips. Basically there is a lot of software out there which has to be used with great caution. I spent much of yesterday digging into some 3 year old code driving a STLED316 display driver, which falls over unless you have a ~10us gap between setting the digit cursor and sending the digit data. So I made it 20us and used a dedicated timing function (written in asm) to do it. It previously worked by accident.

newbrain · « **Reply #157 on:** November 10, 2022, 08:51:53 pm »

Quote from: peter-h on November 06, 2022, 09:14:16 pm

As my project stands right now, the only -O0 bits are specific functions which are in a "boot block" and have no access to stdlib, and have loops which were getting replaced with memcpy()

Instead of renouncing optimization, did you evaluate using -no-builtins instead of -O0 for the specific function?
Or even, extracting the loops as as static inline function, with appropriate attributes.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: GCC ARM32 compiler too clever, or not clever enough? (Read 13921 times)

eutectique

Re: GCC ARM32 compiler too clever, or not clever enough?

brucehoult

Re: GCC ARM32 compiler too clever, or not clever enough?

SiliconWizard

Re: GCC ARM32 compiler too clever, or not clever enough?

brucehoult

Re: GCC ARM32 compiler too clever, or not clever enough?

peter-h

Re: GCC ARM32 compiler too clever, or not clever enough?

SiliconWizard

Re: GCC ARM32 compiler too clever, or not clever enough?

peter-h

Re: GCC ARM32 compiler too clever, or not clever enough?

newbrain

Re: GCC ARM32 compiler too clever, or not clever enough?

Share me