Author Topic: STM 32F4 FPU registers and main() gotcha  (Read 6227 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
STM 32F4 FPU registers and main() gotcha
« on: July 22, 2024, 01:50:47 pm »
I wonder why this
http://www.efton.sk/STM32/gotcha/g203.html
does not cause loads of trouble all over the place.

AIUI, it relates to C compilers treating the function main() differently when it comes to FPU stack operations. This is pretty weird, to be generating different code for a function, based on its name!

Maybe because I am using GCC (v11) and this version of GCC just happens to work i.e. does not emit the extra stack pushes/pops.

To work around this, the FPU enable code would need to go into the startup.s code i.e. before main() is entered. I am doing that but purely by accident; my startupxxx.s code called b_main() and that starts the FPU with

Code: [Select]
// ========== This was in SystemInit() ============

#if (__FPU_PRESENT == 1) && (__FPU_USED == 1)
SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));  /* set CP10 and CP11 Full Access */
#endif

That code is commonly used Cube MX ("HAL") stuff which you find all over the internet...

FreeRTOS seems to do it again when it starts up (inside main() this time):

Code: [Select]
/* Ensure the VFP is enabled - it should be anyway. */
vPortEnableVFP();

/* Lazy save always. */
*( portFPCCR ) |= portASPEN_AND_LSPEN_BITS;

and vPortEnableVFP() contains

/* This is a naked function. */
static void vPortEnableVFP( void )
{
__asm volatile
(
" ldr.w r0, =0xE000ED88 \n" /* The FPU enable bits are in the CPACR. */
" ldr r1, [r0] \n"
" \n"
" orr r1, r1, #( 0xf << 20 ) \n" /* Enable CP10 and CP11 coprocessors, then save back. */
" str r1, [r0] \n"
" bx r14 "
);
}
>

Does this make sense to anyone? It seems to be working by accident, but it is a really weird thing as it is C compiler dependent, and to be sure you want to enable to FPU in the startup.s code.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 525
  • Country: sk
Re: STM 32F4 FPU registers and main() gotcha
« Reply #1 on: July 22, 2024, 02:06:10 pm »
Quote
I wonder why this
http://www.efton.sk/STM32/gotcha/g203.html
does not cause loads of trouble all over the place.
Because main() usually does not contain FP operations, so usually the compiler does not need to stack FP registers.

Usually, main() consists only from a bunch of function calls. And, usually, those functions - especially if they handle FP - are located in separate files, thus are not subject to inlining.

Even with moderate FP usage within a function there's probably no stacking. I don't remember the details of the API, but are many FP registers, so probably some of them the callee don't need to preserve.

The problem happened to me because I don't write programs in the usual way, so quite a significant portion of my programs tend to be either explicitly, or inlined, in main() (I love spaghetti, and have and use a spaghetti-making machine).

Quote
my startupxxx.s code called b_main() and that starts the FPU
b_main() is a C-function, and as such, it is vulnerable to the same problem, potential FP registers stacking - and it does not happen because of the same reason, you most probably have no FP operation in that function.

Quote
Code: [Select]
/* This is a naked function. */
static void vPortEnableVFP( void )

If it's naked indeed (i.e. there is somewhere a prototype with __attribute__((naked))), then there's no C prologue thus no registers stacking and no vulnerability of the kind described. However, the functions leading to calling that vPortEnableVFP() *are* vulnerable - but, again, FreeRTOS functions most probably have no FP operations in them.

JW
« Last Edit: July 22, 2024, 02:13:20 pm by wek »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4392
  • Country: nz
Re: STM 32F4 FPU registers and main() gotcha
« Reply #2 on: July 22, 2024, 02:50:52 pm »
I wonder why this
http://www.efton.sk/STM32/gotcha/g203.html
does not cause loads of trouble all over the place.

Nothing STM or even Arm-specific in that.

If you're going to use an FPU (or vector unit, on ISAs / cores that have them) then you need to enable them before running a function that uses them, where "using" could involve arithmetic or, yes, storing or loading FPU registers.

You can perfectly well do that in main(), just as long as main() is running in privileged mode and doesn't itself use the FPU (etc) before initialising it -- including using it by saving registers in the prologue.

This will apply to anything that has an initially-disabled functional unit: It's certainly true on RISC-V (both FPU and Vector units, if present and used, need to be changed from "Off" to "Initial" or "Clean" in the mstatus.FS and mstasus.VS fields) and I'd imagine it is similar on x86, MIPS, PowerPC, ... too.
 
The following users thanked this post: SiliconWizard

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #3 on: July 22, 2024, 03:13:36 pm »
Thank you both.

I am certainly not using floats before enabling the FPU (which is done in b_main() which then does a long jump to main() which never returns) and I would hope that if I was, it would comprehensively not work :)

It is probably by accident that main() does not use floats currently. I do have some printf() debug calls in there (printf() being mapped to come out on the SWV ITM debug port) which output longs but not floats. If they were floats, would that matter? I am confused.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2326
  • Country: br
    • CADT Homepage
Re: STM 32F4 FPU registers and main() gotcha
« Reply #4 on: July 22, 2024, 03:14:43 pm »
Today something similar happened when i worked on a small Win32 test app (network client).
There were no FPU operations in main(), but some in a thread started with CreateThread(). The app failed with "FPU not initialized" error. I solved the problem using _beginthreadex() instead and it worked. I learned that _beginthreadex() includes necessary CRT initializations.

Regards, Dieter
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #5 on: July 22, 2024, 03:41:14 pm »
That however is not the same thing. It is obvious that float ops with an uninitialised FPU are not going to work.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2326
  • Country: br
    • CADT Homepage
Re: STM 32F4 FPU registers and main() gotcha
« Reply #6 on: July 22, 2024, 04:03:13 pm »
I reported an actual incident and how the FPU remained uninitialized. Not on a STM32, but on Win32. Probably one can do something similar on a STM32.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 525
  • Country: sk
Re: STM 32F4 FPU registers and main() gotcha
« Reply #7 on: July 22, 2024, 04:19:24 pm »
Quote
I am certainly not using floats before enabling the FPU

The gotcha is in the fact, that even if you don't use floats in a function before enabling FPU, the compiler can do so.

If there are FP operations in a function - anywhere in that function - and those operations are so extensive that the compiler can't perform them using only the "callee-modifiable (*)" FP-registers, it then stacks the "callee-saves" FP-registers, and does so in the function's prologue, ie. before any C line is executed. Normally, main() is no exception in this regard (there is/are command-line flag/s which can make it an exception, though; but that might be topic for a different discussion).

Now if you jump to main() after FP being enable, no part of main() executes before enabling FP thus your main() is safe.  If you enable FP in a different C function, that function is not safe; but again, you are not likely to do any FP operations (enabling FP itself does not count, as it does not use FP registers and FP instructions) in that function.

And no worry: would you be caught by this gotcha, it's an immediate 100% fault (I'm not sure which one, but pending individual treatment they normally all escalate to HardFault anyway).

(*)
Quote from: ARM ABI ("Procedure Call Standard for the ArmĀ® Architecture, chapter 6 The Base Procedure Call Standard, subchapter 6.1.2.1 VFP register usage conventions
Registers s16-s31 (d8-d15, q4-q7) must be preserved across subroutine calls; registers s0-s15 (d0-d7, q0-q3) do not
need to be preserved
JW
« Last Edit: July 22, 2024, 04:26:10 pm by wek »
 
The following users thanked this post: harerod

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #8 on: July 22, 2024, 04:32:01 pm »
This FPU stuff is above my pay grade :) but is the problem that the stacking of the FPU registers fails if the FPU is not enabled? Then I can understand it. Those registers are unlikely to be accessible if the FPU is not enabled (same with SPI etc etc).

So you will be stacking garbage, and then when this is popped, the FPU is loaded with garbage. Or will the CPU get a permanent "wait state" from the non-enabled FPU?
« Last Edit: July 22, 2024, 04:35:21 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: de
  • ee - digital & analog
    • My services:
Re: STM 32F4 FPU registers and main() gotcha
« Reply #9 on: July 22, 2024, 04:54:18 pm »
... Those registers are unlikely to be accessible if the FPU is not enabled (same with SPI etc etc). ...

Which is implied in the footnotes of the article you linked in your initial post: http://www.efton.sk/STM32/gotcha/g203.html
The FPU seems to be no different from any other peripheral on the STM32 - enable before first access. This may require a combination of power and clock.
I haven't used the STM32F4 FPU in such a long time, although I designed heaps of devices based on that MCU. During the first tests I wrote setup routines based on the datasheet.
This may have been before CooCox and Atollic became available. Where have the last ten years gone?
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2326
  • Country: br
    • CADT Homepage
Re: STM 32F4 FPU registers and main() gotcha
« Reply #10 on: July 22, 2024, 05:04:06 pm »
Other people experienced hard faults with STM32 FPU while using FreeRTOS. Apparently initialization of FPU isn't 100 % automatic. In my Win32 case i have to include some FPU usage in main() in order to make it work in the thread.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 525
  • Country: sk
Re: STM 32F4 FPU registers and main() gotcha
« Reply #11 on: July 22, 2024, 06:03:29 pm »
The FPU is part of the processor core, so it's not like other peripherals. This is ARM's rules, not ST's.

So, if you don't enable it, and attempt to access its registers, the processor throws UsageFault (ARMĀ® v7-M Architecture Reference Manual B1.6.3 Pseudocode details of FP operation). If you don't have UsageFault enabled - which is the default - then it escalates to HardFault.

JW
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 525
  • Country: sk
Re: STM 32F4 FPU registers and main() gotcha
« Reply #12 on: July 22, 2024, 06:09:57 pm »
Quote
Other people experienced hard faults with STM32 FPU while using FreeRTOS.

That is not necessarily consequence of *late* enabling the FPU (i.e. accessing FP registers or executing FP instructions before enabling FPU), as discussed in this thread.

For example, if FPU is enabled, upon interrupt/exception, the processor stacks (or reserves stack for, if lazy stacking is enabled, which is the default) half of the FPU registers, plus one status word (plus alignment if set so). That's extra 17-20 words, or up to extra 80 bytes, and that may be the difference between stack overflow or not.

JW
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2326
  • Country: br
    • CADT Homepage
Re: STM 32F4 FPU registers and main() gotcha
« Reply #13 on: July 22, 2024, 06:59:32 pm »
No, the person had enough stack space and fixed the problem by "manually" setting the FPU control register.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 525
  • Country: sk
Re: STM 32F4 FPU registers and main() gotcha
« Reply #14 on: July 22, 2024, 07:26:12 pm »
No, the person had enough stack space and fixed the problem by "manually" setting the FPU control register.
Interesting.

Can you please give some links?

Thanks,

JW
 

Offline dietert1

  • Super Contributor
  • ***
  • Posts: 2326
  • Country: br
    • CADT Homepage
Re: STM 32F4 FPU registers and main() gotcha
« Reply #15 on: July 22, 2024, 08:04:59 pm »
https://forums.freertos.org/t/cortex-m4-hard-fault-when-using-floating-point-unit/10180
Now that i read it once more, it isn't that clear whether increasing stack helped or using the CPAR register or both.
« Last Edit: July 22, 2024, 08:10:26 pm by dietert1 »
 
The following users thanked this post: wek

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #16 on: July 22, 2024, 08:50:56 pm »
Now I see why FreeRTOS (or at least my port of it, which was done by the guy who started off my Cube IDE project) enables the FPU when it starts up...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15156
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #17 on: July 22, 2024, 09:53:40 pm »
I wonder why this
http://www.efton.sk/STM32/gotcha/g203.html
does not cause loads of trouble all over the place.

Nothing STM or even Arm-specific in that.

If you're going to use an FPU (or vector unit, on ISAs / cores that have them) then you need to enable them before running a function that uses them, where "using" could involve arithmetic or, yes, storing or loading FPU registers.

You can perfectly well do that in main(), just as long as main() is running in privileged mode and doesn't itself use the FPU (etc) before initialising it -- including using it by saving registers in the prologue.

This will apply to anything that has an initially-disabled functional unit: It's certainly true on RISC-V (both FPU and Vector units, if present and used, need to be changed from "Off" to "Initial" or "Clean" in the mstatus.FS and mstasus.VS fields) and I'd imagine it is similar on x86, MIPS, PowerPC, ... too.

I still recommend doing that kind of initialization in the startup code rather than in the main() function, unless your firmware does enable/disable the FPU (or other similar functionalities) on the fly during normal execution.
One reason is separation of concerns. Another is what (from what I get) the OP describes: if the FPU (or vector, etc) is not enabled before entering main, then indeed any FP operation done inside main could cause a problem because the compiler may reorder some FP instructions before you actually enable the FPU.

Similarly for the vector extension on RISC-V, if you enable the extension in the compiler flags, the compiler may use vector instructions even just for copying data (tested!), so that would raise an exception if the vector unit hasn't been enabled prior.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #18 on: July 22, 2024, 09:58:55 pm »
Quote
One reason is separation of concerns. Another is what (from what I get) the OP describes: if the FPU (or vector, etc) is not enabled before entering main, then indeed any FP operation done inside main could cause a problem because the compiler may reorder some FP instructions before you actually enable the FPU.

If I get you right, when you say main() you actually mean any C function which enables the FPU for the first time and which might perform a floating operation later on.

What I actually have is

startupxx.s -> b_main.c (enables the FPU but definitely does not use it) -> main.c (does not currently use floats, and starts FreeRTOS which enables the FPU again)

So this should be OK but I need to comment in b_main.c to never do any floats there. Funny thing is one could do doubles ;)

Or is there a specific compiler treatment of the function name "main()" ? Example: GCC requires main() to be a type int - it cannot be a void.
« Last Edit: July 22, 2024, 10:07:51 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15156
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #19 on: July 22, 2024, 10:12:53 pm »
Quote
One reason is separation of concerns. Another is what (from what I get) the OP describes: if the FPU (or vector, etc) is not enabled before entering main, then indeed any FP operation done inside main could cause a problem because the compiler may reorder some FP instructions before you actually enable the FPU.

If I get you right, when you say main() you actually mean any C function which enables the FPU for the first time and which might perform a floating operation later on.

Yes.

Is there a reason you don't enable the FPU in the startup assembly code? That would certainly make your life easier.

Now if you have b_main.c (which as I get it contains a C function that enables the FPU among other things) and the main() function in another source file (another compilation unit, so don't include one source file into the other), then you don't run the risk to have functions inlined into one another and thus risk the reordering of some instructions. So, you should be fine.

In a somewhat related matter, in a preemptive multitasker that I wrote (that I won't call "RTOS" quite yet) - at this point for RISC-V targets - in order to optimize context saving depending on the task, each task has a set of options, among which whether the FPU and/or vector unit is used in said task. If not, the corresponding registers are not saved/restored when switching from/to this task. Tasks are independent functions that are called by address (like ISRs) by the scheduler, so there is no risk of unintended code being run within one task as long as we don't explicitely do it.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4392
  • Country: nz
Re: STM 32F4 FPU registers and main() gotcha
« Reply #20 on: July 23, 2024, 12:33:09 am »
I still recommend doing that kind of initialization in the startup code rather than in the main() function, unless your

Sure, of course that is preferable, if it is under your control.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4277
  • Country: us
Re: STM 32F4 FPU registers and main() gotcha
« Reply #21 on: July 23, 2024, 05:34:32 am »
Quote
that kind of initialization in the startup code
I've seen that a lot of the project-creation vendor packages (STM Cube, Atmel Start, etc) are set up to call assorted startup code before main.  Clocks, memory wait states, floating point, RAM vector tables, privilege stuff, etc.
I've always found this a bit annoying, since it results in the code I most often want to look at (to figure out how to do myself) is "hidden" and can be hard to find.  (OK, trace upwards from reset_handler to systemInit to boardInit to blahEvalInit, etc, all in some "library" or "system" directory not considered part of your project, sort of.)

I guess this is an example of why that is a good idea...

I'll also point out that gcc has an __attribute__((main)) to label the top-level function that is never expected to return, and is therefore OK to leave out the initial context saving...
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3992
  • Country: gb
  • Doing electronics since the 1960s...
Re: STM 32F4 FPU registers and main() gotcha
« Reply #22 on: July 23, 2024, 06:16:19 am »
Quote
I've seen that a lot of the project-creation vendor packages (STM Cube, Atmel Start, etc) are set up to call assorted startup code before main.  Clocks, memory wait states, floating point, RAM vector tables, privilege stuff, etc.

Yes; exactly. The startupxxx.s file called a C function called SystemInit(). That, among a load of other stuff, contained the above C code for FPU startup. It resided somewhere in ST's "HAL" function library. I had no idea why they did it that way (well, there was a different version for every CPU). It may have been random "luck" or they knew about this issue. However I found that a lot of stuff which was in SystemInit() was later duplicated in main() so I rationalised it and put the content of SystemInit() in b_main.c which is my "32k boot block" (discussed in other threads on overlays etc; my main.c is linked to be at base+32k). The end of startupxxx.s jumps to b_main().

I am also not that good with arm32 asm (having done Z80 asm for 30+ years :) ) so would need to learn how to put the FPU startup in startupxxx.s. But FreeRTOS gives me the code, I think??

Code: [Select]
static void vPortEnableVFP( void )
{
__asm volatile
(
" ldr.w r0, =0xE000ED88 \n" /* The FPU enable bits are in the CPACR. */
" ldr r1, [r0] \n"
" \n"
" orr r1, r1, #( 0xf << 20 ) \n" /* Enable CP10 and CP11 coprocessors, then save back. */
" str r1, [r0] \n"
" bx r14 "
);
}

And yes I know the .s file can also call a C function (the way it used to call SystemInit()) but that is more risk because you now have a C function which could one day contain float ops, so you have to document that... which is exactly what I have done for b_main.c.

Would this be the right code to put in the startup file?

Code: [Select]
ldr.w r0, =0xE000ED88       /* The FPU enable bits are in the CPACR. */
ldr r1, [r0]
orr r1, r1, #( 0xf << 20 )   /* Enable CP10 and CP11 coprocessors, then save back. */
str r1, [r0]

The C version was

Code: [Select]
#if (__FPU_PRESENT == 1) && (__FPU_USED == 1)
SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));  /* set CP10 and CP11 Full Access */

EDIT: above asm seems to work fine. This is the end of my startup.s file

Code: [Select]

/* Initialise the 8k stack at the top of the 128k RAM. This is 32F417 only */
/* For the 32F437, extra 64k, the fill is done just before jumping to user code */

ldr r2, = 0x2001e000  /*   = _estack - _Stack_Size  */
b LoopFillStack
FillStack:
movs r3, 0x73737373   /* fill with 's' */
str  r3, [r2]
adds r2, r2, #4
LoopFillStack:
ldr r3, = _estack     /* = 0x20020000 */
cmp r2, r3
bcc FillStack

/* Call the clock system initialization function. Moved to b_main.c */
/*  bl  B_SystemInit */

/* Start FPU, to avoid this problem */
/* [url]https://www.eevblog.com/forum/microcontrollers/stm-32f4-fpu-registers-and-main()-gotcha/[/url] */

ldr.w r0, =0xE000ED88        /* The FPU enable bits are in the CPACR. */
ldr r1, [r0]
orr r1, r1, #( 0xf << 20 )    /* Enable CP10 and CP11 coprocessors, then save back. */
str r1, [r0]

/* Call the application's entry point */
  bl  B_main

However some online stuff suggests using DSB after the FPU is enabled.
« Last Edit: July 23, 2024, 10:15:00 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15156
  • Country: fr
Re: STM 32F4 FPU registers and main() gotcha
« Reply #23 on: July 23, 2024, 10:28:01 pm »
Quote
I've seen that a lot of the project-creation vendor packages (STM Cube, Atmel Start, etc) are set up to call assorted startup code before main.  Clocks, memory wait states, floating point, RAM vector tables, privilege stuff, etc.

Yes; exactly. The startupxxx.s file called a C function called SystemInit().

Ah, true. Nothing prevents you from writing your own startup assembly file though.

But if you want to keep that as is, then it should be fine. If the startup assembly calls separate external functions for initializing various things, it should not cause any issue. Obviously just don't use any FP operations in any called function before the FPU has been enabled, and you'll be fine. Keep your code as simple as possible in those external functions and make sure they don't themselves call other external functions that you may not have full control over.

If all you do in those init functions, as you showed here, is access some registers in C via their C definitions, there shouldn't be anything odd happening behind your back even with full optimization on.

To be fair, ARM assembly is not the easiest to learn. I'm a bit biased, but I think RISC-V assembly is much easier to learn.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4277
  • Country: us
Re: STM 32F4 FPU registers and main() gotcha
« Reply #24 on: July 24, 2024, 12:06:28 am »
Quote
Nothing prevents you from writing your own startup assembly file though.
I believe that there is nothing stopping you from writing the startup file in C, to.
Although starting from scratch and throwing out the vendor-provided code seems ... not worth it.

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf