Author Topic: Why do my STM32 flash writes occasionally fail?  (Read 4647 times)

0 Members and 1 Guest are viewing this topic.

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Why do my STM32 flash writes occasionally fail?
« on: February 25, 2020, 11:00:28 am »
I'm having trouble with writing to flash on an STM32F767ZI.  Our firmware (rusEfi open source ECU - github.com/rusefi/rusefi) writes two copies of the configuration to flash (in different pages) in case power is lost while writing one, at least we have the old one, with a CRC to check that the whole thing got written before power was lost.  The problem is that occasionally (maybe 1 in 4 writes), some of the words in one or both of the copies will not get written, and stay stuck in their erased state (0xFFFFFFFF).

Here's an example hex dump, pulled off the chip with an stlink.



The intact sections are intact, and the busted sections are busted.

Observations:
When this happens, the PGPERR (programming parallelism error - trying to write the wrong width to flash) flag gets set in the flash status register.  That's a flag I'd expect to go off every time if there was in fact a width mismatch, but it only happens occasionally!

The chip is running at 3.3v, and PSIZE is set to 2 (32-bit programming), which is the correct setting according to the datasheet.  Interestingly if I set PSIZE to 0 (8-bit programming), the problem seems to go away (though I haven't tried a statistically significant number of times).

I've tried it on two different boards, and both have the same (intermittent) behavior.

edit: I have no problem reading/writing/erasing flash using an stlink.  Only have problems when doing it from the firmware itself.

Whatever is going on, it's pretty intermittent.  While debugging it I saw error free writes, writes with a single word wrong (usually in the first few words), and writes where EVERY word failed to write.

A suspicion:
In the dump above, the failures come in blocks of 32 bytes, aligned on 32 byte boundaries.  On the Cortex-M7, that's the size of a cache line.  However, it doesn't always happen in 32-byte chunks.

Anybody have a clue what's going on here?
« Last Edit: February 25, 2020, 11:05:02 am by mck1117 »
 

Offline Ground_Loop

  • Frequent Contributor
  • **
  • Posts: 644
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #1 on: February 25, 2020, 03:31:43 pm »
Do you have any interrupts messing with it?
There's no point getting old if you don't have stories.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Why do my STM32 flash writes occasionally fail?
« Reply #2 on: February 25, 2020, 04:01:37 pm »
What's the source of the data you're writing to Flash? Is it in DTCM RAM?

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #3 on: February 25, 2020, 06:27:26 pm »
Do you have any interrupts messing with it?

Nope.  One of my (unsuccessful) attempts to simplify the problem was to disable interrupts during the whole flash routine, and it still happens.

What's the source of the data you're writing to Flash? Is it in DTCM RAM?

Yes, it's coming from DTCM.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Why do my STM32 flash writes occasionally fail?
« Reply #4 on: February 25, 2020, 06:40:48 pm »
Dumb question - you're not relying on any code in flash during programming that you're overwriting are you? Including implicit calls to HAL/firmware/library code?
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #5 on: February 25, 2020, 06:58:58 pm »
Dumb question - you're not relying on any code in flash during programming that you're overwriting are you? Including implicit calls to HAL/firmware/library code?

I don't think so, no.  Interrupts are disabled, and while writing flash all that happens is waiting for the BSY bit to be cleared.

Here's the source for writing flash: https://github.com/rusefi/rusefi/blob/836aca5426bc205f7ec518ae0f08e9c7bc20909e/firmware/hw_layer/ports/stm32/flash.c#L190

That file hasn't really been changed since we moved the project to GitHub around 4 years ago.  It was intended for use with the F4, but the F7 has the same flash controller, and up until recently it was working totally fine.

Edit: Oops, misread your question slightly. No, I'm not writing a region that has code in it. Code is in the first ~400k, and the config copies are at 1MB and 1.25MB.
« Last Edit: February 25, 2020, 08:12:05 pm by mck1117 »
 

Offline GromBeestje

  • Frequent Contributor
  • **
  • Posts: 279
  • Country: nl
Re: Why do my STM32 flash writes occasionally fail?
« Reply #6 on: February 25, 2020, 09:00:22 pm »
Moving from STM32F4 to STM32F7. I believe the F4 is a Cortex M4 and the F7 is an Cortex M7.  I have no experience with the Cortex M7 hardware, but I believe they've got some caches lacking in Cortex M4. May these be the cause of the trouble?
 

Online ajb

  • Super Contributor
  • ***
  • Posts: 2601
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #7 on: February 25, 2020, 10:23:57 pm »
Two important things to know about the STM32F7:

- ONLY accesses by the CPU are cached, so if you have memory that is accessed by peripherals, including ALL of the DMA controllers (yes, even the Eth MAC nad USB peripherals) you MUST deal with cache inconsistency.  You could wrangle your buffers to allow selective cache flushing, or you can just use the MPU to prevent caching of areas shared between the CPU and peripherals (you will need to configure the linkage of the relevant data objects to make this work).
- By default, memory accesses to and from general memory are NOT guaranteed to happen in program order.  Regions can be tagged in the MPU as 'Device' or 'Strongly Ordered' to ensure accesses happen in program order and are not improperly cached.

Either of those factors may or may not be related to your current issue, but would be good things to rule out.  If you have the cache enabled, it should be easy to just turn it off and see if hte problem persists.

MPU configuration is described here, and is well worth familiarizing yourself: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0646a/BIHJJABA.html
 

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #8 on: February 25, 2020, 10:42:29 pm »
- ONLY accesses by the CPU are cached, so if you have memory that is accessed by peripherals, including ALL of the DMA controllers (yes, even the Eth MAC nad USB peripherals) you MUST deal with cache inconsistency.  You could wrangle your buffers to allow selective cache flushing, or you can just use the MPU to prevent caching of areas shared between the CPU and peripherals (you will need to configure the linkage of the relevant data objects to make this work).

Yes, I'm aware that there's no CCI on the Cortex-M line.  I think we've already squished all (most?) of those bugs.  None of the data in question here is ever written or read by anyone other than the CPU, so I don't thiiiink that's it.

- By default, memory accesses to and from general memory are NOT guaranteed to happen in program order.  Regions can be tagged in the MPU as 'Device' or 'Strongly Ordered' to ensure accesses happen in program order and are not improperly cached.

Ooh, that's a good one.

Now I'm curious if the operation:
Code: [Select]
- write flash
- wait for BSY flag to clear
- clear PG bit

is sometimes happening in the order

Code: [Select]
- wait for BSY flag to clear
- clear PG bit
- write flash (ends up being a noop, sets the error flag)

(or something), due to things happening out of order.  It's probably worth seeing what happens if I add a DSB after the flash write, before waiting for the BSY bit to clear, just to force the write to get flushed before we disable programming mode.  Writing flash is something we do relatively infrequently (only while configuring a board, not during normal driving), so the performance hit of a full flush every word isn't a big deal.
 

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #9 on: February 25, 2020, 10:51:21 pm »
- By default, memory accesses to and from general memory are NOT guaranteed to happen in program order.  Regions can be tagged in the MPU as 'Device' or 'Strongly Ordered' to ensure accesses happen in program order and are not improperly cached.

Thinking about it a little more... this probably explains why I couldn't get it to do it while stepping through in the debugger.  I suspect a breakpoint/single step does a full flush every cycle, so that the debugger shows the correct thing...
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Why do my STM32 flash writes occasionally fail?
« Reply #10 on: February 26, 2020, 07:27:18 am »
I've just checked a project of my own using an M7 CPU. Try putting a DSB instruction right after writing each flash word.

Offline mck1117Topic starter

  • Contributor
  • Posts: 36
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #11 on: February 26, 2020, 08:35:44 am »
Ok, did some testing.

First test: Disable I/D Caches

The pattern has changed! It's now 100% repro, writing a pattern like this:



This pattern does indeed suggest the bus isn't getting flushed properly every time - it settles in to 2 words written, 4 missing, and repeats.  Both written copies of the flash show the same pattern, but starting at a different offset (possibly because the pipeline state was different upon starting to write flash).

Second test: Cache still disabled, add DSB call after writing each word



Success!

I reenabled the cache, and things look like they work perfect now.  Good call everybody on the DSB!
 
The following users thanked this post: zzattack

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4227
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Why do my STM32 flash writes occasionally fail?
« Reply #12 on: February 26, 2020, 08:51:14 am »
 :-+

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
Re: Why do my STM32 flash writes occasionally fail?
« Reply #13 on: February 26, 2020, 01:02:41 pm »
I quickly learned to sprinkle DSBs around whenever I access any peripheral registers which might be sensitive to register write/read timing and/or order, and only remove some of those DSBs if I absolutely need the performance (which usually isn't the case when accessing peripheral registers). All the mysterious issues went away.

A typical case is instructing a peripheral to do something, and then poll for completion. DSB maybe perhaps needed in-between, so better use it.

For the very first time on M7, initially I thought them as cache consistency issues, because calling cache flush functions helped, but later I noticed caches are disabled by default. Those functions just happened to use DSB instructions internally, hence solving the problem.
 

Offline aix

  • Regular Contributor
  • *
  • Posts: 147
  • Country: gb
Re: Why do my STM32 flash writes occasionally fail?
« Reply #14 on: February 26, 2020, 02:25:15 pm »
If, like me, you're wondering what the DSB instruction is, this will save you having to google:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0497a/CHDDGICF.html

(It's a type of memory barrier that's further discussed here:
 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html)
 
The following users thanked this post: thm_w, jbee

Online ajb

  • Super Contributor
  • ***
  • Posts: 2601
  • Country: us
Re: Why do my STM32 flash writes occasionally fail?
« Reply #15 on: February 26, 2020, 04:18:54 pm »
I quickly learned to sprinkle DSBs around whenever I access any peripheral registers which might be sensitive to register write/read timing and/or order, and only remove some of those DSBs if I absolutely need the performance (which usually isn't the case when accessing peripheral registers). All the mysterious issues went away.

Hmmm, this shouldn't be necessary because the peripheral address space is configured as "Device" memory in the default memory map.  Device memory accesses should always happen in order.  The only more restrictive memory type is "Strictly-Ordered", which in addition to requiring accesses to happen in program order disallows write buffering (the naming could be a bit more intuitive between the two types). If you have code that mixes accesses to peripheral space with accesses to general memory where the ordering of those accesses relative to each other matters then you again have potential problems.  Obviously many cases would have you reading data from memory and writing it to a peripheral or vice versa, but in those cases where there's a direct dependency hopefully the access order within that read-write sequence is maintained, otherwise nothing would ever fucking work (and in that case hopefully people integrating the M7 core like ST would be smart enough to make their peripheral address space Strictly-Ordered instead of Device).  Personally, I've never found DSBs to be necessary in my STM32F7 peripheral drivers, but I have had to use the MPU to ensure certain regions of general memory are handled correctly when DMA or external memory-mapped devices are used.

Unfortunately directly testing this behavior would be time consuming at best and some aspects may be outright impossible--or at least impractical--to test on the physical devices because there's a LOT of internal state that is not accessible. 

Section 1.3.1 of ST AN4667 provides the default memory attributes for the STM32F7 series.

The Cortex M7 MPU documentation gives a good overview of memory attributes: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0646a/BIHJJABA.html
The M7 Memory Model documentation describes what the different attributes mean: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0646a/BIHJJABA.html
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf