Author Topic: what could cause bootloader erase on samd51  (Read 635 times)

0 Members and 1 Guest are viewing this topic.

Offline snarkysparky

  • Regular Contributor
  • *
  • Posts: 243
  • Country: us
what could cause bootloader erase on samd51
« on: April 20, 2021, 03:37:09 pm »
my bootloader resides in the region 0x0000 - 0x1fff

main app starts at 0x2000

At customer location the board went unresponsive.   

Took a readout of the whole memory and the bootloader section was all 0xff.

However the main program was still intact.

Now i admit i forgot to set the boot protect fuses.  But what could erase the flash like that.

Board does need better ESD protection.  Could that do it?

Thanks
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 2566
  • Country: ca
Re: what could cause bootloader erase on samd51
« Reply #1 on: April 21, 2021, 11:41:09 pm »
I've seen bootloaders wipe themselves, but if it just wiped the bootloader area to 0xFF and stopped that is a bit odd.

What voltage supply are you running at?
Do you have BOD enabled and set at a reasonable level?
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3416
  • Country: us
Re: what could cause bootloader erase on samd51
« Reply #2 on: April 22, 2021, 03:16:07 am »
Quote
if it just wiped the bootloader area to 0xFF and stopped that is a bit odd.
Well, if it had been trying to erase "everything", it presumably would have gotten pretty confused after it erased the code it was using to erase everything...
mega0/tiny0/1/2 code can issue an "chip erase" command that works great, and erases itself as well.  Even from the Application Section...
 I tried to claim this was a bug, but Microchip says it's as designed.
On AVR-Da/b, the chip erase command can only be initiated via the debug system (so the datasheet says, anyway.)
 

Offline snarkysparky

  • Regular Contributor
  • *
  • Posts: 243
  • Country: us
Re: what could cause bootloader erase on samd51
« Reply #3 on: April 22, 2021, 11:51:57 am »
Running at 3.3V

BOD is at the default state.  I haven't set it.

My firmware update process is to first erase all blocks needed for the new app.  It should never try to erase block 0.

I enabled Boot protection for block 0.   I hope that fixes my issue.


 

Offline DavidAlfa

  • Frequent Contributor
  • **
  • Posts: 539
  • Country: es
Re: what could cause bootloader erase on samd51
« Reply #4 on: April 22, 2021, 05:16:37 pm »
I gues it's some kind of bug.
Does the bootloader erase a hardcoded address or it's defined in the update?
Stm32 soldering station firmware: https://github.com/deividAlfa/stm32_soldering_iron_controller
Want support for your board? Put detailed info in the forum and get ready for testing. Issues? Before reporting, always flash the latest github FW and make a full reset.
Please use the forum, don't PM me!
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 2566
  • Country: ca
Re: what could cause bootloader erase on samd51
« Reply #5 on: April 22, 2021, 09:07:08 pm »
Well, if it had been trying to erase "everything", it presumably would have gotten pretty confused after it erased the code it was using to erase everything...
mega0/tiny0/1/2 code can issue an "chip erase" command that works great, and erases itself as well.  Even from the Application Section...
 I tried to claim this was a bug, but Microchip says it's as designed.
On AVR-Da/b, the chip erase command can only be initiated via the debug system (so the datasheet says, anyway.)

yeah possible, maybe the erase size is large enough that it ran until the last block.

Running at 3.3V

BOD is at the default state.  I haven't set it.

My firmware update process is to first erase all blocks needed for the new app.  It should never try to erase block 0.

I enabled Boot protection for block 0.   I hope that fixes my issue.

If you don't have BOD setup, then expect that any section of your code can run with any random variable set.
 

Offline L1L1

  • Contributor
  • Posts: 45
  • Country: gr
Re: what could cause bootloader erase on samd51
« Reply #6 on: April 23, 2021, 07:32:09 pm »
This blog post should answer exactly your question:

https://blog.thea.codes/sam-d21-brown-out-detector/
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 575
Re: what could cause bootloader erase on samd51
« Reply #7 on: April 23, 2021, 11:46:51 pm »
Quote
This blog post should answer exactly your question:
So, after all the nvm command protection they put in place, all you need to do is run the flash at a wait state too low for the speed used and you can get a row erase? (it has to be an erase as something like an automatic write would not be able to set flash bits).

More than half of me says can't be, less than half says maybe is so. I can see getting bad reads from flash when wait state is wrong, and I can maybe see inadvertently getting to existing code that does a row erase although odds seem quite low, but the nvm controller doing a row erase on its own because it did not read flash correctly seems improbable to me. Wouldn't exceptions then also become common from the bad flash reads? Seems odd the only symptom would be a rogue row erase.

But if this is so (and applies to samD), then I think my plan would be to set nvm addr to point to non-existent flash and also turn on manual write in startup code. I would assume since the default nvm addr value is 0, whatever may be happening could probably be avoided by providing another (harmless) address.

NVMCTRL_REGS->NVMCTRL_CTRLB = 0x80;   //MANW=1
//page size[18:16], 0=8,1=16,etc, [15:0] num pages
auto v = NVMCTRL_REGS->NVMCTRL_PARAM;
NVMCTRL_REGS->NVMCTRL_ADDR = (8<<(v>>16)) * (v bitand 0xFFFF); //end of flash+1
NVMCTRL_REGS->NVMCTRL_ADDR--; //easier


 
Back to the samd51- that is a big step up from d10/d21 so is not so simple anymore. You are getting 4 rows erased (512*4 *4), so is no longer a simple 1 rogue row erase anymore, and is now something more organized it would seem (why exactly 4 and same size as bootloader section, why not 1-3 rows, or more than 4).edit- erase is in blocks for the d51, and a block is 16 pages, so 16*512=8192 and only 1/2 block is being erased, so I guess still makes no sense as there is nothing in place that erases half a block (region lock is at minimum 8k, so cannot be an explanation as to why the erase happens to stop at start of app).

Quote
I tried to claim this was a bug, but Microchip says it's as designed.
I think they made a mistake, and turned it into a feature (now too late to start changing datasheets, etc.). The clue that it is a mistake is you can do this chip erase anywhere, where any other flash write command can only take place from the bootloader section to another section.

I doubt their intention was to allow you to create a bootloader, protect it from being read and written, but then allowing any user of it to simply wipe the whole flash with the following 2 lines-
    CCP = 0x9D;
    NVMCTRL.CTRLA = 5;

Maybe its such a well thought out feature that I am missing it, but I'll go with the simple explanation that someone/somewhere made a mistake and allowed this command to escape from the updi dungeon.
« Last Edit: April 25, 2021, 10:17:13 pm by cv007 »
 

Online westfw

  • Super Contributor
  • ***
  • Posts: 3416
  • Country: us
Re: what could cause bootloader erase on samd51
« Reply #8 on: April 24, 2021, 10:27:27 pm »
Quote
all you need to do is run the flash at a wait state too low for the speed used and you can get a row erase?
It would be interesting to check whether the problem is running without enough wait states, or SETTING the wait states to the wrong value.  The latter involves writing to the NVMCTRL registers - in fact, with a value (2) that means "Erase Row", but in a slightly different register...
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 575
Re: what could cause bootloader erase on samd51
« Reply #9 on: April 25, 2021, 06:24:57 am »
Quote
The latter involves writing to the NVMCTRL registers - in fact, with a value (2) that means "Erase Row", but in a slightly different register...
Except you also need to have a specific cmdex key value with odds of 256:1 against, in addition to any other odds in place to get to that point.

That blog post link is talking about a d21, and this common (I assume) uf2 bootloader appears to think the d51 is the one with the problem (and not the d21)-
https://github.com/adafruit/uf2-samdx1/blob/master/src/main.c

Why would it be a problem on the d21 in one instance (blog post), but in some other often used bootloader its only a problem in the d51 ?

I'm inclined to believe the bootloader code is involved somehow in any case (a common theme in both), and the fixes described are simply preventing any corrupt flash reads with the resulting possible inadvertent branching into bootloader erase code. The bootloader is using the correct cmdex value, so should be the only place an erase should take place. Without any code in place to erase, getting the nvmctrl to do an erase by 'random' seems unlikely when you need a specific 16bit value to be written to one address (aligned) in a 32bit address range. This is all looking at the nvmctrl from the 'outside' controls, and maybe whatever goes on 'inside' can be a problem under certain conditions.

The booloader in the link above uses a 'double tap' entry, but does not distinguish the cause of the second 'tap', so its not hard to imagine that the second tap can come from a brownout reset (even before the brownout value was raised, although odds decreased when they raise the brownout in early code). Although you are now in the bootloader unintentionally, that still does not get you to a page/block erase.

Remove the bootloader, and maybe the problem also disappears. Those claiming the spurious erase problem/fix do not seem to have tried to test their brownout theory without a bootloader in place, which would be better proof since then no code exists to do any flash writing. I only have samD10's, and its errata suggests enabling manual write (manw) in startup to prevent spurious writes, which sounds like a good idea (and maybe setting addr to some harmless address as described in previous post), but a write to non-erased flash is not an erase (although just as bad, assuming page buffer is not in erased state).


The half page erase in the original post doesn't make much sense in any case since nothing is in place that can erase only 1/2 a block. If I was interested in finding out if this brownout theory was correct, I would take my stock d51 with bootloader/app in place and try to hammer the power as that blog post did. If you can reproduce the problem then you can get to the next step- remove bootloader, app only (recompile), do the same power testing. At least you can then see if the bootloader is somehow involved. If this problem has nothing to do with the bootloader, then testing can continue (wait states, brownout values, etc.). In any case, if this problem can be reproduced I'm not sure I would be happy with only boot protection in place, and would look into region locking also (erasing part of the app is also not good).

Although I don't see anyone claiming D10's act this way, maybe because of the small flash size, few are putting bootloaders in them and avoiding the problem. Maybe I should see how a D10 acts- create an empty main loop, first lighting up an led then setting speed to 48MHz with no wait states, fill up the rest of flash with a row erase function (simulating hundreds of bootloaders in place to maybe increase odds). Power cycle until led quits lighting up, or until satisfied nothing bad happens.

 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 2566
  • Country: ca
Re: what could cause bootloader erase on samd51
« Reply #10 on: April 27, 2021, 10:03:37 pm »
In any case, if this problem can be reproduced I'm not sure I would be happy with only boot protection in place, and would look into region locking also (erasing part of the app is also not good).

Yes, which is why the proper solution is BOD.
There are very specific scenarios where you don't want to enable it, otherwise, intentionally not having it on is just :palm: worthy.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf