Author Topic: Are there any microcontrollers with dedicated OTA or OTW reprogramming hardware?  (Read 2148 times)

0 Members and 1 Guest are viewing this topic.

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
As far as I can tell deploying firmware updates is still a risky business as there is a risk of bricking the device by accidentally uploading the wrong firmware etc.

At the moment if you want a fail safe system then you have to roll your own solution by co-locating another micro which has the sole purpose of receiving firmware and reprogramming the target chip. The system I'm currently using uses a STM32F042 to manage a SAMD21 (https://omzlo.com/articles/canzero). It works but it's requires a bunch of code to make it work as there don't appear to be any standards for kind of thing (I could be wrong).
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
It is absolutely possible to design reliable systems using current hardware without a need for the external MCU. You need to clearly define what you are protecting against.

If you are uploading a correctly signed and check summed image, then how do you define "wrong" image? There may be a subtle error that only shows up a month after deployment. If you have a way to detect this occurrence, then you can go into the firmware update more automatically. If you don't have a reasonable way to detect that there is an issue, then how would the dedicated HW know about it?

One commonly used way to protect against a completely broken image is to have some sort of a flag that only running firmware can set after it verifies itself (like it can contact a server or something like this). If FW can't verify itself or just really broken, then WDT would reset the device. Bootloader should count the number of such resets, and revert to the old firmware in case is this counter reached some threshold.

All this assumes existence of safe unmutable bootloader and a place to store the backup image.
« Last Edit: February 28, 2022, 05:15:39 am by ataradov »
Alex
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
You'll never know if the new firmware is capable of updating itself until it tries to do it. If it fails then you are forever stuck on that version.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
Ok, sure, but how your dedicated HW would work in that case?

If your firmware detects that the current image can't update itself, it can signal the bootloader to switch back to the old version. No need for dedicated hardware here.

Of course this implies that OTA is based on the device polling for the image, not the image being pushed to the device. There is no way to detect that other than not seeing an update in a while.
« Last Edit: February 28, 2022, 05:37:55 am by ataradov »
Alex
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
If the part of the update process that is supposed to recognize a failed update didn't work then it'll never know it failed to update itself.

It could end up in a perpetual loop of firmware updates but never get beyond the current version.
« Last Edit: February 28, 2022, 06:08:35 am by e100 »
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
But again, how would hardware solution help here?
Alex
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
But again, how would hardware solution help here?

At the factory the external hardware is used to program the target (to prove that the process works) and once deployed the external hardware never gets updated or changed in any way, therefore it is always able to download new firmware and reprogram the target regardless of what firmware the target is currently running.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
I don't understand. Do you mean that there will be two completely independent radios connected to the same network? What stops you from doing that with just two devices you already use? It won't be any cheaper if manufacturer integrates two radios in one package.

It seems to me that it is easier to implement a bootloader that never changes and contacts the server for the update on each reboot (or on timer if there is RTC). The main firmware must reset the device periodically when update is required. If you afraid that firmware will lock without a reset, a simple external supervisor could be used.
Alex
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
I don't understand. Do you mean that there will be two completely independent radios connected to the same network? What stops you from doing that with just two devices you already use? It won't be any cheaper if manufacturer integrates two radios in one package.

In the case of the STM32F042 and SAMD21 in my original post, the supervisory STM32F042 is connected to a CAN bus and channels messages to and from the SAMD21 via SPI. There is no need for duplicate transceiver hardware.

It seems to me that it is easier to implement a bootloader that never changes and contacts the server for the update on each reboot (or on timer if there is RTC). The main firmware must reset the device periodically when update is required. If you afraid that firmware will lock without a reset, a simple external supervisor could be used.

Regularly rebooting the system and waiting (perhaps minutes, dead in the water) for a firmware download to complete seems like a poor solution.
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
You are not waiting for the FW to download all the time, just when there is a new firmware.

What if the part of the new firmware that talks over the SPI bus breaks? You will always be able to find some weak point like this when you consider the most unlikely scenarios. But in practice this is not an issue at all. There is a certain level of risk that is to be accepted.
Alex
 

Offline 2N3055

  • Super Contributor
  • ***
  • Posts: 6663
  • Country: hr
Some really unusual statements here:

"As far as I can tell deploying firmware updates is still a risky business as there is a risk of bricking the device by accidentally uploading the wrong firmware etc."

- Wrong. It is trivial to make checks if firmware is proper for the target. Both in version (is it type for the hardware), and by checking integrity of downloaded image.
 

"You'll never know if the new firmware is capable of updating itself until it tries to do it. If it fails then you are forever stuck on that version."

"If the part of the update process that is supposed to recognize a failed update didn't work then it'll never know it failed to update itself.
"


- What do you mean? You are writing random code and then test over the air if it can update something there in the world?  Don't you test your code at all? I understand firmware having hidden errors, but not being sure if it will upload at all is not possible. That part must be tested before even considering it a firmware to upload to production.

Like Alex says (and he knows about this..) you need a single processor, a trusted bootloader, an additional storage (to store interim image and backups) and a watchdog like procedure that will be part of trusted bootloader, that will trigger restore of previous image if boot after update doesn't go as planned.

You are obviously deliberately vague about what are you planning to do. What kind of system is this? Why are you planning to do en masse unattended firmware updates of devices all the time? What is communications channel for firmware pushes? Is channel prone to errors so you get corrupted files?

When designing some system (any system, really) most important thing is to solve only problems that are necessary and not those imposed by ourselves. Or shall I say, it is important not how to solve problem at any cost, but to take a step back, look at the big picture and realize it is easier to remove source of problems altogether and not create solutions to complications you invented yourself.
Keep it simple.

For instance: Is channel prone to errors so you get corrupted files? in that case make sure you cannot get errors. Use protocol that has built in EC. Do checksums/CRC.  Don't just download broken file, blindly flash it into controller and then invent failsafe recovery procedures.

You need to step back and rethink.
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
What if the part of the new firmware that talks over the SPI bus breaks?

The factory test at the beginning proves that the SPI works. If the new firmware you download has buggy comms then you use the supervisor to download a new version. Remember, the target can be running any firmware or none at all.

You will always be able to find some weak point like this when you consider the most unlikely scenarios. But in practice this is not an issue at all. There is a certain level of risk that is to be accepted.

What if the device is on another continent and you have to pay for your own time, travel and accommodation to go and fix it because you saved $5 by going for the "mostly works" instead of the "always works" option?
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
The factory test at the beginning proves that the SPI works. If the new firmware you download has buggy comms then you use the supervisor to download a new version. Remember, the target can be running any firmware or none at all.
But the target still runs some form of a bootloader that talks over SPI? Or do yo also have SWD connection? If it is SPI only, then what stops your bad firmware from erasing the bootloader and bricking the device?
Alex
 

Offline e100Topic starter

  • Frequent Contributor
  • **
  • Posts: 567
The factory test at the beginning proves that the SPI works. If the new firmware you download has buggy comms then you use the supervisor to download a new version. Remember, the target can be running any firmware or none at all.
But the target still runs some form of a bootloader that talks over SPI? Or do yo also have SWD connection? If it is SPI only, then what stops your bad firmware from erasing the bootloader and bricking the device?

The URL I provided in the first post has links to the architecture documentation, source code and schematics. Hopefully that will be able answer some of your questions. Perhaps there are flaws in it, I don't know, it's a complex system and I don't pretend to understand half of it.
From a software perspective it's not a finished thing. Chipageddon means they cannot source parts to make new boards to fund development so it is what it is, at least for the time being.

I didn't come here to promote (or defend) a specific solution created by someone else. I merely used it as an example of a system that in my experience has been un-brickable and therefore worthy of attention. I'm pretty good at breaking things and so was pleasantly surprised that is has survived my stress testing over many months and thousands of firmware updates. It definitely has issues, but none that have rendered hardware unusable.

The unanswered question still remains, are there any microcontrollers with dedicated OTA or OTW reprogramming hardware?

 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
The URL I provided in the first post has links to the architecture documentation, source code and schematics. Hopefully that will be able answer some of your questions. Perhaps there are flaws in it, I don't know, it's a complex system and I don't pretend to understand half of it.
The schematic shows no SWD connection from the CAN MCU, only SPI. So, the wrong firmware can brick that device.

The unanswered question still remains, are there any microcontrollers with dedicated OTA or OTW reprogramming hardware?
I don't think there are, as it makes no commercial sense. There are well proven solutions that are considered good enough by everyone, including largest corporations shipping millions of devices. There is no need to over-complicate things.

Also, this all is really a problem only if your device has no user interface at all and not accessible for maintenance. This is very rare. And for other devices you can always have a recovery procedure by pressing a button or doing some other action with a device. You will likely need that UI for initial commissioning anyway.
« Last Edit: February 28, 2022, 09:33:36 am by ataradov »
Alex
 

Offline TomS_

  • Frequent Contributor
  • **
  • Posts: 834
  • Country: gb
As far as I can tell deploying firmware updates is still a risky business as there is a risk of bricking the device by accidentally uploading the wrong firmware etc.

You can "package" your firmware along with some "headers", including a checksum of the application code. When you upload the firmware package to the device, it checks the headers to ensure that it is intended for this device, and that the checksum is correct. At this point youre in a good place and only need to program the newly received code into the device flash.

If youre not doing that at a minimum, then youre just opening yourself up to programming the wrong firmware in. Certainly it would seem to be a fairly common practice in my experience.

Quote
At the moment if you want a fail safe system then you have to roll your own solution by co-locating another micro which has the sole purpose of receiving firmware and reprogramming the target chip. The system I'm currently using uses a STM32F042 to manage a SAMD21 (https://omzlo.com/articles/canzero). It works but it's requires a bunch of code to make it work as there don't appear to be any standards for kind of thing (I could be wrong).

This can easily be done with a single micro. The method I have been using is to "partition" the micros internal flash into two: one partition (just several KB in size) holds the bootloader code, and the rest of it holds the application code. The partitions are sized accordingly so that neither of them crosses a flash page erasure boundary, so in theory they are both safe from each other as long as something catastrophic doesnt go wrong during erase operations.

I use an external SPI EEPROM to hold new application code, because flash based SPI EEPROMs are dirt cheap.

On boot, the bootloader compares a version number of the loaded application code and the externally stored application code, and if they are different, and if a checksum of the stored application code checks out OK, it erases the application area of the internal flash and programs the externally stored version in.

Before jumping into the application code, whether an upgrade was just performed or not, the bootloader checksums the loaded application code to make sure it isnt corrupt. If its OK, it jumps into it, if its not OK then it can attempt to load in the externally stored version as above. If both the internal and external application code are corrupt, well, youre kind of screwed at that point, so I just flash "SOS" on a status LED.
 

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1548
  • Country: au
The unanswered question still remains, are there any microcontrollers with dedicated OTA or OTW reprogramming hardware?
Short answer : Yes.
You are effectively asking for a ROM loader, and many MCUs have that feature, for OTW reprogramming.

I've not dug into wireless MCUs but many of those are multi-core and I'm sure they have secure bootloaders in their toolkits.

Google finds this in seconds

https://docs.silabs.com/bluetooth/latest/general/firmware-upgrade/secure-ota-dfu
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
ROM loader won't help if your firmware is toast and can't invoke it. There are no ROMs with some sophisticated recovery logic.
Alex
 

Offline PlainName

  • Super Contributor
  • ***
  • Posts: 6847
  • Country: va
ESP32 almost gets there and the process can be used elsewhere (it's baked into the ESP-IDF SDK but there's nothing hardware-specific to it). The flash is partitioned for the default (factory) code and 1 or more OTA partitions. The bootloader checks the last set OTA partition and runs the code in it (you can enable all kinds of version checking in the OTA downloading code, but let's assume you ignore all that and just program in any old thing). The new code has to verify that it's good at some point before a reset - if it doesn't then on reset the bootloader reloads the previously good partition. With a watchdog (preferably hardware) it's pretty robust since anything not the right stuff will cause a reboot and hence reversion to good code.

But... suppose your new code tells the bootloader it is perfectly fine, but you've screwed up the OTA downloading stuff. You're toast. The only way to guard against this kind of thing is with another processor to do the updating. But then you have the problem of how do you update that one...
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11265
  • Country: us
    • Personal site
The new code has to verify that it's good at some point before a reset - if it doesn't then on reset the bootloader reloads the previously good partition.
This is exactly the algorithm I described. This is an industry standard practice that has been used for ages.

And yes, this does not address the issue raised above where the code is generally ok, but no longer able to perform OTA function. You are locked with the last verified firmware with no way to recover.

There is no general way to tell that you did  not intend to upload a dummy blinky firmware. If it was signed correctly - it is a correct firmware.
Alex
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf