Author Topic: Is is realistic to boot Linux on STM32MP1 (Cortex-A7) under 2 seconds? (Read 1927 times)

Miyuki · « **on:** March 16, 2024, 03:19:26 pm »

Hi folks,
For one project I am deciding between running lightweight Linux or some RTOS.
Linux will probably make some parts easier, like managing filesystems.
But on the other hand, I need to boot it in less than 2 seconds to perform as a USB device.
I probably will have relatively slow RAM access as it will be encrypted, but I can have XIP from NOR flash (also encrypted)

It needs to implement just a few features, namely a USB Host, Device and understand a few basic filesystems, and then run the main application (which can easily run even on bare metal)
There will be no graphics or terminal.

nctnico · « **Reply #1 on:** March 16, 2024, 03:39:23 pm »

What typically slows down booting Linux is a slow medium to boot from and waiting for drivers to initialise. I have no hands-on experience myself, but initramfs could be a solution for you to at least speed up reading from the filesystem. initramfs is a Linux filesystem image which sits in memory (where it is loaded through the bootloader) and has a minimal Linux setup inside.

globoy · « **Reply #2 on:** March 16, 2024, 03:54:46 pm »

I'm sorry I can't find it but years ago I read an article about how a company building a head-end unit for a car got their device - complete with GUI - to boot in about a second. A google search does turn up some articles about fast booting linux. Although I can't vouch for any of them I listed a couple below. I imagine that the more stripped down your build the better.

https://www.e-consystems.com/articles/Product-Design/Linux-Boot-Time-Optimization-Techniques.asp

https://www.elinux.org/images/9/97/Boot_one_second_altenberg.pdf

https://elinux.org/images/b/b3/Elce11_koen.pdf

https://embetrix.com/2017/05/16/embedded-linux-fast-boot-techniques/

RoGeorge · « **Reply #3 on:** March 16, 2024, 04:44:30 pm »

In an embedded system usually the hardware is well known at it never change, so there is no need to probe for anything. Instead of a normal Linux install + normal boot, one can put inside the flash memory a snapshot of a running Linux (somehow similar with what a hibernate or a suspend to RAM does for a PC).

nctnico · « **Reply #4 on:** March 16, 2024, 05:05:59 pm »

Quote from: RoGeorge on March 16, 2024, 04:44:30 pm

In an embedded system usually the hardware is well known at it never change, so there is no need to probe for anything. Instead of a normal Linux install + normal boot, one can put inside the flash memory a snapshot of a running Linux (somehow similar with what a hibernate or a suspend to RAM does for a PC).

For embedded systems the Linux kernels are typically compiled lean; only the drivers which are actually used are compiled in c.q. enabled in the device tree. So in the end there is not much probing needed. Typically this 'probing' process doesn't take long but busses like PCI express may wait quite long before finding devices.

Nominal Animal · « **Reply #5 on:** March 17, 2024, 07:06:59 am »

The two limiting factors are the storage speed and the time taken by the bootloader before it hands control over to the Linux kernel.

If you intend to run everything from RAM, you can use an initramfs containing the kernel and all the software you need. This will be read by the bootloader into memory, and if compressed, decompressed by the Linux kernel when it gets control. Whether compression decreases or increases the boot time depends on the storage speed, the size of the image, and the rate at which the processor can decompress the image. In all cases, the rate at which the bootloader can read the entire image is absolutely crucial, because only after it has been done, can the Linux kernel start its work; so nothing you can do to/in Linux can shorten that time.

If you intend to run with minimal stuff in RAM, your bootloader will load the Linux kernel only, then hands control to the Linux kernel. In this case, the bootloader only loads the kernel into RAM, but the kernel ends up doing lots of small (4k random-access) reads during its bootup procedure.
In most cases (when the bootloader isn't idiotic or slow as molasses), the single initramfs image approach is faster.

I don't have any SBCs based on ST32MP1, nor can I find any reports of the typical boot time. Having a record of how long each stage (TF-A, uboot, Linux kernel boot) takes on a typical ST32MP1 would tell whether two seconds is feasible and under what conditions, but since I cannot find any of that online, I cannot say. I do believe that if you asked about it at the ST Community forums, someone might answer with the approximate boot stage timings. If TF-A + uboot (including initramfs loading) can be done in under a second, I say booting to active services within two seconds is doable. If TF-A + uboot (including initramfs loading) takes two seconds or longer, it cannot be done, because by the time control is handed over to the Linux kernel, your time budget has already been spent.

Making Linux appliances boot in less than a second is not too difficult, as long as each hardware combination can have their own firmware (so no slow boot-time probing), and you have fast enough Flash storage for the initramfs image, and fast enough processor to do the decryption and decompression.

Personally, I would not bother with the encryption, unless it is required from a legal/contractual/statute viewpoint. Those who want to steal your firmware can contract a specialist to do it (and regardless of the processor, there are ways to do it, it only depends on how much money they're willing to spend) or bribe someone with access to the sources to provide them a copy. You'd be surprised how cheap and easy the latter approach is, if one does a background search on your employees: it is an exceedingly rare company that is not easily breached thus. Plus, you do need to abide by the GPLv2 for the kernel (provide the sources to the kernel you use in the firmware) and any other copyleft-licensed software you use on the device anyway. It doesn't sound like you have much your own IP to protect. Learn from the mistakes of others like ~~Saleae~~ DreamSourceLab/DSLogic, who forked Sigrok Pulseview to use as their own proprietary software with their logic analyzers, but after a fight had to relent and abide by the Sigrok Pulseview license (GPLv3+) anyway. Now ~~Saleae~~ DSLogic is a good FOSS citizen, though; they just didn't grok the licensing at first.

If you want to combine LGPL'd libraries with your own proprietary software (closed-source userspace application), you'll have to provide a way for the end users to replace those libraries and use dynamic linking (as of 2024, based on related past court cases and recommendations from Free Software Foundation, the authors of the GPL); see e.g. the related Wikipedia article on LGPL. This means that if you use encryption, you need to use e.g. device-specific public key encryption, with one half of the key in the device, and provide the other half of the key to the user so that they can reconstruct the encrypted firmware for the device, substituting any GPL'd or LGPL'd libraries or files with new ones. Providing just the sources, but not a way to replace LGPL'd files, is not sufficient to allow you to mix LGPL'd libraries with your proprietary application code.

If you need to be able to do that, you need to use some other kernel than Linux; perhaps FreeBSD or Windriver VxWorks.

If you intend to ignore GPL and LGPL licensing requirements, I condemn you and your company to Malebolge as described by Dante Alighieri.
Otherwise, although I cannot offer legal advice, I could tell you about the various approaches license-abiding companies have done to do what you intend to do (especially wrt. mixing closed-source/proprietary and open-source software as permitted by the various licenses). It's not hard. First, you have to first let go of the core idea that Linux is zero-cost, because it is not: it's "price" is developers and vendors following the rules of the license, while end users get to do whatever they want (except distributing copies or derivaties, which again fall under the rules of the license). For some use cases, like extremely restricted encrypted devices, using BSD/MIT/Apache-licensed kernel and software, or proprietary licensed stuff like VxWorks, may be necessary.

SiliconWizard · « **Reply #6 on:** March 17, 2024, 07:46:50 am »

Good points above. Under 2 seconds from reset to fully operational looks not impossible, but pretty challenging. Good points about the licensing as well.

While I'm not a big proponent of it, ThreadX would probably fit the bill here, it supports USBX, a full USB stack: https://github.com/eclipse-threadx/threadx , and will likely start in insignificant time compared to Linux.
It's been released with a MIT License, so much easier to deal with if you're developing a commercial product. Your call.

Miyuki · « **Reply #7 on:** March 17, 2024, 08:55:39 am »

2 Nominal Animal: The main purpose of encryption is here to protect(make it a little more inconvenient) from theoretical attacks when one can change data after a secure boot verifies the original image. I agree, that those are more or less theoretical possibilities, but we are in the legal/contractual/statute viewpoint area.

I agree that ThreadX might be a good fit and could probably be able to run without external DRAM in some use cases.

Nominal Animal · « **Reply #8 on:** March 17, 2024, 10:01:49 am »

Quote from: Miyuki on March 17, 2024, 08:55:39 am

2 Nominal Animal: The main purpose of encryption is here to protect(make it a little more inconvenient) from theoretical attacks when one can change data after a secure boot verifies the original image. I agree, that those are more or less theoretical possibilities, but we are in the legal/contractual/statute viewpoint area.

No problem; it is indeed a fully valid use case. I can imagine myself doing that in several circumstances, in automotive and medical for example.

Using open source software is not an issue, either; nor is using proprietary applications with the Linux kernel. (Using the Linux kernel userspace interfaces does not create a derivative in the sense of copyright, according to Linux kernel developers – the copyright owners.)

The licenses apply when you modify the kernel code, or wish to mix LGPL-licensed code with proprietary code. Kernel modification is simple: you need to provide those changes in source code form to your customers. Mixing LGPL-licensed code with proprietary code is more complicated.

To avoid the LGPL complications, you need to use BSD/MIT/Apache-licensed ("permissive licenses") libraries only. That excludes GNU libc (it's LGPL), so you'd need to use newlibc, BSD libc, musl, picolibc, or Bionic as the standard C library when compiling the proprietary code and any libraries it uses. So, this can definitely be done, but it has to be done carefully. When done carefully and documented in detail (I recommend up to kernel and library configuration details and build machinery, excluding encryption step), FOSS developers will be happy with you and you won't have any trouble from them.
(I've talked elsewhere about a similar situation wrt. Qt libraries, when relying on LGPL and having proprietary closed source code. It too can be done, in a way that both developers and the Qt Company is happy, with no license- or copyright-related problems to worry about, and easily defended if spurious copyright-related demands were to occur.)

I'm happy to talk about those from both the perspective of FOSS developer, and from the perspective of someone wishing to combine proprietary closed source code with open source components in a commercial product. I've done both, and am not a "zealot" except in the sense that I want to abide by both the letter and the intent of international and local copyright laws, even if there were a loophole I might be able to exploit. One of my favourite mixes is using Python and Qt for the UI, and a dynamically linked library for the proprietary closed-source stuff, allowing end users to modify the UI.

If I sound confrontational, it is only because I see vendors so often relying on FOSS copyright owners to not sue them, rather than actually abiding by the licenses they use in their products, because they do not take "non-commercial" copyrights seriously at all. Makes me angry, that.

Quote from: Miyuki on March 17, 2024, 08:55:39 am

I agree that ThreadX might be a good fit and could probably be able to run without external DRAM in some use cases.

ThreadX is MIT-licensed, which only requires the license text (and listing the software parts that are used under the MIT license) somewhere in the code and the documentation. There are a few different variants of the BSD license, but it is very similar.
It also contains support for exFAT (FileX) which covers the vast majority of removable devices' filesystems (FAT or exFAT), and can even provide wear leveling (via LevelX).

Another RTOS you might consider is Apache-licensed Zephyr OS. It supports ST32MP157, and seems to have TF-A support also. I'm not sure if its existing filesystem support suffices for your needs, but I suggest you check it out.

DiTBho · « **Reply #9 on:** March 17, 2024, 12:19:28 pm »

Quote from: Miyuki on March 16, 2024, 03:19:26 pm

But on the other hand, I need to boot it in less than 2 seconds to perform as a USB device.

short answer: no

DiTBho · « **Reply #10 on:** March 17, 2024, 12:30:28 pm »

Quote from: nctnico on March 16, 2024, 05:05:59 pm

Typically this 'probing' process doesn't take long but busses like PCI express may wait quite long before finding devices.

Both PCI and ePCI have long wait-cycles, before initializing BAR, understanding the configuration space, and then finding devices.

HwAoRrDk · « **Reply #11 on:** March 17, 2024, 07:47:19 pm »

Quote from: Nominal Animal on March 17, 2024, 07:06:59 am

Learn from the mistakes of others like Saleae, who forked Sigrok Pulseview to use as their own proprietary software with their logic analyzers, but after a fight had to relent and abide by the Sigrok Pulseview license (GPLv3+) anyway.

Wasn't that DSLogic, not Saleae? I'm pretty sure Saleae's Logic software is not a fork of anything.

Nominal Animal · « **Reply #12 on:** March 17, 2024, 08:35:39 pm »

Quote from: HwAoRrDk on March 17, 2024, 07:47:19 pm

Quote from: Nominal Animal on March 17, 2024, 07:06:59 am
Learn from the mistakes of others like Saleae, who forked Sigrok Pulseview to use as their own proprietary software with their logic analyzers, but after a fight had to relent and abide by the Sigrok Pulseview license (GPLv3+) anyway.

Wasn't that DSLogic, not Saleae? I'm pretty sure Saleae's Logic software is not a fork of anything.

Dammit, you're right. I should've double-checked. Edited now.

It was DreamSourceLabs that did the DSLogic Kickstarter using Sigrok/Pulseview, and changed the copyright texts and all. Even their initial firmware was a copy of fx2lafw.

Saleae Logic was the 8-channel logic analyser using Cypress Ez-USB FX2 based on the application note logic analyser suggestion, with no hardware processing at all, as it simply samples the 8 channels and sends them to the host. There was even a thread about it here some time back. The "Saleae Logic" clones (more properly called 24 MHz 8-channel USB logic analyzers) use the same application note schematic, and in Sigrok/PulseView the open source fx2lafw firmware, so as long as you pick one that does not try to pass itself as a Saleae product, they're legal, and work quite well in Sigrok/PulseView.

Saleae Logic 16 has no native Sigrok/Pulseview support, and even has a Xilinx Spartan-3A FPGA in it (but still uses FX2 for USB); it is properly their own intellectual property.

This was a pretty embarrassing error on my part.

Thanks for catching it, HwAoRrDk!

ddrown · « **Reply #13 on:** March 18, 2024, 08:10:35 pm »

I have a STM32MP157A-DK1 board here, and here's the bootup sequence, with timestamps relative to the first message from uboot:

* uboot start (0s)
* uboot message "Boot over mmc0!" (5s)
* uImage & dtb loaded into memory from sdcard (9s)
* kernel finished, first userland/systemd message (13s)
* systemd runlevel transition finished (32s)

There's probably speedup tweaks that can be made in every stage here but even uboot is going to be a problem in your goal of 2 seconds. To the concerns about PCI enumeration, the STM32MP157A doesn't have any PCI devices or PCI/PCIe connectors.

nctnico · « **Reply #14 on:** March 18, 2024, 08:19:44 pm »

Quote from: DiTBho on March 17, 2024, 12:30:28 pm

Quote from: nctnico on March 16, 2024, 05:05:59 pm
Typically this 'probing' process doesn't take long but busses like PCI express may wait quite long before finding devices.

Both PCI and ePCI have long wait-cycles, before initializing BAR, understanding the configuration space, and then finding devices.

Most of the delay is for allowing PCI(e) devices to configure themselves (for example: loading an FPGA configuration which implements the PCI(e) bus) after a reset. In the early 2000's I had Dell to fix a BIOS version because on one of their (new) systems they forgot about the PCI delay.

lunar · « **Reply #15 on:** March 18, 2024, 08:35:20 pm »

The fastest Linux boot I've ever seen was from stali on x86-64 hardware. It booted in seconds, although I don't remember exactly how long. Perhaps 5 seconds. Maybe less. I would recommend investigating what they do (static binaries, musl libc). For those who don't want to bother with self compiling everything from scratch, but are interested in a minimal Linux OS, Alpine is similar in design. Buildroot / Openwrt are also worth looking into.

https://sta.li/index.html

Consider that Alpine is used in docker, specifically because it is lean, and quite fast.

nctnico · « **Reply #16 on:** March 18, 2024, 08:51:54 pm »

Quote from: lunar on March 18, 2024, 08:35:20 pm

The fastest Linux boot I've ever seen was from stali on x86-64 hardware. It booted in seconds, although I don't remember exactly how long. Perhaps 5 seconds. Maybe less. I would recommend investigating what they do (static binaries, musl libc). For those who don't want to bother with self compiling everything from scratch, but are interested in a minimal Linux OS, Alpine is similar in design. Buildroot / Openwrt are also worth looking into.

I would strongly recommend against going the buildroot / openwrt (Yocto) route nowadays as these cross compilation environments are a pain to use and have a steep learning curve (after which they are still a pain to use). A better approach is to use a stripped down existing Linux distribution and go from there.

asmi · « **Reply #17 on:** March 19, 2024, 01:56:19 am »

Just training of DDRx memory interface can take several seconds all by itself depending on the interface and DDR revision. The only realistic way of achieving boot times anywhere close to requested is a suspend-to-RAM, but that requires uninterrupted power supply to preserve it's contents.

Nominal Animal · « **Reply #18 on:** March 19, 2024, 09:47:02 am »

Quote from: ddrown on March 18, 2024, 08:10:35 pm

I have a STM32MP157A-DK1 board here, and here's the bootup sequence, with timestamps relative to the first message from uboot:

* uboot start (0s)
* uboot message "Boot over mmc0!" (5s)
* uImage & dtb loaded into memory from sdcard (9s)
* kernel finished, first userland/systemd message (13s)
* systemd runlevel transition finished (32s)

There's probably speedup tweaks that can be made in every stage here but even uboot is going to be a problem in your goal of 2 seconds. To the concerns about PCI enumeration, the STM32MP157A doesn't have any PCI devices or PCI/PCIe connectors.

Thank you!

This means that uboot takes 5 seconds, initramfs loading 4 seconds, kernel boot 4 seconds, and systemd-based userland boot 20 seconds.
The latest takes so long because userland bootup uses lots of random-access 4k reads, and those are the slowest kind; the I/O throughput using random-access 4k reads can be as low as just a couple of megabytes per second, even on devices that can read at 100 Mb/s at 1M and larger continuous chunks.
SD card is just not good enough for minimum boot times.

Userland can easily be stripped down, but there isn't much one can do to uboot and initramfs loading times. The kernel boot time can probably be reduced by using built-in drivers and baked-in dtb. The end result, say 11 to 13 seconds, is what you can expect a wakeup from hibernation (suspend-to-disk) to take, too.

Getting below 12 second boot time from powerup to active services in Linux, seems unlikely or impossible on this particular hardware combination.
Note that by the time the Linux kernel boot completes, it can immediately enumerate itself as an USB device; the userspace does not need to be fully booted up for that.

A faster storage medium for an uncompressed initramfs image could help quite a bit.
Note, however, that the STM32MP157A is the no-crypto variant; so the above timings do not include any overhead from TF-A or encryption.

Let's review the STM32MP1 boot sequence.

The 128k Flash on the ST32MP157C or ST32MP157F contains the TF-A (Trusted Firmware-A) first-stage bootloader. (The other variants do not support secure boot or TF-A). This will set up clocks, RAM, et cetera, and load and verify the second-stage bootloader, uboot, from the mass storage, and hand off control to uboot.

uboot will do what it is configured to do, and load the Linux initramfs or kernel+dts from the mass storage, and hand off control to the Linux kernel.

The Linux kernel will configure all peripherals according to dts and probing, then mount the root filesystem (or initramfs), and start userspace init. At this point, the kernel (USB gadget driver) will be active, and can do USB device enumeration; so for OP, this point in time is the target two seconds from powerup.

If an initramfs is used, and there is a separate root partition, a pivot is done in the userspace initramfs image to swap the initramfs and the newly mounted real root filesystem, then the init continues on the new root filesystem. (This is also why /etc should always be part of the root filesystem, and not a separate filesystem.)

Looking at the STM32MP157C datasheet, there are quite a few boot media options. eMMC (4.0 - 4.51) on SDMMC2 seems an obvious choice, and the datasheet claims up to 208 Mbytes/second in 8-bit mode. Practical data rates will be less than that, but will depend heavily on the particular eMMC module chosen. (Note that this applies even if an RTOS is used instead of uboot and/or Linux kernel.)

The question is, what kind of boot times should this yield?

This post at ST Community describes practical OpenSSL AES-128-CBC at 256 byte blocks at around 10 Mbytes/second, which worries me: compare that to the almost 200 Mbytes/second one could expect from eMMC. Even an SD card should yield better than 10 Mbytes/second transfer rates, which indicates a secure boot might be constrained by the encryption rate the STM32MP157C/F is capable of.

I suspect that regardless of the storage media speed, with secure boot and TF-A, the boot times reported by ddrown using an SD card with no crypto would be comparable; i.e. somewhere around ten second boot time, even with a fully optimized Linux userspace or wakeup-from-hibernation. With eMMC and no crypto, I suspect the boot time could be shrunk to somewhere around four seconds or so. These are guesstimates based on playing with a number of Linux SBCs supporting different boot media, however, so basically pulled out of my hat.

With an RTOS, one would replace everything from the second-stage bootloader onwards. The two second time budget includes both TF-A –– remember, ddrown's timings use a variant with no crypto, so we do not actually know how long TF-A takes –– and the RTOS boot-up timing, but I believe an RTOS (without uboot) should be able to enumerate itself as an USB device within the two second time limit from powerup, and be fully functional soon enough after that.

guenthert · « **Reply #19 on:** March 25, 2024, 07:51:06 am »

I'd think 2s to boot Linux is quite a challenge, particularly on a not-so-fast MCU. I'm a bit surprised to see the question of boot time determining the viability of Linux in this application. More typically, I would think, factors like familiarity of the developers with the system, availability of 3rd party software (including libraries), performance and soundness of networking stack, need for updateability in the field, cost of hardware (particularly storage) etc. would determine the OS (if any) to use.

If I were determined to make booting Linux on a MCU fast, I'd have a 2nd look at XIP: https://elinux.org/Kernel_XIP

Alternatively, if acting as a USB device is mandated and functionality of Linux is desired (and cost and complexity a lesser concern), a dual CPU configuration would come to my mind, i.e. use a MCU supporting USB device mode with minimal software connected to a Linux system booting leisurely on convenient hardware.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Is is realistic to boot Linux on STM32MP1 (Cortex-A7) under 2 seconds? (Read 1927 times)

Share me