[solved] Linux/Arm, catastrophic crashes

Nominal Animal:

With my patches both u-boot and the kernel set the operating cpu-clock and the dram-clock to the minim, and in this configuration I have just demonstrated that the SoM doesn't overheat with a passive heat-sink, so in theory, you don't even need a cooling fan.

--- End quote ---
True.  I personally might still put a very low RPM (say, 500) 12V PC chassis fan in a pull configuration in the enclosure, through a tight mesh or a filter.  The filter helps with dust and the fan noise, too.  A suitable 120×120×25mm, say a Nidec Gentle Typhoon, if you can find one, is cheap, basically silent when undervoltaged for lower RPMs, and is well suited for the task.  (I haven't used 5V fans, so have no opinion/guidance with those at all.)

For some of my SBCs, I've been looking at ways to mill a new lid for a Hammond die-cast aluminium enclosure, to act as both "lid" as well as the primary heatsink.  I can get the enclosures cheap locally, and a suitable chunk of aluminium is not that expensive, although most of it will be turned into swarf.. the idea is to have sparse ribs on the outside, and pedestals on the inside to meet the SBC chips that heat up, so it has to be much thicker than just a lid.  Wires are pulled through via grommets at the lid-box seam, and the enclosure itself is basically airtight.

in the meanwhile, the first thing to do is to remove the on-board connector for the cooling fan, this will prevent my colleagues to install one.
--- End quote ---
Very good idea.  If one does not want to modify the board itself, I'd recommend finding a suitable connector without wires, and pushing that in.  Or even 3D-printing a tiny disabled connector, maybe with a "BAD" or something in raised lettering where the wires usually come out..

(The plastic shroud of the fan connector usually comes off just by pulling, but it could be glued to the board so that pulling it might damage the board.  These boards are multilayer and so tight I wouldn't want to try to desolder a 2/3-pin TH connector off.)

The system is now up and running with an up-time of more than a week  :D :D :D

Days ago I developed a monitoring program that runs continuously in the background with a low priority and checks the integrity of each rootfs file(1).

typedef struct
   char_t filename[filename_size];
   md5sum_t md5sum;
} mon_entry_t;

--- End code ---

There is a huge database in ram with the md5 checksum of each file of the rootfs, the monitoring program can calculate a fresh md5-checksum on the fly and compare with the value stored in the database.

It's also useful to find corrupted files due to filesystem crash, and this way it's easy to find the corrupted file and selective restore it from a backup.

(1) with "rootfs" I mean files in these folders

/bin

--- End code ---

/root, /home, /local, etc; services folders such as /var, /dev, /sys, /proc, /tmp, are not considered.
The database consumes ~90Mbyte of ram.


