Author Topic: [solved] Linux/Arm, catastrophic crashes  (Read 9547 times)

0 Members and 1 Guest are viewing this topic.

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3796
  • Country: gb
Re: Linux/Arm, catastrophic crashes
« Reply #25 on: October 11, 2021, 09:38:17 pm »
so, in the meanwhile two new boards has arrived with the fan cooler installer so I tested everything in parallel with the same configuration/hard-drive brand/model, kernel version, and rootfs stuff

results:
  • two new boards with fan cooler: both crashed
  • two old boards without the fan cooler: still running (the uptime is now 6 days)

I think it's the damn fan cooler, its BLDC motor probably sinks current with spikes and the LDO is unable to filter.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3796
  • Country: gb
Re: Linux/Arm, catastrophic crashes
« Reply #26 on: October 11, 2021, 09:39:24 pm »
Tomorrow I will instrument the LDO to measure things with a DSO
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6172
  • Country: fi
    • My home page and email address
Re: Linux/Arm, catastrophic crashes
« Reply #27 on: October 12, 2021, 06:39:47 am »
Depending on the BLDC controller, it could even produce voltage spikes.  Sharing the SoC supply with it is sheer idiocy!  Even I – a bumbleduck hobbyist – would know enough to have a good filter separating the motor (any motor!) from the SoC supply.

Depending on the supply voltage you use, perhaps you could use one of those small buck/boost converters to power a suitable PC case fan (3-pin voltage controlled one, 8 - 12 VDC) separately?  I prefer larger ones over smaller ones, and undervolt them at 9-10v (depending on the fan), for quiet operation but sufficient airflow.  You might consider 3D-printing a baffle and holder for the fan, to direct the airflow.
 
The following users thanked this post: nctnico, DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3796
  • Country: gb
Re: Linux/Arm, catastrophic crashes
« Reply #28 on: October 12, 2021, 08:15:28 am »
Depending on the BLDC controller, it could even produce voltage spikes.  Sharing the SoC supply with it is sheer idiocy! 

Yup. I haven't yet touch the hardware, but this for sure will be the first thing to hack and modify.

Even I – a bumbleduck hobbyist – would know enough to have a good filter separating the motor (any motor!) from the SoC supply.

Frankly, I'm still shocked by this discovery. I mean, I don't have the board schematic, and the board itself looks so small that you need a magnifying glass to check things out.

I thought like a user. if I buy a product, it should be well designed ... ain't it? umm.. No, wrong.

In this case, it "emerged" that there is a problem, and the problem seems to be related to the cooling fan, and if this is pure idiocy ... well it's just the *last* idiocy ever with this board. Remember the on-sale choice to overclock the dram and - worse still, in a 1.2Ghz SoC sold without a head-sink? All default incorrect choices that you (the customer, the end user) need to correct.

Depending on the supply voltage you use, perhaps you could use one of those small buck/boost converters to power a suitable PC case fan (3-pin voltage controlled one, 8 - 12 VDC) separately?  I prefer larger ones over smaller ones, and undervolt them at 9-10v (depending on the fan), for quiet operation but sufficient airflow.  You might consider 3D-printing a baffle and holder for the fan, to direct the airflow.

With my patches both u-boot and the kernel set the operating cpu-clock and the dram-clock to the minim, and in this configuration I have just demonstrated that the SoM doesn't overheat with a passive heat-sink, so in theory, you don't even need a cooling fan.

However, air flow is better, and constant air flow is more than enough, so for sure you don't even need additional noises introduced by a unit that controls the speed of a bldc motor, and the motor itself needs to sink current from the 12V power-rain, so before and not after the LDO that supplies the SoC.

That was a very unusual choice. Not cleaver and rather stupid, and I regret not having noticed this before, but I tend to *trust* the hardware of a product, in fact I spent two months thinking it was all software problems (and there are really problems even with the software).

All boards arrived with a bug with the thermal-zone set in the factory u-boot that, worse still, set the dram clock to the highest value possible, the module is sold without any heat-sink, and they say "it's ok, trust us", I tried myself, with this configuration the module fries like fry chips, and if it doesn't fry immediately (which, at 1.2Ghz, is just a matter of time) ... well ... you experience several weird behaviors, and continuous crashes.

I too prefer a medium cooling fan to quite operation and sufficient airflow. I have just ordered a dozen 5V ultra silent cooling fans.  Just a 5V version of those used years ago on 486-cpus.

The parcel is on its way, in the meanwhile, the first thing to do is to remove the on-board connector for the cooling fan, this will prevent my colleagues to install one.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6172
  • Country: fi
    • My home page and email address
Re: Linux/Arm, catastrophic crashes
« Reply #29 on: October 12, 2021, 10:42:56 am »
With my patches both u-boot and the kernel set the operating cpu-clock and the dram-clock to the minim, and in this configuration I have just demonstrated that the SoM doesn't overheat with a passive heat-sink, so in theory, you don't even need a cooling fan.
True.  I personally might still put a very low RPM (say, 500) 12V PC chassis fan in a pull configuration in the enclosure, through a tight mesh or a filter.  The filter helps with dust and the fan noise, too.  A suitable 120×120×25mm, say a Nidec Gentle Typhoon, if you can find one, is cheap, basically silent when undervoltaged for lower RPMs, and is well suited for the task.  (I haven't used 5V fans, so have no opinion/guidance with those at all.)

For some of my SBCs, I've been looking at ways to mill a new lid for a Hammond die-cast aluminium enclosure, to act as both "lid" as well as the primary heatsink.  I can get the enclosures cheap locally, and a suitable chunk of aluminium is not that expensive, although most of it will be turned into swarf.. the idea is to have sparse ribs on the outside, and pedestals on the inside to meet the SBC chips that heat up, so it has to be much thicker than just a lid.  Wires are pulled through via grommets at the lid-box seam, and the enclosure itself is basically airtight.

in the meanwhile, the first thing to do is to remove the on-board connector for the cooling fan, this will prevent my colleagues to install one.
Very good idea.  If one does not want to modify the board itself, I'd recommend finding a suitable connector without wires, and pushing that in.  Or even 3D-printing a tiny disabled connector, maybe with a "BAD" or something in raised lettering where the wires usually come out..

(The plastic shroud of the fan connector usually comes off just by pulling, but it could be glued to the board so that pulling it might damage the board.  These boards are multilayer and so tight I wouldn't want to try to desolder a 2/3-pin TH connector off.)
 
The following users thanked this post: DiTBho

Offline DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3796
  • Country: gb
Re: Linux/Arm, catastrophic crashes
« Reply #30 on: October 25, 2021, 11:37:00 am »
The system is now up and running with an up-time of more than a week  :D :D :D

Days ago I developed a monitoring program that runs continuously in the background with a low priority and checks the integrity of each rootfs file(1).

Code: [Select]
typedef struct
{
   char_t filename[filename_size];
   md5sum_t md5sum;
} mon_entry_t;

There is a huge database in ram with the md5 checksum of each file of the rootfs, the monitoring program can calculate a fresh md5-checksum on the fly and compare with the value stored in the database.


It's also useful to find corrupted files due to filesystem crash, and this way it's easy to find the corrupted file and selective restore it from a backup.


(1) with "rootfs" I mean files in these folders
Code: [Select]
/bin
/etc
/lib
/opt
/sbin
/usr/bin
/usr/include
/usr/lib
/usr/libexec
/usr/sbin
/usr/share

/root, /home, /local, etc; services folders such as /var, /dev, /sys, /proc, /tmp, are not considered.
The database consumes ~90Mbyte of ram.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 
The following users thanked this post: Nominal Animal


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf