Author Topic: [solved] modern gcc pressure  (Read 3036 times)

0 Members and 1 Guest are viewing this topic.

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
[solved] modern gcc pressure
« on: August 15, 2021, 12:02:50 am »
I am experimenting weird behaviors with a SoM.

For some silly reasons I found the DDR clock set to 900Mhz while it should be <700Mhz. I cannot fix it at the moment because I don't have access to the firmware file (u-boot) and I am going to fix it next week.

Even with the wrong DDR clock speed, the SoM appears *somehow* stable if you do not stress the ram, otherwise the kernel crashes, and this is really something you wouldn't like to see.

It seems that some issues happen under heavy load, which is difficult to emulate on u-boot, so to expose the issue I am using cmake -J8 running eight c++ in parallel to compile Clang

Clang is really an heavy job for the SoM, and I am doing it only for testing reasons, I mean, usually I cross-compile this heavy stuff on a x86 machine.

It seems recent gcc versions (>=v10) take 1 GB to 1.5 GB of RAM per job, and if the system has the 8 logical CPUs, but only 512MB RAM (which is insane, but that's the SoM I have here), the MAKEOPTS value should be lowered, this is so that the system has RAM to run the basics as well as compile without hitting swap very often slowing things down.

At the moment the kernel has 2Gbyte of swap allocated on an attached USB HDD, and the weird thing is:
  • with Gcc-v10, the kernel crashes ... claiming the DRAM has some issues (1)
  • with Gcc-v8, the kernel doesn't crashes
  • with Gcc-v7, the kernel doesn't crashes
And I am perplexed about this  :-//


(1) which is correct, the DRAM clock is wrong, even if it passes all the memtests(2) on u-boot
(2) on u-boot these tests are not as stressing tests as with compiling clang with gcc v10 under Linux
« Last Edit: August 19, 2021, 05:33:45 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #1 on: August 15, 2021, 12:04:40 am »
kernel v5.12.0-rc2
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline chickenHeadKnob

  • Super Contributor
  • ***
  • Posts: 1055
  • Country: ca
Re: modern gcc pressure
« Reply #2 on: August 15, 2021, 01:45:12 am »
It might be as simple as the SOM running hotter (thus slower timing) under one load versus the other. If you are borderline with the timing than any little thing could push you over the edge. I remember a  case a long time ago where we had an intermittent and fairly rare crashes on a prototype system. Everyone blamed my software. I couldn't reproduce until I had the idea of heating the board with a heat gun. I then narrowed it to DMA controller timing PAL by accident when I touched the warm board on certain PAL pins adding some capacitance. That allowed the board to function correctly even when warm. I got lucky there.

What I am getting at is if you are borderline then I wouldn't place great significance on the C compiler.
 
The following users thanked this post: DiTBho

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #3 on: August 15, 2021, 11:05:08 am »
It might be as simple as the SOM running hotter

There is a heat-sink on the top of the SoC chip in the SoM. But the DDR3 ram chip has no heat-sink.

What I am getting at is if you are borderline then I wouldn't place great significance on the C compiler.

I think it's somehow connected. Gcc v10 tends to consume much more ram, and move more things to the cache.

I need a way to test the reliability of all the cpufreq voltage/frequency settings.
I am not the author of the hardware and mechanical design, I am the software guy, but I need hardware reliability tests

First, I have to tell my colleagues the importance of the hardware diagnostic tools  :-//
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #4 on: August 15, 2021, 01:07:20 pm »
anyway, it's frustrating that for common things you don't notice the problem. For instance, I have just compiled ruby with Gcc v10, ruby has several parts written in C++ , that tend to stress the ram more than parts written in C, and frankly I was expecting a crash, but no crash.

I mean, I would never have noticed the flaw if I hadn't tried to compile llvm with gcc v10 as reliability test.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: modern gcc pressure
« Reply #5 on: August 15, 2021, 06:05:21 pm »
There are a number of dedicated programs for RAM testing. You should probably use them for this purpose rather than relying on running programs you know are taxing for memory, but without knowing in what way. I don't know u-boot much, and thus don't know what the "memtests" you mention are. Can you elaborate?

memtester on Linux is a classic one if you want to try something directly from a Linux session. It's available as a package on many distributions. If you're using a custom Linux, you can build it from sources:
http://pyropus.ca/software/memtester/

 
The following users thanked this post: DiTBho

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: modern gcc pressure
« Reply #6 on: August 15, 2021, 09:52:55 pm »
Fix the memory clock first. Likely the problem isn't the load but the access patterns. A long time ago I had a defective memory module in a computer which caused large compilation runs to fail every now and then. Without the right memory clock the hardware is useless because you can't rely on it so every minute you spend on this is a waste of time. Even if the compiler doesn't crash, it may produce garbage anyway.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: DiTBho

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #7 on: August 15, 2021, 10:23:04 pm »
Even if the compiler doesn't crash, it may produce garbage anyway.

This is exactly the point of frustration. Imagine if I hadn't tested under heavy load - as this is a commercial product you assume it should work fine, so, shallowly, I would  have thought everything was fine.

And with gcc-v7, gcc-v8 don't even see any problems. Everything passes internal tests: ruby passes its tests, gcc passes its tests, etc. Only with gcc-v10 and only under heavy load you see the kernel crashing for strange reasons, and only then you start investigating why.

This is how I found that the clock setting of the dram in u-boot is completely wrong  :-[
« Last Edit: August 15, 2021, 10:31:09 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #8 on: August 15, 2021, 11:01:57 pm »
The SoC has 4 cores, each core has 2 threads -> 8 logical CPUs.

memtests

Nothing special, since I don't have any tool on hands, I tried cmake -J8 running eight Gcc-v10/c++ in parallel to compile Clang-v11; the process takes 8 hours and stresses the cache, the ram and the swap.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #9 on: August 15, 2021, 11:49:31 pm »
it's DDR3-1333
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: modern gcc pressure
« Reply #10 on: August 16, 2021, 12:09:02 am »
The SoC has 4 cores, each core has 2 threads -> 8 logical CPUs.

memtests

Nothing special, since I don't have any tool on hands, I tried cmake -J8 running eight Gcc-v10/c++ in parallel to compile Clang-v11; the process takes 8 hours and stresses the cache, the ram and the swap.

I'd be curious to see what you get with the memtester program I suggested. It tests memory thoroughly - I've used it in the past to check bad RAM sticks and it spotted the faults with no problem - but it's light on CPU load. So if you let it run for a while and all tests pass, then it might just be a matter of temperature, as chickenHeadKnob suggested. Which wouldn't be surprising, as timings will degrade as temperature rises.
 
The following users thanked this post: DiTBho

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: modern gcc pressure
« Reply #11 on: August 19, 2021, 05:27:34 pm »
During the last three days I got sources, hacked u-boot and changed a lot of things.

Now the EHCI works, I can save the env to the SDcard, a lot of improvements.

Anyway, in the binary the I got last week, the DRAM was set to a wrong frequency and Gcc v10 froze as described in previous posts. I set it to different values for lower frequencies and tested the ram with memtest v4.5.1

Code: [Select]
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok


The SoM has an head-sink and a cooling fan, the max monitored temperature was 31 C
  • 2x360 -> 720Mhz: crashed after 45 loops
  • 2x312 -> 624Mhz: stable after 99 loops


« Last Edit: August 20, 2021, 08:23:28 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBhoTopic starter

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: [solved] modern gcc pressure
« Reply #12 on: August 23, 2021, 10:53:59 am »
With an updated version of Ninja I compiled again llvm and collected the max stack usage during the process

Gcc-v8: 250M of 2.00G
Gcc-v10: 1.04G of 2.00G
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf