EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: DiTBho on January 20, 2021, 11:39:39 am

Title: Two Swap partitions, but out of memory?
Post by: DiTBho on January 20, 2021, 11:39:39 am
I have some doubts

I am trying to compile a relatively big C++ project on a machine that only has 256Mbyte of ram (don't question why, it's a crazy idea, I know). So I have two swap partitions

Code: [Select]
# swapon -s
Filename                                Type            Size    Used    Priority
/dev/hda1                               partition       249944  7648    -1
/dev/hdc1                               partition       7543084 0       -2

250Mbyte on a device, and 7.5Gbyte on a second device.

So, 250+7.5G isn't more than enough if g++ needs 1Gbyte of ram to compile stuff?

Well ...

Code: [Select]
cc1plus:: internal compiler error: Segmentation fault

g++ didn't think so, and it crashed complaining it has not enough ram  :wtf:

I unmounted the two swap partitions, and remounted them in reverse order

Code: [Select]
# swapon -s
Filename                                Type            Size    Used    Priority
/dev/hdc1                               partition       7543084 0       -1
/dev/hda1                               partition       249944  7648    -2

And this time it worked  :D








yes, but ..... why?  :-//
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 20, 2021, 12:11:16 pm
Code: [Select]
/dev/hdc1                          partition       7543084 1155648  -1

I monitored "swap -s | grep hdc1" and noticed this during g++ execution.
The max usage of stack was ~1Gbyte
Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 20, 2021, 01:06:28 pm
Without debugging the whole situation, your question will be very hard to answer in a way other than educated guesses. Unless someone was (un?)lucky to come across that in the past and already knows the explanation.

Does it work with a valid configuration? -2 is not a legal value for swap priority.(1) The minimum is -1.

Another option is that, since gcc is executed on a system that is no longer in the normal mode of operation, some internal bug depending on a race condition within gcc is triggered.

____
(1) In the configuration. Operating system may use any value internally, so if that’s priority set by the system, everything’s ok.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 20, 2021, 01:13:57 pm
Without debugging the whole situation

What should I have to "debug" ?
Kernel? It's an old 2.6.39 (I can't update for several reasons), but on this platform has an uptime of 3 years.
Gcc? it's v4.3.4, it has proven to work with more than 1994 packages compiled
G++? it's v4.3.4, it has proven to work with this large project in the second configuration above shown (and it's able for example to recompile cmake)

Does it work with a valid configuration? -2 is not a legal value for swap priority.(1) The minimum is -1.

That's interesting since I simply did
Code: [Select]
swapoff /dev/hda1
swapoff /dev/hdc1

swapon /dev/hdc1
swapon /dev/hda1

I have no idea from where "-2" comes from, but I am surprised the kernel was not using the second swap partition and triggered out "not enough memory", which caused g++ to crash

Another option is that, since gcc is executed on a system that is no longer in the normal mode of operation, some internal bug depending on a race condition within gcc is triggered.


What do you mean with "no longer in the normal mode of operation"?
Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 20, 2021, 02:33:40 pm
What should I have to "debug" ?
Kernel? It's an old 2.6.39 (I can't update for several reasons), but on this platform has an uptime of 3 years.
Gcc? it's v4.3.4, it has proven to work with more than 1994 packages compiled
G++? it's v4.3.4, it has proven to work with this large project in the second configuration above shown (and it's able for example to recompile cmake)
Unfortunately all of the above, which makes it unfeasible. Unless you are extremely bored and have A LOT of free time. ;)

That's interesting since I simply did
Code: [Select]
swapoff /dev/hda1
swapoff /dev/hdc1

swapon /dev/hdc1
swapon /dev/hda1
Therefore probably system assigned it and it’s not a problem (as stated in the note).

What do you mean with "no longer in the normal mode of operation"?
The system is already operating in out-of-memory condition. Abusing swap to pretend there is more memory is not changing that.


I may imagine one more option: what are the devices on which the swapfiles are located on? Aren’t they by any chance some small flash media like an SD card? Perhaps they are damaged and gcc received a mangled page. Such media are not suitable for heavy I/O.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 20, 2021, 03:10:00 pm
what are the devices on which the swapfiles are located on?

hda and hdc are hard-drive units. Checked with badblocks, no badblock found.
Title: Re: Two Swap partitions, but out of memory?
Post by: Nominal Animal on January 20, 2021, 05:15:02 pm
You need to tell the kernel how much it is allowed to overcommit; see vm/overcommit-accounting.html (https://www.kernel.org/doc/html/latest/vm/overcommit-accounting.html) in the Linux kernel documentation.

Simply put, you'll want to do
    sudo sysctl vm.overcommit_memory=1
to tell the kernel to always pretend there is enough memory.  This does mean that it can get real nasty swapping (and essentially grind to a halt), but hey, if you need it for this one compile, go for it; the worst thing it can do is crash the system.

(The default, 0, is a heuristic; and I think you are hitting this heuristic.)

Before running the gcc/g++ command, it may be useful to clear caches,
    sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches; sync ; echo 3 > /proc/sys/vm/drop_caches'
in the hopes that the kernel picks a better working set to keep in RAM then.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 20, 2021, 07:31:07 pm
sysctl vm.overcommit_memory=1

I tried this with the previous setup, and this time it worked perfectly, thank you so much for the tips  :D
Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 20, 2021, 07:44:59 pm
Nominal Animal: but overcommiting should not cause gcc to not be able to use all available memory depending on the order of swap spaces. Caches would also only affect performance (if anything), as Linux will always free them and replace with anonymous pages or more suitable caches if needed. What’s your theory on the cause of the crash?
Title: Re: Two Swap partitions, but out of memory?
Post by: Nominal Animal on January 20, 2021, 09:57:29 pm
Nominal Animal: but overcommiting should not cause gcc to not be able to use all available memory depending on the order of swap spaces. Caches would also only affect performance (if anything), as Linux will always free them and replace with anonymous pages or more suitable caches if needed. What’s your theory on the cause of the crash?
Not a theory, I've seen this before.  You're looking at the wrong place for the bug: it's not in the kernel, but in GCC.

The default heuristic on OP's machine would allow the g++ process to allocate a limited amount of memory – that is, only a fraction of the available 1G/2G/3G address space (depending on CONFIG_VMSPLIT*, assuming a 32-bit architecture), after which malloc()/memmap etc. start failing, returning NULL/MAP_FAILED with errno==ENOMEM.

A vast majority of programmers assume this never happens, or that when it happens the process is buggered anyway, so don't bother to check for this, and instead let the process fail when it dereferences such a NULL pointer.  A variant of this happened to OP.  I bet that if the g++ command were straced, we'd see the final syscall before the crash to be sbrk(), which fails, but the code just assuming it cannot/does not, and dereferencing the virtual addresses it would have obtained (but did not), leading to segmentation fault.



If we look at the original report, we see that gcc crashed due to segmentation fault.  If the underlying reason was true OOM (kernel unable to page stuff in/out efficiently enough), then OOM killer would have triggered, and the process died from KILL signal.  So, we know it was not a true OOM situation: instead, the process dereferenced an invalid pointer.  (I do not believe it was a NULL pointer, and mmap() return value is checked; so the one left is brk()/sbrk() call whose return value was not properly checked.)

Most current programmers are utter shit at what they do, really.  They do not consider it worth the effort to have their processes behave correctly when such resource requests are denied.  In the past, GCC developers were semi-fanatical about this, claiming that if the C standard allows something, no matter how stupid, then GCC should do that; and since the C standard does not say anything about "out of memory", that cannot happen in practice either.  (Even now, they refuse to require malloc() to return sufficiently aligned memory to use with SSE/AVX vectors, "since those are not standard types", and we end up needing intrinsic _mm_malloc() or posix_memalign() to allocate memory for them.)
Perhaps they are becoming saner now that GCC switched to C++, but this error message indicates otherwise.

Because this is such a common programming error (not handling an error reported by the kernel is an error, no matter whether a standard says such errors may or may not occur), users like OP do not recognize the difference between resource exhaustion and process crashing because of additional resources being denied by the kernel.
Title: Re: Two Swap partitions, but out of memory?
Post by: magic on January 21, 2021, 09:49:06 am
It's not a GCC bug that changing the order of swap partitions enables it to run :horse:

I'm surprised by that thing, myself. I too would expect to be able to allocate at least as much virtual memory as the sum of RAM and all swap spaces.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 21, 2021, 02:23:20 pm
Code: [Select]
echo 2 > /proc/sys/vm/overcommit_memory

Code: [Select]
# swapff /dev/hdc1
# ./dymem
trying to allocate 1.0  of memory .. success
trying to allocate 3.0  of memory .. success
trying to allocate 7.0  of memory .. success
trying to allocate 15.0  of memory .. success
trying to allocate 31.0  of memory .. success
trying to allocate 63.0  of memory .. success
trying to allocate 127.0  of memory .. success
trying to allocate 255.0  of memory .. success
trying to allocate 511.0  of memory .. success
trying to allocate 1023.0  of memory .. success
trying to allocate 1.1023k of memory .. success
trying to allocate 3.1023k of memory .. success
trying to allocate 7.1023k of memory .. success
trying to allocate 15.1023k of memory .. success
trying to allocate 31.1023k of memory .. success
trying to allocate 63.1023k of memory .. success
trying to allocate 127.1023k of memory .. success
trying to allocate 255.1023k of memory .. success
trying to allocate 511.1023k of memory .. success
trying to allocate 1023.1023k of memory .. success
trying to allocate 1.1048575M of memory .. success
trying to allocate 3.1048575M of memory .. success
trying to allocate 7.1048575M of memory .. success
trying to allocate 15.1048575M of memory .. failure

Code: [Select]
# swapon /dev/hdc1
# ./dymem
trying to allocate 1.0  of memory .. success
trying to allocate 3.0  of memory .. success
trying to allocate 7.0  of memory .. success
trying to allocate 15.0  of memory .. success
trying to allocate 31.0  of memory .. success
trying to allocate 63.0  of memory .. success
trying to allocate 127.0  of memory .. success
trying to allocate 255.0  of memory .. success
trying to allocate 511.0  of memory .. success
trying to allocate 1023.0  of memory .. success
trying to allocate 1.1023k of memory .. success
trying to allocate 3.1023k of memory .. success
trying to allocate 7.1023k of memory .. success
trying to allocate 15.1023k of memory .. success
trying to allocate 31.1023k of memory .. success
trying to allocate 63.1023k of memory .. success
trying to allocate 127.1023k of memory .. success
trying to allocate 255.1023k of memory .. success
trying to allocate 511.1023k of memory .. success
trying to allocate 1023.1023k of memory .. success
trying to allocate 1.1048575M of memory .. success
trying to allocate 3.1048575M of memory .. success
trying to allocate 7.1048575M of memory .. success
trying to allocate 15.1048575M of memory .. success
trying to allocate 31.1048575M of memory .. success
trying to allocate 63.1048575M of memory .. success
trying to allocate 127.1048575M of memory .. success
trying to allocate 255.1048575M of memory .. success
trying to allocate 511.1048575M of memory .. success
trying to allocate 1023.1048575M of memory .. success
trying to allocate 1.1073741823G of memory .. failure

Code: [Select]
Mem:     61932k total,    57060k used,     8564k free,    14084k buffers
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 21, 2021, 02:26:42 pm
request=1, loop(i=0..31): { try_malloc(request); request+=(1<<i) }
Title: Re: Two Swap partitions, but out of memory?
Post by: Nominal Animal on January 21, 2021, 04:58:34 pm
It's not a GCC bug that changing the order of swap partitions enables it to run :horse:
Changing the order of swap partitions changes the heuristic used by vm.overcommit_memory=0, changing the limit where the kernel stops granting more memory to the process.

The bug in GCC is to crash, instead of reporting that a memory allocation failed.  No matter how much you :horse: the facts here do not change.

I'm surprised by that thing, myself. I too would expect to be able to allocate at least as much virtual memory as the sum of RAM and all swap spaces.
Nope, not with the default vm.overcommit_memory=0 (heuristic).

The vm.overcommit_memory=1 (allow) grants all memory requests.

With vm.overcommit_memory=2 (limit), the vm.overcommit_ratio/vm.overcommit_kbytes comes into play.  To implement swap+RAM limit, do
    sudo sysctl vm.overcommit_memory=2
    sudo sysctl vm.overcommit_ratio=100
which tells the kernel to allow allocations up to (swap + 100% of RAM).  The default vm.overcommit_ratio is 50, which means that just setting vm.overcommit_memory=2 makes the kernel allow allocations up to (swap + 50% of RAM).  This (50) is actually preferred over 100, because the memory needed by the kernel are not included in the overcommit calculations.
Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 22, 2021, 03:24:44 am
DiTBho
After Nominal Animal provided the answer and I took some sleep my brain sees that from a different perspective. Perhaps I should’ve asked that question before attempting to guess the cause of this problem: why are you building that project on such a resource-restricted hardware? Wouldn’t it be simpler to cross-compile on a larger machine instead of pushing some poor device beyond its limits? :D
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 22, 2021, 10:05:47 am
why are you building that project on such a resource-restricted hardware?

It's done mostly for two reasons

Wouldn’t it be simpler to cross-compile on a larger machine instead of pushing some poor device beyond its limits?

I don't like cross-compiling, I sometimes use it with OpenWRT, but I prefer Qemu/$Arch with a native builder. It's a bit slower than cross-compiling, but less suffering.
Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 22, 2021, 02:40:51 pm
If the device has 256MB of RAM and you want to allocate 1GB of memory, you are clearly exhausting available RAM and the system will be pushed outside its safe operating zone.(1) Keep in mind that swap is not “free memory”. You can’t solve the problem of insufficient RAM by simly throwing more swap space. This is not its purpose.

Of course it’s your project; test it thoroughly and if you believe it works well enough, deploy. But be aware, you are in the dangerous area. I am stressing this so much, because I’ve seen too many people misusing swap as “free, slower memory” and failures assciated with that.
____
(1) There are specific workloads with large memory allocated to a process, but tiny actively used set. Those are in the gray area, but they are also rare.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 22, 2021, 03:08:11 pm
If the device has 256MB of RAM

the device has 64Mbyte of soldered ram

Keep in mind that swap is not “free memory”. You can’t solve the problem of insufficient RAM by simly throwing more swap space. This is not its purpose.

What't the purpose of swap? isn't to give the system more memory than physically available?. It's ok since I don't have any problem with latency.

Title: Re: Two Swap partitions, but out of memory?
Post by: golden_labels on January 24, 2021, 07:42:21 pm
What't the purpose of swap? isn't to give the system more memory than physically available?.
In the 80s and, if someone used Windows from 9x line, in the 90s. ;)

Swap space acts as the backing store for pages, that do not have it “naturally”.

Modern operating systems decide which pages should be in RAM to maximize performance. A large portion of them is rarely accessed or not used at all. Keeping them in RAM is a waste of that precious resource. So the OS kicks such pages out of RAM(1) and uses that space to put something more useful there. But here comes the problem: while e.g. .text sections and resources embedded in a binary are only copies of data stored in some persistent memory (HDD, SSD, …) and are duplicated in RAM for speed, anonymous pages are not. Their content can’t be simply erased. Providing swap lets the system create copies elsewhere, if needed, and make anonymous page the same class of citizens as other pages. If they have a copy, they can also be removed.

That does not neccesserily happens if anonymous pages use so much RAM that they “overflow into swap”. No, in a perfectly healthy system, that has only a small portion of its RAM used for program data, you will still see swap usage growing. That’s because there are other things that go into RAM and usually they are much more useful than the initialization code of a process one has started 2 weeks earlier. As an example, free -h from me:
Code: [Select]
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       1.6Gi       152Mi       124Mi       6.0Gi       5.8Gi
Swap:         3.8Gi       214Mi       3.6Gi
Swap is more important for a system with little RAM usage than if RAM is exhausted.

If programs start actively using more memory than there is RAM(2), the system is no longer able to work normally. It will remain stable, but performance may drop by few orders of magnitude, latency and latency variance grow considerably, some operations start to time out and — in the worst case — the many processes may become completely unresponsive for a long time. If it’s one time situation on your personal computer or the nature of the workload allows that(3), you may try waiting through the problems. But if that’s going to be a repetitive task or a situation that occurs outside personal context, be aware that you are outside of the specs.

There is a bit mre technical discussion of the topic in Chris Down’s “In defence of swap” (https://chrisdown.name/2018/01/02/in-defence-of-swap.html). The article is primarily his input into the discussion on whether swap should even be used nowadays, but it also contains some technical background.
____
(1) Details are a bit more complex.
(2) In reality even less than the whole RAM.
(3) For example a few times I was stitching a large panorama: the locality of reference was very high, so no problems were observed.
Title: Re: Two Swap partitions, but out of memory?
Post by: DiTBho on January 26, 2021, 04:49:26 pm
as test, I compiled a gcc-4.4.3: language=c,c++
It took 1 day 5 hours 31 min 4 sec, and (probably for the c++ stuff) it consumed up to 400Mbyte of stack.
performance? well ... I added a ramdisk to dish machine, and moved the swap there
It took 10 hours 10 min 2 sec


harddrive swap: 1 day 5 hours 31 min 4 sec
ramdisk swap: 9 hours 33 min 2 sec