I wonder ... why the kernel does'n t just use the swap?
It is a transitory problem, so by the time gcc is killed, the kernel memory statistics no longer reflect the exact situation when the problem happened.
I do believe it
does use swap,
all of it. The reason it is not deterministic is that each command invocation creates a set of processes (four or five or something like that), and their order and time-wise overlap varies.
The other possibility is that there are still some kernel-userspace-API/ABI issues left that cause GCC to occasionally keel over, but I don't think so at all. Newer versions of GCC are horrible memory hogs.
To be sure, I would run
strace -o log -ff gcc "$CFLAGS" -o obj/hallo2.o -c hallo2.cand examine the trace,
log, and in particular compare it to a trace obtained on x86-64; and then repeat that until it triggers the SIGBUS, to find out exactly where it happens. This will generate a set of
log.PID files. Running
strace -o log -e trace=%memory -ff gcc "$CFLAGS" -o obj/hallo2.o -c hallo2.climits the logs to memory allocations, which will tell you how much memory
gcc and its subprocesses actually allocate. (Before running either command, do remember to run
rm -f log.* to remove old log files. Since the process IDs will change, new files will be created each time.)
It would be very simple to implement a tiny library (loaded using LD_PRELOAD) interposing
sigaction() and chaining SIGBUS so that a custom handler would be called before the set handler (GCC does set a SIGBUS handler). Whenever delivered, the handler would dump the address and
/proc/self/maps and/or
/proc/self/smaps to standard error or a dedicated file, describing the exact state of the process memory mappings at the point of error, remove itself from the chain, then call the set handler. The only tricky bit is that it has to be async-signal safe, too, so
<unistd.h> I/O and a small (4k) static/global array for I/O.
Basically something akin to
catchsegv, except for SIGBUS. Let me know if you'd like one.