EEVblog Electronics Community Forum
Products => Computers => Programming => Topic started by: DiTBho on February 14, 2023, 08:21:19 pm
-
BUG: scheduling while atomic
critical panic
reset
(kernel Linux-4.* on a MIPS32LE SoC made by IDT)
Now, that "Scheduling while atomic" indicates that you've tried to sleep somewhere that you shouldn't - like within a spinlock-protected critical section or an interrupt handler.
But, the error message doesn't come from a kernel driver, but rather from the kernel-core recompiled with
Preemption Model: Preemptible Kernel, Low-Latency
Worse still, the same kernel, compiled with the same toolchain, perfectly works on the same hardware if compiled with "Voluntary Preemption", which .... ok is where the kernel periodically checks to see if it should reschedule processes "while doing kernel things", so it puts less stress on the kernel-mechanisms of concurrency.
A kernel that panics with "scheduling while atomic" looks no good, something like a walking dead kernel or something, and I am out of idea, at the moment, but umm, it's seems the right job for a serious kernel bug-hunter.
Suggestions are welcome ;D
-
Where's the stack trace? We need the stack trace related to the error message (and kernel version so we can look at the code at say bootlin Elixir cross referencer (https://elixir.bootlin.com/linux/latest/source)).
My first suggestion is to enable and run lockdep (https://www.kernel.org/doc/html/latest/locking/lockdep-design.html), to verify the kernel locking is correct. This is to make sure the underlying cause is not a code path where a lock that was supposed to be released, isn't.
While that runs (it makes the kernel slow as molasses), I'd look at each component along the stack trace, and check the gitlog (at github kernel mirror (https://github.com/torvalds/linux/)) to see if there have been any locking-related changes applied since that kernel version; lockdep problems and sleeping while atomic is quite common bugs that do get fixed by shuffling the locking code. (For whatever reason, it seems to be quite hard to get locking and concurrency right.)
Yes, it is drudge work, looking at details.. but that's what bugfixing is, isn't it?
-
Thanks.
I will try to provide more info.
kernel version
it's the whole 4.* class, from kernel-4.04.197 ... to kernel-4.14.287.
kernel-{ 4.14.288 .. 4.14.290 } do not even boot.
I have to fix a serious problem with CrossDev at the moment because I cannot compile any cross-compiler gcc < v6, and I cannot compile kernel v4.0.*, v4.1.*, v4.2.*, v4.3.* with gcc { v6.5, v7, ...}.
I have an old rootfs around with old compilers, I will try first with them.
You need gcc-v4.* to compiler kernel v2.6.*
You need gcc-v5.* to compiler kernel v3.* and early kernel v4
Also, I'd like to try kernels v5 and v6 to see if things have resolved in the meantime.
(I doubt, the first kernel 5 I tried does not even boot)
Anyway, I have managed to make an automatic builder and results checker
- apply patches (if any)
- configure (according to a given profile)
- build kernel
- tftp-upload to target
- reset target and make it boot
- log and check results