EEVblog Electronics Community Forum
EEVblog => EEVblog Specific => Topic started by: EEVblog on December 09, 2018, 11:48:08 pm
-
David found a linker bug in the GCC 7.2 ARM-EABI (embedded) while working on the µSupply firmware. An example of how your project development can completely stop when you encounter errors like this.
https://www.youtube.com/watch?v=BfSZTr3eBHE (https://www.youtube.com/watch?v=BfSZTr3eBHE)
-
Reduce the resource load on the linker by reducing the number of symbols it has to deal with.
Go into your code and make as many functions and variables as you can local in scope. Maybe that'll fix it.
-
Where does this version of the GCC come from? Even official launchpad version is newer, so it definitely worth trying. There is no real point on reporting bugs against old versions.
-
Right, first try the latest GCC version, maybe the bug is already fixed.
If not, good luck finding the bug in GCC. If you have lots of time, you can compile GCC with debug information and then debug the compiler and linker. If you don't have time, compile it with Keil µVision, they use a different compiler, written by ARM. It is even free, if you need less than 32 kB.
If you want to stick with GCC, and the latest version has still the bug, just comment out everything step by step until the linker error doesn't happen anymore. Or rollback to older versions of your code that works (I assume it is all version controlled and you commit often) and apply the changes to the current version step by step. You can then see exactly what causes the problem and probably find a workaround for it, like different link order etc., or the GCC guys can help then.
PS: GCC code quality sucks. By fuzzing the compiler, someone found found 100 bugs (http://www.vegardno.net/2018/06/compiler-fuzzing.html). The same fuzzing found 8 bugs in the Rust compiler and 9 bugs in LLVM/clang. Maybe it is possible to use LLVM/clang as the compiler and linker. But might need some work for writing new linker files, Makefiles etc.
-
Is it possible to dump several source files into one translation unit to decrease the number of the objects and avoid the need for linking too many of them? You may be able to pinpoint where the shunt hits the fan! :P
-
(posted on youtube, but realised this is probably the better place ot have a discussion about this)
There is an open bug with linking with arm-none-eabi-gcc to do with link time optimisation -flto. Not sure if this is what you are using here however.
If it is:
Seems to only pop up with STM32 cores that I have seen reported.
Has to do with the order of the files that are being linked to, and specifically the weakly defined functions defined in the start.s assembly file being discarded.
This happens if you pass the assembly script to the linker after all the c and c++ files.
The temporary fix for this bug is to pass the assembly files to the linker first.
Otherwise it might be worth trying to see if this is a specific issue of the compiler on windows?
If the code is in VCS somehwere, maybe ask someone on the forums who has an stm32f0 dev environment setup to see if it is some random setting?
As some have mentioned you could try compiling on linux. If you havent got any of that setup, or have little interest in setting it up, perhaps ask one of the forum members to give it a crack to see if it can be linked there?
Compiling quite large codebases for stm32 products at work we have only run into the afore-mentioned actual bug in the compiler/linker.
There have been instances where using an eclipse based IDE, the java VM gets messed up and starts giving super weird compiler/linker errors. but these issues never followed the code when someone else tried to compile.
-
Where does this version of the GCC come from? Even official launchpad version is newer, so it definitely worth trying. There is no real point on reporting bugs against old versions.
Using an older, more stable version is probably a better choice. I'm never using the latest GCC version. Always an older version (previous major version number) with a lot of bugs fixed. When it comes to software remember 'newer is always worse'.
-
Here's just a thought. It was hard to tell from the screen captures, but it looks like you might be passing lots of object files to the linker in the link command?
Have you tried putting all your object files into a library first, and then passing the library to the linker so it can resolve symbols that way? It might be more efficient for the linker than forcing it to consume all the object files directly.
-
I think ataradov is more questioning that the current official release of the arm-none-eabi toolchain is the Q2 release, which on linux at least is 7.3.1.
-
You should at least try with the latest 7-2018-q2 release from https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads (even only the ld.exe if you have issues with the other changes).
As their binary releases aren't built with debug information, you won't get a callstack unless you rebuild the toolchain, which should be done on Ubuntu if you follow the official instructions (and adding "--build_type=native,debug" argument to their build-toolchain.sh, else they strip the binaries).
So I think it would be easier if you could reproduce the problem on Linux (you don't need to build the whole project, you can reuse the object files and the custom linker script if there is one).
Also, you are using ARM's toolchain, not official GCC/binutils releases, so the bugs should be reported at https://bugs.launchpad.net/gcc-arm-embedded/ (maybe the bug doesn't even exist on binutils master branch, only on ARM ones).
-
Compile on Linux, if that fails do 'strace' in build script for linker on Linux.
-
May not solve this particular issue, but as people pointed out on Youtube NEVER build code in a OneDrive/Dropbox folder. The build process creates/modifies/deletes hundreds/thousands of small files during build and when in such a shared folder it frequently causes things to break due to the sync program locking files while it tries to transfer them online.
If you need to put the work online use a proper repo system made for code but not an automated sync system.
The issue is very unlikely to be the linker itself, but something in your code causing it to barf. As someone also suggested on YT, I imagine that there was a point where it still built. Should look very closely, maybe post what you changed when it started failing.
-
It is not gcc that needs to be compiled with debug-information, but binutils (binutils includes all the low-level utilities).
-
David, do you have tar-ball / .zip of all the files you are trying to link, and the ld command that is failing?
-
May not solve this particular issue, but as people pointed out on Youtube NEVER build code in a OneDrive/Dropbox folder. The build process creates/modifies/deletes hundreds/thousands of small files during build and when in such a shared folder it frequently causes things to break due to the sync program locking files while it tries to transfer them online.
If you need to put the work online use a proper repo system made for code but not an automated sync system.
The issue is very unlikely to be the linker itself, but something in your code causing it to barf. As someone also suggested on YT, I imagine that there was a point where it still built. Should look very closely, maybe post what you changed when it started failing.
yep, development will only "completely stop" if you decide to let it block you. Go back to the last working build, and either figure out what change broke it, or work on other functionality.
-
May not solve this particular issue, but as people pointed out on Youtube NEVER build code in a OneDrive/Dropbox folder. The build process creates/modifies/deletes hundreds/thousands of small files during build and when in such a shared folder it frequently causes things to break due to the sync program locking files while it tries to transfer them online.
If you need to put the work online use a proper repo system made for code but not an automated sync system.
The issue is very unlikely to be the linker itself, but something in your code causing it to barf. As someone also suggested on YT, I imagine that there was a point where it still built. Should look very closely, maybe post what you changed when it started failing.
Using OneDrive/Dropbox can definitly mess up your build process. Make uses timestamps to decide wether a file has been modified since the last time it has been compiled. Even a network mount can be upsetting make.
An easy way to debug is to copy the source code and compile it with make -B
Maintainig your source code with git can help you to get back to a working version.
-
Made a forum account since most people might not realize this.
GCC was designed around large stack sizes. The easiest way to get this to work is modify the binaries stack size to a the same as Linux.
Default stack sizes:
Windows: 1MB
Linux: 8MiB
editbin /STACK:reserve[,commit]
So I would probably try playing around with this. You will need to be in "Developer command Prompt for VS" which should be in the Visual Studio menu.
For example:
editbin /STACK:8388608 ld.exe
You can be more conservative with 2097152 (2MB) but since this is a developer tool, I don't see a problem with using 8MB.
Note: I probably have the MiB/MB wrong since I'm finding conflicting material. I apologize if it's confusing.
-
Would be great if David replied, assuming he has a forum account, if nothing else just to confirm he reads this thread.
-
I've had plenty of issues like this it's usually easiest to undo what you did that made it break and carefully analyze that code to see if anything is odd. If that doesn't work you can start commenting out blocks. If you haven't been using any version control, even if just your memory, then you sort of just need to go through everything. Do you compile with -wall? Without seeing the code I can only suggest the basic diagnostic method. :-//
-
..but it's Open Source, so you can fix issues like this yourself...
Seems you get what you pay for.
BTW there are code-limited free versions of IAR and other professional tools, which ought to be enough for something a simple as a power supply...
I'll get my coat.
-
When I was working on a big project (ZigBee stack, ~150K binary), we have reported numerous bugs to IAR, it is no better. The first few were painful, since we had to go through the support people. At some point we've just got direct contact with the engineers and things got better.
-
When I was working on a big project (ZigBee stack, ~150K binary), we have reported numerous bugs to IAR, it is no better. The first few were painful, since we had to go through the support people. At some point we've just got direct contact with the engineers and things got better.
Not just that but ARM GCC is maintained by ARM themselves so saying it is free isn't exactly right.
-
From the video it seems you are using Eclipse/CDT + gnu arm toolchain.
Have you tried building the same project using Atollic True Studio?
It ships with modified versions of gcc/gdb.
Since it's based on Eclipse and officially supported by ST it could work.
https://atollic.com/truestudio/ (https://atollic.com/truestudio/)
-
When I was working on a big project (ZigBee stack, ~150K binary), we have reported numerous bugs to IAR, it is no better. The first few were painful, since we had to go through the support people. At some point we've just got direct contact with the engineers and things got better.
This ^^^
Same with Keil. Despite having a full-payed license once had to download a pirated copy of their RTX RTOS to fix a bug they weren't so inclined to fix...
Last week spent one day debugging a code I swear on my life was correct but didn't work as expected (a simple FIFO queue).
Know what? Changed compiler revision and magically worked. Spent 3 hours digging trough change logs and found this
[SDCOMP-48383] In certain circumstances, when compiling at -O3 -Otime or with --vectorize, the compiler could generate incorrect code for a do while loop with a controlling expression that contains a postfix decrement operator with an operand of unsigned type.
This was the (pseudo-)code that triggered the bug (not kidding...)
do {
if (queue->buf[cnt] == MAX)
do_something();
} while (cnt--)
So paid does not always means better... ;)
-
When I was working on a big project (ZigBee stack, ~150K binary), we have reported numerous bugs to IAR, it is no better. The first few were painful, since we had to go through the support people. At some point we've just got direct contact with the engineers and things got better.
Not just that but ARM GCC is maintained by ARM themselves so saying it is free isn't exactly right.
In what sense does it make it "not free"? That's like saying that since RedHat maintains a lot of GCC/employs many maintainers "saying GCC is free isn't exactly right".
"Free-ness" of software doesn't depend on who maintains the code but on the license (when talking free as in freedom) and on price (when talking free as in beer). In GCC case the license is still GPL and it is still a free download, both the "normal" and the ARM versions.
-
When I was working on a big project (ZigBee stack, ~150K binary), we have reported numerous bugs to IAR, it is no better. The first few were painful, since we had to go through the support people. At some point we've just got direct contact with the engineers and things got better.
Not just that but ARM GCC is maintained by ARM themselves so saying it is free isn't exactly right.
In what sense does it make it "not free"? That's like saying that since RedHat maintains a lot of GCC/employs many maintainers "saying GCC is free isn't exactly right".
"Free-ness" of software doesn't depend on who maintains the code but on the license (when talking free as in freedom) and on price (when talking free as in beer). In GCC case the license is still GPL and it is still a free download, both the "normal" and the ARM versions.
Not free as in: with every ARM device you buy you pay for the development of gcc for the ARM platform.
It is kind of like (commercial) TV and radio aren't free even though you don't have to pay anything to the TV station. With every product you buy you pay for TV and radio.
-
He's probably twisting it in the sense that "we all pay a little something for it when we buy something with an ARM core".
-
..but it's Open Source, so you can fix issues like this yourself...
Seems you get what you pay for.
BTW there are code-limited free versions of IAR and other professional tools, which ought to be enough for something a simple as a power supply...
I'll get my coat.
Hey Mike, why don't you go PIC your nose... ;D
(...No, I don't have anything to contribute to this thread. I use avr-gcc and have found zero problems with it.)
Tim
-
I think the point of the video was not to solve the problem, but to show just how horrible tools are, since the most obvious solution or at least path to a workaround was mentioned here multiple times.
AVR-GCC toolchain is infamous with its "relocation truncated to fit: R_AVR_13_PCREL" liner errors that pop up for multiple reasons, so they get fixed and pop up again. This happens from time to time, not a huge deal, just rearrange some stuff in the code and it goes away.
-
I think the point of the video was not to solve the problem, but to show just how horrible tools are, since the most obvious solution or at least path to a workaround was mentioned here multiple times.
AVR-GCC toolchain is infamous with its "relocation truncated to fit: R_AVR_13_PCREL" liner errors that pop up for multiple reasons, so they get fixed and pop up again. This happens from time to time, not a huge deal, just rearrange some stuff in the code and it goes away.
Hmm, so a discrepancy between where the compiler expected the data to be, and where the linker wanted/had to put it? I can see that being a problem with the >128k AVRs, yes. Or more generally, any system with segmented memory, or even more generally, a platform with different sizes of pointers.
And yeah, everything is horrible. Anything only ever gets fixed just enough to be usable, for some arbitrary (market-driven?*) degree of "usable". Any sufficiently large project, is too large to ever find all the bugs, or unexpected quirks, before release.
*Hey, you guys all like capitalism, right? Right?... :P
Tim
-
Hmm, so a discrepancy between where the compiler expected the data to be, and where the linker wanted/had to put it?
No, linker is just not able to place the code. LD does not "shuffle" the functions around, it just places things in a linear order and once placed, a function can no longer be taken out and placed somewhere else. And two or more functions may be located in a way that the distance between them is bigger than 13-bit relative offset allows.
This can happen on smaller memory devices too. 13-bits it +/- 8K words, so any device with memory size over this number may be affected.
And that just describes the legitimate case. This particular linker error happened in a few cases because of compiler bugs and had nothing to do with the linker itself.
-
Hmm, so a discrepancy between where the compiler expected the data to be, and where the linker wanted/had to put it?
No, linker is just not able to place the code. LD does not "shuffle" the functions around, it just places things in a linear order and once placed, a function can no longer be taken out and placed somewhere else. And two or more functions may be located in a way that the distance between them is bigger than 13-bit relative offset allows.
This can happen on smaller memory devices too. 13-bits it +/- 8K words, so any device with memory size over this number may be affected.
And that just describes the legitimate case. This particular linker error happened in a few cases because of compiler bugs and had nothing to do with the linker itself.
That error ought to be sanely reported, and fixed by selecting an appropriate memory model.
-
That error ought to be sanely reported, and fixed by selecting an appropriate memory model.
I'm not really sure how to report it any better short of writing a complete explanation in the error message. Given the error message it is not all that hard to find documentation explaining the error in details.
You can say that compiler can try harder to avoid the issue, but it is still possible to create code that can't be properly placed in the device no matter what.
Also IAR shows very similar message in this situation as well.
-
Not free as in: with every ARM device you buy you pay for the development of gcc for the ARM platform.
It is kind of like (commercial) TV and radio aren't free even though you don't have to pay anything to the TV station. With every product you buy you pay for TV and radio.
That's a rather oddball definition, but okay. Let's not pollute the thread with more philosophical disputes.
-
I have seen similar things happen with STM32 code as well - sometimes simply changing the optimization level for the compiler will make this go away because the compiler will inline some functions, remove unused code and suddenly the linker won't blow up anymore. My impression is that the ARM port of binutils could use a bit more love.
And re Mike and the suggestion to use IAR - given that you are mostly a PIC developer, you are probably intimately familiar with the shitshow the various PIC compilers, especially for the smaller devices (PIC18 and such) are. And they were (still are?) paid. So much for the "you get what you pay for" in this case. I will use GCC or Clang (that supports ARM as well, AFAIK) any day over those things.
-
https://www.mail-archive.com/bug-binutils@gnu.org/msg29920.html (https://www.mail-archive.com/bug-binutils@gnu.org/msg29920.html)
It seems that Nick was right that this hitting the stack limit.
The reason it's doing so is that you have a quite a lot of templates in your
C++ code.
The linker seems to segfault when it's trying to demangle this symbol
_ZNSt11_Tuple_implILj0EJN7General6Parser4NodeINS1_7KeywordILj4ELj2EEENS1_6StatesIJNS2_INS3_ILj5ELj3EEENS5_IJNS1_4SCPI3EndINS7_15CommandInternalIRKS4_JNS1_5ParamIfEENS7_5BlankILj0EEZNS7_7CommandISB
which is done in libiberty
https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L4315 (https://github.com/gcc-mirror/gcc/blob/master/libiberty/cp-demangle.c#L4315)
This hits two VLAs expanding two structs of 16 bytes. However you have 1485065
entries in dpi.num_copy_templates
causing it to push the stack down (dpi.num_saved_scopes +
dpi.num_copy_templates) * 16 bytes. Which is 190129824 bytes, or 181mb and so
way over blowing your stack limit.
I tried with GCC 9 which seems to do a better job with the templates and it
works there. But I guess the real fix is to not use those VLAs in libiberty.
But I believe that's maintained by GCC if I'm not mistaken.
for now, you can work around it by increasing your ulimit.
So, not really a bug of the linker, aside from it not reporting the stackoverflow correctly?
I did see that you're running your working dir in OneDrive. I must discourage that. It has caused me strange issues before running working dirs inside cloud synced directories.
-
Would be great if David replied, assuming he has a forum account, if nothing else just to confirm he reads this thread.
Yes he does. His forum name is Seppy.
-
https://www.youtube.com/watch?v=iwUqE6ZJqgA (https://www.youtube.com/watch?v=iwUqE6ZJqgA)
-
David,
At 2:14 in the followup, you say the std::vector has massive disadvantages - what are you basing that statement on? Any particular context? Perhaps it was just a slip of the tongue, but the vector itself is allocated on the stack, its memory buffer is however allocated from the heap so not all is kept on the stack.
-
Looks like the right way for GNU would be to undefine CP_DYNAMIC_ARRAYS, if they are still using this in the latest version, so that it uses the heap allocation part.