Author Topic: Writing a compiler backend  (Read 2675 times)

0 Members and 1 Guest are viewing this topic.

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Writing a compiler backend
« on: October 26, 2023, 09:20:08 pm »
Does anyone here have any experience writing a compiler backend (GCC or LLVM) to support a new target? How hard is it, which of the two would you suggest?
In particular, I'd be interested in knowing which one makes it easier to implement rules for optimized instruction scheduling (and where the doc for that would be).
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #1 on: October 26, 2023, 09:31:20 pm »
I've was involved in developing a back end for GCC. If you are familiar with compilers its not too hard to look at an existing back end for a comparable ISA and work from there. If compilers are a mystery to you, its probably a steep learning curve. Once you get, say, C working, its pretty easy to get other languages to compile for your new target. You may need to add some extra run time library routines, but that's about it.

The process is basically to get assembly language to work with binutils, get a language to work with gcc, and get gdb to work. When we did out implementation a big problem was we took time to refine our implementation after initially getting it working, and as we did that gdb kept changing a lot. I don't know if its more stable now. At that time it made trying to get our implementation aligned with the very latest core code, so we could submit our work and get it integrated with the main line, quite hard. I seemed to be endlessly restructuring for the latest gdb core APIs.

People seem to recommend starting with LLVM for a completely new implementation. I have no experience with that, and can't comment.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #2 on: October 27, 2023, 09:23:08 am »
a new target?

which new target? a variant of Risc-V? or an experimental new architecture tested? RISC-ish?
I mean, is already there at least an assembly compiler? 
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #3 on: October 27, 2023, 06:52:04 pm »
I've was involved in developing a back end for GCC

imagine you want to resurrect a target (e.g. HC11, 8bit, CISC) that has been removed as "deprecated" in gcc v3.4.6: how difficult will it be?
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #4 on: October 27, 2023, 07:11:27 pm »
I've was involved in developing a back end for GCC
imagine you want to resurrect a target (e.g. HC11, 8bit, CISC) that has been removed as "deprecated" in gcc v3.4.6: how difficult will it be?
I think the big issue will be GDB. It has changed so much since the days of GCC3.x. You will need to do a lot to the old code to make it compatible with the latest code. gcc and binutils shouldn't be too tough. I'm not sure you'll need to change anything in the run time library. I suspect the biggest incentive for them to remove an architecture is the amount of work needed to get the ISA specific GDB code up to date.

Why do you want to get an old ISA back into the tools? Simple architectures, like HC11, or AVR or MSP430 (two small MCU cores which are still in there) don't gain much from the enhancements in newer versions of GCC. In fact, the improvements made to improve code efficiency with complex cores have often made the code worse - bigger or slower - for the simpler cores.


Remember that you need to know an instruction set inside out to make a compiler for it. Otherwise you'll let things like a non-obvious flag change mess up the generated code.
 
The following users thanked this post: DiTBho

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Re: Writing a compiler backend
« Reply #5 on: October 27, 2023, 07:34:27 pm »
a new target?

which new target? a variant of Risc-V? or an experimental new architecture tested? RISC-ish?
I mean, is already there at least an assembly compiler?

That would be something completely new. But of course with some similarities with existing targets and yes, RISC-ish. The assembler part doesn't bother me at all, I've looked at binutils and writing support for a new target is pretty straightforward. I could also write my own assembler tools, that's a simple endeavor.

As I mentioned, one thing that's still a bit fuzzy to me is how either GCC or LLVM deals with instruction scheduling rules - that's the part that looks the most opaque to me so far. But yes surely having a look at the code for a supported target should help.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #6 on: October 27, 2023, 07:44:50 pm »
I've looked at binutils and writing support for a new target is pretty straightforward. I could also write my own assembler tools, that's a simple endeavor.

simple? simple undertaking? hah, from my previous experiences I'm not sure about these two points, but there's no point in writing it. Good luck.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #7 on: October 27, 2023, 08:39:10 pm »
The assembler part doesn't bother me at all, I've looked at binutils and writing support for a new target is pretty straightforward. I could also write my own assembler tools, that's a simple endeavor.
A basic assembler is easy to write from scratch, but if you want a compiler too you will need the assembler to deal with standard object files. Things can get long winded and messy like that. Add to binutils. Its pretty easy. Disassembly and other utils almost come for free when you make the assembler work in binutils.
 

Offline SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Re: Writing a compiler backend
« Reply #8 on: October 27, 2023, 09:07:34 pm »
Ok, I'll have to admit that LLVM is a bit cleaner and better documented than GCC:

https://www.llvm.org/docs/WritingAnLLVMBackend.html
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: Writing a compiler backend
« Reply #9 on: October 28, 2023, 05:48:28 am »
Why do you want to get an old ISA back into the tools?

for example: warning system / code coverage / static analysis, all very useful things that have become usable in much later versions of GCC, though clang/llvm is still much better in this regard (i think this is one of the reasons why "People seem to recommend starting with LLVM for a completely new implementation")
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #10 on: October 28, 2023, 07:24:34 am »
Why do you want to get an old ISA back into the tools?

from time to time, Gcc-v3.4.6 will be even more difficultitoo difficult to be emerged on modern Gentoo builders as it requires
- a dedicated overlay with dedicated rules, includes and it can benefit from neither the portage nor the system
- a lot of patches

Then you also have to consider that many extensions have been added, so... either you compile strictly C/89, C/90 projects, or... you can't compile them on old compilers.

This is why everything I write is c/89, but it's limiting.

Simple architectures, like HC11, or AVR or MSP430 (two small MCU cores which are still in there) don't gain much from the enhancements in newer versions of GCC. In fact, the improvements made to improve code efficiency with complex cores have often made the code worse - bigger or slower - for the simpler cores.

Yup, some time ago, I also experimented a collateral problem with gcc-v1*: it eats much more ram on am embedded GNU/Linux sistem, to the point a router like rb532a (target=host=mips32r2/le) simply cannot manage anything and crashed due to "out of memory(1)" problem, while up to gcc-v6 and v7 this is not a problem.

- eats more ram
- produce slower code

I know, so ... I need a compromise: gcc-v6-7 for HC11 would be great!


(1) it only has 32Mbytes of soldered RAM, which was sufficient with kernel 2.6, which required 5Mbytes of kernel (full static), while it is not enough with kernels >4 as they require up to 10-12Mbytes of RAM, furthermore it also has the GNU-libc-based userland has inflated its ram-time needs, so it (especially gcc->=1*) tends to consume +30% more  ram.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #11 on: October 28, 2023, 08:59:35 am »
A basic assembler is easy to write from scratch, but if you want a compiler too you will need the assembler to deal with standard object files. Things can get long winded and messy like that.

I wouldn't say "easy to write", but this is precisely what I did: a basic-assembly compiler (then extended to macros) and a basic-linker (then extended to linking-scripts), both made from scratch on a lexing library I had wrotten many years before, and at the first step I had to re-invent the .obj format to preserve my sanity because the stuff that exists is too overcomplicated for what I need.


Pros?
The C source code of the project looks very friendly and - modesty aside - well written and not with things you never understand, therefore also easy to maintain as single projects and Gentoo Overlays for non-x86 systems.

Which means all-included, all in one project without any external dependency, easily portable, easily testable, and easily fixable if something goes wrong.

ok, maybe I say this because I am the author, and therefore I know well how to do it and therefore where to put my hands... but I really spent a lot of time organizing things well, and immediately, instead of arranging them later, once have already been implemented ...

... this is a defect I see with both gcc and binutils and usually with the GNU stuff... which is always crazy, and every time I have to spend several weeks solving the various problems in cross-compiling gcc, binutils, gdb, and all their dependencies.

Also, the build-up time of my toolchain, from configure to the final binary, is shorter than anything build with GNU physolophy, which is a great point for me!

Cons?
Some objtools are missing, gdb "as is" is not compatibile, you don't have an elf file, so firmwares and simulators relying on this form simply won't work, and even gdb (and debuggers based on it) won't work unless you prepare a "bridge" (I did it, I like ... meh ... not so much, but it's there), you have a disassembler, a mapper, which are the two main important tools to have, but you cannot play all the tricks you usually do with the GNU-ld linker scripts and objtools (some of these are passed by gcc when you invoke it with particular flags, again ... this stuff won't work), except a binary to srec and brec dedicated converter that I implemented to manage those two Motorola formats for EPROM programmers.


So, it depends on what you look at  :-//
  • if I look at the my sources, things keep getting better
  • if I look at my Overlay, things have got better more than I couldn't imagine
  • if I look at the final ecosystem ... things have got long winded (it took 4 years to the point I described above) and messy as there are a lot of imcompatibilities with existing other tools and stuff.
« Last Edit: October 28, 2023, 09:02:29 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #12 on: October 28, 2023, 12:35:53 pm »
Simple architectures, like HC11, or AVR or MSP430 (two small MCU cores which are still in there) don't gain much from the enhancements in newer versions of GCC. In fact, the improvements made to improve code efficiency with complex cores have often made the code worse - bigger or slower - for the simpler cores.

Yup, some time ago, I also experimented a collateral problem with gcc-v1*: it eats much more ram on am embedded GNU/Linux sistem, to the point a router like rb532a (target=host=mips32r2/le) simply cannot manage anything and crashed due to "out of memory(1)" problem, while up to gcc-v6 and v7 this is not a problem.

- eats more ram
- produce slower code

I know, so ... I need a compromise: gcc-v6-7 for HC11 would be great!
We found that for a small core there was a significant code performance hit moving from version 3.x.x of GCC to 4.x.x. It got worse and worse after that, but there was a significant hit even going from 3.x.x to 4.x.x.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #13 on: October 28, 2023, 12:45:04 pm »
We found that for a small core there was a significant code performance hit moving from version 3.x.x of GCC to 4.x.x. It got worse and worse after that, but there was a significant hit even going from 3.x.x to 4.x.x.

so, I'd best start a new machine layer with lcc or sdcc.
OK. Thanks!
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline abeyer

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: us
Re: Writing a compiler backend
« Reply #14 on: October 28, 2023, 08:56:51 pm »
from time to time, Gcc-v3.4.6 will be even more difficultitoo difficult to be emerged on modern Gentoo builders as it requires
- a dedicated overlay with dedicated rules, includes and it can benefit from neither the portage nor the system
- a lot of patches

If this is your main issue and you aren't as concerned w/ moving to newer features, wouldn't it be easier to just build a vm or container image with an older version and work from that?
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 3915
  • Country: gb
Re: Writing a compiler backend
« Reply #15 on: October 29, 2023, 09:59:36 am »
If this is your main issue and you aren't as concerned w/ moving to newer features, wouldn't it be easier to just build a vm or container image with an older version and work from that?

What I did in the past is create a completely self-contained miniroot. On many non-x86 architectures virtualization is not supported and/or does not work. In short, forget VMs and forget containers.


The point, however, is to have a compiler that can be compiled in stage4, starting from an Overlay.

As "crossdev" exists on Gentoo to emerge a crosscompiler, I created a similar tool, and all its centralized functions, to emerge old cross-compilers.

I wanted to understand for a moment how much it makes sense to stay on gcc-v3.6 rather than resurrecting the target on modern gcc (so as to exploit crossdev to emerge the crosscompiler), and he just explained to me that it makes a lot of sense for hc11 to stay with v3.6.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline tellurium

  • Regular Contributor
  • *
  • Posts: 231
  • Country: ua
Re: Writing a compiler backend
« Reply #16 on: December 29, 2023, 12:43:01 am »
Does anyone here have any experience writing a compiler backend (GCC or LLVM) to support a new target? How hard is it, which of the two would you suggest?

How about TCC instead ? IMO making a backend for it is easier - I remember making some linker changes for the RISCV target, was easy.

Also, I was mesmerised by the https://github.com/rswier/c4 . With some extra effort, it could be extended to a quite usable subset of C, and making a backend for it would be the easiest.
Open source embedded network library https://mongoose.ws
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Writing a compiler backend
« Reply #17 on: December 29, 2023, 02:10:50 am »
Does anyone here have any experience writing a compiler backend (GCC or LLVM) to support a new target? How hard is it, which of the two would you suggest?

I've never implemented a new ISA back end from scratch, but I've worked on incremental features and bug fixing on both. The LLVM infrastructure is by far the easier to get started with, and for that matter easier and faster for experts to work with too.

I've never had to do much with instruction scheduling. Single-issue machines don't care much (only loads are usually significant, mul/div are too rare to matter much), and OoO eats anything, so it's only really important on strict in-order superscalar. Recent dual-issue cores such as Arm A55 and SiFive U74 give you a bit of wriggle room by letting you issue dependent instructions together and duplicating the ALU in two different pipe stages so they're much less sensitive to instruction scheduling than older cores such as A53.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Writing a compiler backend
« Reply #18 on: December 29, 2023, 02:27:02 am »
Why do you want to get an old ISA back into the tools? Simple architectures, like HC11, or AVR or MSP430 (two small MCU cores which are still in there) don't gain much from the enhancements in newer versions of GCC. In fact, the improvements made to improve code efficiency with complex cores have often made the code worse - bigger or slower - for the simpler cores.

Oh, I have to disagree with that!  AVR and MSP430 are perfectly good targets for any modern compiler that knows how to cope with both i386 and RISC. They will benefit from newer compilers pretty much exactly as much as x86, Arm, RISC-V, MIPS, POWER etc will.

HC11 ... that's basically a 6800 with an extra index register?  Pretty impoverished, but I think LLVM and modern gcc have a better chance with it than older versions. You just need to model it correctly. The 6502 is even weirder, but there is a current project that has made a very nice LLVM compiler for it:

https://llvm.org/devmtg/2022-05/slides/2022EuroLLVM-LLVM-MOS-6502Backend.pdf

Quote
Remember that you need to know an instruction set inside out to make a compiler for it. Otherwise you'll let things like a non-obvious flag change mess up the generated code.

Very true. I'm currently working on a project porting an x86/Arm JIT to RISC-V. Others have already got it "working" -- if you ignore the mucked up edge-cases and resulting bugs. It usually works. I have better RISC-V knowledge than the others working on it and am currently auditing their code and un-mucking it up.
 

Online magic

  • Super Contributor
  • ***
  • Posts: 6779
  • Country: pl
Re: Writing a compiler backend
« Reply #19 on: December 29, 2023, 04:20:42 am »
Box86 or some proprietary stuff?

Realizing that I can now run LTspice or old ECU diagnostics software at least half-decently on an ARM Chromebook made me wonder if x86 is destined to become the universal bytecode of the future that Java wanted to be :D
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Writing a compiler backend
« Reply #20 on: December 29, 2023, 05:00:54 am »
Box86 or some proprietary stuff?

Uh .. proprietary but open? MSIL/CIL.

Quote
x86 is destined to become the universal bytecode of the future that Java wanted to be :D

Oh gawd, please nooooo...
 

Online magic

  • Super Contributor
  • ***
  • Posts: 6779
  • Country: pl
Re: Writing a compiler backend
« Reply #21 on: December 29, 2023, 05:42:03 am »
OK, I see. I thought that you meant porting an x86 to ARM JIT to RISCV.

Box86 is a project of this kind. With Box86, WINE and Apple Rosetta Wintel really is becoming "compile once, run everywhere".
 

Online magic

  • Super Contributor
  • ***
  • Posts: 6779
  • Country: pl
Re: Writing a compiler backend
« Reply #22 on: December 29, 2023, 05:59:43 am »
I've never had to do much with instruction scheduling. Single-issue machines don't care much (only loads are usually significant, mul/div are too rare to matter much), and OoO eats anything, so it's only really important on strict in-order superscalar.
My last contact with compilers was at a university course, so take this with a grain of salt. But I remember that some reordering of instructions was able to reduce register moves and register pressure, which is applicable to all targets. That being said, maybe you don't need to worry about it in GCC/LLVM because the SSA IR is already optimized when it reaches the backend?
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #23 on: December 29, 2023, 03:40:41 pm »
Why do you want to get an old ISA back into the tools? Simple architectures, like HC11, or AVR or MSP430 (two small MCU cores which are still in there) don't gain much from the enhancements in newer versions of GCC. In fact, the improvements made to improve code efficiency with complex cores have often made the code worse - bigger or slower - for the simpler cores.

Oh, I have to disagree with that!  AVR and MSP430 are perfectly good targets for any modern compiler that knows how to cope with both i386 and RISC. They will benefit from newer compilers pretty much exactly as much as x86, Arm, RISC-V, MIPS, POWER etc will.
Up to GCC 3.x it was a great platform for the MSP430 and AVR backends. After that it all went pear shaped. We could never get as tight code for either of those architectures from GCC 4.x, no matter how much massaging we did. The stuff they put in to improve scheduling in complex cores just degraded what it could achieve for simple cores. I think they just didn't care, and made no allowance for maintaining what GCC had previously done well. I never worked on versions after 4, but from what I have seen the single minded focus on performance with complex cores has just increased.
« Last Edit: December 29, 2023, 03:42:30 pm by coppice »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Writing a compiler backend
« Reply #24 on: December 29, 2023, 09:13:07 pm »
Up to GCC 3.x it was a great platform for the MSP430 and AVR backends. After that it all went pear shaped. We could never get as tight code for either of those architectures from GCC 4.x, no matter how much massaging we did. The stuff they put in to improve scheduling in complex cores just degraded what it could achieve for simple cores. I think they just didn't care

Who is "they"?  GCC is open source. The people who maintain and improve performance or code size or whatever on any particular ISA are the people who care about that ISA -- the users of it. If no one cares about an ISA enough to maintain the back end for it (past minimally working) as the rest of the compiler changes and improves, then the only people to blame for that are the users of that ISA. Or the lack of users.

You do know that GCC 4 is twenty years old? There have been a LOT of versions since then. The main ISAs that I use (arm64 and riscv) didn't even exist -- hadn't even been contemplated -- when GCC 4 came out. RISC-V RV32I and RV64I exist in very simple cores (as well as more complex ones) and modern GCC handles them well.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8652
  • Country: gb
Re: Writing a compiler backend
« Reply #25 on: December 29, 2023, 09:39:13 pm »
Up to GCC 3.x it was a great platform for the MSP430 and AVR backends. After that it all went pear shaped. We could never get as tight code for either of those architectures from GCC 4.x, no matter how much massaging we did. The stuff they put in to improve scheduling in complex cores just degraded what it could achieve for simple cores. I think they just didn't care

Who is "they"?  GCC is open source. The people who maintain and improve performance or code size or whatever on any particular ISA are the people who care about that ISA -- the users of it. If no one cares about an ISA enough to maintain the back end for it (past minimally working) as the rest of the compiler changes and improves, then the only people to blame for that are the users of that ISA. Or the lack of users.
What on Earth are you bladdering on about? They is obviously the developers of the main line GCC code. We, as developers of ISA back ends trying to eventually get them into the main line were playing constant catch up, and had no say. Conceptually the core of GCC is supposed to be ISA neutral, but of course most of the development of the core is driven by people who have specific ISAs they are trying to optimise for.
You do know that GCC 4 is twenty years old? There have been a LOT of versions since then.
GCC 5.1 is less than 9 years old (22nd April 2015, I just looked it up). I think they skipped 5.0. So, the GCC 4 series was very long lived. After that they suddenly started jumping the version numbers rather quickly.
The main ISAs that I use (arm64 and riscv) didn't even exist -- hadn't even been contemplated -- when GCC 4 came out. RISC-V RV32I and RV64I exist in very simple cores (as well as more complex ones) and modern GCC handles them well.
ARM64 was in the later revisions of GCC 4. I think RISC-V must have been too.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf