Author Topic: Very small linux capable core  (Read 4190 times)

0 Members and 1 Guest are viewing this topic.

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Very small linux capable core
« on: July 25, 2021, 01:48:57 pm »
Hi,
This could be my first time going for a linux on a soft-core, I want to know if we have open source cores capable of running minimal linux, I have about 10-12K of LUT left on my device, and it's a Gowin part.

Do you recommend any thing? is it possible on 10K LUTs?
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #1 on: July 25, 2021, 01:56:04 pm »
Sure.

Quote
VexRiscv linux balanced (RV32IMA, 1.21 DMIPS/Mhz 2.27 Coremark/Mhz, with cache trashing, 4KB-I$, 4KB-D$, single cycle barrel shifter, catch exceptions, static branch, MMU, Supervisor, Compatible with mainstream linux) ->
    Artix 7     -> 180 Mhz 2883 LUT 2130 FF
    Cyclone V   -> 131 Mhz 1,764 ALMs
    Cyclone IV  -> 121 Mhz 3,608 LUT 2,082 FF

https://github.com/SpinalHDL/VexRiscv
 
The following users thanked this post: ali_asadzadeh

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Re: Very small linux capable core
« Reply #2 on: July 25, 2021, 01:59:27 pm »
Thanks for the info, :-+
It's in scala :'( do you recommend anything Verilog or system verilog at least! or do you recommend any good tutorial about scala?
How long does it take to learn scala, since I can do a moderate job on verilog and system verilog.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7298
  • Country: fr
Re: Very small linux capable core
« Reply #3 on: July 25, 2021, 04:10:12 pm »
Thanks for the info, :-+
It's in scala :'( do you recommend anything Verilog or system verilog at least! or do you recommend any good tutorial about scala?

It uses SpinalHDL. It's based on Scala so sure, you'll need to learn Scala. And then you can go there: https://spinalhdl.github.io/SpinalDoc-RTD/

But SpinalHDL generates Verilog, so you can read the generated code, although I would expect it not to be extremely readable compared to human-written Verilog.
Also, although probably more comfortable, you don't need to understand VexRiscv fully to be able to use it. Many people have used it without knowing SpinalHDL. You just need to figure out how to configure it for your particular use and generate the HDL output.

Now, it's unclear to me whether VexRiscv is really ready to be directly used for running Linux at the moment outside of pure simulation. Quoting VexRiscv's Readme:
Quote
There is currently no SoC to run it on hardware, it is WIP. But the CPU simulation can already boot linux and run user space applications (even python).

Maybe brucehoult can confirm it is possible already, but what the maintainers say sounds confusing to me.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #4 on: July 26, 2021, 12:40:48 am »
Maybe brucehoult can confirm it is possible already, but what the maintainers say sounds confusing to me.

I don't have direct experience with it.

The safe choice for SoC is RocketChip. I don't know the minimum size if you cut out FP and so forth. A full single-core system will fit on an Arty. Several real chips are based on the Rocket components, including the SiFive Fe-310 and FU-540, and Kendryte K210.

Again, it's a generator (Chisel this time) written in Scala, producing verilog.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7298
  • Country: fr
Re: Very small linux capable core
« Reply #5 on: July 26, 2021, 02:01:28 am »
But then I would doubt a minimal RocketChip SoC able to run Linux would fit witing 10k LUTs?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #6 on: July 26, 2021, 03:16:29 am »
But then I would doubt a minimal RocketChip SoC able to run Linux would fit witing 10k LUTs?

The core itself, very easily.

I just found this:

https://groups.google.com/a/groups.riscv.org/g/hw-dev/c/zZxy0iFzrvI/m/LVeFiK2vAQAJ

RocketTile (which includes L1 caches) 4413 LUTs.

11000 LUTs in total for a system, but that's mostly in various bus interfaces. 11000 is not *that* far off 10000.
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 1948
  • Country: ca
Re: Very small linux capable core
« Reply #7 on: July 26, 2021, 03:52:40 am »
I think this is a bit of an academic discussion, because Linux requires a lot of memory (relatively speaking), which in turn requires external memory in all but the largest FPGAs (for which there is no point in saving LUTs in the first place because you can fit dozens of even the largest cores). So your SoC will have to have some sort of bus interface and external memory controller. To give one data point, Petalinux (embedded Linux build which runs on Microblaze CPU) requires at least 32 MBytes of RAM. When I experimented with Petalinux on my Spartan-7 board, OS ended up consuming about 20 MBytes of RAM for itself, so for 32 MB of total RAM that doesn't leave very much for user applications. I don't know what RISC-V build of Linux requires (and I'm really curious to find out, so if someone knows - please let us know), but something tells me the requirement is in the same order of magnitude.

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #8 on: July 26, 2021, 04:27:11 am »
I think this is a bit of an academic discussion, because Linux requires a lot of memory (relatively speaking), which in turn requires external memory in all but the largest FPGAs (for which there is no point in saving LUTs in the first place because you can fit dozens of even the largest cores). So your SoC will have to have some sort of bus interface and external memory controller. To give one data point, Petalinux (embedded Linux build which runs on Microblaze CPU) requires at least 32 MBytes of RAM. When I experimented with Petalinux on my Spartan-7 board, OS ended up consuming about 20 MBytes of RAM for itself, so for 32 MB of total RAM that doesn't leave very much for user applications. I don't know what RISC-V build of Linux requires (and I'm really curious to find out, so if someone knows - please let us know), but something tells me the requirement is in the same order of magnitude.

Sure. The assumption here is always going to be that you have external RAM of some kind -- DRAM with the FPGA vendor's DDR IP (not counted in the 10k LUT budget), or you can get 32Mx16 PSRAM for $7 that can be interfaced yourself very easily (i.e. 2 chips, 32Mx32, 128 MB).

RISC-V mainline Linux is very similar I've played around with a javascript implementation with Fedora here:

https://bellard.org/jslinux/vm.html?cpu=riscv64&url=fedora33-riscv.cfg&mem=32

32 MB lets you boot into console with 15 MB free (19 available if freeing disk cache). You can use gcc to compile a small program, vi to edit it etc. It still works with 24 MB or even 20 MB.

Buildroot with 16 MB gives you 5 MB free, and is enough to let you compile and run hello.c. It even works in 15 MB, but that's the limit if you want to run gcc. If you just want to run existing undemanding binaries then 12 MB might be enough.

https://bellard.org/jslinux/vm.html?cpu=riscv64&url=buildroot-riscv64.cfg&mem=16

You might be able to trim Buildroot to run in 8 MB, after some fashion.

Of course Fedora requires RV64GC, so double precision FPU etc, which is big. Buildroot or yocto/OpenEmbedded you could configure for RV32, no FPU -- not even integer multiply/divide if you wanted to.
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 1948
  • Country: ca
Re: Very small linux capable core
« Reply #9 on: July 26, 2021, 04:45:10 am »
Sure. The assumption here is always going to be that you have external RAM of some kind -- DRAM with the FPGA vendor's DDR IP (not counted in the 10k LUT budget), or you can get 32Mx16 PSRAM for $7 that can be interfaced yourself very easily (i.e. 2 chips, 32Mx32, 128 MB).
PSRAMs tend to have big access latencies, and thus require relatively large caches, contributing to the overall resource demands. For example, HyperRAM devices have latencies in the order of tens of cycles, as they are optimized for streaming data (and so sequential access), so cache lines better be rather large to amortize such big latency.

Also, I don't know much about Gowin, but from what I do know they are kind of similar to ice40Ultra series of FPGA, and so are VERY pin- and resource constrained. And memory interfaces typically consume a lot of pins. While doing such a project could be an interesting challenge, I question practicality of such approach. Especially since the point of using Linux is typically to use some of it's software stacks like TCP/IP or USB Host, both of which require controllers which further raise the resource requirements (especially the latter one). If you don't need any of those, you are much better off going for bare-metal application.
« Last Edit: July 26, 2021, 04:59:24 am by asmi »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #10 on: July 26, 2021, 05:24:31 am »
Sure. The assumption here is always going to be that you have external RAM of some kind -- DRAM with the FPGA vendor's DDR IP (not counted in the 10k LUT budget), or you can get 32Mx16 PSRAM for $7 that can be interfaced yourself very easily (i.e. 2 chips, 32Mx32, 128 MB).
PSRAMs tend to have big access latencies, and thus require relatively large caches, contributing to the overall resource demands. For example, HyperRAM devices have latencies in the order of tens of cycles, as they are optimized for streaming data (and so sequential access), so cache lines better be rather large to amortize such big latency.

FPGA soft cores aren't all that fast either -- probably not more than 30 or 50 MIPS on a Gowin.

The PSRAM I referred to...

https://www.mouser.com/datasheet/2/1127/APM_PSRAM_OPI_Xccela_APS256XXN_OBRx_v1_0_PKG-1954780.pdf

... has oct SPI at 200 MHz for 800 MBps burst of 16, 32, or 64 bytes (or 2K). With SPI it's not a lot of pins either.

That seems like way more bandwidth than a soft core CPU could use.

Quote
Also, I don't know much about Gowin, but from what I do know they are kind of similar to ice40Ultra series of FPGA, and so are VERY pin- and resource constrained. And memory interfaces typically consume a lot of pins. While doing such a project could be an interesting challenge, I question practicality of such approach. Especially since the point of using Linux is typically to use some of it's software stacks like TCP/IP or USB Host, both of which require controllers which further raise the resource requirements (especially the latter one). If you don't need any of those, you are much better off going for bare-metal application.

I don't disagree with that, I'm just trying to answer the OP's question.
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Re: Very small linux capable core
« Reply #11 on: July 26, 2021, 06:35:34 am »
Thanks for sharing your Ideas.
In the gowin I have about 12K of LUT's, also I have a 32MB internal SDRAM, I want to compile and use this https://libiec61850.com/libiec61850/ on minimal linux, I hope I can do it. ^-^
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #12 on: July 26, 2021, 07:17:12 am »
Thanks for sharing your Ideas.
In the gowin I have about 12K of LUT's, also I have a 32MB internal SDRAM, I want to compile and use this https://libiec61850.com/libiec61850/ on minimal linux, I hope I can do it. ^-^

So now I don't know why you need Linux.

It should be easy to create a HAL for FreeTOS or Zephyr for that project.
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Re: Very small linux capable core
« Reply #13 on: July 26, 2021, 07:44:33 am »
Quote
So now I don't know why you need Linux.

It should be easy to create a HAL for FreeTOS or Zephyr for that project.
I think this project requires some libraries that needs a minimal linux.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #14 on: July 26, 2021, 07:55:53 am »
Quote
So now I don't know why you need Linux.

It should be easy to create a HAL for FreeTOS or Zephyr for that project.
I think this project requires some libraries that needs a minimal linux.

According to the link you sent, it needs only time, threads, TCP/IP sockets, ethernet, and a filesystem. The mentioned RTOSes have all those.
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Re: Very small linux capable core
« Reply #15 on: July 26, 2021, 08:22:25 am »
So, what soft processor do you suggest for that?
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #16 on: July 26, 2021, 08:36:31 am »
So, what soft processor do you suggest for that?

The same as already mentioned, but you could use a smaller configuration without MMU etc. Does that library use floating point? I didn't look.
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1563
  • Country: ca
Re: Very small linux capable core
« Reply #17 on: July 26, 2021, 08:49:03 am »
It can use floating point.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #18 on: July 26, 2021, 09:18:10 am »
It can use floating point.

A decent single precision FPU will use 2000 LUT6s all by itself, or around 4000 for double precision. More with LUT4s, obviously.
 
The following users thanked this post: ali_asadzadeh

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7298
  • Country: fr
Re: Very small linux capable core
« Reply #19 on: July 26, 2021, 04:35:52 pm »
I think this is a bit of an academic discussion, because Linux requires a lot of memory (relatively speaking), which in turn requires external memory in all but the largest FPGAs (for which there is no point in saving LUTs in the first place because you can fit dozens of even the largest cores). So your SoC will have to have some sort of bus interface and external memory controller. To give one data point, Petalinux (embedded Linux build which runs on Microblaze CPU) requires at least 32 MBytes of RAM. When I experimented with Petalinux on my Spartan-7 board, OS ended up consuming about 20 MBytes of RAM for itself, so for 32 MB of total RAM that doesn't leave very much for user applications. I don't know what RISC-V build of Linux requires (and I'm really curious to find out, so if someone knows - please let us know), but something tells me the requirement is in the same order of magnitude.

Agreed. This is why I talked about a "real" SoC able to run Linux (with the idea of doing anything useful with it.)
And if you want to support DDR RAM, then the controller will itself take several more thousands of LUTs. If SDRAM is OK (but then you'll probably have very limited memory), then it becomes doable.

What I'm guessing here is the OP would like to get an alternative to using a typical COTS SoC supporting Linux to lower cost/part count, while having access to FPGA fabric for some custom logic.
But they would need to also tell us what it is they would intend on running on this.

The small RISCV cores, such as  VexRiscv or PicoRV32, usually have relatively poor performance, with average CPI in the order of 4 or so. With no FPU. So if you are lucky to get them to clock at up to around 150 MHz (or even 200 MHz) on a modest FPGA, then it would probably have the equivalent processing power of an old ARM CPU @ 40 MHz or something (or even slower). Are you sure you really want to run Linux on this? What would that be for exactly?
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 1948
  • Country: ca
Re: Very small linux capable core
« Reply #20 on: July 26, 2021, 05:25:36 pm »
Agreed. This is why I talked about a "real" SoC able to run Linux (with the idea of doing anything useful with it.)
And if you want to support DDR RAM, then the controller will itself take several more thousands of LUTs. If SDRAM is OK (but then you'll probably have very limited memory), then it becomes doable.
SDRAM is good enough if only used for the CPU as there won't be much of a bandwidth left to to anything useful.

What I'm guessing here is the OP would like to get an alternative to using a typical COTS SoC supporting Linux to lower cost/part count, while having access to FPGA fabric for some custom logic.
But they would need to also tell us what it is they would intend on running on this.
OP doesn't have a problem with using shady Chinese parts, so I'm sure if that were the case, he'd be able to source some cheap-ass ripoff parts. I've seen some Zynq-010's on Aliexpress for a pocket change, and it's got more resources than Gowin parts + two A7 cores.

The small RISCV cores, such as  VexRiscv or PicoRV32, usually have relatively poor performance, with average CPI in the order of 4 or so. With no FPU. So if you are lucky to get them to clock at up to around 150 MHz (or even 200 MHz) on a modest FPGA, then it would probably have the equivalent processing power of an old ARM CPU @ 40 MHz or something (or even slower). Are you sure you really want to run Linux on this? What would that be for exactly?
Just for fun, I'm currently working on an out of order core (Tomasulo with ROB) - curious to see what kind of performance it can reach with that approach. Especially curious at the kind of frequency it can achieve on a K325 SG2 on a Genesys 2 board.

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7298
  • Country: fr
Re: Very small linux capable core
« Reply #21 on: July 26, 2021, 06:13:29 pm »
Just for fun, I'm currently working on an out of order core (Tomasulo with ROB) - curious to see what kind of performance it can reach with that approach. Especially curious at the kind of frequency it can achieve on a K325 SG2 on a Genesys 2 board.

I'm curious too. Getting it right and 100% bug-free is no small endeavour. Will that be a fully-pipelined core?

 

Online asmi

  • Super Contributor
  • ***
  • Posts: 1948
  • Country: ca
Re: Very small linux capable core
« Reply #22 on: July 26, 2021, 06:39:16 pm »
I'm curious too. Getting it right and 100% bug-free is no small endeavour. Will that be a fully-pipelined core?
Of course - that's the whole point of OoO execution. There are a couple of major difficulties with implementing such approach on FPGA, mostly with multi-ported memories for the reservation stations, ROB and register file. That's why it's an interesting challenge. RSes can probably be implemented in logic (as there are usually just a handful of them), but ROB and regfile are going to be interesting. Especially if multiple execution units will be able to "commit" results into regfile and ROB in a single cycle (think about the case when load and ALU operation complete in the same cycle, or even both of those + a INT MUL unit), or these things will need to be serialized (which will decrease performance). We will see.

The two big advantages of Tomasulo-based approach is that you get register renaming effectively for free, and different operations can have different latencies, which eliminates the need for bypassing.
« Last Edit: July 26, 2021, 06:41:22 pm by asmi »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 7298
  • Country: fr
Re: Very small linux capable core
« Reply #23 on: July 26, 2021, 07:38:31 pm »
Good luck with that. =)
I'm curious to see what kind of performance you get with this. I'll be curious about the Coremark score you're able to achieve, for instance, or the results you get with Bruce's countPrimes().
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 2355
  • Country: nz
  • Formerly SiFive, Samsung R&D
Re: Very small linux capable core
« Reply #24 on: July 27, 2021, 12:52:45 am »
Just for fun, I'm currently working on an out of order core (Tomasulo with ROB) - curious to see what kind of performance it can reach with that approach. Especially curious at the kind of frequency it can achieve on a K325 SG2 on a Genesys 2 board.

Very cool!

Did you know that cpu designer Mitch Alsup [1] has recently come up with an enhanced CDC 6600 scoreboard algorithm that he says is equivalent to Tomasulo but easier to implement? His own description of this is not easy to access, but there is a detailed write-up of it by someone else here:

https://libre-soc.org/3d_gpu/architecture/6600scoreboard/

Please note: I endorse Alsup, I do not endorse the author of this page who may well have introduced some distortions or misunderstandings.

[1] co-author of the important 1991 paper "Single instruction stream parallelism is greater than two" that prompted modern OoO design. "When all constraints are removed except those required by the semantics of the program, we have found degrees of parallelism [in SPEC] in excess of 17 instructions per cycle. Finally, and perhaps most important for exploiting single instruction stream parallelism now, we show that if the hardware is properly balanced, one can sustain from 2.0 to 5.8 instructions per cycle on a processor that is reasonable to design today". Mitch designed Motorola CPUs in the 80s including the 88k. He was Chief Architect at AMD in the period when Athlon64/Opteron was being developed.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf