Author Topic: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver  (Read 4484 times)

0 Members and 1 Guest are viewing this topic.


Offline Mr. Scram

  • Super Contributor
  • ***
  • Posts: 9810
  • Country: 00
  • Display aficionado
 

Offline donotdespisethesnake

  • Super Contributor
  • ***
  • Posts: 1093
  • Country: gb
  • Embedded stuff
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #2 on: December 25, 2018, 01:36:57 pm »
Interesting project, but I don't think anyone is going to raise USD $6 million for a SOC. It's still a bit early for Libre Hardware at the silicon level.

A lot of people like the idea of Libre designs, but even most of those don't really want to pay a premium for it. Everyone is fine with OSS because it generally also means free as in beer.
Bob
"All you said is just a bunch of opinions."
 

Offline tsman

  • Frequent Contributor
  • **
  • Posts: 599
  • Country: gb
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #3 on: December 25, 2018, 01:44:08 pm »
Are you affiliated or just sharing?
Looks like they're part of this. programmerjake is the Jacob listed at the end of this update article.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 3996
  • Country: nz
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #4 on: December 26, 2018, 01:13:24 am »
Sadly, Luke (lkcl) has a string of unfinished and undelivered projects behind him, including most recently the EOMA68 computer with $231880 raised and two years late and counting. For very good reasons of course.  https://www.crowdsupply.com/eoma68/micro-desktop

Lots of people are going to be making RISC-V processors with the *standard* Vector processing extension for things like GPUs (and even more so to run OpenCL and the like) in about a year from now.
 

Offline lkcl

  • Newbie
  • Posts: 2
  • Country: gb
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #5 on: January 31, 2019, 04:23:20 am »
bruce, i'm sorry to have to point this out publicly: what you are posting here is bordering on a smear campaign.  the purpose of an open transparent crowdfunding campaign is to keep people informed of progress.  to say "for very good reasons of course" is disingenuous and insulting, not just to me, it's insulting to the backers, sponsors, to crowdsupply and to every single person who has helped the project over the past six years.

the owner of the factory in shenzhen is named mike.  his father developed cancer two years ago, his uncle died in december and his mother spent january in hospital recovering from surgery.  do you think, under the circumstances, that i should pressurise him to demand that he deliver on the schedule that he promised at the end of november?

the design of the Cards is extremely complex from a component sourcing and available space perspective.  it uses *mid-mount* Micro-HDMI (Type D) connectors and mid-mount USB-OTG.  those are extraordinarily difficult to get hold of, and over the six years of the project's life *three* sourced mid-mount Micro-HDMI connectors went end-of-life, each one taking up to a YEAR to find, let alone design the PCB around it.

each redesign required USD $2500 and approximately four months to complete, from the first time that the PCB CAD software was opened until the time that the samples arrived from the factory.

in the case of the last Micro-HDMI redesign, an expert in EMF and RF **VOLUNTEERED** their time to help do the track layouts.  as he was helping part-time it quite literally took over six months of email and exchanging screen-shots on the mailing list, to get the layout to the point where both of us were happy.

the fact remains that the funding is nothing remotely close to what a "normal" VC-backed Corporation would have access to.  a normal VC-backed Corporation would have access to millions of dollars.  USD $250,000 would be spent just on casework of the laptop housing alone!

you have *no idea* how challenging things have been, to keep the promises that i've made to backers.

> Lots of people are going to be making RISC-V processors with the *standard*
> Vector processing extension for things like GPUs

they will fail, plain and simple.

> (and even more so to run OpenCL and the like)

just like Nyuzi and MIAOW, they will be much more successful.

RVV is simply not designed to cope with the workloads of GPUs.

i spent several months talking to Jeff Bush of Nyuzi, reading his work.  we exchanged several messages to establish what the power-performance ratio of Nyuzi was, compared to say MALI400.  we calculated that Nyuzi would be a QUARTER of the capability of MALI400, meaning that to achieve the same level of performance of MALI400, it would be necessary to lay down FOUR times the amount of silicon as a MALI400 GPU, meaning that it would also consume four times as much power.

as we are finding by analysing the Vulkan API and using it to guide (and constrain) the project, GPUs are extremely complex.  MALI400 has a whopping 128 floating-point registers!  it's an 8-year-old embedded GPU, what in seven hells is a low-end GPU doing with *128* registers, for goodness sake?

it turns out that this is to ensure that computations are done entirely in the register file, and that any temporary / interim calculations do *not* need to be pushed back through the L1/L2 cache barrier.  analysing geoff's work shows how expensive getting data through the L1/L2 CAMs really is.

however FP computations are nothing like the full story.  converting from the quad 32-bit FP pixel values (ARGB) to 8/8/8/8 32-bit ARGB pixels realistically requires dedicated hardware, otherwise, as jeff found, you end up budgeting something like 4 instructions per pixel just to do the conversion.  a dedicated quad-FP32 to INT32 hardware instruction will, instead, do *four* pixels *per clock*, and push it to a special memory area, independent of the main memory.

this is close to what the Broadcom VideoCoreIV does (which we also studied closely).

then also it is typically necessary to have a dedicated Z-Buffer, because to use "standard vector MAX" operations turns out to be far too computationally expensive, in a similar fashion to the ARGB conversion.

some GPUs also have a dedicated hardware operation to detect the point where two lines, specified by XY coordinates, will cross, as, again, it is simply *too expensive* to do those operations with a Vector Processor.

so all of these things have to be taken into account, as part of the design of the instruction set.  a Vector Processor DOES NOT EQUATE to a GPU.  the role of a GPU is far, far more complex, as the Larrabee team discovered, and the MIAOW team (who specifically excluded GPU functionality, only going after OpenCL interoperability), *and* Jeff discovered when developing Nyuzi.


here's the thing: we cannot even *talk* to the people on the RVV Working Group to communicate these facts to them, because we are EXCLUDED by virtue of the RISC-V Foundation operating as a closed-doors Cartel.

Libre projects - those that are seeking funding based on complete 100% transparency - CANNOT join a closed-doors ITU-style Foundation, as it would fundamentally violate the basis of the Grant Application to enter into a Membership Agreement with the RISC-V Foundation that permanently prevented and prohibited me from talking publicly about the direction that the Libre RISC-V SoC is taking.

so will you please *stop* with the crusade, ok? you are badly misinterpreting the facts, and misinforming people as a result.
 

Offline lkcl

  • Newbie
  • Posts: 2
  • Country: gb
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #6 on: January 31, 2019, 04:49:00 am »
Interesting project, but I don't think anyone is going to raise USD $6 million for a SOC.

actually i have a client that has access to VC funding.  if an FPGA demonstration is achieved, it unlocks the funding.

we need to do this in stages, basically.  you're absolutely right in that nobody is going to just throw money at us, yet more than that, even if they did i would *refuse* to go direct to production without going through some extremely rigorous testing and milestones, first.

you may be interested to know that i've also received an offer by a sponsor to pay for a MVP run.  to properly take advantage of that, as it's almost certainly going to be a one-off, i absolutely absolutely have to be sure that the design is viable.

in addition i've applied for a (small, $EUR 50,000) Grant with a "Privacy and Trust" focus.  the sole exclusive reason why i was able to apply is the committment to full transparency.

by using the resources of crowdsupply and the libre-riscv mailing list, i will be able to demonstrate to the Non-Profit that yes, the Grant is being used for the purposes that it was intended, and that yes, milestones are being reached.

once we have a fully working FPGA technology demonstrator, and the software side completed (Vulkan), and associated custom instructions needed for a GPU done, *then* we will be able to apply to the same Non-Profit for a Grant of up to $EUR 5 million.

so there are ways (several).  promising to be 100% transparent, in the days where people are actually starting to get the message that technology is messing up their lives, has opened up a lot of unexpected doors.

Quote
It's still a bit early for Libre Hardware at the silicon level.

yeah this is very much a chicken-and-egg situation.  if you look at OR1200, they quotes knew quotes in advance that there was no way anyone would put it into high-end (fast clock rate) production... consequently they didn't bother to make the design decisions that would *allow* it to go into a high-end ASIC.

so mainly it is about belief and committment.  decide on the goal, *stick* to it, and go from there.  most of the Libre / Open Hardware community simply do not *believe* that they can gain access to 40nm or above, so they do not try.

Quote
A lot of people like the idea of Libre designs, but even most of those don't really want to pay a premium for it. Everyone is fine with OSS because it generally also means free as in beer.

well, that's why the processor has been designed as a semi-copy of other commercially-successful SoCs.  it targets six separate and distinct markets, which is a strategy deployed by every successful Fabless Semi Company.  ST for example do one design, and when certain parts of the ASIC fail to pass testing, they don't chuck it out, they go, "whoops, that bit didn't work, let's chuck it in a smaller package with less pins, and sell it for a bit less money".  consequently you have the STM32F030 which is identical in every respect to the STM32F070 except that the STM32F030 doesn't have a USB interface.

... you see how that works? :)

so, the product at the end of the day is intended to be sold in mass-volume markets... oh and it happens to be fully libre and 100% transparent right to the bedrock.  you'll be familiar with the commercial benefits of full source code availability, due to the cost savings.

 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 3996
  • Country: nz
Re: Libre RISC-V M-Class with Open-Source GPU and Kazan Vulkan driver
« Reply #7 on: January 31, 2019, 10:34:11 am »
bruce, i'm sorry to have to point this out publicly: what you are posting here is bordering on a smear campaign.

No, it's an attempt to allow people to do due diligence before they risk their hard-earned cash on another of your projects.

Quote
the fact remains that the funding is nothing remotely close to what a "normal" VC-backed Corporation would have access to.  a normal VC-backed Corporation would have access to millions of dollars.  USD $250,000 would be spent just on casework of the laptop housing alone!

you have *no idea* how challenging things have been, to keep the promises that i've made to backers.

No. These things are exactly what I'd expect. And the budget is as you say, tiny.

It was YOU who had no idea how challenging the project was, thereby seriously misleading the backers and making it probable they will lose their money with nothing delivered.

Quote
> Lots of people are going to be making RISC-V processors with the *standard*
> Vector processing extension for things like GPUs

they will fail, plain and simple.

We shall see. I was involved with the compiler for a GPU at Samsung and know its architecture well. I sit a couple of desks down from a guy with a Norwegian name who designed GPUs with Norwegian codenames at ARM. I'm working on RISC-V Vector compiler and he's working on the hardware. We're starting from a position of experience, not a position of ignorance.

Quote
RVV is simply not designed to cope with the workloads of GPUs.

RVV is an instruction set. It can be implemented in *many* ways. Certainly you need something in addition to standard RVV (texture unit, for example), but it's an excellent base and there is a simple and efficient mapping from SIMT GPU code to RVV: thread ids, divergence, convergence, the lot. Read Yunsup's thesis.

Quote
as we are finding by analysing the Vulkan API and using it to guide (and constrain) the project, GPUs are extremely complex.  MALI400 has a whopping 128 floating-point registers!  it's an 8-year-old embedded GPU, what in seven hells is a low-end GPU doing with *128* registers, for goodness sake?

it turns out that this is to ensure that computations are done entirely in the register file, and that any temporary / interim calculations do *not* need to be pushed back through the L1/L2 cache barrier.  analysing geoff's work shows how expensive getting data through the L1/L2 CAMs really is.

"it turns out". This is well known to people who work on GPUs. I was on the team implementing the OpenCL and Vulkan compiler, and giving feedback and suggesting modifications to the instruction set and RTL. Our design had even more than 128 registers. If you're doing the typical 32-wide warp/wavefront/whatever you want to call it then 128 is nowhere near enough.

Quote
however FP computations are nothing like the full story.  converting from the quad 32-bit FP pixel values (ARGB) to 8/8/8/8 32-bit ARGB pixels realistically requires dedicated hardware, otherwise, as jeff found, you end up budgeting something like 4 instructions per pixel just to do the conversion.  a dedicated quad-FP32 to INT32 hardware instruction will, instead, do *four* pixels *per clock*, and push it to a special memory area, independent of the main memory.

Luke, this is standard stuff, known to anyone in the field. "Output Buffer", sometimes known as "O#" in shader code.

Quote
some GPUs also have a dedicated hardware operation to detect the point where two lines, specified by XY coordinates, will cross, as, again, it is simply *too expensive* to do those operations with a Vector Processor.

Yes, scan conversion/triangle unit, whatever. Standard stuff.

Quote
so all of these things have to be taken into account, as part of the design of the instruction set.  a Vector Processor DOES NOT EQUATE to a GPU.  the role of a GPU is far, far more complex, as the Larrabee team discovered, and the MIAOW team (who specifically excluded GPU functionality, only going after OpenCL interoperability), *and* Jeff discovered when developing Nyuzi.

Of *course* a vector unit is not all you need for a GPU. This is obvious. However a standard RVV vector unit, appropriately implemented, is an excellent base to add a few special-purpose GPU instructions/functional units to.

Quote
here's the thing: we cannot even *talk* to the people on the RVV Working Group to communicate these facts to them, because we are EXCLUDED by virtue of the RISC-V Foundation operating as a closed-doors Cartel.

I find it extremely arrogant that you imagine people don't know this and need it to be communicated to them. There are EXTREMELY EXPERIENCED people from a number of big name companies that actually ship working products on the Working Group. People who have done everything from DSPs to GPUs to SuperComputers before, so we are getting a wide range of input from people with vast and vastly different experience, and synthesising it into something that -- actually -- everyone is now happy with.

Just to name *one* person on the Working Group: Steve Wallach. Look him up. "Wallach was awarded the 2008 Seymour Cray Computer Science and Engineering Award for his 'contribution to high-performance computing through design of innovative vector and parallel computing systems, notably the Convex mini-supercomputer series, a distinguished industrial career and acts of public service.'" For a bit of fun, go and read _The Soul of a New Machine_ (which I did in 1982).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf