Author Topic: FPGAs with embedded ARM cores, who makes them? (Read 12644 times)

Bassman59 · « **Reply #25 on:** August 01, 2018, 12:48:15 am »

Quote from: rhb on July 31, 2018, 04:38:11 pm

If there are only two, they will just say theirs is right and the other's is wrong.

I discovered the benefit of testing simultaneously on multiple platforms during a 500,000 line port from VMS to Unix. That code had fewer than a dozen bugs reported in the first release and it went down from there. Because we built tests cases and ran them on multiple platforms as we wrote the code we quickly learned what things to avoid. That code was in service for 12-16 years and completely unsupported for 4-6 years. They only pulled the plug when it simply became obsolete.

I wish it was possible to write generic VHDL (or Verilog, I prefer and use the former daily) that would let me be vendor-agnostic. But the reality is that you can't. And believe me, I've tried, even going so far as to use VHDL configurations and generates to swap out vendor-specific blocks. You end up with spaghetti, the kind that's been sitting in the drainer for a few hours because your wife made dinner and you were still a work, and now it's a blob of paste in the sink.

The good news is that inferring standard things like RAMs and ROMs is portable.

Things as simple as input DDR blocks aren't the same from vendor to vendor. Some families have input and output serializers, and there are all sorts of specific clocking requirements that make porting difficult. Clock resources are all over the place, some have PLLs, some have DLLs, some ahve both, some have just delay elements. Some families have input delays on all pins, some only on clock pins.

Hard blocks which require instantiation are not portable. Altera's gigabit serializers don't work the same way as Xilnx'. Lattice has user-accessable flash in the Mach XO parts, Xilinx has such in Spartan 3AN, completely different access mechanisms, so don't pretend you can abstract that. Memory interfaces (DDR3 and such) are all different, with wizards for configuration and setting the zillion parameters each one seems to have. And then there is the interface to the interface. What is provided? Wishbone? AXI? PLB? Something else?

Even the simple stuff isn't portable. Here's an example.

I spent years doing Xilinx designs, and in the Xilinx world, you can initialize your flip-flops as such:

Code: [Select]

signal foo : std_logic_vector(7 downto 0) := X"AB";
What this does is immediately after configuration completes, the eight flip-flops that form the vector foo are preset with the value AB. This means, among other things, that an explicit reset is not necessary, as that's done as part of the configuration process (which happens at power-up or whenever otherwise forced). Certainly, a logic reset can be used as necessary, which leads to ...

A second thing about Xilinx is that they tell you that if you really need a (global) reset, you should always use synchronous resets, never asynchronous resets. The reason? The reset net's prop delay is "excessive" and to make sure that all flip-flops come out of reset at the same time, you should use the synchronous reset. The sync reset is synchronous to the clock and the timing analyzer knows how to properly determine whether the routing for it meets timing. (It's basically flip-flop to flip-flop like any other synchronous path.) And the good news is that the flip-flops can be configured so their reset inputs are synchronous or asynchronous. The synthesis tool does this automatically, and it doesn't use any extra resources. That is, the flop's D and CE inputs aren't involved at all with reset.

We started to use Microsemi FPGAs, the ProASIC-3E parts in particular.

ProASIC-3 Lesson 1. The VHDL initializers (to set or reset flip-flops at startup) are ignored; the fabric has no way to implement them. So you must reset all flip-flops.

Lesson 2. An external power-on or other explicit reset is required, as the states of each flip-flop at power-up are unknown, because there is no initialization from configuration memory and there is no GSR.

Lesson 3. The flip-flops support an asynchronous reset or preset only. They do not support a synchronous reset. To implement the synchronous reset, the synthesizer builds a mux with one input at the reset (or preset) value, selected by the reset signal, and that's combined with all of the other logic that drives the D and CE inputs. This makes your resource use explode. Yes, a lot of logic uses what appear to be synchronous resets, say, counter clears and suchlike, but that's not global to every flip-flop in the design, and that's usually coded as in addition to the global sync reset.

Lesson 4. The fabric doesn't require you to use a special reset input pin. Pick any pin that is convenient. But it is smart enough to recognize a reset as a large fan-out signal and it will put it on a low-skew global net. These nets are commonly used for clocks, but (very much unlike modern Xilinx parts) are accessible from the fabric, so any signal can drive them and they can connect to any logic-block input, not just clocks on flip-flops and RAMs. Because the reset is now on a low-skew net that can be driven by logic, you can easily synchronize it to your clock and then distribute it in a low-skew fashion to the entire design.

Lesson 5. Because the low-skew high-fanout global nets are available for general logic use and not just for clocks, the synthesis tool may detect that some signal or other has high fan-out and would benefit from being on a global net. That would seem to be a good thing, yes? For example, as design I'm finishing up now has a large mux that takes sixteen 16-bit data buses (from block RAMs) and muxes them into one 16-bit bus. The synthesis tool detected that the upper bit of the mux select had a high fan-out and put it on a global net. And it failed to meet timing, and by a ridiculous margin (something like 2.5 ns on a 100 MHz clock). (Wide muxes in the ProASIC-3 fabric are particularly ugly.) I looked at the timing analyzer to see why, and it showed the path from the counter that generated the mux select to one of the mux-output registers, and there was an oddball 6 ns (!) delay on one particular part of the path. It turns out that it put a mux-select line on the global net, and to get to the global buffer (which is on the edge of the chip) required a long route. Once the signal was on the global net the delay was short, but it was the route to the buffer that killed it. I had to greatly increase the fan-out limit in the synthesis tool so that it wouldn't do that.

So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.

asmi · « **Reply #26 on:** August 01, 2018, 01:00:42 am »

Quote from: Bassman59 on August 01, 2018, 12:48:15 am

So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.

I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.

rhb · « **Reply #27 on:** August 01, 2018, 01:02:57 am »

This has wandered a *long* way from my original question. My question has been answered, but I'd like to make a few comments.

The fact that thousands of projects have been completed using Vivado or Quartus in no way demonstrates that either is compliant with the Verilog standard or that in the cases where the standard is "implementation defined" they do the same thing. Those are my concerns. It is only a matter of time before hard ARM cores appear in other vendors lines. It's the obvious thing to do. It reduces latencies in the PS-PL interface.

Most of the time performance is not an issue, Ahmdal's law, but when it is, you have to understand what the hardware is doing at the wire and gate level to get it right. And the rules change over time as the technology changes.

I did a port of 500,000 lines of VAX FORTRAN code from VMS to 6 flavors of Unix (Sun, IBM, HP, SGI, DEC and Intergraph) in the early 90's. It had conditionals for byte sex and FORTRAN record length. That project taught me the value of using multiple systems during initial development.

I had a problem with a piece of code on the Sun. I contacted Sun and got an, "It must be a problem with your code." But when I said, "It's works just fine on the IBM and the HP." I got an, "Oh, I see what you mean." I had a solution the next day in the form of an obscure compiler flag.

Anyone who thinks that all compilers produce the same result hasn't used more than one compiler. Or has not looked at the program results closely. I don't do UIs. I do numerical codes and it is far more complex than most imagine when you have a few terabytes of data and a few million petaflops to perform. You *really* get intimate with whatever hardware and development software you are using.

If you are aware of what is not portable you can avoid doing it if it's not performance critical and if it is, you can isolate it with a #ifdef. But you need to know that after you've written 20 lines of code, not after writing 20,000 lines. Portability is the product of the attitude, discipline and skill of the programmer.

Siwastaja · « **Reply #28 on:** August 01, 2018, 07:13:54 am »

As a relevant side note, FPGA development is horrible. Especially for someone with pedantic software background, especially in well-maintained (open source or not) projects, you'll actually feel dirty and want to puke.

This may be a slight exaggeration, but it's basically a duopoly of two giants with price fixing limiting the rate of technological advancements. You work with their rules, using their black boxes, and the boxes are much blacker than anything you see in software world. After paying $ $ $ for the devices, you pay $ $ $ $ $ $ for a license to use their bloated piece of shit compilers, which are quite advanced inside indeed to be replaced. Because there is no real competition, you need to accept what you get, and as a small player, it's hard to get support. When you accept this reality, you can do quite well. After all, these design flows do work. I have worked with them no problem; they are just highly unoptimal and feel dirty to anyone used to more scientific or engineering way of thinking. But accepting this, and having other aspects in the project done in a more sustainable way and controlled by you, you can cope with it.

What they basically tell you between the lines is: our FPGAs are a replacement for your 1-year, $10M ASIC development cycle. It doesn't matter if it costs you $100000 and 1 months of design time to do something utterly trivial - it's still 10x better than the alternative!

They are not interested in making FPGAs a more widespread thing - something that the world was expecting. I remember everybody was talking that the "FPGAs are coming everywhere" a decade ago. Now that talk has all but stopped. FPGA vendors make a high-profit niche business that's clearly large enough, and when run as price-fixed duopoly, works well for them. The niche is large enough to offer a very steady flow of profit. Trying to step outside of this realm would be a huge risk.

edit: trying to work around broken forum software

rhb · « **Reply #29 on:** August 01, 2018, 12:10:52 pm »

I've maintained several million lines of software, most of it written by scientists who never bothered to learn how to program. I've also supported software for which the company paid annual maintenance fees in the $100k range. I was *very* thankful I did not have to use the software.

In one case for which the company paid $80k a year for support, after a week or so of back and forth, the support person said, "Well, if you get it working please send me the fixes so I can give them to the other customers." They had to scrape me off the ceiling with a putty knife.

So I'm pretty familiar with the general problems I'm facing. The nice thing about having 3 platforms is if it works on two and fails on one, it's their fault. If it fails on two and works on one, it's your fault. Time to read the language standard more closely.

nctnico · « **Reply #30 on:** August 01, 2018, 12:16:25 pm »

Quote from: rhb on August 01, 2018, 12:10:52 pm

If it fails on two and works on one, it's your fault. Time to read the language standard more closely.

Trust me: they won't care at all. If you are going to develop on 3 platforms in parallel then you are wasting your time. FPGA software is a balancing act between vendor lock-in and allowing customers to use their existing code without a major rewrite.

ehughes · « **Reply #31 on:** August 01, 2018, 12:57:31 pm »

Quote

I've maintained several million lines of software, most of it written by scientists who never bothered to learn how to program. I've also supported software for which the company paid annual maintenance fees in the $100k range. I was *very* thankful I did not have to use the software.

A large ego isn't going to help you. FPGA (and ASIC) workflows are quite a bit different from *NIX software workflows. I think what the other people here are trying to communicate is that you are approaching this problem without ever have written a line of HDL for either synthesis or simulation.

You can use the verification tools in a manner closer to how would approach a generic software problem. Synthesis targets are completely different.

Here is the one piece I think you are missing:

Both of the major languages were developed for simulation and documentation in mind, NOT synthesis. The synthesis constructs were added later. There are some notes in the current versions of the standard regarding synthesis but starting with the mindset that writing in pure Verilog is going to give you this ultra portable code base that will work equally well across every FPGA is naive at best. There are people on this forum who do FPGA for a living for mission critical systems. You can ignore their advice but you are going to be very frustrated when the rubber hits the road in your project.

Here is another piece that you are missing: Altera, Xilinx, et. al have no intention of perfectly implementing the language standards. Abiding by the language standard is meaning for synthesis as they were never intended to be generic synthesis tools.

You are also missing a huge piece of the flow: constraints management. This is something you will not find anywhere in the language specs. Large projects almost always require significant time in the constraints planning to guide place&route, control clock routes, etc. This component of the flow is 100% vendor specific and can change significantly even in within the same vendor from family to family. In many cases, it is the *only* way to get specific behaviors.

Unlike writing C, The *majority* of code and support files for an FPGA is vendor specific. By the time you come up with a build system that can handle every corner case, you will have 95% spaghetti and 5% sauce. There is literally no valid use case for doing this other than to burn time.

There have been some EDA companies (Altium) that have attempted to do what you are trying to do. They all spent millions and failed because one one simple fact. Most users of FPGAs *don't care* about supporting every chip. The hardware only has to work for a specific use case. No sane design team with a set of requirements shift between vendors because they feel like it. Very few teams go half way through a project and decide to use Altera instead of Xilinx. This happens so rarely that you would be taken out back and shot for considering it.

Both of the major vendors *still sell* products from 25 years ago. All of this work may stroke an ego but you may find users don't really care.

nctnico · « **Reply #32 on:** August 01, 2018, 01:47:47 pm »

Quote from: ehughes on August 01, 2018, 12:57:31 pm

Unlike writing C, The *majority* of code and support files for an FPGA is vendor specific. By the time you come up with a build system that can handle every corner case, you will have 95% spaghetti and 5% sauce. There is literally no valid use case for doing this other than to burn time.

A while ago I have used a large open hardware project which uses HDLmake to generate a Makefile to run the synthesis and P&R process. It can target several vendors. It is not perfect but it does help to make the open hardware project synthesize for Xilinx and Altera without needing to mess around with project files.

Still it doesn't solve the timing constraints which are a very important part of any FPGA design indeed.

Siwastaja · « **Reply #33 on:** August 01, 2018, 02:31:11 pm »

A medium-complexity FPGA project can be 10000 lines of VHDL and another 10000 lines of proprietary Quartus configuration files for all the constraints. Then you get 1000 warnings every time you compile. And you compile for days. They don't care. They know developing an ASIC takes a year, so a full compile in a day is 365 times faster. That's what FPGA's still get compared to.

rhb · « **Reply #34 on:** August 01, 2018, 08:54:04 pm »

Well, my DE10-Nano finally arrived. So I shall see for myself. While it may be well meant, "Don't try it" seems not very useful advice in the context of a hobby project by someone with my peculiar background. It's not as if failure matters. The set of design tasks for a time sampling based T&M instrument is not very large or complex. It's a minuscule subset of FPGA applications.

From the comments it seems that there is a need for a common constraint language.

Developing and testing simultaneously on two systems may not be any benefit. But it certainly doesn't hurt to try it. I think generally people have missed the point, I want to know when the vendor is not adhering to the language standard. Comparing the result of synthesizing the same Verilog on two systems is the best way to find where that is happening.

As noted previously, my question was answered. So I'll leave others to argue about the wisdom of testing on multiple targets. I'd rather see what actually happens.

nctnico · « **Reply #35 on:** August 01, 2018, 10:01:04 pm »

Quote from: rhb on August 01, 2018, 08:54:04 pm

Well, my DE10-Nano finally arrived. So I shall see for myself. While it may be well meant, "Don't try it" seems not very useful advice in the context of a hobby project by someone with my peculiar background. It's not as if failure matters. The set of design tasks for a time sampling based T&M instrument is not very large or complex. It's a minuscule subset of FPGA applications.

I wouldn't underestimate the amount of work. Sampling is the easy part but reconstruction and overlaying multiple acquisitions (trigger point interpolation) on top of eachother isn't. Not by a long shot.

About developing on two systems: your time is better spend using a simulator as a reference instead of a different FPGA. Using the simulator you can verify your design and then check with what the FPGA does. One of the problems you'll encounter with an FPGA is that it is very hard to debug the internal signals. I usually implement a debug bus (16 lines or so) which allows me to bring various internal signals to the outside which then go into a logic analyser.

Daixiwen · « **Reply #36 on:** August 02, 2018, 08:10:23 am »

For a hobby project I wouldn't consider a waste of time to try the same code on different platforms. You will learn a lot about the tools, and indeed you may run in different problems on each platform which will teach you different fixes that you would need to do on your code.
For the HDL part itself the synthesizers have been better and better in recognizing HDL code that describes specific hardware modules and implement them in hardware (multipliers, memory, even double port with two different clocks). You can write a good part of your code to be vendor independent. There are still parts that have to use vendor specific IPs (PLLs for example, or I/O interface blocks) and when you are using an FPGA with a hard CPU core, the interface between the two will also be specific for each vendor, and sometimes for each FPGA family.
For timing constraints there is a kind of industry standard called Synopsys timing constraints description language, but each implementation is different, especially with signal and clock naming, and you can't just take the constraints file from one platform and use it on another. It *might* be possible to try and make the files more portable by putting all the vendor specific stuff at the beginning and using variables, and then put the actual constraints at the end, but I've never tried something like that.

For a professional project this is totally a waste of time. Just pick one platform and use it, you usually don't even have enough time to finish the project on one platform. If you ever need to change FPGA vendors you need to redo the whole PCB anyway and it's easier to consider it as a new project instead.

rhb · « **Reply #37 on:** August 02, 2018, 12:28:46 pm »

Quote from: nctnico on August 01, 2018, 10:01:04 pm

I wouldn't underestimate the amount of work. Sampling is the easy part but reconstruction and overlaying multiple acquisitions (trigger point interpolation) on top of each other isn't. Not by a long shot.

I'm assuming 1000+ hrs to completion. The rest I entirely agree with. My goal is portable IP blocks for implementing the functions of a DSO/MSO/MDO/AWG.

Quote from: nctnico on August 01, 2018, 10:01:04 pm

One of the problems you'll encounter with an FPGA is that it is very hard to debug the internal signals. I usually implement a debug bus (16 lines or so) which allows me to bring various internal signals to the outside which then go into a logic analyser.

That sounds like an excellent approach.

Quote from: Daixiwen on August 02, 2018, 08:10:23 am

You will learn a lot about the tools, and indeed you may run in different problems on each platform which will teach you different fixes that you would need to do on your code.

That is the point of doing it. I expect to find lots of "features" in the development tool chain.

Quote from: Daixiwen on August 02, 2018, 08:10:23 am

For a professional project this is totally a waste of time. Just pick one platform and use it, you usually don't even have enough time to finish the project on one platform. If you ever need to change FPGA vendors you need to redo the whole PCB anyway and it's easier to consider it as a new project instead.

Yes, and then it goes on the market for $20K and the users get to do the testing. And after the warranty has run out they finally have a usable scope.

My experience with the Unix port was the initial port to two systems (Sun was BSD and Intergraph was Sys V) took 9 months, the 3rd took 4 months as there were a lot of constructs the IBM FORTRAN compiler would not accept (branches into conditional blocks) which had to be corrected. The HP took 4 weeks. I did the DEC and SGI ports in an idle afternoon. Because we tested on multiple systems at every compile, the code we wrote did not require changes going to the IBM, HP, etc. Just the VAX FORTRAN code. I attribute the very low bug rate on that project to the multiplatform testing. It was a major lesson for me. It taught me to never rely on the man pages for a system. I always check the language and POSIX standards first and code to that. If and only if there is a problem do I read the system man pages.

There are at least two Zynq based DSOs on the market. I don't know of any Cyclone V based products. But I expect there will be eventually for the simple reason that a company which has been using Altera devices is *not* going to switch vendors for all the reasons put forth.

Siwastaja · « **Reply #38 on:** August 02, 2018, 04:00:12 pm »

Quote from: rhb on August 01, 2018, 08:54:04 pm

Developing and testing simultaneously on two systems may not be any benefit. But it certainly doesn't hurt to try it. I think generally people have missed the point, I want to know when the vendor is not adhering to the language standard.

You need to understand the difference here:

In C or C++, or Java, or any similar, there is a standardization committee, the standard is written for the purpose it's used for from the start (computer programming), and thus, there are good chances that the compilers at least try to follow the standard. Or when they won't, they often have a reason not to (like that the standard totally sucks in some part. Like the aliasing assumption rules in the C standard.).

In VHDL, I think the standardization body is weak. The language is originally built for a completely different purpose - describing behavioral simulation models; not even register tranfer level, and even less for logic synthesis. The language is fairly simple, but the actual practical synthesizable constructs are not defined in the standard at all. For example, there are no keywords for defining a register (D flipflop). There are no keywords for defining asynchronous reset, or a synchronous reset.

You do it by describing how the reset or clock works. You always actually write a behavioral simulation model for a freakin' flipflop! And there are multiple ways to do this syntactically.

This is super dumb. It's like in C, you wouldn't have an assignment operator available.

This is why editors such as Emacs offer code autogeneration, so that they generate the boilerplate required to simulate - and synthesize - a D flipflop!

With this little standardization around synthesis, you should think about it in this way: the synthesis toolmakers have just figured out: "should we invent a synthesizable hardware description language? What the heck, let's just use this language, trying to interpret the intention of the writer". Now, they (Altera and Xilinx) play with very similar rules, so most of the constructs are very well interchangeable, but there is no strict "official standard" you would refer to, to say who's right and who's wrong.

This is a highly practical situation, instead of ideal.

rhb · « **Reply #39 on:** August 02, 2018, 05:41:06 pm »

I've already downloaded the IEEE Verilog standard.

In brief, an FPGA is a collection of hard silicon blocks, an N layer interconnect fabric and a bunch of FET switches controlled by a bit map. Would you consider that an accurate description? Have I left anything significant out? I am not aware of a technology that would allow any other realization and this has profound implications for synthesis.

Routing the interconnect to satisfy constraints is NP hard. Finding satisfactory solutions is difficult. And finding optimal solutions is impossible except in special cases such as discussed by David Donoho in some papers he wrote in 2004. I do not know whether those apply in the case of FPGA synthesis nor do I know what Vivado and Quartus do. They might be very sophisticated or they might be very lame. It entirely depends upon the character of the person who wrote the code. I have seen everything from brilliant to idiotic.

Computing is rarely, if ever ideal, it's mostly a matter of compromise. There's nothing stupud about the C aliasing rules. It's the price tag for pointers. If you want to avoid that, use FORTRAN instead. FORTRAN has been as successful as it has in scientific programming precisely for what constructs John Backus and his team allowed in the language. Not allowing aliasing lets a FORTRAN compiler do things a C compiler cannot. TANSTAFL.

NorthGuy · « **Reply #40 on:** August 02, 2018, 05:54:16 pm »

Quote from: rhb on August 02, 2018, 05:41:06 pm

In brief, an FPGA is a collection of hard silicon blocks, an N layer interconnect fabric and a bunch of FET switches controlled by a bit map. Would you consider that an accurate description? Have I left anything significant out?

In brief, a PCB is a collection of ICs and discrete elements and a bunch of traces connecting the elements together. Would you consider that an accurate description? Have I left anything significant out?

rhb · « **Reply #41 on:** August 02, 2018, 07:17:25 pm »

Quote from: NorthGuy on August 02, 2018, 05:54:16 pm

Quote from: rhb on August 02, 2018, 05:41:06 pm
In brief, an FPGA is a collection of hard silicon blocks, an N layer interconnect fabric and a bunch of FET switches controlled by a bit map. Would you consider that an accurate description? Have I left anything significant out?

In brief, a PCB is a collection of ICs and discrete elements and a bunch of traces connecting the elements together. Would you consider that an accurate description? Have I left anything significant out?

You appear not to understand what it means when a problem is NP hard. Which was the point of that description. What I wrote implies that synthesis is NP hard. A PCB is not NP hard.

Siwastaja · « **Reply #42 on:** August 02, 2018, 07:38:10 pm »

Oh! You can download files! This is a great start!

NorthGuy · « **Reply #43 on:** August 02, 2018, 08:25:17 pm »

Quote from: rhb on August 02, 2018, 07:17:25 pm

You appear not to understand what it means when a problem is NP hard. Which was the point of that description. What I wrote implies that synthesis is NP hard. A PCB is not NP hard.

FPGA is like PCB. Except instead of traces you get switches controlled by the configuration bits. The routing tools make connections and as soon as your constrains are met they're done. There's no searching for optimum.

However, my point was different. Electronic design is not all about laying PCB traces, and similarly the FPGA design is not about routing.

rhb · « **Reply #44 on:** August 03, 2018, 12:45:05 am »

ROFL!

It's *all* about optimization. It's a classic problem in computer science. It's *why* FPGAs are hard. And why the design tools are so large and slow.

Satisfying the constraints is an optimization problem in mathematics and computer science. It's classic operational research.

NorthGuy · « **Reply #45 on:** August 03, 2018, 01:32:19 am »

Quote from: rhb on August 03, 2018, 12:45:05 am

It's *all* about optimization. It's a classic problem in computer science. It's *why* FPGAs are hard. And why the design tools are so large and slow.

FPGAs are not hard. Design tools are slow because they're overbloated.

Quote from: rhb on August 03, 2018, 12:45:05 am

Satisfying the constraints is an optimization problem in mathematics and computer science. It's classic operational research.

Optimization is when you try to optimize something - that is find a solution which produces the maximum (or minimum) value of something while satisfying given conditions and constraints. What do you think the FPGA routing optimizes?

BrianHG · « **Reply #46 on:** August 03, 2018, 02:15:56 am »

Quote from: asmi on August 01, 2018, 01:00:42 am

Quote from: Bassman59 on August 01, 2018, 12:48:15 am
So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.

What do you mean?
I wrote a complete image scaler and video processor in system Verilog in Altera's Quartus 3 years ago. All math was written out as simple adds, multiplies, divides in Verilog. I did not use any DSP tiles, yet, once compiled, Quartus placed all the arithmetic into the DSP blocks all on it's own.

The 1 issue I had was with the slower Cyclone implementation, a set of multiply-add where I needed the Altera megafunction for the multi-cycle-clock feature to get an improved FMAX, since at the time, I did not know how to properly implement this in System Verilog.

From what I know of the time, I can agree on the floating point where you have more power calling Altera's IP function, however this was 5 years ago when I started and things should have improved since then.

rhb · « **Reply #47 on:** August 03, 2018, 03:10:13 am »

In the 1940's the Air Force realized that they had serious logistical problems for which they urgently needed a better method of handling. They engaged a mathematician, George Dantzig, to study the problem. Dantzig solved the problem by developing the simplex method and his work spawned an entire field called "operations research". It is used for scheduling aircraft crews, factory production, shipping and many other things.

As an example consider that there are planes which go from A to B and A to C each day and a plane that goes from C to B. So suppose there is more cargo from A to B than there is capacity on the plane to B. If you schedule it properly you can ship that cargo from A to C and then from C to B. But to do this in a timely manner you have to schedule the planes so that the cargo from A arrives at C before the plane leaves for B. The constraint in this case is the capacity of each plane and the solution is the day's flight schedule.

Dantzig provided a solution with the simplex method. It is not guaranteed to be optimal, but it's pretty close. It's good enough.

Later computer scientists started looking at such problems and developed a classification. I don't recall all the details and have no interest in looking them up. Suffice it to say, if a problem is NP hard and large enough, the sun will burn out before you find the optimal solution even if you use all the computers on the planet. If a strange problem walked in my office, the first thing I considered once I understood what was wanted was, "Is it NP hard?" If it was, I needed to negotiate what was "good enough".

In computer science, FPGA synthesis is what is generally called a "Traveling Salesman" problem. Given a set of cities, find the shortest route which visits each city only once.

FPGA synthesis is called a convex optimization problem in mathematics. Minimize x subject to constraints y. Anyone working with such problems generally will randomly mix terminology from mathematics and computer science.

At low clock rates, latencies don't matter. At high clock rates they are critical. Even the clock distribution is difficult. Where you place each element of an IP alters the latencies. So the optimization is where to put the elements of the IP such that the constraints are met.

Marco · « **Reply #48 on:** August 03, 2018, 03:39:35 am »

Unless you want to do a high update rate wideband spectrum analyzer I don't see what you need to do much signal processing in the FPGA part for. Just make life easy on yourself and do it in the ARM part.

NorthGuy · « **Reply #49 on:** August 03, 2018, 04:02:04 am »

Quote from: rhb on August 03, 2018, 03:10:13 am

... Minimize x subject to constraints y ... So the optimization is where to put the elements of the IP such that the constraints are met.

What is "x"? What do you want the routing tools to minimize?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGAs with embedded ARM cores, who makes them? (Read 12644 times)

Share me