EEVblog Electronics Community Forum

Electronics => Microcontrollers => Topic started by: rhb on July 30, 2018, 01:00:37 pm

Title: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on July 30, 2018, 01:00:37 pm
Xilinx has the Zynq and Intel/Altera has the Cyclone V. 

Are there any others?  Google was not much help.  Lots of spurious hits.

I'm starting to develop FOSS  Verilog for DSOs and it would be *very* helpful to have a 3rd platform as it is much easier to get the vendor to take ownership of a bug in their development tools if you can tell them that the code works on two other systems.

To start with I'm using a MicroZed and a DE10-Nano for my initial development and will switch to a Zybo Z7-20 when I start working on the display portion and need an HDMI output.  At that time I'll get boards with HDMI output for the other platforms.

I started this some time ago, but got stalled buying T&M gear and other stuff.  I modified the thread name and moved it to Projects.  This will probably take 12-18 months to complete working on a retiree schedule and having to deal with multiple computers to satisfy software system requirement conflicts.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: andersm on July 30, 2018, 01:44:28 pm
Do you need application processors, or microcontrollers?

In addition to the Cyclone V, Intel/Altera have the Arria V and Arria 10 with dual Cortex-A9 cores, and Stratix 10 with a quad-core Cortex-A53 CPU. Xilinx' Zynq UltraScale+ MPSoC and RFSoC also have Cortex-A53 CPUs.
Microchip/Microsemi's SmartFusion 2 have a Cortex-M3 MCU.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Fsck on July 30, 2018, 01:57:48 pm
everyone (imporant) makes fpgas with arm processor cores now.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on July 30, 2018, 09:06:41 pm
I'm looking for something from a 3rd vendor with their own tool chain.  I didn't find anything from Lattice which surprised me.  I need something on a par with the Zynq 7010/20.

As stated in the initial post, I have a MicroZed and a Zybo Z7-20.  My DE10-Nano did not arrive today as I had hoped.  It *should* have been in today's mail.  But I need something comparable to those from a major vendor with a free tool chain such as the Quartus Lite edition.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Daixiwen on July 31, 2018, 08:17:07 am
You need a DS-5 license to develop on ARM with the Altera/Intel platforms. I think you can get away with it if you use embedded Linux on the SoC but I'm not sure.
The SmartFusion2 uses a much less powerful ARM core (Cortex M3, as previously stated) so it's not really a replacement for the Zynq. Also the license is only free for very small FPGAs (25K LEs).
AFAIK there aren't other producers of FPGAs with hard ARM cores.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: AndyC_772 on July 31, 2018, 08:26:46 am
it is much easier to get the vendor to take ownership of a bug in their development tools if you can tell them that the code works on two other systems.

I'm not sure I'd agree with that assumption. Far more likely, IMHO, is that a vendor will take ownership of a bug in their tools if the organisation reporting the bug places orders with a value in excess of $1M / year.

If you're developing some FOSS then that's great, but please, don't invest the time and effort developing three FPGAs. If you have three FPGAs' worth of resources, then spend them instead developing one FPGA, documenting it thoroughly, and providing responsive and capable support to anyone who wants to make use of it.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rachaelp on July 31, 2018, 08:41:04 am
I'm looking for something from a 3rd vendor with their own tool chain.  I didn't find anything from Lattice which surprised me.  I need something on a par with the Zynq 7010/20.

How about Microsemi (formerly Actel) FPGA's. Their SmartFusion / SmartFusion2 SoC devices have ARM Cortex M3's and various other peripherals. Their Libero toolchain is available in a free version, I think the main difference between the free and paid version is relating to the ModelSim version and also the available Synopsys synthesis tools.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: andersm on July 31, 2018, 10:29:19 am
You need a DS-5 license to develop on ARM with the Altera/Intel platforms.
You need a DS-5 license to use DS-5. It is not a requirement. User-level Linux application debugging is free with DS-5. I use the Cyclone V SoCFPGA at work, and we develop the Linux software using Yocto and its SDK. Those who like using IDEs use the SDK with either vanilla Eclipse, or CLion from JetBrains.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: andersm on July 31, 2018, 10:33:50 am
I'm looking for something from a 3rd vendor with their own tool chain.
There are not many FPGA vendors. There are even fewer in the high-end market segment that would embed an applications processor.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on July 31, 2018, 01:32:31 pm
it is much easier to get the vendor to take ownership of a bug in their development tools if you can tell them that the code works on two other systems.

I'm not sure I'd agree with that assumption. Far more likely, IMHO, is that a vendor will take ownership of a bug in their tools if the organisation reporting the bug places orders with a value in excess of $1M / year.

If you're developing some FOSS then that's great, but please, don't invest the time and effort developing three FPGAs. If you have three FPGAs' worth of resources, then spend them instead developing one FPGA, documenting it thoroughly, and providing responsive and capable support to anyone who wants to make use of it.

I'm going to be writing Verilog code.  I want it to be as portable as possible.  I know from personal experience that testing code on multiple platforms *as you write it*  identifies issues with the compilers and the language standards.  Sequential ports *are a lot more work*.  If Verilog is not portable across vendors I want to know it at the start, not a year later when an OEM chooses a different chip for their new design. 

In this case, I also need to be aware of hardware variations in the same way you have to if you're working on the CPU intensive portion of a seismic processing algorithm where *everything* matters down to the order and stride of your array accesses.

My goal is to develop FOSS IP for T&M gear so that I'm not dependent upon the OEM to fix things or add features.  I want it to be usable on anything put on the market.  There have been numerous rather pointless threads speculating about FOSS FW for DSOs.  Eventually I came to realize the problem was most people did not understand complex DSP flows, so it looked much harder than it is.

I spent many years writing and maintaining seismic processing codes.   For me *everything* a DSO can do is trivial DSP 101 stuff.  I've worked on 2 commercial seismic processing systems and a pair of closely related academic systems.  There are really only two ways to go about it, either in single trace or multiple trace chunks.  Each has to do the same things to the data so each has some fiddle required to handle the other option.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: xaxaxa on July 31, 2018, 01:55:37 pm

I'm going to be writing Verilog code.  I want it to be as portable as possible.  I know from personal experience that testing code on multiple platforms *as you write it*  identifies issues with the compilers and the language standards.  Sequential ports *are a lot more work*.  If Verilog is not portable across vendors I want to know it at the start, not a year later when an OEM chooses a different chip for their new design. 


Not sure about verilog but VHDL is completely portable between vendors; I started out using altera FPGAs, and had written a large library of vhdl modules targeting altera only with no consideration for portability. Later when I switched to xilinx I found that all my old code just worked.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on July 31, 2018, 02:32:42 pm

I'm going to be writing Verilog code.  I want it to be as portable as possible.  I know from personal experience that testing code on multiple platforms *as you write it*  identifies issues with the compilers and the language standards.  Sequential ports *are a lot more work*.  If Verilog is not portable across vendors I want to know it at the start, not a year later when an OEM chooses a different chip for their new design. 
Not sure about verilog but VHDL is completely portable between vendors; I started out using altera FPGAs, and had written a large library of vhdl modules targeting altera only with no consideration for portability. Later when I switched to xilinx I found that all my old code just worked.
I agree. Just start with Xilinx and go from there. Don't infer basic building blocks (for example use an array as memory with the added bonus that the synthesizer will choose the best type of memory). IMHO VHDL also offers more flexibility to make designs scalable/portable compared to Verilog.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: ehughes on July 31, 2018, 03:42:40 pm
Quote
If Verilog is not portable across vendors I want to know it at the start, not a year later when an OEM chooses a different chip for their new design. 

Given that the ARM cores themselves are written, tested and implemented in Verilog (and SystemVerilog),   it can be very portable.

You *really* should focus on one FPGA and supporting it well.   If this is your 1st go on a high end FPGA,  you really need to think about your development strategy.    Trying to target several FPGA platforms is going to lead to disappointment.    Just look at other FOSS FPGA platforms (Red Pitaya).    they have a hard enough time supporting *one* platform.   

  While it is true you can write Verilog that can synthesize under different toolchains,    there are lots of features in the different vendors parts that cannot be inferred with generic HDL.   

My experience over many FPGA projects (especially on the high end) is that purely generic HDL can be portable but is also not optimal.     I completed FPGA project which was doing imaging processing for a space application.     We were able to get a 5x improvement in processing efficiency by *thinking* about how the algorithm would map to available resources and directly instantiating hardware in the FPGA.  Generic HDL can certainly get things to work but falls apart quickly when you are trying to push the speeds and density.
 
Quote
I'm starting to develop FOSS  Verilog for DSOs and it would be *very* helpful to have a 3rd platform as it is much easier to get the vendor to take ownership of a bug in their development tools if you can tell them that the code works on two other systems.

There are only 2 players in the high end space:  Xilinx and Altera.     They are not going to care about your FOSS project and fixing bugs.    Until you get to 6 or 7 figures in a purchase order,  you have no leverage.   That is just a reality.


Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on July 31, 2018, 04:15:40 pm
Quote
If Verilog is not portable across vendors I want to know it at the start, not a year later when an OEM chooses a different chip for their new design. 
Given that the ARM cores themselves are written, tested and implemented in Verilog (and SystemVerilog),   it can be very portable.
You are forgetting that ARM cores are implemented in silicon (chips) and that is a whole different ball game compared to dealing with FPGAs. Apples and oranges.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on July 31, 2018, 04:38:11 pm
If there are only two, they will just say theirs is right and the other's is wrong. 

I discovered the benefit of testing simultaneously on multiple platforms during a 500,000 line port from VMS to Unix.  That code had fewer than a dozen bugs reported in the first release and it went down from there.  Because we built tests cases and ran them on multiple platforms as we wrote the code we quickly learned what things to avoid.  That code was in service for 12-16 years and completely unsupported for 4-6 years.  They only pulled the plug when it simply became obsolete.

In seismic processing a major operation involves summing 10**5-6 samples into *each* of 10**13-15 samples for the *simplest and cheapest* method.  Doing this takes 10**4 or more cores running for 7-10 days.  The current state of the art algorithms are an order of magnitude more CPU intensive.

A friend of mine spent over $250K porting a state of the art code to both FPGAs and GPUs.  There was not sufficient performance improvement to justify the cost of completing either port and deploying it.  That's how well tuned to the Intel architecture the existing code is.

When I wrote code 20 years ago for the simplest algorithm I chose a DEC Alpha for the floating point performance.  The inner loop ran 10% faster if I used explicit temporary variables and used a stride of 2 for the loop.  I read Alpha documentation for weeks for that project. That code was not used by the Intel version.

When developing such code you take into consideration instruction issue, pipeline latencies, cache organization, etc.  I have 4 editions of "Computer Architecture: A Quantitative Approach" by Patterson and Hennessy.  I've read the first 3 cover to cover.  I've not read the 4th because I've not had an HPC code to work on since I bought it.  So all I did was a quick skim to look for anything new.

I would not even consider developing FPGA code that did not take into account the particulars of the target hardware.  That will get isolated by #ifdef and the code run through the C preprocessor.  But a lot of it will be generic.  This is where profiling execution comes into play.  You only optimize the stuff that matters.

I looked at both Verilog and VHDL.  I like the syntax of Verilog much better.

Anyway, thanks for confirming that I hadn't missed another chip vendor.  I'll just have to live with two instead of three.  And hopefully my DE10-Nano will arrive today.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on July 31, 2018, 04:47:24 pm
If there are only two, they will just say theirs is right and the other's is wrong. 
I don't get why you are worried about this at all. These tools have been on the market for decades with hundreds of thousands of users. If there is a problem you are not the first to run into it and the answer can always be found through Google.
Quote
I would not even consider developing FPGA code that did not take into account the particulars of the target hardware.
That is the wrong way of going at it. Just like a C compiler optimises for the platform particulars a synthesizer does exactly the same. It doesn't hurt to read the synthesis manual to know how certain low level building blocks are instantiated but try to avoid instantiating the low-level building blocks directly. You can declare an array in VHDL in one line. Instantiating a low level memory block takes much more work AND it may not even be as efficient as you might think.
Quote
I looked at both Verilog and VHDL.  I like the syntax of Verilog much better.
VHDL has all the good stuff like records (structs) and strong typing. Using records alone to concatenate related signals into one saves a huge amount of typing.

Looking at an FPGA design like a functional problem works better in a higher level language. Now people will come and chime in saying that they can get maximum speed / minimum size from yadda yadda yadda but the fact is that just like C/C++ programs 99% of an FPGA design isn't speed or resource sensitive. The most precious thing is development time. You don't want to write an application like a modern full featured web browser (like Firefox) in assembler.

By the way: I'm missing simulation in your requerements.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on July 31, 2018, 06:07:11 pm
AFAIK VHDL also provides better functionality to write test benches (signal stimulus / verification) much more easily compared to Verilog. I only have experience in VHDL, but at least there it's almost too easy to create a behavioral test bench using the non-synthesizable constructs.

VHDL has its own stupid verbose things like the requirement to write super_long_(type_casts(everywhere))), and for example, the inability to index such a type cast, leading to a three-liner including a temporary variable to do a trivial thing like comparing a single bit.

... but all languages have some nuisance features like this.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on July 31, 2018, 06:22:31 pm
VHDL has its own stupid verbose things like the requirement to write super_long_(type_casts(everywhere))), and for example, the inability to index such a type cast, leading to a three-liner including a temporary variable to do a trivial thing like comparing a single bit.
That is only if you make all multi-bit vectors which are actually numbers std_logic_vector. Use the numeric library and use a numeric type for every signal which represents a number. That saves a whole lot of typing.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: ehughes on July 31, 2018, 06:22:42 pm
Quote
You are forgetting that ARM cores are implemented in silicon (chips) and that is a whole different ball game compared to dealing with FPGAs. Apples and oranges.

ARM uses several different FPGA platforms for validation.     Several semiconductor vendors (i.e. NXP, TI)  do the 1st verification of their chips on an FPGA.    The code is very portable.  Even the new RISC-V is in Verilog for FPGA implementation.

Both HDL's can achieve the same goal.  Much of it comes down to personal preference.


Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: asmi on July 31, 2018, 06:31:38 pm
AFAIK VHDL also provides better functionality to write test benches (signal stimulus / verification) much more easily compared to Verilog. I only have experience in VHDL, but at least there it's almost too easy to create a behavioral test bench using the non-synthesizable constructs.
That is only true if you still live in the past century. Because in this century SystemVerilog is light years ahead when it comes to verification. VDHL is a stone age tool in comparison.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on July 31, 2018, 06:41:58 pm
AFAIK VHDL also provides better functionality to write test benches (signal stimulus / verification) much more easily compared to Verilog. I only have experience in VHDL, but at least there it's almost too easy to create a behavioral test bench using the non-synthesizable constructs.
That is only true if you still live in the past century. Because in this century SystemVerilog is light years ahead when it comes to verification. VDHL is a stone age tool in comparison.
Still.. how well is SystemVerilog supported while Xilinx Vivado has trouble with some VHDL constructs? To me it also seems like Verilog with a whole bunch of stuff bolted onto it. Keeping the old problems AND adding new ones.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on July 31, 2018, 06:46:55 pm
Not sure about verilog but VHDL is completely portable between vendors; I started out using altera FPGAs, and had written a large library of vhdl modules targeting altera only with no consideration for portability. Later when I switched to xilinx I found that all my old code just worked.

It depends on what you're doing. If you just write general HDL (either VHDL or Verilog), it is very portable. However, if you work on something very fast, or IO related (involving clocking schemes, SERDES, calibration etc.), or use something vendor-specific (such as interfacing PC through built-in JTAG), I am not sure it is ever possible to make the code portable.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on July 31, 2018, 07:14:27 pm
VHDL has its own stupid verbose things like the requirement to write super_long_(type_casts(everywhere))), and for example, the inability to index such a type cast, leading to a three-liner including a temporary variable to do a trivial thing like comparing a single bit.
That is only if you make all multi-bit vectors which are actually numbers std_logic_vector. Use the numeric library and use a numeric type for every signal which represents a number. That saves a whole lot of typing.

That's exactly what I did. The problem I faced was a style policy forbidding the usage of anything else than std_logic and std_logic_vectors in entity ports.

My solution to that, after some nagging, was finally to simply disregard the policy. Others started to do the same so it worked well. Fight the power  :box:

You still can't completely avoid these casts, and the totally illogical naming - some have to_ prefix, other do not, for example - of these library features is daunting for beginners, as I oversaw while giving classes.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on July 31, 2018, 07:21:27 pm
AFAIK VHDL also provides better functionality to write test benches (signal stimulus / verification) much more easily compared to Verilog. I only have experience in VHDL, but at least there it's almost too easy to create a behavioral test bench using the non-synthesizable constructs.
That is only true if you still live in the past century. Because in this century SystemVerilog is light years ahead when it comes to verification. VDHL is a stone age tool in comparison.

Regarding FPGA, digital ASIC and design capture/verification, I've been under the rock for... about 6 years? ... but to talk about "past century" is definitely not true. While looking at this in around 2010 last time IIRC, SystemVerilog wasn't very widely used (if at all) in the real world yet; but was touted as the next big thing. We did a lot of academic work (mostly useless papers and such) around SystemVerilog and SystemC, anyway, but saw little practical use. I guess the game has changed, now?

Still, I see the simplicity of building a 10-line VHDL testbench using the single unified design language appealing; as long as unit size is small. For complex systems, this of course won't scale up well for complete system-level simulation.

But VHDL is a surprisingly capable language with surprisingly little "feature bloat". It's a reasonable task to learn all the VHDL features available, and most will be very useful (and offer more than Verilog). Syntax is a bit verbose though, especially if you come from C background.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: asmi on July 31, 2018, 07:56:55 pm
Still.. how well is SystemVerilog supported while Xilinx Vivado has trouble with some VHDL constructs?
Pretty good. See latest UG900 and UG901 for details - most features are supported. It greatly improved in last few releases once Xilinx started shipping their own IPs entirely developed in SV.
To me it also seems like Verilog with a whole bunch of stuff bolted onto it. Keeping the old problems AND adding new ones.
That just tells me that you didn't bother learning anything about it, while still having "an opinion" :palm:
SV fixed all major annoyances of "classic" Verilog (like that reg/wire business, lack of support for enums), plus it added a whole bunch of new features, some of which are entire game-changers (like support for interfaces and ports in synthesizable code, this makes developing modules with AXI links so much easier as you no longer have to copy-paste a million of signals that belong to AXI bus). The only real bummer is lack of IP integrator support, but it only requires that top level module is written using Verilog/VHDL, while all internal modules can be in SV.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 01, 2018, 12:48:15 am
If there are only two, they will just say theirs is right and the other's is wrong. 

I discovered the benefit of testing simultaneously on multiple platforms during a 500,000 line port from VMS to Unix.  That code had fewer than a dozen bugs reported in the first release and it went down from there.  Because we built tests cases and ran them on multiple platforms as we wrote the code we quickly learned what things to avoid.  That code was in service for 12-16 years and completely unsupported for 4-6 years.  They only pulled the plug when it simply became obsolete.

I wish it was possible to write generic VHDL (or Verilog, I prefer and use the former daily) that would let me be vendor-agnostic. But the reality is that you can't.  And believe me, I've tried, even going so far as to use VHDL configurations and generates to swap out vendor-specific blocks. You end up with spaghetti, the kind that's been sitting in the drainer for a few hours because your wife made dinner and you were still a work, and now it's a blob of paste in the sink.

The good news is that inferring standard things like RAMs and ROMs is portable.

Things as simple as input DDR blocks aren't the same from vendor to vendor. Some families have input and output serializers, and there are all sorts of specific clocking requirements that make porting difficult. Clock resources are all over the place, some have PLLs, some have DLLs, some ahve both, some have just delay elements. Some families have input delays on all pins, some only on clock pins.

Hard blocks which require instantiation are not portable. Altera's gigabit serializers don't work the same way as Xilnx'. Lattice has user-accessable flash in the Mach XO parts, Xilinx has such in Spartan 3AN, completely different access mechanisms, so don't pretend you can abstract that. Memory interfaces (DDR3 and such) are all different, with wizards for configuration and setting the zillion parameters each one seems to have. And then there is the interface to the interface. What is provided? Wishbone? AXI? PLB? Something else?

Even the simple stuff isn't portable. Here's an example.

I spent years doing Xilinx designs, and in the Xilinx world, you can initialize your flip-flops as such:

Code: [Select]
    signal foo : std_logic_vector(7 downto 0) := X"AB";
What this does is immediately after configuration completes, the eight flip-flops that form the vector foo are preset with the value AB. This means, among other things, that an explicit reset is not necessary, as that's done as part of the configuration process (which happens at power-up or whenever otherwise forced). Certainly, a logic reset can be used as necessary, which leads to ...

A second thing about Xilinx is that they tell you that if you really need a (global) reset, you should always use synchronous resets, never asynchronous resets. The reason? The reset net's prop delay is "excessive" and to make sure that all flip-flops come out of reset at the same time, you should use the synchronous reset. The sync reset is synchronous to the clock and the timing analyzer knows how to properly determine whether the routing for it meets timing. (It's basically flip-flop to flip-flop like any other synchronous path.) And the good news is that the flip-flops can be configured so their reset inputs are synchronous or asynchronous. The synthesis tool does this automatically, and it doesn't use any extra resources. That is, the flop's D and CE inputs aren't involved at all with reset.

We started to use Microsemi FPGAs, the ProASIC-3E parts in particular.

ProASIC-3 Lesson 1. The VHDL initializers (to set or reset flip-flops at startup) are ignored; the fabric has no way to implement them. So you must reset all flip-flops.

Lesson 2. An external power-on or other explicit reset is required, as the states of each flip-flop at power-up are unknown, because there is no initialization from configuration memory and there is no GSR.

Lesson 3. The flip-flops support an asynchronous reset or preset only. They do not support a synchronous reset. To implement the synchronous reset, the synthesizer builds a mux with one input at the reset (or preset) value, selected by the reset signal, and that's combined with all of the other logic that drives the D and CE inputs.  This makes your resource use explode. Yes, a lot of logic uses what appear to be synchronous resets, say, counter clears and suchlike, but that's not global to every flip-flop in the design, and that's usually coded as in addition to the global sync reset.

Lesson 4. The fabric doesn't require you to use a special reset input pin. Pick any pin that is convenient. But it is smart enough to recognize a reset as a large fan-out signal and it will put it on a low-skew global net. These nets are commonly used for clocks, but (very much unlike modern Xilinx parts) are accessible from the fabric, so any signal can drive them and they can connect to any logic-block input, not just clocks on flip-flops and RAMs. Because the reset is now on a low-skew net that can be driven by logic, you can easily synchronize it to your clock and then distribute it in a low-skew fashion to the entire design.

Lesson 5. Because the low-skew high-fanout global nets are available for general logic use and not just for clocks, the synthesis tool may detect that some signal or other has high fan-out and would benefit from being on a global net. That would seem to be a good thing, yes? For example, as design I'm finishing up now has a large mux that takes sixteen 16-bit data buses (from block RAMs) and muxes them into one 16-bit bus. The synthesis tool detected that the upper bit of the mux select had a high fan-out and put it on a global net. And it failed to meet timing, and by a ridiculous margin (something like 2.5 ns on a 100 MHz clock). (Wide muxes in the ProASIC-3 fabric are particularly ugly.) I looked at the timing analyzer to see why, and it showed the path from the counter that generated the mux select to one of the mux-output registers, and there was an oddball 6 ns (!) delay on one particular part of the path. It turns out that it put a mux-select line on the global net, and to get to the global buffer (which is on the edge of the chip) required a long route. Once the signal was on the global net the delay was short, but it was the route to the buffer that killed it. I had to greatly increase the fan-out limit in the synthesis tool so that it wouldn't do that.



So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: asmi on August 01, 2018, 01:00:42 am
So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 01, 2018, 01:02:57 am
This has wandered a *long* way from my original question.  My question has been answered, but I'd like to make a few comments.

The fact that thousands of projects have been completed using Vivado or Quartus in no way demonstrates that either is compliant with the Verilog standard or that in the cases where the standard is "implementation defined" they do the same thing.  Those are my concerns.  It is only a matter of time before hard ARM cores appear in other vendors lines.  It's the obvious thing to do.  It reduces latencies in the PS-PL interface.

Most of the time performance is not an issue, Ahmdal's law, but when it is, you have to understand what the hardware is doing at the wire and gate level to get it right.  And the rules change over time as the technology  changes.

I did a port of 500,000 lines of VAX FORTRAN code from VMS to 6 flavors of  Unix (Sun, IBM, HP, SGI, DEC and Intergraph) in the early 90's.  It had conditionals for byte sex and FORTRAN record length.  That project taught me the value of using multiple systems during initial development.

I had a problem with a piece of code on the Sun.  I contacted Sun and got an, "It must be a problem with your code."   But when I said, "It's works just fine on the IBM and the HP."  I got an, "Oh, I see what you mean."  I had a solution the next day in the form of an obscure compiler flag.

Anyone who thinks that all compilers produce the same result hasn't used more than one compiler. Or has not looked at the program results closely.  I don't do UIs.  I do numerical codes and it is far more complex than most imagine when you have a few terabytes of data and a few million petaflops to perform. You *really* get intimate with whatever hardware and development software you are using.

If you are aware of what is not portable you can avoid doing it if it's not performance critical and if it is, you can isolate it with a #ifdef.  But you need to know that after you've written 20 lines of code, not after writing 20,000 lines.  Portability is the product of the attitude, discipline and skill of the programmer.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on August 01, 2018, 07:13:54 am
As a relevant side note, FPGA development is horrible. Especially for someone with pedantic software background, especially in well-maintained (open source or not) projects, you'll actually feel dirty and want to puke.

This may be a slight exaggeration, but it's basically a duopoly of two giants with price fixing limiting the rate of technological advancements. You work with their rules, using their black boxes, and the boxes are much blacker than anything you see in software world. After paying $ $ $ for the devices, you pay $ $ $ $ $ $ for a license to use their bloated piece of shit compilers, which are quite advanced inside indeed to be replaced. Because there is no real competition, you need to accept what you get, and as a small player, it's hard to get support. When you accept this reality, you can do quite well. After all, these design flows do work. I have worked with them no problem; they are just highly unoptimal and feel dirty to anyone used to more scientific or engineering way of thinking. But accepting this, and having other aspects in the project done in a more sustainable way and controlled by you, you can cope with it.

What they basically tell you between the lines is: our FPGAs are a replacement for your 1-year, $10M ASIC development cycle. It doesn't matter if it costs you $100000 and 1 months of design time to do something utterly trivial - it's still 10x better than the alternative!

They are not interested in making FPGAs a more widespread thing - something that the world was expecting. I remember everybody was talking that the "FPGAs are coming everywhere" a decade ago. Now that talk has all but stopped. FPGA vendors make a high-profit niche business that's clearly large enough, and when run as price-fixed duopoly, works well for them. The niche is large enough to offer a very steady flow of profit. Trying to step outside of this realm would be a huge risk.

edit: trying to work around broken forum software
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 01, 2018, 12:10:52 pm
I've maintained several million lines of software, most of it written by scientists who never bothered to learn how to program.   I've also supported software for which the company paid annual maintenance fees in the $100k range.  I was *very* thankful I did not have to use the software. 

In one case for which the company paid $80k a year for support, after a week or so of back and forth, the support person said, "Well, if you get it working please send me the fixes so I can give them to the other customers."  They had to scrape me off the ceiling with a putty knife.

So I'm pretty familiar with the general problems I'm facing.  The nice thing about having 3 platforms is if it works on two and fails on one, it's their fault.  If it fails on two and works on one, it's your fault.  Time to read the language standard more closely.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on August 01, 2018, 12:16:25 pm
If it fails on two and works on one, it's your fault.  Time to read the language standard more closely.
Trust me: they won't care at all. If you are going to develop on 3 platforms in parallel then you are wasting your time. FPGA software is a balancing act between vendor lock-in and allowing customers to use their existing code without a major rewrite.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: ehughes on August 01, 2018, 12:57:31 pm
Quote
I've maintained several million lines of software, most of it written by scientists who never bothered to learn how to program.   I've also supported software for which the company paid annual maintenance fees in the $100k range.  I was *very* thankful I did not have to use the software.

A large ego isn't going to help you.     FPGA (and ASIC) workflows are quite a bit different from *NIX software workflows.    I think what the other people here are trying to communicate is that you are approaching this problem without ever have written a line of HDL for either synthesis or simulation.     

You can use the verification tools in a manner closer to how would approach a generic software problem.      Synthesis targets are completely different.

Here is the one piece I think you are missing:

Both of the major languages were developed for simulation and documentation in mind, NOT synthesis.    The synthesis constructs were added later.     There are some notes in the current versions of the standard regarding synthesis but  starting with the mindset that writing in pure Verilog is going to give you this ultra portable code base that will work equally well across every FPGA is naive at best.        There are people on this forum who do FPGA for a living for mission critical systems.    You can ignore their advice but you are going to be very frustrated when the rubber hits the road in your project.

Here is another piece that you are missing:   Altera, Xilinx, et. al  have no intention of perfectly implementing the language standards.   Abiding by the language standard is meaning for synthesis as they were never intended to be generic synthesis tools.

You are also missing a huge piece of the flow: constraints management.  This is something you will not find anywhere in the language specs.  Large projects almost always require significant time in the constraints planning to guide place&route,  control clock routes, etc.    This component of the flow is 100% vendor specific and can change significantly even in within the same vendor from family to family.   In many cases,   it is the *only* way to get specific behaviors. 

Unlike writing C,  The *majority* of code and support files for an FPGA is vendor specific.      By the time you come up with a build system that can handle every corner case,    you will have 95% spaghetti and 5% sauce.  There is literally no valid use case for doing this other than to burn time.

There have been some EDA companies (Altium) that have attempted to do what you are trying to do.    They all spent millions and failed because one one simple fact.    Most users of FPGAs *don't care* about supporting every chip.    The hardware only has to work for a specific use case.   No sane design team with a set of requirements shift between vendors because they feel like it.   Very few teams go half way through a project and decide to use Altera instead of Xilinx.  This happens so rarely that you would be taken out back and shot for considering it.

Both of the major vendors *still sell* products from 25 years ago.    All of this work may stroke an ego but you may find users don't really care.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on August 01, 2018, 01:47:47 pm
Unlike writing C,  The *majority* of code and support files for an FPGA is vendor specific.      By the time you come up with a build system that can handle every corner case,    you will have 95% spaghetti and 5% sauce.  There is literally no valid use case for doing this other than to burn time.
A while ago I have used a large open hardware project which uses HDLmake to generate a Makefile to run the synthesis and P&R process. It can target several vendors. It is not perfect but it does help to make the open hardware project synthesize for Xilinx and Altera without needing to mess around with project files.

Still it doesn't solve the timing constraints which are a very important part of any FPGA design indeed.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on August 01, 2018, 02:31:11 pm
A medium-complexity FPGA project can be 10000 lines of VHDL and another 10000 lines of proprietary Quartus configuration files for all the constraints. Then you get 1000 warnings every time you compile. And you compile for days. They don't care. They know developing an ASIC takes a year, so a full compile in a day is 365 times faster. That's what FPGA's still get compared to.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 01, 2018, 08:54:04 pm
Well, my DE10-Nano finally arrived.  So I shall see for myself.  While it may be well meant, "Don't try it" seems not very useful advice in the context of a hobby project by someone with my peculiar background.  It's not as if failure matters.  The set of design tasks for a time sampling based T&M instrument is not very large or complex.  It's a minuscule subset of FPGA applications.

From the comments it seems that there is a need for a common  constraint language.

Developing and testing simultaneously on two systems may not be any benefit.  But it certainly doesn't hurt to try it.  I think generally people have missed the point, I want to know when the vendor is not adhering to the language standard.  Comparing the result of synthesizing the same Verilog on two systems is the best way to find where that is happening.

As noted previously,  my question was answered.  So I'll leave others to argue about the wisdom of testing on multiple targets.  I'd rather see what actually happens.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on August 01, 2018, 10:01:04 pm
Well, my DE10-Nano finally arrived.  So I shall see for myself.  While it may be well meant, "Don't try it" seems not very useful advice in the context of a hobby project by someone with my peculiar background.  It's not as if failure matters.  The set of design tasks for a time sampling based T&M instrument is not very large or complex.  It's a minuscule subset of FPGA applications.
I wouldn't underestimate the amount of work. Sampling is the easy part but reconstruction and overlaying multiple acquisitions (trigger point interpolation) on top of eachother isn't. Not by a long shot.

About developing on two systems: your time is better spend using a simulator as a reference instead of a different FPGA. Using the simulator you can verify your design and then check with what the FPGA does. One of the problems you'll encounter with an FPGA is that it is very hard to debug the internal signals. I usually implement a debug bus (16 lines or so) which allows me to bring various internal signals to the outside which then go into a logic analyser.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Daixiwen on August 02, 2018, 08:10:23 am
For a hobby project I wouldn't consider a waste of time to try the same code on different platforms. You will learn a lot about the tools, and indeed you may run in different problems on each platform which will teach you different fixes that you would need to do on your code.
For the HDL part itself the synthesizers have been better and better in recognizing HDL code that describes specific hardware modules and implement them in hardware (multipliers, memory, even double port with two different clocks). You can write a good part of your code to be vendor independent. There are still parts that have to use vendor specific IPs (PLLs for example, or I/O interface blocks) and when you are using an FPGA with a hard CPU core, the interface between the two will also be specific for each vendor, and sometimes for each FPGA family.
For timing constraints there is a kind of industry standard called Synopsys timing constraints description language, but each implementation is different, especially with signal and clock naming, and you can't just take the constraints file from one platform and use it on another. It *might* be possible to try and make the files more portable by putting all the vendor specific stuff at the beginning and using variables, and then put the actual constraints at the end, but I've never tried something like that.

For a professional project this is totally a waste of time. Just pick one platform and use it, you usually don't even have enough time to finish the project on one platform. If you ever need to change FPGA vendors you need to redo the whole PCB anyway and it's easier to consider it as a new project instead.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 02, 2018, 12:28:46 pm
I wouldn't underestimate the amount of work. Sampling is the easy part but reconstruction and overlaying multiple acquisitions (trigger point interpolation) on top of each other isn't. Not by a long shot.

I'm assuming 1000+ hrs to completion.  The rest I entirely agree with. My goal is portable IP blocks for implementing the functions of a DSO/MSO/MDO/AWG.

One of the problems you'll encounter with an FPGA is that it is very hard to debug the internal signals. I usually implement a debug bus (16 lines or so) which allows me to bring various internal signals to the outside which then go into a logic analyser.

That sounds like an excellent approach.


 You will learn a lot about the tools, and indeed you may run in different problems on each platform which will teach you different fixes that you would need to do on your code.

That is the point of doing it. I expect to find lots of "features" in the development tool chain.

For a professional project this is totally a waste of time. Just pick one platform and use it, you usually don't even have enough time to finish the project on one platform. If you ever need to change FPGA vendors you need to redo the whole PCB anyway and it's easier to consider it as a new project instead.

Yes, and then it goes on the market for $20K and the users get to do the testing.  And after the warranty has run out they finally have a usable scope.

My experience with the Unix port was the initial port to two systems (Sun was  BSD and Intergraph was Sys V) took 9 months, the 3rd took 4 months as there were a lot of constructs the IBM FORTRAN compiler would not accept (branches into conditional blocks) which had to be corrected.  The HP took 4 weeks. I did the DEC and SGI ports in an idle afternoon.  Because we tested on multiple systems at every compile, the code we wrote did not require changes going to the IBM, HP, etc.  Just the VAX FORTRAN code.  I attribute the very low bug rate on that project to the multiplatform testing.  It was a major lesson for me. It taught me to never rely on the man pages for a system.  I always check the language and POSIX standards first and code to that.  If and only if there is a problem do I read the system man pages.

There are at least two Zynq based DSOs on the market.  I don't know of any Cyclone V based products.  But I expect there will be eventually for the simple reason that a company which has been using Altera devices is *not* going to switch vendors for all the reasons put forth.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on August 02, 2018, 04:00:12 pm
Developing and testing simultaneously on two systems may not be any benefit.  But it certainly doesn't hurt to try it.  I think generally people have missed the point, I want to know when the vendor is not adhering to the language standard.

You need to understand the difference here:

In C or C++, or Java, or any similar, there is a standardization committee, the standard is written for the purpose it's used for from the start (computer programming), and thus, there are good chances that the compilers at least try to follow the standard. Or when they won't, they often have a reason not to (like that the standard totally sucks in some part. Like the aliasing assumption rules in the C standard.).

In VHDL, I think the standardization body is weak. The language is originally built for a completely different purpose - describing behavioral simulation models; not even register tranfer level, and even less for logic synthesis. The language is fairly simple, but the actual practical synthesizable constructs are not defined in the standard at all. For example, there are no keywords for defining a register (D flipflop). There are no keywords for defining asynchronous reset, or a synchronous reset.

You do it by describing how the reset or clock works. You always actually write a behavioral simulation model for a freakin' flipflop! And there are multiple ways to do this syntactically.

This is super dumb. It's like in C, you wouldn't have an assignment operator available.

This is why editors such as Emacs offer code autogeneration, so that they generate the boilerplate required to simulate - and synthesize -  a D flipflop!

With this little standardization around synthesis, you should think about it in this way: the synthesis toolmakers have just figured out: "should we invent a synthesizable hardware description language? What the heck, let's just use this language, trying to interpret the intention of the writer". Now, they (Altera and Xilinx) play with very similar rules, so most of the constructs are very well interchangeable, but there is no strict "official standard" you would refer to, to say who's right and who's wrong.

This is a highly practical situation, instead of ideal.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 02, 2018, 05:41:06 pm
I've already downloaded the  IEEE Verilog standard.

In brief, an FPGA is a collection of hard silicon blocks,  an N layer interconnect fabric and a bunch of FET switches controlled by a bit map.  Would you consider that an accurate description?  Have I left anything significant out?  I am not aware of a technology that would allow any other realization and this has profound implications for synthesis.

Routing the interconnect to satisfy constraints is NP hard.  Finding satisfactory solutions is difficult.  And finding optimal solutions is impossible except in special cases such as discussed by David Donoho in some papers he wrote in 2004.  I do not know whether those apply in the case of FPGA synthesis nor do I know what Vivado and Quartus do.  They might be very sophisticated or they might be very lame.  It entirely depends upon the character of the person who wrote the code.  I have seen everything from brilliant to idiotic.

Computing is rarely, if ever ideal, it's mostly a matter of compromise.  There's nothing stupud about the C aliasing rules.  It's the price tag for pointers.  If you want to avoid that, use FORTRAN instead.  FORTRAN has been as successful as it has in scientific programming precisely for what constructs John Backus and his team allowed in the language.  Not allowing aliasing lets a FORTRAN compiler do things a C compiler cannot.  TANSTAFL.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 02, 2018, 05:54:16 pm
In brief, an FPGA is a collection of hard silicon blocks,  an N layer interconnect fabric and a bunch of FET switches controlled by a bit map.  Would you consider that an accurate description?  Have I left anything significant out?

In brief, a PCB is a collection of ICs and discrete elements and a bunch of traces connecting the elements together. Would you consider that an accurate description?  Have I left anything significant out?
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 02, 2018, 07:17:25 pm
In brief, an FPGA is a collection of hard silicon blocks,  an N layer interconnect fabric and a bunch of FET switches controlled by a bit map.  Would you consider that an accurate description?  Have I left anything significant out?

In brief, a PCB is a collection of ICs and discrete elements and a bunch of traces connecting the elements together. Would you consider that an accurate description?  Have I left anything significant out?

You appear not to understand what it means when a problem is NP hard.  Which was the point of that description.  What I wrote implies that synthesis is NP hard.  A PCB is not NP hard.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on August 02, 2018, 07:38:10 pm
Oh! You can download files! This is a great start!
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 02, 2018, 08:25:17 pm
You appear not to understand what it means when a problem is NP hard.  Which was the point of that description.  What I wrote implies that synthesis is NP hard.  A PCB is not NP hard.

FPGA is like PCB. Except instead of traces you get switches controlled by the configuration bits. The routing tools make connections and as soon as your constrains are met they're done. There's no searching for optimum.

However, my point was different. Electronic design is not all about laying PCB traces, and similarly the FPGA design is not about routing.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 03, 2018, 12:45:05 am
ROFL! 

It's *all* about optimization.  It's a classic problem in computer science.  It's *why* FPGAs are hard.  And why the design tools are so large and slow.

Satisfying the constraints is an optimization problem in mathematics and computer science.  It's classic operational research.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 03, 2018, 01:32:19 am
It's *all* about optimization.  It's a classic problem in computer science.  It's *why* FPGAs are hard.  And why the design tools are so large and slow.

FPGAs are not hard. Design tools are slow because they're overbloated.

Satisfying the constraints is an optimization problem in mathematics and computer science.  It's classic operational research.

Optimization is when you try to optimize something - that is find a solution which produces the maximum (or minimum) value of something while satisfying given conditions and constraints. What do you think the FPGA routing optimizes?

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: BrianHG on August 03, 2018, 02:15:56 am
So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.
:-// What do you mean?
I wrote a complete image scaler and video processor in system Verilog in Altera's Quartus 3 years ago.  All math was written out as simple adds, multiplies, divides in Verilog.  I did not use any DSP tiles, yet, once compiled, Quartus placed all the arithmetic into the DSP blocks all on it's own.

The 1 issue I had was with the slower Cyclone implementation, a set of multiply-add where I needed the Altera megafunction for the multi-cycle-clock feature to get an improved FMAX, since at the time, I did not know how to properly implement this in System Verilog.

From what I know of the time, I can agree on the floating point where you have more power calling Altera's IP function, however this was 5 years ago when I started and things should have improved since then.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 03, 2018, 03:10:13 am
In the 1940's the Air Force realized that they had serious logistical problems for which they urgently needed a better method of handling.  They engaged a mathematician, George Dantzig, to study the problem. Dantzig solved the problem by developing the simplex method and his work spawned an entire field called "operations research".  It is used for scheduling aircraft crews, factory production, shipping and many other things.

As an example consider that there are planes which go from A to B and A to C each day and a plane that goes from C to B.  So suppose there is more cargo from A to B than there is capacity on the plane to B.  If you schedule it properly you can ship that cargo from A to C and then from C to B.  But to do this in a timely manner you have to schedule the planes so that the cargo from A arrives at C before the plane leaves for B.  The constraint in this case is the capacity of each plane and the solution is the day's flight schedule.

Dantzig provided a solution with the simplex method.  It is not guaranteed to be optimal, but it's pretty close.  It's good enough.

Later computer scientists started looking at such problems and developed a classification.  I don't recall all the details and have no interest in looking them up.  Suffice it to say, if a problem is NP hard and large enough, the sun will burn out before you find the optimal solution even if you use all the computers on the planet.  If a strange problem walked in my office, the first thing I considered once I understood what was wanted was, "Is it NP hard?"  If it was, I needed to negotiate what was "good enough".

In computer science,  FPGA synthesis is what is generally called a "Traveling Salesman" problem.  Given a set of cities, find the shortest route which visits each city only once.

 FPGA synthesis is called a convex optimization problem in mathematics.  Minimize x subject to constraints y.  Anyone working with such problems generally will randomly mix terminology from mathematics and computer science.

At low clock rates, latencies don't matter.  At high clock rates they are critical.  Even the clock distribution is difficult.  Where you place each element of an IP alters the latencies.  So the optimization is where to put the elements of the IP such that the constraints are met.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Marco on August 03, 2018, 03:39:35 am
Unless you want to do a high update rate wideband spectrum analyzer I don't see what you need to do much signal processing in the FPGA part for. Just make life easy on yourself and do it in the ARM part.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 03, 2018, 04:02:04 am
... Minimize x subject to constraints y ...  So the optimization is where to put the elements of the IP such that the constraints are met.

What is "x"? What do you want the routing tools to minimize?
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: daveshah on August 03, 2018, 07:16:41 am
Critical path delay, among other things
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Siwastaja on August 03, 2018, 09:07:09 am
Optimization is when you try to optimize something - that is find a solution which produces the maximum (or minimum) value of something while satisfying given conditions and constraints. What do you think the FPGA routing optimizes?

1) The longest delay from any certain flip-flop output, through the logic and routing matrix, to another flip-flop input, called critical path delay. This sets the maximum clock frequency. So you obviously want to minimize it.

OTOH, the longest delay on the clock domain defines the clock speed. Once you have optimized the longest delay to be as short as possible, the rest do not matter. So you don't want to overoptimize them, because that would be limiting for the other things to optimize for:

2) Number of logic resources used. You see, by duplicating some logic, you'll be able to shorten the critical path.

3) Placement and routing resource usage. Placement of the LUTs is highly critical optimization process. If unoptimally placed, you'll soon run out of routing resources.

Actually, if you look at the settings of your synthesis tool, you'll find out shitload of options to adjust this optimization process, for example, a slider so you can balance between speed and area (#1 and #2).

So, many metrics to optimize for. Some work against each other, so you need to balance them.

FPGA is like a super-complex PCB with very limited number of layers, and tens of thousands of components. It's impossible to route by a human; hence "autorouting" is necessary.

Yes, even though part of the reason for the slow tools is bloat, they still are complex inside. Which is part of the reason no open source synthesis tools exist.

All of the complexity is hidden from the designer. The synthesis tools feel sucky, and yes we all hate them, as we always love/hate EDA tools we use, but actually they are quite some cool shit.

The only reason why it takes 10 hours for "place & route" algorithm to meet timing requirements and fit the design into the available device, is not the bloat.

If you had ever used FPGA, you would know most of this - pretty basic stuff. The compilers tend to be quite verbose, as well, and the GUI shows you the optimization results such as the critical path and design resource usage in very explicit way. Your comments clearly show you have no idea about FPGAs whatsoever, so why bother commenting like that?

I agree that FPGA place&route is probably hard enough that finding the most optimal solution would possibly take years of synthesis time even for a fairly simple design. So it's all about getting close enough in manageable time and software complexity.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: daveshah on August 03, 2018, 09:38:14 am

Yes, even though part of the reason for the slow tools is bloat, they still are complex inside. Which is part of the reason no open source synthesis tools exist.


https://github.com/YosysHQ/yosys (https://github.com/YosysHQ/yosys)
https://github.com/YosysHQ/nextpnr (https://github.com/YosysHQ/nextpnr)
https://github.com/verilog-to-routing/vtr-verilog-to-routing (https://github.com/verilog-to-routing/vtr-verilog-to-routing)

While I accept they are not at the level of complexity of the commercial tools, there are certainly open source FPGA flows out there - and they win on startup time compared to the vendor tools if nothing else. You can go from Verilog to a programmed iCE40 with a small design in about 2 seconds.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 03, 2018, 01:23:35 pm
@Siwastaja gave a good overview of the optimization problem in terms of an FPGA synthesis.  My math background let me look at the physical hardware, recognize it constituted a Traveling Saleman variant at which point I actually know quite a lot without knowing much about the details.  For example, at a certain density of utilization it is very desirable to move to the next larger part in the line.  That's obvious from the mathematics.  But when that point is reached is much harder to determine.

NP hard problems are intractable.  To find the desired answer you may  have to test all of the possible answers.  For large problems this is physically impossible.  However, for smaller problems you might find a near optimal solution after a week or two of computer time.  I would guess that a lot of the "bloat" is a large collection of  such solutions to common customer synthesis blocks which are then used as the starting point for the synthesis of the entire FPGA.  You'll meet the constraints much faster if you start out close to a solution.

The software engineering of Vivado is poor.  That's obvious from the way it is packaged. Quartus lets you download the pieces you want.  Still large, but less likely to fail.  I had a 17 GB Vivado download fail after some 10-12 hours.  I've got a 3 Mb/S link.  Packaging Vivado for all platforms in a single file is crazy.  That person should be fired for gross incompetence.

Writing code to solve NP hard problems is the most difficult class of programming.  Doing a good job requires a person who spends a good bit of their personal time buying books, reading  journal papers and experimenting on their own systems.  I spent at least 4-8 hours each week doing that and several thousand dollars each year.  It the person tasked with doing the work is a 9-5 type, the results will be very poor.  The best programmers are more than 10x better than the average programmer.  It's for the simple reason they care about the subject and their work. 

I'll have to take a look at the FOSS synthsizer.  There was some very important work by Emanuel Candes and David Donoho in 2004-2006 which proved that optimal solutions to  certain L0 (aka NP hard) problems could be found in L1 (simplex method and similar) time.  That's very different from finding a near optimal solution in L1 time which is current practice.  There are some serious restrictions, but it has also led to some interesting results on regular polytopes in N dimensional space which run much faster than simplex or interior point methods.

Observing Amdahl's law is critical to making good use of an FPGA with embedded hard cores.  You write the code on the ARM, profile it and move the slow parts into the FPGA.  Of course, for a DSO the acquisition has to start in the FPGA.  But once it hits memory there are more options.  I want split screen waterfall and time domain displays among many other things that are not available in typical DSOs.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 03, 2018, 02:30:44 pm
1) The longest delay from any certain flip-flop output, through the logic and routing matrix, to another flip-flop input, called critical path delay. This sets the maximum clock frequency. So you obviously want to minimize it.

This is a constraint, not something to optimize for. If the delay meets the setup/hold requirement od the receiving flip flop, it works. If the setup/hold requirements are not met, it doesn't work. It doesn't make any sense to try to make the delay shorter if the setup is already met. Worse yet, if you make it too short you risk failing the hold.

2) Number of logic resources used.

This is also a constraint, imposed either by the number of available logic or by floorplaning. It is to be met, not optimized.

You see, by duplicating some logic, you'll be able to shorten the critical path.

BTW: On a number of occasions, I came across a situation where I had duplicated registers to meet the timing, but the tool "optimized" my design and replaced duplicated registers with one, and then failed timing. Of course, this has nothing to do with mathematical optimization.

3) Placement and routing resource usage. Placement of the LUTs is highly critical optimization process. If unoptimally placed, you'll soon run out of routing resources.

Again, there's no reason to optimize anything. The placement you create either allow satisfactory routing which meets your timing constrains, or it doesn't.

FPGA is like a super-complex PCB with very limited number of layers, and tens of thousands of components. It's impossible to route by a human; hence "autorouting" is necessary.

I'd say FPGA has much better routing capabilities than PCB. The tools let you route manually if you wish, but it is just as tedious as routing PCBs. You wouldn't want to do this except for some limited cases.

Yes, even though part of the reason for the slow tools is bloat, they still are complex inside. Which is part of the reason no open source synthesis tools exist.

All of the complexity is hidden from the designer.

We don't actually know what is hidden from the designer. I suspect reverse-engineering the FPGA bitsreams and creating your own tools for place and route would make things much faster. But reverse-engineering is slow and boring job and no one wants to do it. BTW: There's an open source effort under way:

https://github.com/SymbiFlow/prjxray

but it doesn't seem to move very quickly.

I agree that FPGA place&route is probably hard enough that finding the most optimal solution would possibly take years of synthesis time even for a fairly simple design. So it's all about getting close enough in manageable time and software complexity.

Again, there's no optimal solution. Any solution that meets all of your constrains is just as good as any other. And there are billions of them (much more actually) for any given design. But you only need one.

Wait a minute. I'll take it back. There is one thing you can optimize for, and this is the compile time, but it doesn't seem to be on vendor's radars.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: asmi on August 03, 2018, 03:29:27 pm
Again, there's no optimal solution. Any solution that meets all of your constrains is just as good as any other. And there are billions of them (much more actually) for any given design. But you only need one.
This is not correct. "Optimize area" vs "Optimize performance" is an obvious candidate, there are reasons to reduce the amount of resource used even if they are available in the chip of your choice - more resources used => more power consumed => more heat generated. This can drive much more than many suspects - like using physically bigger package as it let's you get away with less heat management, which ironically can make entire solution smaller and cheaper. Less consumed power can mean longer battery life (or ability to get away with smaller capacity battery while still meeting your requirements on a batter life), and/or cheaper DC-DC converters with smaller inductors, less powerful PSUs and so on. FPGAs do not exist in isolation, and only parameters of entire system are important. Power is actually a big topic - for example Artix-100T can consume up to about 7 amps of current on it's Vccint rail alone! Not many "general purpose" DC-DC converters can deal with that kind of load and maintain good efficiency (less efficiency => more heat => heat managements becomes more problematic), while specialized ICs designed to deal with that kind of load are generally cost quite a bit more.
Another factor is achieving timing closure becomes progressively harder as resource utilization rolls over some magical number (about 70% in my experience), there were times where even 166 MHz was almost too much so ask for because components were too far apart and net delays were too high, and that in turn was caused by some bad decisions made during PCB development, while board re-spin was not an option.
There is a reason there is a million of settings and parameters for both synthesis and P&R tools, as developing a system with FPGA is very often about compromises.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 03, 2018, 04:12:26 pm
Again, there's no optimal solution. Any solution that meets all of your constrains is just as good as any other. And there are billions of them (much more actually) for any given design. But you only need one.
This is not correct. "Optimize area" vs "Optimize performance" is an obvious candidate, there are reasons to reduce the amount of resource used even if they are available in the chip of your choice - more resources used => more power consumed => more heat generated.

Power depends on the number of switchings per unit of time. It is not clear whether the design using more area will consume more power compared to a smaller design. Aggressive design on A15T can draw more power than lazy design on A100T. Static analysis cannot estimate power with any reasonable accuracy, so you cannot optimize for power unless you run special simulations. Vivado estimates power consumption on every implementation run. Compare this to real power measurements. These two have nothing in common.

The "Optimize area" and "Optimize performance" settings do not imply that the tools run optimization process trying to minimize area, or maximize performance (whatever this means). It's much simpler. Often, the same thing may be done in a number of ways. Say, if you have a simple 32-bit counter, it may be possible to implemented it using carry logic, or using DSP, or, if you want it really fast, you could do it in general fabric consuming more LUTs. If a similar decision is to be made, the tools may use your settings to select one option or another.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 03, 2018, 05:35:34 pm
In mathematics, satisfying constraints is called an optimization problem. In the case of an NP hard problem, finding the best solution is generally not practical.  So you settle for something close to optimal.  The optimal solution might only be a pS less variation in the latency of the individual  bits of an adder output.  The tighter your constraints the more difficult the optimization problem is.

You have a set of constraint equations.  What is commonly minimized is the summed absolute deviation of the solution from the constraints.  This is generally referred to as an L1 solution.  But the mathematics are quite general.  Normal notation is : min <some expression> s.t. <some set of constraint equations>. In the case of FPGA synthesis, one would most likely want to apply weights to the error terms so that the bounds on the high speed portions are tighter than on the low speed portions.  So different weights would be applied to different errors in the minimization expression.

Mathematicians, scientists and engineers have collectively spent millions of hours studying the problem looking for practical solutions.  And continue to do so as NP hard problems are very important in many applications.  The literature on the subject is so vast that no person could  ever read all of it. 

If anyone wants to learn more about the topic, I suggest:

Linear Programming: Foundations and Extensions
Robert J. Vanderbei
3d ed Springer 2008

Vanderbei is a professor at Princeton and writes beautifully.  And the Gnu Linear Programming Kit, GLPK, is excellent even it it is not as fast as the $100K/seat commercial packages. In some 8 years using it and following the mailing list, I cannot think of a single failure that was not due to user error.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: nctnico on August 03, 2018, 07:05:45 pm
In mathematics, satisfying constraints is called an optimization problem. In the case of an NP hard problem, finding the best solution is generally not practical.  So you settle for something close to optimal.  The optimal solution might only be a pS less variation in the latency of the individual  bits of an adder output.  The tighter your constraints the more difficult the optimization problem is.
In an FPGA this doesn't matter at all. In an FPGA logic is typically synchronous. Sure there are clock domain crossings (not to forget the inputs and outputs are a clock domain crossing!) but these can all be caught by timing constraints. The place & route just needs to make sure the delay doesn't exceed the time needed for the clock to arrive in a worst case scenario. Actually the software isn't that sophisticated. It needs a lot of steering from the user to get timing closure in many cases.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 03, 2018, 07:06:57 pm
In mathematics, satisfying constraints is called an optimization problem.

We won't get far if we start twisting basic mathematical terms.

Functions may have minimums and maximums. Since the same methods can deal with minimums and maximums, they often called optimums. An optimum can be either maximum or minimum.

The task of finding minimums is called minimization. The task of finding maximums is called maximization. The task of finding optimums is called optimization.

The optimization may be complicated with constraints - which finds the optimum in the sub-space of the inputs defined by the constraints.

Often optimization cannot be performed computationally within reasonable time (what you call NP-hard). In this situation, finding a point which is close enough to the optimum gives you approximate solution.

Finding an arbitrary point in the constrained sub-space is not optimisation. Contrary to the optimization, this task does not have any approximate solution. They point either lies within the constraints, or it doesn't. It cannot be more within constraints, or less within constraints. It is either in or not. Makes sense so far?

This is what happens in FPGA. There are constraints, predominantly timing constraints. If these constraints are met the design will work across specified conditions. If not, the design may fail. Very simple.

In the case of FPGA synthesis, one would most likely want to apply weights to the error terms so that the bounds on the high speed portions are tighter than on the low speed portions.  So different weights would be applied to different errors in the minimization expression.

FPGA uses RTL logic model (stands for Register Transfer Logic), which includes sequential elements (often called registers or flip-flops) and combinatorial logic between them (various gates, LUTs, muxes, interconnect etc.).

When a clock hits a flip-flop, the input of the flip-flop gets registered, transferred to the output and starts to propagate through the combinatorial logic to the next sequential element where it is supposed to be registered on the next(for simplicity) clock edge.

The delay through combinatorial logic is generally unpredictable - it varies from FET to FET, depends on the process, voltage, temperature. But the vendor performs characterization work - they measure the delays for thousands of FPGAs and across all the various conditions and they come up with two numbers - minimum delay and maximum delay. The vendor does this for each and every elements within FPGA.

Once these numbers are known, you can sum them up across the combinatorial path and use the numbers to determine whether the design is acceptable or not. This is done with two comparisons.

1. The combined minimum delay must be big enough to make sure that by the time the signal propagates to the next sequential element, this next sequential element has already done working with its previous input. This is defined by the "Hold" characteristic - the time starting from the clock edge and ending at the point in time where the register doesn't need the input any more.

2. The combined maximum delay must be small enough to make sure that the signal gets to the next sequential element before the sequential element starts registering the signal. This is defined by the "Setup" characteristic - the time starting from the point when the register must have valid input to the clock edge.

Thus the sequence of events should be such:

clock edge - hold expires - new data arrives - setup point - next clock edge

Note that the uncertainty of the delays doesn't propagate to the next clock cycle and doesn't accumulate. Each clock cycle starts anew, error free (not counting clock jitter). This feature lets the RTL system work for very long periods of time without errors.

However, for the RTL system to work, the events must happen in exact order (clock-hold-data-setup-clock) regardless of the conditions - voltage, temperature etc. If data arrives before hold or after setup even for one single flip-flop in your FPGA, the whole design may be doomed. Worse yet, the flip flop clocked when the input is unstable may become metastable.

Therefore, the design must meet the timing constraints. The solution cannot be approached or done approximately. The constraints must be met. And vice versa, once the constraints are met, there's no reason to tweak things any further - the design is guaranteed to work anyway. Makes sense?

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 03, 2018, 10:34:41 pm
On the first part:

This looks to be a tolerably decent summary.

https://en.wikipedia.org/wiki/Convex_optimization

The 2nd part is obvious by inspection from my summary description of an FPGA as a collection of hard blocks and an N layer interconnect controlled by FET switches which are set by a bit pattern in memory.  Solving for the connections and choice of hard blocks which meet the timing constraints is a convex optimization problem.  It's actually equivalent to a regular polytope in N dimensional space which as N gets large is overwhelmingly likely to be convex.  Which is nice as it makes things easier.

I had hoped my air transport example would have made the concepts clear.  But you may believe whatever you wish.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 03, 2018, 11:51:42 pm
So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.
:-// What do you mean?
I wrote a complete image scaler and video processor in system Verilog in Altera's Quartus 3 years ago.  All math was written out as simple adds, multiplies, divides in Verilog.  I did not use any DSP tiles, yet, once compiled, Quartus placed all the arithmetic into the DSP blocks all on it's own.

The DSP block in the Xilinx Spartan 3E/3A/3AN and Spartan 6 is quite clever, as it can be configured to do MAC, various clear/set, with programmable pipeline stages and etc etc. It has an opcode input that can be used to configure it on-the-fly on a per-clock basis. We used it to build a dual-slope integrator.

The problem was that ISE wasn't clever enough, and it would use the DSP block for the multiplier only. In order to have it do what we wanted, we had to instantiate the block and write a state machine that controlled the opcode and kept track of the data as it moved through the pipeline. Annoying and non-portable? Yes. Did it work? Yes. Whatever. The product shipped and I didn't particularly care that it was "inelegant."
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: carl0s on August 04, 2018, 12:31:33 am
Bet this site's had quite a few visits lately: https://en.wikipedia.org/wiki/NP-hardness (https://en.wikipedia.org/wiki/NP-hardness). Still makes no frickin' sense to me.


If you can do a 'scope equivalent of OpenTx  (https://www.google.com/search?q=opentx)(https://www.google.com/search?q=horus+radio (https://www.google.com/search?q=horus+radio) ) then that would be super cool.

What we really want, is for Chinese 'scope manufacturers to be pushing (and competing on) their hardware manufacturing limits, to be utilised by a standard operating system.

It's Android, for oscilloscopes.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: NorthGuy on August 04, 2018, 01:04:56 am
But you may believe whatever you wish.

Thank you for the permission.

However mathematics (as any science) is not based on believes, but rather on proofs.

If you believe that FPGA design is an optimization of untold convex functions, I don't think I can say anything useful that would help. Please accept my apologies for disturbing your thread.

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 04, 2018, 02:49:09 am
The traveling salesman problem (TSP) asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science.

 In 2006, Cook and others computed an optimal tour through an 85,900-city instance given by a microchip layout problem, currently the largest solved TSPLIB instance.

From:  https://en.wikipedia.org/wiki/Travelling_salesman_problem

See also:

sarielhp.org/teach/2004/b/27_lp_2d.pdf

If you believe that FPGA design is an optimization of untold convex functions, I don't think I can say anything useful that would help. Please accept my apologies for disturbing your thread.

FPGA synthesis is a convex optimization, not design.

In a purely mathematical description, the constraints form planes in an N dimensional space.  The possible solutions lie within the convex polytope called the feasible region which is all the possible solutions which satisfy the constraints.  If one wishes to clock the system at the highest possible rate, then one seeks the vertex of the polytope with the highest possible clock rate.  One may choose a wide variety of traits such as minimum latency to optimize.  This consists of reorienting the polytope in N dimensional space and seeking the minimum.  Every vertex of the polytope is optimal for some constraint.

I'm a scientist, not a mathematician.  I only learned this a few years ago when I solved some problems and then realized I'd been taught they could not be solved.  So I set out to find out why what I'd been taught was wrong. Or more precisely, when what I'd been taught was wrong. The general case was true, but there were exceptions which I'd not been shown.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: hamster_nz on August 04, 2018, 04:25:15 am
Waffling on about the mathematical purity of the FPGA P+R process makes as much sense as lamenting that you can't see see the animals at the zoo ias deciding where to walk is an NP-hard problem.

Treating it as an optimization problem was "gen 1"  FPGA tools. It had to be abandoned to be replaced with a set of heuristics that give reasonable results in a reasonable amount of time.

Take this vague observation. The increase in P+R runtime vs FPGA and and design size is not consistent with the standard 'NP hard' scaling.  Compare the hardness of travelling salesman for 1,000 cities vs 10,000 cities. against the place and route of a design of 1000 LUTs vs one for 10,000 LUTs.

Why is a 85% full design on a small FPGA harder to route successfully than the same design on an FPGA that is twice the size? It should be harder, as it has far more solutions that need to be explored.

I am sure somebody will find this interesting to watch:

"Visualization of nextpnr placing & routing two PicoRV32s on an iCE40 HX8K (10x speed)"

https://twitter.com/i/status/1024623710165237760
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 04, 2018, 06:40:57 am

FPGA synthesis is a convex optimization, not design.

You are confusing synthesis and fitting (place and route). They are two pretty much entirely separate processes.



The goal of synthesis is to translate your hardware description into a netlist that implements the design in the primitives available in the target architecture and shows the connections between those primitives. Synthesis is one part of pattern matching: follow the guidelines in the manual. If you want a flip-flop, write code this way. If you want a large RAM block, write code that way.

The other part of synthesis is logic optimization (and not necessarily minimization), and this optimization is highly dependent on what works best for a given target architecture. Consider a simple shift register. Xilinx CLBs can be configured as shift registers, so say a four-bit shift register fits into one CLB. MicroSemi logic "tiles" are too fine-grained for that, so a shift register synthesizes into four flip-flops. The Xilinx CLB can also be configured as a small RAM (called LUT RAM or distributed RAM), and that feature doesn't exist in MicroSemi's parts, either, so you get an array of flip-flops to implement that memory.

The obvious point here is the ultimate implementation of the logic in specific target-device primitives actually doesn't matter -- as long as the result is functionally correct, then, really, who cares what the synthesis tool creates?

But there are FPGA features that the synthesis tool can't, or won't, infer. In some cases, inferring a primitive like a PLL or a gigabit serializer from a purely behavioral description is just too complicated. It's better to instantiate a black box, which the synthesis tool just puts into the netlist and passes along to the place and route. (There are some things that synthesis should infer but doesn't, like DDR flops in the I/O.)

Since the synthesis tool has no real way of knowing what the fitter will do, it cannot do a proper timing analysis. But synthesis understands loading and fan-out and will replicate and buffer nets for performance reasons.



The role of the fitter is to take the netlist of primitives and their connections and fit them into the fabric. There is more to the fitting process than just the netlist, though, than just the traveling-salesman problem of "optimal" placement and routing.

That's where the constraints come in. They are in two broad categories: timing and placement constraints.

Timing constraints are usually straightforward. FPGAs are synchronous digital designs, so you need a period constraint for each clock to ensure your flip-flip to flip-flop logic (both primitives and routing) ensure no failures. Managing chip input and output timing is often straightforward, too; there are not that many ways of connecting things.

You set the timing constraints for your actual design requirements. If you have a 50 MHz clock, you don't set a 75 MHz period constraint, as it makes the tools work harder than they have to, and it might not close timing. Then the tools run and you see whether you win or not. If you don't, then you need to reconsider things. Set the tools for extra effort. Look to where you can minimize logic. Look to see if the synthesis built something wacky, so you have to re-code.

Placement constraints are a lot more complicated, and this is because each FPGA family has different rules. You have to choose pins for each signal that goes to or comes from off-chip. Sounds easy, right? Well, the layout person has ideas about routing to make that job easier. But you have to mind specialist pins for your clock inputs. You have to abide by I/O standards and I/O supply voltages, and whether a 3.3 V LVCMOS signal can go on a bank with LVDS signals. Oh, and LVDS requires choosing the pair of pins. And you have to specify output drive and slew rate, and on inputs you specify termination, input delay, pull-up/pull-down/keeper. And on and on.

Only once your placement constraints are set should you let the traveling salesman run.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 04, 2018, 01:52:48 pm
Thank you!  Finally, some sensible comments. I'd meant for this to end long ago, but tried to provide NorthGuy with an explanation of the terminology mathematicians use for these problems.

Two brief comments:

An almost full chip is harder to synthesize because  many of the heuristics that succeed when it is not so full don't succeed.  So you are forced to try more possibilities.  No one ever attempts a full optimization because it simply cannot be done for anything other than uselessly small problems.  It's really just a search for any point inside the feasible region.  But the mathematical community have a lexicon and language for this. So I follow their rules.

I only referenced the TSP as a simple explanation of what NP hard meant and because the synthesis and placement problem is by inspection  at least as hard as a TSP.  No one previously bothered with the distinction between synthesis and placement which as noted is significant, as they involve very different problems.

I asked what I thought was a simple question.  Are there any other FPGAs with hard ARM cores similar to the Zynq and Cyclone V?  Boy, was I ever wrong!  It turned into quite an odyssey,

Have Fun!
Reg

Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 04, 2018, 09:48:55 pm
Thank you!  Finally, some sensible comments. I'd meant for this to end long ago, but tried to provide NorthGuy with an explanation of the terminology mathematicians use for these problems.

Two brief comments:

An almost full chip is harder to synthesize because  many of the heuristics that succeed when it is not so full don't succeed.

Again -- that's not a synthesis problem, that's a fitter problem.  But, yes, you are correct, because as the device fills up, routing resource availability might become strained. That was a problem on ancient devices (ugh, XC3000?) but the newer stuff has a lot of routing so the problem really becomes, "can we place the related logic close enough to each other so that routing between them meets our timing constraints?"

Quote
So you are forced to try more possibilities.  No one ever attempts a full optimization because it simply cannot be done for anything other than uselessly small problems.  It's really just a search for any point inside the feasible region.  But the mathematical community have a lexicon and language for this. So I follow their rules.

But optimization is dependent on your goal. I mean, this is engineer, right? We don't need to strive for perfection, because we can't define that, anyway. But we can say, "the design has 5,000 flip-flops and has to run at 50 MHz." Meeting the former constraint requires choosing a device with at least 5,000 flip-flops. If there are more, well, great. Meeting the latter constraint means the design will work. We don't care if it can run faster.


Quote
I asked what I thought was a simple question.  Are there any other FPGAs with hard ARM cores similar to the Zynq and Cyclone V?  Boy, was I ever wrong!  It turned into quite an odyssey,

Well, this is the Internet, where veering off-topic is a given.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 04, 2018, 11:00:04 pm
Sorry, yes.  It's just that no one made the distinction between synthesis and fitting before.  So I'm still tending to describe it in that fashion.  But it's obvious that they are quite different problems.  The synthesis step is entirely governed by the available hard blocks.

In this case, "optimization" is complete once the constraints are met unless actual performance fails to meet expectations from the simulations.  But the mathematicians still call it "optimization" even if it's really just finding the feasible region.

As a consequence of stumbling across the work of Candes and Donoho, I spent some 3 years reading over 3000 pages of complex mathematics on optimization.   It's really cool stuff.  Google "single pixel camera" if you'd like to really blow your mind. TI is using it in a near IR spectrometer product although they call it "Hadamard sensing".  I still need to get serious on the general subject of convex optimization, but that sort of stuff is a lot of work to read.  I also have no one to talk to about it, so it's not as much fun as if I did. I'd rather play with hardware right now.  And in particular compare the Zynq world to the Cyclone V world.

Again, thank you for writing a clean crisp description of the process.  You have to know the topic well to do that, and even then it's work to do well.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: BrianHG on August 05, 2018, 03:15:34 am
So yeah, it would be great if it was reasonable to "write once, synthesize everywhere," but in practice, that isn't possible.
I agree with everything above, but in addition to that there is an elephant in the room - DSO application will most certainly require using DSP tiles, and they are some of the most non-portable even across different FPGA families of a single vendor, much less so between vendors.
:-// What do you mean?
I wrote a complete image scaler and video processor in system Verilog in Altera's Quartus 3 years ago.  All math was written out as simple adds, multiplies, divides in Verilog.  I did not use any DSP tiles, yet, once compiled, Quartus placed all the arithmetic into the DSP blocks all on it's own.

The DSP block in the Xilinx Spartan 3E/3A/3AN and Spartan 6 is quite clever, as it can be configured to do MAC, various clear/set, with programmable pipeline stages and etc etc. It has an opcode input that can be used to configure it on-the-fly on a per-clock basis. We used it to build a dual-slope integrator.

The problem was that ISE wasn't clever enough, and it would use the DSP block for the multiplier only. In order to have it do what we wanted, we had to instantiate the block and write a state machine that controlled the opcode and kept track of the data as it moved through the pipeline. Annoying and non-portable? Yes. Did it work? Yes. Whatever. The product shipped and I didn't particularly care that it was "inelegant."

Yes, keeping track of those damn pipeline stages and where and when data is valid.  Even in Quartus, it is a hand full.  However, since I also created my own full multiport read and write intelligent cache DDR2 DRam controller, the trick I performed, whenever reading, writing, or math anywhere else, and I done a lot of on the fly video processing, I have 2 thing included in every verilog function I have created to date:

1.  Enable input and enable output, where the enable is a DFF with the same pipe size as the function allowing the data flow to go on and off at any point, with an embedded parameter size configuration control.
2.  A wide width of DFF in and out destination address bits.  Basically the same as the enable in to out, but, with a n-number of address data bits.  So, if I read from my ram controller or process color in my color enhancement processor, I also provide a destination address input with the enable input, which all follows the delay pipe through the verilog function, and, at the output, the in sync enable out, with destination address out and the function's generated data all come in parallel.  Whichever number of bits I set for these in the parameters, or if I just don't wire the port, it is no problem as the veriolog compiler will only include the wired assets when building the firmware not wasting time and space on anything unwired.

* the address may also be any parallel unprocessed data which may need to be kept in parallel with the verilog function's processed data output.

Following this convention design practice for every function I make has exponentially increased my ability to change pipe-line lengths on the fly for any function at will when needed for optimization with 0 debugging effort or any additional external fixes.  If I were to instruct someone in the art of doing designs which will need variable sized pipelines, this would be the first thing I would teach.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 05, 2018, 04:53:29 am
I'd very much like to learn more.  You can buy a 2.6/5.2 GSa/S 14 bit ADC eval board  from AD for $1900 or a 2/4 GSa/S for $1200  so one of my goals is to make what I write accommodate variable widths so that a person could assemble a bespoke instrument as a one-off using connectorized modules and eval boards..  Not sure it's possible, but it would be really useful if I can figure out how to build a framework that could do that.

I'm sure it will not be easy, but this is  a 12-18 month project.  There's no hope of work in the oil patch at current prices, so I might as well give something else a serious effort.  Whatever the outcome, I'm sure I'll learn a lot.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: BrianHG on August 05, 2018, 05:03:59 am
I'd very much like to learn more.  You can buy a 2.6/5.2 GSa/S 14 bit ADC eval board  from AD for $1900 or a 2/4 GSa/S for $1200  so one of my goals is to make what I write accommodate variable widths so that a person could assemble a bespoke instrument as a one-off using connectorized modules and eval boards..  Not sure it's possible, but it would be really useful if I can figure out how to build a framework that could do that.

I'm sure it will not be easy, but this is  a 12-18 month project.  There's no hope of work in the oil patch at current prices, so I might as well give something else a serious effort.  Whatever the outcome, I'm sure I'll learn a lot.
Which ADC boards so we could see how they interface on the digital side.  At these speeds, usually you will need to use LVDS transceivers.  This complicates board-to-board linking and on the FPGA side, this will get be as expensive as the ADC eval boards, if not double or triple.

As for the feasibility of doing it, yes, with money, anything with the speeds you listed can be done, but, the connectors and board layout for the FPGA, with careful attention to a dedicated bank and dedicated PLL clock domain to acquire at said speeds.

Hint, at these speeds, with real-time mathematical signal processing at full sample speed, you will be using a lot of pipe-lined functions to keep that maximum clock rate up there as well as multiple parallel channels.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 05, 2018, 01:15:40 pm
AD9689.  I had assumed that the FPGA side would be about double the ADC side. I have no intentions of building such a beast.  I just mentioned it because  the capabilities it offers are quite intriguing.   I discovered it via an AD ad in my email.  That and  a good article in QEX not long ago about a 100 MHz BW 16 bit SDR built with a $500 ADC board and a $500 ZedBoard prompted me to factor variable width ADCs into my DSO FOSS FW  project.

Something like the AD9869 is *seriously* difficult to deal with and the JESD204B interface IP might make a one off uneconomic/impractical.  The AD9689 eval board is transformer coupled so it would not make a general purpose DSO, but for a bespoke lab instrument it's cheaper and more capable than buying a $20K DSO. A one off like that is really PhD or post-doc  project material though.

The stuff is quite pricey, but I've discovered that there has been a lot of standardization of interconnects between FPGA boards and ADC and DAC boards.  I presume the market developed because of the amount of time it takes to design a board to operate at these speeds.

This thread got started because I know from experience that good design methodology makes a huge difference both in productivity and bug rates.  I got lots of flack for suggesting developing on both the Zynq and Cyclone V at the same time.  But the best way to understand the strengths and weaknesses of each is to do a series of small tasks on both at the same time.  I'd hoped there was a 3rd line with embedded hard ARM cores from a different vendor, but it appears not.

I'd be really appreciate a general outline of your methodology for dealing with pipelines of variable bit width.  I'd like to generalize what I do even though it's not needed and is more work.

For the FOSS DSO FW project I've assumed a pipeline from the ADC of an AFE correction filter, user selected BW filter with a choice of step responses (e.g.  best rise time, least overshoot) followed by fanout to a pipelined set of math functions, a main data stream and the trigger functions.  Observation of period measurements using a couple of $20K 1 GHz DSOs and a Keysight 33622A (< 1 pS jitter), a Leo Bodnar GPSDO and one of his 40 pS rise time pulsers, suggest to me that interpolating the trigger point is a significant problem.  This was confirmed by a comment by @nctnico.

A back of the envelope estimate suggests that a 10 point sinc interpolator lookup table would provide 1 pS resolution of the trigger set point at the cost of 10 multiply-add operations.  But  I've not done any numerical experiments yet.  I'm still trying to get a basic discussion of minimum phase anti-alias filtering completed for the DSP 101 FOSS DSO FW thread but being forced to do other things like spray weed killer.

If you can suggest some graduate level texts on FPGAs I'd be grateful.  I've bought some books, but they're pretty basic undergraduate level texts.  I posted asking for recommendations, but no one suggested anything at the sort of level I was looking for.  I'd *really* like to find a graduate  level monograph on the IC design aspect of FPGAs so I have a better understanding of the actual hardware at the register and interconnect level.  My search attempts produced longer lists of introductory material than I wanted to wade through.



Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: BrianHG on August 05, 2018, 07:10:25 pm
Reading everything so far, I personally would choose a CycloneV or Xilinx equivilant low end dev board with embedded Arm and hopefully 2 banks of DRAM, 1 dedicated for ARM software and another for high speed sampling with as a plug, a HDMI/VGA output.  Start out with a cheap home made 500MSPS converter as these dev boards will struggle to interface any faster, they might even limit you a bit slower.

I know this will have an embarrassing cost below 800$ total, even if you need to make you own custom ADC daughter board, but the code you test and develop will be identical to your super fast final high end final product.  All the tricks you will need to speed up the CycloneV to deal with a 500MSPS ADC with it's 400-600Mhz DDR3 ram and real time processing will be equivilant to when you upgrade to an Altera Arria FPGA to handle to speed of the DDR3/4 1.3GHZ ram and 2-3GSPS ADC.  Your development and learning cure will be the same and with a compiled project, you will be better suited in per-compiling and being able to select whether you will need a Arria, or Stratix FPGA from Altera (or Xilinx equivilant) to make the jump into multi-GHz sampling.

I am not sure of the level of capability in Xilinx's IDE, but I would personally use 'System Verilog' instead of 'Verilog' since at the time when I started my video scaler, System Verilog automatically tracked and handled mixed unsigned and signed registers for the component color processing math I required when feeding Altera's DSP blocks with less headache than the regular old Verilog.  Things may have changed since this was 5 years ago.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 05, 2018, 08:02:34 pm
I currently have an Avnet MicroZed, Digilent Zybo Z7-20, BeagleBoard X15, Terasic DE10-Nano and a Zynq 7010 based Instek GDS-2072E DSO with a 500/1000 MSa/S dual channel ADC.  So I think I'm pretty well equipped for the initial work.  I had the good fortune to get the Instek for $244 delivered from Amazon.

My current plan is to work through the MicroZed and DE10-Nano tutorials, doing each one on both platforms.  That will give me a good idea of the quality of the tools and a basic sense of how much portability is possible.

While a full feature synthesis and fitting package is clearly beyond the ability of one person, I have used lex and yacc for professional work, am familiar with optimization codes and device level physics.  So for the very narrow range of tasks I'm pursuing, writing my own is not impossible.  Just far more work than I should like.

This started as an attempt to escape from bad Chinese FW.  Then I tried to buy my way out and bought a $20K DSO at 1/2 price which I returned and then got a week or so of demo time with another A list DSO.  Much to my dismay and horror, even $20K would not buy me a DSO which functioned properly. After what I've seen, I don't think any sum would get me a sensible and bug free instrument.

Having written large, complex pieces of software which were bug free, I don't think buggy products are inevitable.  As I wrote in a 4 page memo I sent to the 2nd A list OEM.  "Bugs in software may be inevitable, but shipping them to customers is unprofessional."

I shudder at the thought of your having read this whole thread.

If you know of good discussions of software engineering in the context of HDLs I'd be very grateful if you would post or send me  links.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 05, 2018, 08:56:43 pm
1.  Enable input and enable output, where the enable is a DFF with the same pipe size as the function allowing the data flow to go on and off at any point, with an embedded parameter size configuration control.

We have apparently invented the same wheel.

-a
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: Bassman59 on August 05, 2018, 09:17:50 pm
Having written large, complex pieces of software which were bug free, I don't think buggy products are inevitable.  As I wrote in a 4 page memo I sent to the 2nd A list OEM.  "Bugs in software may be inevitable, but shipping them to customers is unprofessional."

That's true for any product, I think.

But let's step back even further. The previous discussion was all about synthesizing logic and fitting it into the resources available in the target device. That process, which can seem unwieldy because of the tools, is actually straightforward.

There is a significant assumption about it, though. The assumption is that the logic you wish to implement in your chip is functionally correct. That is, if you take a design which is functionally correct and you synthesize it according to the rules and you constrain it properly and you meet the constraints, it will work.

And the truth is that verifying that your logic design is functionally correct is hard. Simulation is part of the verification process. Writing a comprehensive test bench is not trivial, especially when your FPGA talks to a peripheral as a data source or sink. (This means you need to obtain, or more likely write, bus-functional models of the things you connect to your FPGA, and then you have to verify that the models are correct!) The test bench should do more than generate a clock and reset, and staring at waveform displays isn't verification. It just shows you what is happening for a given set of conditions, and doesn't tell you what it should be doing for that given set of conditions.

A test bench should apply test vectors and verify that the output is what is expected. Yes, it should be obvious, but it must be stated -- you have to know what you should get out of the circuit for a given input!

Now there are a bunch of people on this forum who are manly men and refuse to do any kind of simulation. Their design skills are clearly impeccable and they don't make even the smallest of typos. Or, they say, "I'll use ChipScope, it's faster than writing a test bench." Except when a place-and-route cycle takes an hour, and if you forget to include a signal to monitor, or make some other omission, then it's another hour wasted. Or, sometimes the case might be that the debugger core makes the design not meet timing, and then you can't trust its output.

You can't do serious FPGA design and development without understanding how to functionally verify a design.

Quote
If you know of good discussions of software engineering in the context of HDLs I'd be very grateful if you would post or send me  links.

I have not seen any. Hell, given that FPGA design is all text-based design entry these days, you would think that every professional FPGA designer has embraced source-code control for their designs. But you would be wrong.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: BrianHG on August 05, 2018, 10:00:51 pm
Now there are a bunch of people on this forum who are manly men and refuse to do any kind of simulation. Their design skills are clearly impeccable and they don't make even the smallest of typos. Or, they say, "I'll use ChipScope, it's faster than writing a test bench." Except when a place-and-route cycle takes an hour, and if you forget to include a signal to monitor, or make some other omission, then it's another hour wasted. Or, sometimes the case might be that the debugger core makes the design not meet timing, and then you can't trust its output.
For major projects, like my video scaler, I have each function in it's own project with it's own test bench simulation.  I simulate and exhaust all the possibilities for each individual function or combination of some small functions together.

Next level up, I have my main project which uses each sub function project wired together, basically a top hierarchy, and simulate that one as far as I can.  I am bound by certain limitations here at this level since it can take hundreds of millions of clock cycles just to boot everything up to a synced functional state.  The PC would take forever to simulate this, or, at least 5-6 years ago, with a 2 core cpu, this was the case at the time working with a freeware web version of Quartus.

Without at least these 2 to 3 levels of simulation, what I have achieved would be impossible.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 05, 2018, 10:09:16 pm
ROFL!!!

Bassman59 and BrianHG, you make my heart glad.  I'm not the only one who understands the problem and how to address it.

The test bench has to include every case you can think of that might arise.  And writing good test cases is far more art than science.  As a summer intern at a major oil company, one of the scientists was having problems with a 3D FFT on a parallel machine (Intel i386 Hypercube).  He mentioned it to me.  I told him to run two test cases. an input which was a central spike and an input which was all ones.  He found the bug in 15 minutes.  I was quite astounded he didn't know to do this.

But, no, I would not be surprised that the majority of FPGA designers behaved like the majority of programmers and did not use version control. In fact, that is precisely why I think I can do better than the norm.

If it's not a "kick the tires" exercise, the first thing I do on a new system is install RCS to manage the administration files.  I have a system with swappable drives and over two dozen disks which I use to test things. Twenty years ago my ISP  starting bouncing all my outgoing mail back.  I called support ( a friend owned the company).  The support guy said it must be a mistake in my sendmail.cf.  To which I responded, "My sendmail.cf has not been changed since "date" and the RCS log says that change was due to a change of yours on "date".  I never mentioned that the owner was someone I knew.  But the support guy got the message.  I could prove that it was not a mistake on my part.  I forget what they had munged, but they did find and fix it.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: hamster_nz on August 05, 2018, 10:23:30 pm
ROFL!!!The test bench has to include every case you can think of that might arise.  And writing good test cases is far more art than science. 

Have you tried any formal verification?

You define how your component has to perform and the toolset tells you if your design meets those requirements.

See Clifford Wolf's Slides at http://www.clifford.at/papers/2016/yosys-synth-formal/slides.pdf (http://www.clifford.at/papers/2016/yosys-synth-formal/slides.pdf)

Quote
Formal verification uses modern techniques (SAT/SMT
solvers, BDDs, etc.) to prove correctness by essentially doing
an exhaustive search through the entire possible input space.

It is pretty good at actually proving that RTL works as required... I've dabbled in it only a bit.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: RoGeorge on August 05, 2018, 10:57:25 pm
I would not be surprised that the majority of FPGA designers behaved like the majority of programmers and did not use version control. In fact, that is precisely why I think I can do better than the norm.

Prepare to be surprised.
Professional developers, as far as I seen, do use version control, testbenches, simulation, boilerplates, automated and manual testing, continuous integration, and more.

It's obvious that an open source digital oscilloscope will cost more, both in hardware and effort compared to on the shelf scopes, but it might come with great advantages in the long run. That is why I would like to see such a project, too, but there might be a reason for the lack of any such project yet. IMO, trying to make it compatible with many FPGA manufacturers is unrealistic. I will stick with one vendor, at least for the beginning.

Anyway, before throwing big $$$ for high speed devboards, I will suggest starting small, with a modest ADC + RAM + Zynq, just to identify where the major effort sinks are, and to see if people are willing to crowd around the project or not.

From there, it may fly or not, wish all the best to the project.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: rhb on August 06, 2018, 12:21:52 am
If you know of good discussions of software engineering in the context of HDLs I'd be very grateful if you would post or send me  links.

I have not seen any. Hell, given that FPGA design is all text-based design entry these days, you would think that every professional FPGA designer has embraced source-code control for their designs. But you would be wrong.


Bassman59 has experience with FPGA developers.  I just have experience with software developers.  Most of whom I have found severely underwhelming.

I am not considering developing an open source DSO.  Never have.  Except for bespoke high performance instruments made up from eval boards it's not economic. 

My goal is  FOSS FW for COTS gear.  But i know from experience that taking into account a more general set of requirements typically pays off in the end.  Even if you don't implement any of it.  Understanding the issues can keep you from making serious design mistakes.

How will I learn what the differences between vendors are except by doing exactly the same thing on multiple systems?  I'm really tired of the "pick one and just do that" BS.  In this case I have a 50-50 chance of picking the worst one.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: daveshah on August 06, 2018, 06:19:04 am
If you really want to make sure your design is good, you could have a look at Yosys and SymbiYosys, a FOSS formal verification suite for Verilog, that can prove your design is correct for all cases.

I would recommend the ZipCPU blog (http://zipcpu.com) as an introduction to using Yosys for formal verification.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: carl0s on August 06, 2018, 07:52:34 pm
FPGA Hell. Jesus. I'll pray you stay out of there.

I'm in RGB TFT Hell. It should be easy, but I'm exhausted.
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: legacy on August 06, 2018, 09:16:39 pm
Yup, a lot of my colleagues use git for both C/C++, Ada and vhdl  :popcorn:
Title: Re: FPGAs with embedded ARM cores, who makes them?
Post by: legacy on August 06, 2018, 09:27:52 pm
Quote
gave me a strong confidence when making the change that the result would still work.

want new features?

open a branch (on git)
commit everything
test it on your workbench with test-cases made on stubbed modules, to see if what you have modified has changed/compromised the behavior of some module, and don't forget to simulate the whole design for system integration to check that everything is globally OK

(all of this can be automatically checked, even by Makefile)

does the result still work?
yes: comment the commit as step-milestone, and go ahead
no: revert to the previous commit

it has become the common practice nowadays, especially for the team working over the internet :-//