Author Topic: Learning FPGAs: wrong approach?  (Read 55293 times)

0 Members and 2 Guests are viewing this topic.

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Learning FPGAs: wrong approach?
« on: June 15, 2017, 01:02:51 am »
Off in another thread:
Quote
  Do you know if there's any document which would describe the structure of the UDB in details?
(UDBs are the little FPGA-like block in a Cypress PSoC microcontroller.  But this question is generic to all FPGAs, CLPDs, and similar devices.)

So I (a software engineer with an EE degree from pre-FPGA times) have looked at various FPGAs at various times, and there always seems to be a hump that I have trouble getting over.   And I'm wondering if that's because of where I start - with the datasheet that describes the internal structure of the device.   Usually they go on about product terms and LUTs and output macrocells and so on.  And I know what each of those are (more or less) and how they work as individual pieces, but I lose the thread when I try to figure out how they might be combined to build larger structures.  (I mean, in principle I can build a UART out of shift registers, and I know how to build a shift register from a PAL, but...)

But that's completely the wrong approach, isn't it?
For the most part, if you're designing with an FPGA or CPLD, you should be designing at a MUCH higher level, with Verilog or VHDL or some schematic-entry tool.  And the tools knows about the resources available on a give chip, and will combine them appropriately or tell you to get a bigger chip with more macrocells (or something.)  Yeah, I can probably eliminate the 512byte FIFO from the chip that only has 320 bytes of embedded RAM, and at some point knowing a bit about the internals can help me optimize my thinking ("there's not penalty for adding extra terms to THAT equation.")  But otherwise ... it's like when you think about designing a SW algorithm, reading up on the multiple internal buses of your microcontroller is not the best starting point...

Am I getting closer?
 

Offline daybyter

  • Frequent Contributor
  • **
  • Posts: 397
  • Country: de
Re: Learning FPGAs: wrong approach?
« Reply #1 on: June 15, 2017, 01:35:46 am »
I would just read a verilog tutorial.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #2 on: June 15, 2017, 01:43:32 am »
To star, both Altera and Xilinx are the top dogs here.  They both can be programmed in VHDL and Verilog.  I personally prefer Verilog since it is simpler language and you can still do all you want in Altera's Quartus.

Altera's Quartus allows you to enter visually gates, flipflops, ram, fifo, anything you like graphically and wire them together graphically as if it were a digital schematic.  This includes Verilog code you have written whose inputs and outputs would be represented in quartus as a block device with ins and outs.  On you schematic in Quartus, you then wire these function blocks to IO pins, have a selected chip for the project & compile.  Then, you chosen FPGA will be created and programmed doing what you laid out.  The compiler will also create a report telling you how much of the FPGA gates, memory & IO pins you used.  Also, how fast the clock can function.

Now, as for the USART, a simple set of DFF can serial shift data in to make a serial decoder, but, if you want there are pre-made blocks, to do this, or, even public domain Verilog/VHDL blocks which already conform to standard RS-232 standards which you can add anywhere in your Quartus schematic layout which eventually becomes your chip.  Dont worry about the size of what you are doing for these smaller functions like gates and serial decoders and even small dual port rams or fifos, I doubt you will fill even a % on the smallest FPGA.

Learning to read the data sheets on these devices helps so you understand when they say the IC has total 256kbit ram + 50k logic gates & how many IOs at what speed and voltage.

Also, look on youtube for tutorial videos for Quartus so you can see some examples of a user creating a device.  I cant speak for the quality or complexity of such videos, so search for beginner, try watching a few at 2x speed, if something looks interesting, restart the video at 1x speed.

Best of luck.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #3 on: June 15, 2017, 01:59:38 am »
Oh, 1 additional thing, Quartus II Web edition is free and you can play with it without any programmer or IC, it's fully functional.

When Youtubing, searching Quartus Tutorial, look for schematic entry & how to create things like ram.
Don't worry about Quartus version numbers, it's basically the same thing from version 9 and up.  Only a little different visual improvements.

https://www.youtube.com/results?search_query=quartus+tutorial

You will find basic schematic entry and how to add verilog code as well as plenty of stuff, like simulation.  (Note that simulation has slightly changes over quartus versions over the years, for these, look at the latest Quartus versions on how to do that.)
« Last Edit: June 15, 2017, 02:03:27 am by BrianHG »
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1640
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #4 on: June 15, 2017, 06:44:22 am »
I wouldn't study the cells of a FPGA too long if you want to get started. It's what turns out to be the least common dominator, and they just put a lot of them on 1 chip so it can be flexible to use.

What is most important to remember from a cell:

- Each cell has a LUT. Most entry level devices are 4-bit wide LUTs. This is what actually encodes the logic you programmed.
- Each cell has 1 flip flop. I.e. 1 bit of high-speed of 'memory'. This is rarely used for memory as such, but more for state information.

Then there are a ton of switches to route signals. Most FPGA's also contain a adder block in each cell, as it tends that those are often used.
FPGA's also have hard logic these days, which you could implement with cells - but as they are so often used the vendors have baked them on the chip. Most common options are hardware multipliers and embedded SRAM blocks. More advanced FPGA's even integrate complete ARM cortex CPU's or DDR controllers onchip at fixed locations.

Programming is best done in a high level language. Although you could do it graphically - I wouldn't recommended. It's too tedious after a short while.

I started out with VHDL and still doing that today. It's much more strongly typed opposed to Verilog which is loosely typed. Pick your poison.

In HDL you should describe how signals should behave and change at (clock) events. In VHDL processes you can actually write rather high level code, complete with functions and other abstractions. VHDL is actually a typical programming language in that respect: you can also write non-synthesizable code (useful for test benches for example).

I think one important tricks to understand is how statements are synthesized to hardware. RTL viewer is your friend. Usually if you write more complex statements, more hardware is being added. If you try to do more computation in 1 go, combinational paths will be longer and thus the maximum clock frequency of the design will go down.

This is actually no different than MCU design. You'll also look at the assembly. You'll also worry about execution times and sizes.
« Last Edit: June 15, 2017, 08:56:14 am by hans »
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #5 on: June 15, 2017, 07:55:14 am »
I agree that looking in too much detail at the underlying hardware structure of the FPGA isn't useful, beyond the high level parameters that tell you how much logic resource you have available.

For example, if an FPGA is sold with 10,000 logic cells, that means roughly 10,000 bits that can be registered and stored. The look-up tables associated with them mean you can have, to a good first approximation, any logical relationship between one set of bits and another that you like.

FPGAs also include 'hard' logic blocks, like dual-port RAM, multipliers and PLLs. These are useful, and you should definitely learn how to instantiate, configure and use them, but there are other things that are worth learning first and getting to grips with.

Don't use schematic entry to design your logic. Seriously. It's fine for academic purposes as a way to introduce the basic concept of a configurable system, but it's not portable, doesn't scale, takes much longer to do non-trivial designs, and is difficult to maintain. Walk away before you even start, and instead, learn VHDL (my preference) or Verilog.

By far the most important things to get your head around are:

- writing 'code' for an FPGA is not like writing code for a microprocessor. It's NOT a sequence of instructions to be executed one after the other, even if it occasionally looks as though it might be. Each process you create describes a piece of hardware which exists in parallel with all the others you've defined, and is always carrying out its prescribed function independently. Nothing at all inherently happens sequentially. If you want different things to happen on consecutive clock edges, you need to make sure something changes on one edge which can then be read and taken into account on the next edge.

- always, always, always be aware of when signals get updated, and when they are required to be valid with respect to other signals. Get to grips on day one with the concept of a clock domain. After lunch (but still on day one), read up on metastability, how it happens, and how to stop it becoming a problem in your design. Be in no doubt whatsoever that FPGA vendors and their tool chains do NOT solve this problem for you, but they do give you everything you need to solve it yourself.

I make a big deal of this because it's a ridiculously easy way to mess up a design, in a way which is not obvious to look at, and which causes occasional (or perhaps frequent) errors that make a board unreliable. Avoid a world of hurt later on by taking clock relationships seriously right from the very start.

Here's an example to spoil your day. Suppose you have an FPGA design which is a simple square wave generator, and the period of the square wave needs to be programmable.

Generating the wave is easy. You create a counter, clocked from some master reference clock (call it FCLK), which counts up from 0 to some programmed value CLK_PERIOD, and when it matches, you toggle the output and zero the counter.

CLK_PERIOD needs to be set, let's say via an SPI interface. So, you create a simple SPI slave, which can be written by an external microcontroller. That SPI slave is clocked by an external pin (SCK). The new value of CLK_PERIOD is updated when the last active edge of SCK is received.

Now consider what happens just at that moment. Let's suppose CLK_PERIOD is changed from 0x7F to 0x80.

On every edge of FCLK, the value of CLK_PERIOD is being read. That's fine if all the bits in CLK_PERIOD are actually valid at that instant. But what if SCK and FCLK have just the wrong phase relationship, so at the time the counter is being compared with it, some bits have their new values and some have the old value? And what if a bit is just on the point of changing, which makes the comparator metastable?

At best, you get a single cycle with the wrong period, ie. neither the old value nor the new one. At worst, your counter runs off into the weeds and it takes 2^n clocks before the output starts toggling again (where 'n' is the number of bits in your counter).

The logic of your code might be completely fine. Functional simulation will never show a problem. But once every few seconds, minutes or hours, your real hardware will malfunction.

Ways to solve this problem include:

- make FCLK fast enough that SCK can be sampled as though it's an asynchronous signal, ie. don't use it as a clock at all, but instead, look for changes of state in SCK in a process that's driven from FCLK.

- use a dual-port RAM as a FIFO. Push setting changes into the FIFO from the SCK side, and read them out in the FCLK domain. The vendor's FIFO logic includes robust features for crossing clock domains. (Typically, this involves converting addresses to and from Gray codes, but you never see this in your own code as it's done for you when you instantiate a FIFO).
 
The following users thanked this post: hans, marshallh, Yansi, Joeri_VH

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #6 on: June 15, 2017, 09:34:34 am »
There are a few things that are worth thinking about. Or maybe it is just rambling.

FPGAs are not CPUs
CPUs hold this state in RAM and legislatures, and change this state slowly, at best only a few words at a time. FPGAs can change a lot of state all the time if you let them. If you code for FPGAs as though they are CPUs, then you will miss the point of FPGAs

A design for an FPGA is static - for the most part you can't add more state information at 'runtime' - you do not have the FPGA equivalent of "malloc()" or "new" to add more logic. Instead try to think of your data flowing through your designs, much like how signals flow through a circuit.

Overthinking and Overdesigning
For the most part, FPGAs are just chips with inputs, outputs and clocks. Your job is to get the output pins to change as required by the inputs and the passage of time (as measured by the clocks). Your design does this by keeping track of information in a hidden, internal state vector, which evolves from cycle to cycle.

The simpler and more concise your description how this happens, the better the end result will be.

It is easy to overthink the problem, esp for a newbie. Ask yourself often "is this the simplest way to do this?".

As a general rule deeply nested HDL code with lots of IF statements is bad. If you do this, you are missing something about the problem and should see if you can decompose the problem more.

A simple design is easier to debug than a complex one.

Structure or behavioral code
Learn the distinction between structural and behavioral design. Your design is made up of little bits that behave in a specific way, connected together to make your design. Try to keep these aspects of your designs as separate as much as possible, at least while you are starting out.

Structural is like designing a schematic or PCB - how things are connected. Graphical tools work good at this, and this works well when using IP blocks, But HDL code isn't too good at this - it gets too verbose

Behavioral is like making the 'model' describing how an OpAmp works or other complex component works. This is somewhat hard to do graphically, but works well when using HDL code.

If you try to describe how things are connected and how they behave at the same time, it doesn't work well in either code or in graphical tools. Avoid this ugly middle ground!

Different FPGAs from different vendors
If you ignore the more magical parts of an FPGA (e.g. PLLs), at the bottom of the heap are the FPGA's primitives. LUTs and FlipFlops, ALMs, Slices, whatever - these change between vendors and devices, but they are very simple and pretty generic.

You could build anything you want with enough four input LUTs and D-flipflops. It might not be most efficient, but you can do it. Likewise you can also build anything you like with a (impractically) wide enough RAM and an single address register (e.g. a 256x8 bit RAM can act as an 8-bit counter). For the most part, the different FPGA architectures have just decided to put a different stake in the ground along this continuum.

A design that is optimal for one FPGA architecture is usually pretty close to optimal for another - changing vendor or the part is not usually going to make your design much easier.

Loops
If you write software for a while you have the power of unbounded loops fixed in your head. You need to retrain yourself to achieve your design goal without them.

This is like learning to write non-blocking code, but taking it to the next level. Don't try to be cunning and fight it or work around it - you can't win. :)

Time
Time for FPGA designs is very different for time in software or electronics. Everything happens all at once (in parallel), and then things happen so slowly (you can only do so much in one clock cycle).

Much like the shift to when you start shifting from DC to doing AC designs in electronics, or working in with maths in the frequency domain, or moving to writing real-time software you didn't know how much you didn't know what time really means until you finally get to grips with it. Once you do understand it, you can't see how it could really be any other way.
« Last Edit: June 15, 2017, 09:36:18 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: soFPG

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #7 on: June 15, 2017, 09:56:31 am »
PLLs

PLLs are bad beasts. They are only present in recent devices and they are usually vendor specific. E.g. Spartan2 (obsolete, but still good as it's 5V tolerant therefore very useful to design PCI-boards) and Spartan3 don't have PLLs, whereas Spartan6 devices include a few of them, but in order to use them you need to pass through their IP-wizards which automatically instantiate resources according to user's requirements. Nothing wrong with this, it simply means the approach is specific to Xinlinx (Altera, others ..)

I say it because from my point of view, everything is related to HDL at the RTL level is just 'HDL', I mean a couple of vhdl files plus constraints for a simulator.

I don't play with vendor tools until I have a workset. Instead I spend the 70% of the time on the simulator, where a few constraints are different from those required by the final target. This is the first thing one should learn as it's the main approach's rule, and it also means I can't simulate PLLs (neither it makes sense) until I move to the synthesizer, where the block is physically defined and properly instantiated.

On simulator I usually describe PLL block as a black box entity, with it's behavior is idealized by a function (I can write it in C, matlab, and pass it to the simulator through wrappers).

In other words, playing with HDL is a mashup between pure logic behavior and pspice. This is enough for a preliminary workset, then (the last 30% of your development time) you need to experiment and verify on the physical target if time-constraints are really all satisfied.
 

Offline chris_leyson

  • Super Contributor
  • ***
  • Posts: 1541
  • Country: wales
Re: Learning FPGAs: wrong approach?
« Reply #8 on: June 15, 2017, 10:08:26 am »
Quote
writing 'code' for an FPGA is not like writing code for a microprocessor. It's NOT a sequence of instructions to be executed one after the other
Good advice from AndyC_772 it's an easy trap that new players can fall into. Always clock logic from a single clock source if you can and never ever use asynchronous logic. Use clock enable inputs to slow down a master clock if your logic needs a slower clock source. Always register or latch input signals to avoid metastability issues and if you need to cross clock boundaries then instantiate an asynchronous FIFO.
These days you don't need to do 'bare metal' design and you don't need to focus on the internal architecture of a particular FPGA family, it's all done with core generators now whereas back in the day with older generation silicon with limited resources you might have had to hand craft logic to save on resources.
« Last Edit: June 15, 2017, 10:35:10 am by chris_leyson »
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #9 on: June 15, 2017, 10:11:35 am »
Always remember that you are not writing code, you are building hardware.Think about how you would use logic ICs to do the job.
Even things that look like sequential code, specifically VHDL process blocks, express priority, not sequence.
You don't need to know anything about LUTs, Slices etc. until you're getting into advanced optimisation.
 
Start with a devboard that has a fairly big device. Even for simple designs, place & route will be faster - you can just ignore the stuff you won't be using.
You will be using state machines a lot, so need to understand them.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #10 on: June 15, 2017, 10:19:41 am »
legacy, it sounds as though you're making life difficult for yourself. Do you really not know which device - or at least, family of devices - a given design will target? And what do you have against PLLs? They're essential tools that have been present in every major device family for the last decade.

My preferred family is Altera Cyclone IV E. Older parts are functionally similar but less good in every quantifiable way. Cyclone V parts are bigger and more costly, and Cyclone 10 isn't yet readily available. I probably could switch to another vendor, but in the absence of a compelling reason to do so, it would be a lot of work for little or no benefit.

With that in mind, I usually start a new design along the following lines...

- how many I/O's do I need? Design the rest of the schematic, then see how many pins end up on the empty FPGA page. Add a few more for test points, and for the feature I'll find I need by the time the design is at rev C.

- create a dummy FPGA project in Quartus, with all the I/O pins defined with their correct direction and I/O standard (LVDS, 3.3V CMOS, 1.8V CMOS and so on).

- allocate the pin-out of the device, ensuring all the rules about which pins can go where are met. For example, on Cyclone IV E, LVDS pins can only go in a 2.5V bank, differential inputs must be at least a certain number of pads away from single-ended outputs, and so on.

- estimate the logic capacity requirement of the design. This is hard. Often I'll actually write a complete first draft of the code at this point, and hold off completing the PCB until it's done. It's amazing how many bugs get spotted and fixed at this stage.

- finish the PCB and send off for manufacturing

- simulate the VHDL in ModelSim. The Altera free version of this includes complete behavioural simulation models for all the hard IP blocks (memory, PLLs, DSP and so on), so I can simulate the entire system without having to worry about these. Yes, it's vendor specific, but so is my board, so I really don't care.

- debug the VHDL. This is probably the most coffee intensive part of the whole process, up to this point.

- write the SDC file. This takes over as the most coffee intensive part of the process.

- start writing test code for the main processor. Keep doing this until real hardware arrives.

- when real hardware arrives, plug it in and begin testing. By this point, I should already know that my FPGA will, to a good first approximation, work as intended. Minor changes and enhancements can be tested functionally on real hardware. More significant changes require a return to ModelSim.
« Last Edit: June 15, 2017, 10:23:44 am by AndyC_772 »
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #11 on: June 15, 2017, 10:29:20 am »
The first hardware you should use when starting out with FPGAs should ALWAYS be a devboard. Any devboard. Even better if it has an on-board programmer. Most manufacturers ( and some third parties) do very cheap boards for most FPGA families.

There are so many other things that can make it not work, that you really don't want to be wasting time messing around trying to figure out if it's a hardware, programming  or code  problem.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #12 on: June 15, 2017, 11:26:41 am »
They're essential tools that have been present in every major device family for the last decade.

Does Spartan2 have pll ? No!
Does Spartan3 have pll ? No!
Does Spartan6 have pll ? Yes!

Is Spartan2 5V tolerant? Yes!
Is Spartan3 5V tolerant? No!
Is Spartan6 5V tolerant? No!

Before babysitting people, understand what people needs.

 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #13 on: June 15, 2017, 11:33:10 am »
They're essential tools that have been present in every major device family for the last decade.

Does Spartan2 have pll ? No!
Does Spartan3 have pll ? No!

Spartan 2&3 has DLLs, which serve the same purpose.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 
The following users thanked this post: hans, Someone

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #14 on: June 15, 2017, 02:37:47 pm »
I have used a simulator only once and it was a simple 4 bit counter.  I just had to try it...

If my primary state machine has, say, 100 states (which uses a 100 bit 1-HOT state vector) and controls a few dozen outputs that kick off other processes, I just can't see how the simulator is going to help.  It may be several thousand cycles in before I get to the part I want to see.  I may actually be using a logic analyzer at the FPGA level to analyze a hang in the Operating System.  Maybe something to do with booting the system from the Compact Flash.  There a lot of cycles before I get this far.  What sector did I read?  What did the data look like?  No, printf() is not a solution here!

So, I try to use a board with enough IO to feed a fairly wide logic analyzer.  Now I can create some kind of trigger that starts the capture just before the spot I am interested in and not have to wade through a bazillion nanoseconds of trace.

Bottom line:  I head straight to hardware.  This takes time because the system has to be fully synthesized, placed and routed.  It is probably not the most productive way to design FPGA projects but it works for me in my hobby world.

I don't have enough time with Vivado and a real project to know if the in-circuit logic analyzer will be a help.  From what I have seen, it is a really nice feature.  Once you become an uber-guru of the constraints file.  What a PITA!

I have seen designs where the code writes straight to the LUT.  All of the logic is specified around LUTS and DFFs.  I don't tend to understand it...

I write my VHDL as though I want to understand it several years later.  Just simple code, no tricky bits.  I let the toolchain worry about the details.  Again, this is not the high performance approach but it works for me.

 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #15 on: June 15, 2017, 02:41:45 pm »
I'd agree that to get something happenning quickly & get a feel for things, avoiding simulation is probably a good start as it's yet another tool to learn before you see anything working.
You may or may not chose to use it later on - personally I've never used one.
You need to trade off the time setting it all up versus the savings in compile/program time not having to do place & route every iteration.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8517
  • Country: us
    • SiliconValleyGarage
Re: Learning FPGAs: wrong approach?
« Reply #16 on: June 15, 2017, 03:03:53 pm »
To star, both Altera and Xilinx are the top dogs here.  They both can be programmed in VHDL and Verilog.
and schematic capture as well. or Abel or  or AHDL.

As an FPGA designer you don;t deal with the 'guts' of the fpga. that is handled by the synthesizer and mapper.
simply make your schematic / code or mix thereof , click compile and blast it in the chip. done.
the tools come with extensive libraries with almost anything you can think of ( including whole cpu's and peripherals )
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Re: Learning FPGAs: wrong approach?
« Reply #17 on: June 15, 2017, 03:37:13 pm »
the internal stuff gets important when optimizing your design, when you are a tight ass with low budget, or need every last MHz out of it

http://zipcpu.com/blog/2017/06/12/minimizing-luts.html

if you dont care about $ you can "program" fpgas in python, or even GO like javascript fed kids do these days https://reconfigure.io/ (on $3K dev board haha)
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline julian1

  • Frequent Contributor
  • **
  • Posts: 735
  • Country: au
Re: Learning FPGAs: wrong approach?
« Reply #18 on: June 15, 2017, 10:54:08 pm »
I like the Lattice ice40 fpgas. I believe they're the only one's where the "guts" meaning LUTs and inter-connect as well as bitstream format have actually been reverse engineered and documented by Clifford Wolf. If you want to tinker at that low-level, then the code is available as an example. I don't believe any of the other vendors - eg xilinx or altera document this stuff. In fact the opposite is true and they all encumbered by patent protections.

It helps that there's a lightweight and open-source verilog compiler and p&r available - sufficiently mature to synthesize the pico RISC-V core. From memory, I believe the toolchain even beats Lattice's proprietary toolchain - in terms of reduced lut counts and better timing.
 

Online Sal Ammoniac

  • Super Contributor
  • ***
  • Posts: 1672
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #19 on: June 15, 2017, 11:26:07 pm »
I like the Digilent FPGA dev boards. They have a good selection and they're not too costly.

Don't worry about how the FPGA fabric works at first--you really don't need to know those low-level details as a beginner. Later, when you have more experience, you can explore the inner workings of the part.

Simulation is your friend. If something doesn't work in simulation, it's not likely to work on the chip. Learn how to write test benches at the same time you learn how to write Verilog or VHDL code. You'll save lots of time in the long run.

You probably already have a good grasp of state machines, but if not, bone up on them because you'll be using them a lot when working with FPGAs.

Start with simple projects, like a simple serial UART or SPI before trying to tackle something like HDMI.

Strive for simplicity. Complex designs are rarely the best designs.
Complexity is the number-one enemy of high-quality code.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #20 on: June 16, 2017, 12:50:49 am »
Off in another thread:
Quote
  Do you know if there's any document which would describe the structure of the UDB in details?
(UDBs are the little FPGA-like block in a Cypress PSoC microcontroller.  But this question is generic to all FPGAs, CLPDs, and similar devices.)

So I (a software engineer with an EE degree from pre-FPGA times) have looked at various FPGAs at various times, and there always seems to be a hump that I have trouble getting over.   And I'm wondering if that's because of where I start - with the datasheet that describes the internal structure of the device. 

Yah, not the best place, as you've figured out for yourself.

If you're old school I presume you shouldn't have any trouble with the actual logic design for what you want to do. So start there. Start with the familiar and work towards the unfamiliar.

Design your logic with discrete flip-flop, registers, gates, whatever you need - but don't reach for your dog eared TI 7400 logic guide, just invent your own parts as you need because you can have any part you want. With the magic of HDLs and FPGAs you can make, and interconnect, those parts inside the FPGA. Seventeen bit adder? No problem. Twenty-nine input 'and' gate? No problem. You get the idea.

Next step would be to learn one of the HDLs. My recommendation would be for Verilog, but in 30 seconds there will be 10 fan-boys coming along to tell you that I'm muddle-headed and VHDL is the only true way to the light. (If HDLs were church denominations VHDL would be the Calvinists and Verilog the Pentecostal Baptists; although I suspect some of the VHDL guys would quite like to find a Plymouth Brethren HDL.  :))

Once you've got the beginning of a grip on your chosen HDL, take the 'discrete' design you already made and implement the discrete parts you 'made up' in HDL, interconnect them in HDL, scribble a little HDL test-bed and hit the simulator.

In the process of doing this I think you'll find what I did, that you start thinking in HDL instead of flip-flops, gates, etc and you'll start to be able to do your design work directly in HDL.

The vendor tools for actual FPGAs can be quite a struggle to set up and get running with - not what you want at the 'hello world' stage. I'd recommend that if you're going Verilog that you grab the open source Icarus iverilog simulator and have a play with that and get yourself comfortable with some working results in simulation before you try and get them anywhere near an actual FPGA. If you want to go VHDL I'm sure someone can point you at some tools.

But otherwise ... it's like when you think about designing a SW algorithm, reading up on the multiple internal buses of your microcontroller is not the best starting point...

I'd prefer cache as the programming analogy. Some algorithms are going to suck unless you understand cache coherency, cache occupancy etc. You can always get something that works on any architecture, but getting it working well may mean tuning for the cache implementation on each architecture that you run it on.

Similarly, you can probably get a design in Verilog to work on any FPGA, but you may need to dig into the specific FPGA architecture to get it to work well, or take up a smaller capacity chip etc.

One area it is worth knowing the ins and outs of your particular FPGA is literally the ins and the outs. You can save quite a lot of grief by understanding how to use I/O cells to your advantage, and how to get the right clock into the right pin and distributed around inside the FPGA the right way.

As a software engineer you're going to have to keep reminding yourself that the HDL you're writing represents wires, gates and registers. It's all parallel and any temptation to fall back onto classic iterative programing habits will bite you in the backside. Every time your instincts tell you to write a for loop you're almost always looking for a Mealy/Moore state machine instead.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 
The following users thanked this post: chickenHeadKnob

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #21 on: June 16, 2017, 08:02:07 am »
if you dont care about $ you can "program" fpgas in python, or even GO like javascript fed kids do these days https://reconfigure.io/ (on $3K dev board haha)

LOL  :-DD :-DD :-DD
 

Offline jprozas

  • Newbie
  • Posts: 3
  • Country: es
Re: Learning FPGAs: wrong approach?
« Reply #22 on: June 16, 2017, 11:14:47 am »
This link is to learn digital design with FPGA (verilog) with open tools. (Spanish)

https://github.com/Obijuan/open-fpga-verilog-tutorial/wiki

Enviado desde mi Aquaris_A4.5 mediante Tapatalk
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #23 on: June 16, 2017, 02:27:09 pm »
if you dont care about $ you can "program" fpgas in python ...

You probably think you're joking, but human madness is already past that. Xilinx marketers take python very seriously, and even "scientists" from California University believe that python is the most efficient way to program FPGAs:

https://forums.xilinx.com/t5/Xcell-Daily-Blog/Best-Short-Paper-at-FCCM-2017-gets-30x-from-Python-based-PYNQ/ba-p/765899

https://arxiv.org/pdf/1705.05209.pdf
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #24 on: June 16, 2017, 02:37:52 pm »

You probably already have a good grasp of state machines, but if not, bone up on them because you'll be using them a lot when working with FPGAs.


A state machine is just a C switch statement inside the while(1) loop.  But, just ahead of the switch(), you need to define a default output state for every output signal you create.  Otherwise, you have to define the output state of every signal at every state.

Like this:
Code: [Select]

    process(Reset,Clk) is
    begin
        if Reset = '1' then
            state <= s0;
        elseif rising_edge(clk) then
            state <= NextState;
        end if;
    end process;


    process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,
                r_Button0, CCC, CondMet, BOSC_Flag, SavedSign, A_BUS(15), ShiftCount,
                SZ, ZR, DVDS, Result, Ones, OVR,
                CountShifts, ACC, IncludeEXT, EXTN, Rotate, AFR,
                BitCount, XIO_Device, XIO_Function, XIO_Modifier,
                DisplaySwitch,
                ConsoleXIOCmdBusy, ConsoleXIOCmdAck,
                PrinterXIOCmdAck, PrinterXIOCmdBusy,
                ReaderXIOCmdBusy, ReaderXIOCmdAck,
                DiskXIOCmdBusy, DiskXIOCmdAck,
                DiskReady, IAR,
                SingleStep, BreakPointActive, BreakPoint,
                PendingInterrupt, ReturnState_r, StartState) is
    begin
        A_BusCtrl <= A_BUS_NOP;
        ACC_Ctrl <= ACC_NOP;
        ACC_ShiftIn <= '0';
        Add <= '0';
        AFR_Ctrl <= AFR_NOP;
        BitCountCtrl <= BitCount_NOP;
        CI <= '0';
        CIn <= '0';
        CIX <= '0';
        CarryIndCtrl <= CARRY_IND_NOP;

        <and so on...>
       
        case state is
            when s0    => NextState <= s0a; -- use this to IPL
            when s0a  => if DiskReady = '0' then -- wait for disk to go not ready
                                      NextState <= s0b;
                                else
                                      NextState <= s0a;
                                end if;
            when s0b => if DiskReady = '1' and ColdstartHold = '0' then -- wait for disk to go ready and
                                                                                                     -- coldstart code to be copied

            <and so on>
 

There are two processes to create this FSM:  The first just changes the state according to the NextState value on every clock cycle.  In the case of a loop, the state may not actually change.  See the second process...

The second process does all the work and it is not clocked.  It is just a huge collection of combinatorial logic.

Here I defined default outputs for 10 signals (although they aren't shown in the snippet of FSM code).  In the real world, there are 49 of these default outputs and 117 states.

I didn't say anything about the 'sensitivity list' that starts out as
Code: [Select]
process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,

This sensitivity list tells the simulator which signals to monitor to decide to actually run the process.  If there are no changes to any signals in the list, the simulator won't evaluate the process.

This list is meaningless to synthesis but the synthesizer will whine if an input signal to the process is undeclared.  But it's just whine and snivel, the output works with or without the list.
« Last Edit: June 16, 2017, 02:44:42 pm by rstofer »
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #25 on: June 16, 2017, 02:50:06 pm »
if you dont care about $ you can "program" fpgas in python ...

You probably think you're joking, but human madness is already past that. Xilinx marketers take python very seriously, and even "scientists" from California University believe that python is the most efficient way to program FPGAs:

https://forums.xilinx.com/t5/Xcell-Daily-Blog/Best-Short-Paper-at-FCCM-2017-gets-30x-from-Python-based-PYNQ/ba-p/765899

https://arxiv.org/pdf/1705.05209.pdf

I think you're jumping to conclusions there. The first URL is talking about using Python (running an a dedicated or soft processor on the FPGA) with pre-packaged bitstreams for the FPGA fabric. So it's just about using Python to interface to things implemented on the FPGA. Just a short way down the page you'll find this quote:

Quote
PYNQ does not currently provide or perform any high-level synthesis or porting of Python applications directly into the FPGA fabric. As a result, a developer still must use create a design using the FPGA fabric. While PYNQ does provide an Overlay framework to support interfacing with the board’s IO, any custom logic must be created and integrated by the developer.

They do indirectly reference a Python to HDL tool, but the thrust of that page (and paper) is not on programming the FPGA fabric in Python.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #26 on: June 16, 2017, 03:10:22 pm »

You probably already have a good grasp of state machines, but if not, bone up on them because you'll be using them a lot when working with FPGAs.


A state machine is just a C switch statement inside the while(1) loop.  But, just ahead of the switch(), you need to define a default output state for every output signal you create.  Otherwise, you have to define the output state of every signal at every state.

Like this:
Code: [Select]

    process(Reset,Clk) is
    begin
        if Reset = '1' then
            state <= s0;
        elseif rising_edge(clk) then
            state <= NextState;
        end if;
    end process;


    process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,
                r_Button0, CCC, CondMet, BOSC_Flag, SavedSign, A_BUS(15), ShiftCount,
                SZ, ZR, DVDS, Result, Ones, OVR,
                CountShifts, ACC, IncludeEXT, EXTN, Rotate, AFR,
                BitCount, XIO_Device, XIO_Function, XIO_Modifier,
                DisplaySwitch,
                ConsoleXIOCmdBusy, ConsoleXIOCmdAck,
                PrinterXIOCmdAck, PrinterXIOCmdBusy,
                ReaderXIOCmdBusy, ReaderXIOCmdAck,
                DiskXIOCmdBusy, DiskXIOCmdAck,
                DiskReady, IAR,
                SingleStep, BreakPointActive, BreakPoint,
                PendingInterrupt, ReturnState_r, StartState) is
    begin
        A_BusCtrl <= A_BUS_NOP;
        ACC_Ctrl <= ACC_NOP;
        ACC_ShiftIn <= '0';
        Add <= '0';
        AFR_Ctrl <= AFR_NOP;
        BitCountCtrl <= BitCount_NOP;
        CI <= '0';
        CIn <= '0';
        CIX <= '0';
        CarryIndCtrl <= CARRY_IND_NOP;

        <and so on...>
       
        case state is
            when s0    => NextState <= s0a; -- use this to IPL
            when s0a  => if DiskReady = '0' then -- wait for disk to go not ready
                                      NextState <= s0b;
                                else
                                      NextState <= s0a;
                                end if;
            when s0b => if DiskReady = '1' and ColdstartHold = '0' then -- wait for disk to go ready and
                                                                                                     -- coldstart code to be copied

            <and so on>
 

There are two processes to create this FSM:  The first just changes the state according to the NextState value on every clock cycle.  In the case of a loop, the state may not actually change.  See the second process...

The second process does all the work and it is not clocked.  It is just a huge collection of combinatorial logic.

Here I defined default outputs for 10 signals (although they aren't shown in the snippet of FSM code).  In the real world, there are 49 of these default outputs and 117 states.

I didn't say anything about the 'sensitivity list' that starts out as
Code: [Select]
process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,

This sensitivity list tells the simulator which signals to monitor to decide to actually run the process.  If there are no changes to any signals in the list, the simulator won't evaluate the process.

This list is meaningless to synthesis but the synthesizer will whine if an input signal to the process is undeclared.  But it's just whine and snivel, the output works with or without the list.
This is pretty bad coding because it is prone to creating latches. As a rule of thumb you only have 2 signals at most in the sensitivity list of a process: clock and reset. If there are other signals then it smells fishy. The problem is likely better solved using a function instead of a process.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #27 on: June 16, 2017, 04:01:13 pm »
They do indirectly reference a Python to HDL tool, but the thrust of that page (and paper) is not on programming the FPGA fabric in Python.

This is all semantics.

"At a system-level the skill set necessary to integrate multiple custom IP hardware cores, interconnects, memory interfaces, and now heterogeneous processing elements
is complex. Rather than drive FPGA development from the hardware up, we consider the impact of leveraging Python to accelerate application development."

This may not look as FPGA programming to you, but it is to them. Certainly, as they say, they're only at the beginning of that road, but they're on that road.

Note that the fabric per-se doesn't even appear in their description of the pre-Python FPGA programming. For them, FPGA programming is merely integration and interconnection of IPs. They see Python as the way forward to replace the process.

 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #28 on: June 16, 2017, 04:15:31 pm »
Xilinx marketers take python very seriously, and even "scientists" from California University believe that python is the most efficient way to program FPGA

So, we have HDL which means Hardware Description Language, and we need to use python?
Does it make sense?

 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #29 on: June 16, 2017, 04:23:16 pm »

This is pretty bad coding because it is prone to creating latches. As a rule of thumb you only have 2 signals at most in the sensitivity list of a process: clock and reset. If there are other signals then it smells fishy. The problem is likely better solved using a function instead of a process.

The point of defining default output values before the case statement is to guarantee that latches are NOT inferred.  In any event, XST complains when latches are inferred.  Just fix the problem and move on.

I guess I don't see replacing a simple case structure with a multitude of functions although I have seen similar implementations.  In my implementation, I can see by looking at a particular case exactly what outputs I am setting up and it will only be a small subset of the 49 declared.  When I want to know what happens in each step of the Divide instruction, it is all in one place.  Sure, it takes 7 states but they are all written together, as neighbors, not split over several functions.  I could instead create a function for the Load Accumulator signal, for example, but that would require some kind of logic based on 18 values of the 'state' vector.  Basically, a big OR statement on the 'state' vector.  But that scatters the logic all over the place!

Actually, it wouldn't work well because my accumulator process takes 8 different values of the ACC_Ctrl signal to determine what it should do at each clock.  These need to be mutually exclusive and I can't imagine using an 'if-endif' on 8 discrete signals.

Code: [Select]
process(Reset, Clk, ACC_Ctrl)
begin
if Reset = '1' then
ACC <= (others => '0');
elsif Clk'event and Clk = '1' then
case ACC_Ctrl is
when ACC_NOP => null;
when ACC_LOAD => ACC <= A_BUS;
when ACC_AND => ACC <= ACC and A_BUS;
when ACC_OR => ACC <= ACC or A_BUS;
when ACC_EOR => ACC <= ACC xor A_BUS;
when ACC_SHIFT_LEFT => ACC <= ACC(14 downto 0) & ACC_ShiftIn;
when ACC_SHIFT_RIGHT => ACC <= ACC_ShiftIn & ACC(15 downto 1);
when ACC_XCHG => ACC <= EXTN;
when others => null;
end case;
end if;
end process;

But, yes, it is possible to create functions using the 'state' vector as one of parameters.

There's always another way...

 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #30 on: June 16, 2017, 04:37:20 pm »
Xilinx marketers take python very seriously, and even "scientists" from California University believe that python is the most efficient way to program FPGA

So, we have HDL which means Hardware Description Language, and we need to use python?
Does it make sense?

Not in my world!

One of my favorite quotes (in "A Compiler Generator" McKeeman, Horning & Wortman, 1970, page 11):
Quote

"It is possible by ingenuity and at the expense of clarity..[to do almost anything in any language].  However, the fact that it is possible to push a pea up a mountain with your nose does not mean that this is a sensible way of getting it there.  Each of these techniques of language extension should be used in its proper place."

Christopher Strachey
NATO Summer School in Programming (1969?)
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #31 on: June 16, 2017, 04:43:29 pm »
So, we have HDL which means Hardware Description Language, and we need to use python?
Does it make sense?

It doesn't.

But you cannot explain this to The Python programmer. VHDL programming would look totally bizarre to him, because programming in Python is easy (whatever that means).

Similarly, programming by connecting individual LUTs and FFs would look bizarre to a VHDL programmer (such as yourself). If you can imagine the feeling, you know how The Python programmer would feel about the VHDL.
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #32 on: June 16, 2017, 04:58:40 pm »
So, we have HDL which means Hardware Description Language, and we need to use python?
Does it make sense?
Yes, it makes perfect sense.

“The combining of both Python software and FPGA’s performance potential is a significant step in reaching a broader community of developers, akin to Raspberry Pi and Ardiuno. This work studied the performance of common image processing pipelines in C/C++, Python, and custom hardware accelerators to better understand the performance and capabilities of a Python + FPGA development environment. The results are highly promising, with the ability to match and exceed performances from C implementations, up to 30x speedup."

This is what we were promised 20 years ago - the ability to accelerate software performance using reconfigurable hardware.  But instead we just got faster general-purpose CPUs.


 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #33 on: June 16, 2017, 06:04:26 pm »
They do indirectly reference a Python to HDL tool, but the thrust of that page (and paper) is not on programming the FPGA fabric in Python.

This is all semantics.

It's got nothing to do with semantics which is "the branch of linguistics and logic concerned with meaning". That phrase would only make sense if we were quibbling over the precise meaning of words, which we weren't.

This may not look as FPGA programming to you, but it is to them.

The article says quite explicitly what the are doing and makes it explicitly clear that does not include trying to program the FPGA fabric in Python so I can see no basis for your assertion. It is quite clear that they understand the difference between programming the FPGA fabric and building a application framework around the FPGA in Python. It doesn't fit your narrative of 'Ho, ho, look at them, they think you can program an FPGA in Python'.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #34 on: June 16, 2017, 06:28:23 pm »
It's got nothing to do with semantics which is "the branch of linguistics and logic concerned with meaning". That phrase would only make sense if we were quibbling over the precise meaning of words, which we weren't.

We are. The words are "Programming FPGA". You interpret them as "Programming fabric with VHDL or alike". The other, broader meaning is "Building applications with FPGA".

It doesn't fit your narrative of 'Ho, ho, look at them, they think you can program an FPGA in Python'.

That's not my narrative. My narrative is:

"Look. Python came to FPGAs too. The guys who are deemed to be scientists, but in fact know very little, write papers where they misinterpret their own facts and come to a wrong conclusions about Python efficiency and suitability. This false interpretation is spread and promoted by Xilinx as an established fact. Now more people will believe all this gibberish. So sad."

 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5319
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #35 on: June 16, 2017, 06:28:56 pm »

So I (a software engineer with an EE degree from pre-FPGA times) have looked at various FPGAs at various times, and there always seems to be a hump that I have trouble getting over.   And I'm wondering if that's because of where I start - with the datasheet that describes the internal structure of the device.

I was in A very similar boat to you, of a certain vintage EE well before FPGAs.

I must've had a dozen false starts. I (usually) could get as far as running a demo on a dev board and getting a tool chain to work as a script monkey, but beyond that I floundered. There was still a gap between what I wanted to do, and where the tutorials finished.

This was the video that changed my understanding and got me in a position to write my own HDL:



 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #36 on: June 16, 2017, 06:58:50 pm »
It's got nothing to do with semantics which is "the branch of linguistics and logic concerned with meaning". That phrase would only make sense if we were quibbling over the precise meaning of words, which we weren't.

We are. The words are "Programming FPGA". You interpret them as "Programming fabric with VHDL or alike". The other, broader meaning is "Building applications with FPGA".

It doesn't fit your narrative of 'Ho, ho, look at them, they think you can program an FPGA in Python'.

That's not my narrative. My narrative is:

"Look. Python came to FPGAs too. The guys who are deemed to be scientists, but in fact know very little, write papers where they misinterpret their own facts and come to a wrong conclusions about Python efficiency and suitability. This false interpretation is spread and promoted by Xilinx as an established fact. Now more people will believe all this gibberish. So sad."

I think I'll just leave it at: People can follow the link and decide for themselves what the authors said and whether it agrees with my interpretation or yours.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #37 on: June 16, 2017, 07:27:12 pm »
That MachX02 video is great!  I really like the Lattice Diamond toolchain.
The fact that the board has a ton of IO on pads is a real selling point.
I wonder if I just changed vendors?

Alas, no...  That device doesn't have anywhere near enough BlockRAM and I'm pretty sure it would be short of LUTs for my main project.

OTOH, as a starter board, with LOTS of IO and a compelling price, it seems like an excellent choice!
« Last Edit: June 16, 2017, 07:39:20 pm by rstofer »
 

Online MK14

  • Super Contributor
  • ***
  • Posts: 4539
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #38 on: June 16, 2017, 07:45:35 pm »
That MachX02 video is great!  I really like the Lattice Diamond toolchain.
The fact that the board has a ton of IO on pads is a real selling point.
I wonder if I just changed vendors?

The amazing and really nice thing about it, is that you can buy them, as shown in the video (now the MachX03 series), for only about $25/£19 from Digikey (there are probably other sellers). It has about 6,900 LE's, so is reasonably powerful, for many things.
It even has about 8 programmable Leds + few more Leds and a few tiny dil switches, for messing with.
Most other FPGA kits, cost considerably more (there are exceptions, I know). They even have configuration storage onboard and/or within the chip, as necessary. The programmer (USB), is included as well (built into board), along with any voltage regulators, crystals etc, as needed.
I.e. It is all ready to run, as is.

But at that price, it is practicable, to design it into your FPGA powered projects. Without having to worry about soldering big pin count BGA parts, and design complicated BGA ready PCB's.

EDIT: You edited your post.
I agree, very big projects (FPGA complexity wise), would need much more powerful chips. Lattice seem to be aiming for the low end.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #39 on: June 16, 2017, 08:59:54 pm »
That MachX02 video is great!  I really like the Lattice Diamond toolchain.
The fact that the board has a ton of IO on pads is a real selling point.
I wonder if I just changed vendors?

The amazing and really nice thing about it, is that you can buy them, as shown in the video (now the MachX03 series), for only about $25/£19 from Digikey (there are probably other sellers). It has about 6,900 LE's, so is reasonably powerful, for many things.
It even has about 8 programmable Leds + few more Leds and a few tiny dil switches, for messing with.
Most other FPGA kits, cost considerably more (there are exceptions, I know). They even have configuration storage onboard and/or within the chip, as necessary. The programmer (USB), is included as well (built into board), along with any voltage regulators, crystals etc, as needed.
I.e. It is all ready to run, as is.

But at that price, it is practicable, to design it into your FPGA powered projects. Without having to worry about soldering big pin count BGA parts, and design complicated BGA ready PCB's.

EDIT: You edited your post.
I agree, very big projects (FPGA complexity wise), would need much more powerful chips. Lattice seem to be aiming for the low end.

One thing I believe as a newcomer:  The toolchain is more important than the device.  Assuming, of course, that the device is large enough for the project.  I really liked the presentation on Lattice Diamond.  Looking at the MachX03, I think I'll order a board just so I can play with the tools.  The toolchain looks a lot like Xilinx ISE with the added Logic Analyzer feature of Vivado.  This really might be the ultimate startup board.

I never underestimate the number of things that have to work right in order to blink LEDs.  The "HelloWorld" 'program' for FPGAs is every bit the equal of getting it running in C.

---

OK, I ordered the board direct from Lattice - they had stock.  But, damn, they don't offer anything like a reasonable shipping rate.  I'll quit whining...  Soon...
 
The following users thanked this post: MK14

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #40 on: June 16, 2017, 09:32:50 pm »

You probably already have a good grasp of state machines, but if not, bone up on them because you'll be using them a lot when working with FPGAs.


A state machine is just a C switch statement inside the while(1) loop.  But, just ahead of the switch(), you need to define a default output state for every output signal you create.  Otherwise, you have to define the output state of every signal at every state.

Like this:
Code: [Select]

    process(Reset,Clk) is
    begin
        if Reset = '1' then
            state <= s0;
        elseif rising_edge(clk) then
            state <= NextState;
        end if;
    end process;


    process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,
                r_Button0, CCC, CondMet, BOSC_Flag, SavedSign, A_BUS(15), ShiftCount,
                SZ, ZR, DVDS, Result, Ones, OVR,
                CountShifts, ACC, IncludeEXT, EXTN, Rotate, AFR,
                BitCount, XIO_Device, XIO_Function, XIO_Modifier,
                DisplaySwitch,
                ConsoleXIOCmdBusy, ConsoleXIOCmdAck,
                PrinterXIOCmdAck, PrinterXIOCmdBusy,
                ReaderXIOCmdBusy, ReaderXIOCmdAck,
                DiskXIOCmdBusy, DiskXIOCmdAck,
                DiskReady, IAR,
                SingleStep, BreakPointActive, BreakPoint,
                PendingInterrupt, ReturnState_r, StartState) is
    begin
        A_BusCtrl <= A_BUS_NOP;
        ACC_Ctrl <= ACC_NOP;
        ACC_ShiftIn <= '0';
        Add <= '0';
        AFR_Ctrl <= AFR_NOP;
        BitCountCtrl <= BitCount_NOP;
        CI <= '0';
        CIn <= '0';
        CIX <= '0';
        CarryIndCtrl <= CARRY_IND_NOP;

        <and so on...>
       
        case state is
            when s0    => NextState <= s0a; -- use this to IPL
            when s0a  => if DiskReady = '0' then -- wait for disk to go not ready
                                      NextState <= s0b;
                                else
                                      NextState <= s0a;
                                end if;
            when s0b => if DiskReady = '1' and ColdstartHold = '0' then -- wait for disk to go ready and
                                                                                                     -- coldstart code to be copied

            <and so on>
 

There are two processes to create this FSM:  The first just changes the state according to the NextState value on every clock cycle.  In the case of a loop, the state may not actually change.  See the second process...

The second process does all the work and it is not clocked.  It is just a huge collection of combinatorial logic.

Here I defined default outputs for 10 signals (although they aren't shown in the snippet of FSM code).  In the real world, there are 49 of these default outputs and 117 states.

I didn't say anything about the 'sensitivity list' that starts out as
Code: [Select]
process (state, FullEA, FetchOpnd, F, TAG, IA, CO, OFL, OVFLInd, COtemp, CSET, VSET,

This sensitivity list tells the simulator which signals to monitor to decide to actually run the process.  If there are no changes to any signals in the list, the simulator won't evaluate the process.

This list is meaningless to synthesis but the synthesizer will whine if an input signal to the process is undeclared.  But it's just whine and snivel, the output works with or without the list.
This is pretty bad coding because it is prone to creating latches. As a rule of thumb you only have 2 signals at most in the sensitivity list of a process: clock and reset. If there are other signals then it smells fishy. The problem is likely better solved using a function instead of a process.

I'm guessing what Sal wrote is in Python? Explain to me why latches are bad. The overall structure of his code looks similar to what would be found a state machine written in Verilog. In Verilog to create a state machine you need a control and a datapath. The datapath has all the logic you would need for the input and output signals. The Control would be defining your States using parameters followed by 3 Always blocks, 1) For transitioning states on clock edges 2) For defining the transistions 3) For Defining the outputs for each state. Here is code I wrote to display "EXTRA CREDIT PLZ" on a Hitachi 47780, written in Verilog. I know the code is ugly, and I've been working on making my code more legible but there are striking resemblances between what I wrote and what Sal wrote. When I compile this code, Quartus infers latches, but as far as I know, they're necessary.

The Control
Code: [Select]
module LCD_SM(Clock,Reset,
  Delay45ms,Delay80ns,Delay240ns,Delay_TO,Inst_Cnt32,FinalWrite,
  Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,
  Reset40us,Reset100us,
  CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,
  EN,
  FirstWrite);

input Clock, Reset, Delay45ms, Delay80ns, Delay240ns, Delay_TO;
input [4:0] Inst_Cnt32;
input FinalWrite;

output reg Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us;
output reg CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us;
output reg EN;
output reg FirstWrite;

parameter Pwr_Up = 4'b0000;
parameter Pwr_Up_Delay = 4'b0001;
parameter Off_Pwr_Up_Delay = 4'b0010;
parameter Write_Data = 4'b0011;
parameter Data_Setup_Delay = 4'b0100;
parameter E_Pulse_Hi = 4'b0101;
parameter E_Hi_Time = 4'b0110;
parameter E_Pulse_Lo = 4'b0111;
parameter Proc_Comp_Delay = 4'b1000;
parameter Load_Next_Data = 4'b1001;
parameter End0 = 4'b1010;
parameter End1 = 4'b1011;
parameter End2 = 4'b1100;
parameter End3 = 4'b1101;
parameter End4 = 4'b1110;
parameter End5 = 4'b1111;

reg [3:0] state, next_state;

always@(posedge Clock or posedge Reset)
begin
if(Reset)
state <= Pwr_Up;
else
state <= next_state;
end

always@(state or Delay45ms or Delay80ns or Delay240ns or Delay_TO or FinalWrite) //need to add transition signals to go with state
begin
case(state)

default: next_state <= Pwr_Up;

Pwr_Up: next_state <= Pwr_Up_Delay;

Pwr_Up_Delay: if (Delay45ms)
next_state <= Off_Pwr_Up_Delay;
else
next_state <= Pwr_Up_Delay;

Off_Pwr_Up_Delay: next_state <= Write_Data;

Write_Data: next_state <= Data_Setup_Delay;

Data_Setup_Delay: if(Delay80ns)
next_state <= E_Pulse_Hi;
else
next_state <= Data_Setup_Delay;

E_Pulse_Hi: next_state <= E_Hi_Time;

E_Hi_Time: if(Delay240ns)
next_state <= E_Pulse_Lo;
else
next_state <= E_Hi_Time;

E_Pulse_Lo: next_state <= Proc_Comp_Delay;

Proc_Comp_Delay: if(Delay_TO)
next_state <= Load_Next_Data;
else
next_state <= Proc_Comp_Delay;

Load_Next_Data: if(FinalWrite)
next_state <= End0;
else
next_state <= Write_Data;

End0: next_state <= End1;

End1: next_state <= End2;

End2: next_state <= End3;

End3: next_state <= End4;

End4: next_state <= End5;

End5: next_state <= End5;

endcase
end

always@(state or Inst_Cnt32)
begin
case(state)

default:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

Pwr_Up:          {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

Pwr_Up_Delay:     {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000010000000;

Off_Pwr_Up_Delay: {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

Write_Data:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

Data_Setup_Delay: {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000100000000;

E_Pulse_Hi:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000001000000010;

E_Hi_Time:        {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000001000000010;

Proc_Comp_Delay:
begin
if (Inst_Cnt32 == 0)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000001;
end
else if (Inst_Cnt32 == 1)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000100000;
end
else if (Inst_Cnt32 == 2)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000100;
end
else if (Inst_Cnt32 == 3)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 4)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 5)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 6)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000010000;
end
else if (Inst_Cnt32 == 7)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 8)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 9)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 10)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 11)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 12)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 13)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 14)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 15)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 16)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 17)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 18)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 19)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 20)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 21)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 22)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 23)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 24)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 25)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 26)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 27)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 28)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 29)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 30)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 31)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else if (Inst_Cnt32 == 32)
begin
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000001000;
end
else
{Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;
end

Load_Next_Data: {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b011000000001001000;

End0:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

End1:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

End2:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

End3:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

End4:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

End5:       {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

endcase
end

endmodule


The Datapath
Code: [Select]
module LCD_Datapath(CE240ns,CE80ns,CE45ms,CE32,
  CE4ms,CE2ms,CE40us,CE100us,
  Clock,
  Delay45ms,Delay80ns,Delay240ns,Inst_Cnt32,Delay_TO,
  Reset45ms,Reset80ns,Reset240ns,ResetPC,
  Reset4ms,Reset2ms,Reset40us,Reset100us,
  FinalWrite,
  FirstWrite);
 
input Clock;
input Reset45ms,Reset80ns,Reset240ns,ResetPC;
input Reset4ms,Reset2ms,Reset40us,Reset100us;
input CE240ns,CE80ns,CE45ms,CE32;
input CE4ms,CE2ms,CE40us,CE100us;
input FirstWrite;

output [4:0] Inst_Cnt32;
output Delay45ms,Delay80ns,Delay240ns,Delay_TO;
output FinalWrite;

wire [4:0] Eighty_ns,TwoForty_ns;
wire [21:0] FortyFive_ms,Four_ms,Two_ms,Forty_us,Hundred_us;
wire Delay4ms,Delay2ms,Delay40us,Delay100us;
wire FirstWrite;

assign Delay_TO = Delay4ms|Delay2ms|Delay40us|Delay100us|FirstWrite;

//module Counter_22bit(Clock,Reset,CE,Counter);
Counter_22bit FortyFiveMilSec(Clock,Reset45ms,CE45ms,FortyFive_ms),
  FourMilSec(Clock,Reset4ms,CE4ms,Four_ms),
  TwoMilSec(Clock,Reset2ms,CE2ms,Two_ms),
  FortyMicSec(Clock,Reset40us,CE40us,Forty_us),
  HundredMicSec(Clock,Reset100us,CE100us,Hundred_us);


//module Counter_5bit(Clock,Reset,CE,Counter);
Counter_5bit EightyNanSec(Clock,Reset80ns,CE80ns,Eighty_ns),
TwoFourtyNanSec(Clock,Reset240ns,CE240ns,TwoForty_ns),
WriteCounter(Clock,ResetPC,CE32,Inst_Cnt32);

//module comparator_standalone(A,B,G,E,L);
comparator_5bit FinalWriteCompar(Inst_Cnt32,22,G,FinalWrite,L),
Eightyns(Eighty_ns,4,g,Delay80ns,L),
TwoFortyns(TwoForty_ns,12,g,Delay240ns,L);
comparator_22bit FortyFivems(FortyFive_ms,2250000,G,Delay45ms,L),
  Fourms(Four_ms,200000,G,Delay4ms,L),
  Twoms(Two_ms,100000,G,Delay2ms,L),
  Fortyus(Forty_us,2000,G,Delay40us,L),
  Hundredus(Hundred_us,5000,G,Delay100us,L);



endmodule
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #41 on: June 16, 2017, 10:09:40 pm »
Latches are bad because you can't control their timing and you they could oscillate before settling to a state or not get into the right state at all. The keyword is asynchronous logic. In an FPGA you want to prevent using asynchronous logic unless you really know what you are doing. The logic inside an FPGA does not receive all inputs simultaneously and most architectures use (strings of) lookup tables to create a combinatorial output so you can get a wild variety of signals at the output of a LUT.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #42 on: June 16, 2017, 11:01:25 pm »

Code: [Select]
Pwr_Up:          {Reset45ms,Reset80ns,Reset240ns,ResetPC,Reset4ms,Reset2ms,Reset40us,Reset100us,CE240ns,CE80ns,CE45ms,CE32,CE4ms,CE2ms,CE40us,CE100us,EN,FirstWrite} <= 18'b000000000000000000;

This type of coding eliminates the requirement to define default values for the 18 signals but it sure takes a lot of typing when only one or two signals are changing.  Furthermore, if the 8th bit is set, I have to wander through the signal list and count until I figure out which signal has been set.  It isn't immediately obvious.

I have no idea how to assign default values to individual signals in Verilog.

In some ways, your coding looks a lot like microcode.  So, you could create an array of 18 bit values and alias the bits to signal names (does Verilog have aliases?).  Then you could just index into the array as a function of state.

I like microcode and I have thought about building a CPU using that scheme.  It worked for the IBM 360 and a lot of other machines.  Microcoding brought structure to CPU design.

I have always wanted to write a meta-assembler like was used on the AMD bit slice devices.  Those were fun days!

 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #43 on: June 16, 2017, 11:02:13 pm »
I think it's not that latches are per se bad, it's that in HDL it's very easy to get a latch inferred without you realizing it. If you're looking at a device that has a data sheet that starts 'Transparent D-type latch' you choose it only when it is appropriate, whereas with case statements you have to be careful to make sure that you get what you're asking for.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #44 on: June 16, 2017, 11:31:19 pm »
for a case statement in Verilog, default is just "default" I have one in two of my always blocks, a default for state transitions and a default for state outputs. I suppose writing it could be a lot. I mean I use a combination of excel and Sublime text to do my typing. It really speeds things up. For debugging purposes I use a combination of testbenches, compilation reports, and RTL (netlist) viewer. The RTL view is nice because it creates a visual. I can easily find those output bits there.

For example



Thats from the RTL. Say I want to know what bits change from state Pwr_Up to Pwr_Up_Delay. I go into the RTL, find the State instance and select the net that belongs to Pwr_Up_Delay output, the entire net highlights and can be easily traced. That's just me.

Btw, as far as I know. One must account for the outputs and transitions for EVERY state, regardless if those outputs change or not. The "default" case (for outputs) is simply what the signals are going to be upon start up. The "default" case (for transitions) is what the initial state of the state machine is. If you forget to include a state, because the outputs dont change, you will get an error.
« Last Edit: June 16, 2017, 11:36:54 pm by Mattjd »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #45 on: June 16, 2017, 11:44:52 pm »
I think it's not that latches are per se bad, it's that in HDL it's very easy to get a latch inferred without you realizing it. If you're looking at a device that has a data sheet that starts 'Transparent D-type latch' you choose it only when it is appropriate, whereas with case statements you have to be careful to make sure that you get what you're asking for.
One way to avoid that is not to use clock-less processes in VHDL (besides being careful with x when a else y).
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #46 on: June 16, 2017, 11:52:05 pm »
I think it's not that latches are per se bad, it's that in HDL it's very easy to get a latch inferred without you realizing it. If you're looking at a device that has a data sheet that starts 'Transparent D-type latch' you choose it only when it is appropriate, whereas with case statements you have to be careful to make sure that you get what you're asking for.

From the compilation report.

Info (10041): Inferred latch for "y[0]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[1]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[2]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[3]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[4]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[5]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[6]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[7]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[8]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[9]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[10]" at Mux_9_bit_32_to_1_behavorial.v(19)
Info (10041): Inferred latch for "y[11]" at Mux_9_bit_32_to_1_behavorial.v(19)
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[0]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[1]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[2]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[3]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[4]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[5]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[6]" is permanently enabled
Warning (14026): LATCH primitive "Mux_9_bit_32_to_1_behavorial:MUX_DUT|y[8]" is permanently enabled


Now I want those because of how I am using the Mux. I don't know what other software is like but Quartus by Altera gives a nice detailed report of stuff like that it. It also tells you any kind of optimizations like, the removal of registers because of redundant or otherwise bad logic (basically the value of the register is always the same so Quartus removes it). With loads of other stuff.


edit: I'm not arguing about the latches, but that, in my experience, there is a load of information provided upon compilation that is sooo helpful.
« Last Edit: June 16, 2017, 11:58:51 pm by Mattjd »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #47 on: June 17, 2017, 12:07:29 am »
IMHO Xilinx' tools output so many messages for a reasonably sized project that it all becomes useless noise.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #48 on: June 17, 2017, 12:16:51 am »
To each their own I suppose. I don't know what "reasonably sized is" but I have built a 64 bit processor on a DE0 (Cyclone III E3PC16F484C6), using the LCD I spoke of earlier as a peripheral. I even wrote a program for it, the ROM would be read, and would take keyboard input from PS/2 then send the input to the MUX that controls the output to the LCD. Had maybe 500 messages. I found going through them easy. *Shrugs*

This processor was an academic requirement of course and may very well still be considered small. It overclocked to 66 mhz. Base clock was 50 mhz. It was not pipelined, did not have a FPU and could not perform multiplication or division. Very basic processor. The clock multiplier was provided through Altera's megafunction IPs. For the RAM, I made two, one using Altera provided megafunctions, and then describing it in Verilog.

 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #49 on: June 17, 2017, 12:39:54 am »
Explain to me why latches are bad.

Latches are bad because they are ambiguous.

Code: [Select]
  signal a: std_logic := '0';
  signal b: std_logic := '0';

-- Setting up the test cases
process(clk)
  begin
    if rising_edge(clk) then
       a <= not a;
       b <= not b;
    end if;
  end if;

-- and now a latch.
process(a,b)
  begin
     if a = '1' then
      latch <= b;
     end if;
  end process;

So, if 'a' changes from 1 to 0, and and 'b' also changes from 1 to 0 at the same time, what value ends up in 'latch'?

Ambiguity is evil in digital design.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #50 on: June 17, 2017, 12:54:28 am »

Btw, as far as I know. One must account for the outputs and transitions for EVERY state, regardless if those outputs change or not. The "default" case (for outputs) is simply what the signals are going to be upon start up. The "default" case (for transitions) is what the initial state of the state machine is. If you forget to include a state, because the outputs dont change, you will get an error.

Not exactly...

In VHDL you specify default values before the case statement and then you only need to change the value if a particular state needs to do that.

Code: [Select]
process(state, PlotterXIOCmdReq, PlotterXIOCmd, XIOFunction)
begin
SetIntBusy        <= '0';
ClearIntBusy <= '0';
ClearInterrupts <= '0';
case state is
when ACK =>  PlotterXIOCmdAck_i <= '1';
if PlotterXIOCmdReq = '1' then
next_state <= ACK;
else
case XIOFunction is
when XIO_SenseDevice => ClearInterrupts <= PlotterXIOCmd(0);

<clip>


The signal ClearInterrupts is defined to be '0' just before 'case state'.  It will always have a value of '0' unless overridden as it is in case XIOFunction.  Since the value is defined for all states, it will never infer a latch.

This code makes no sense as it was hacked from a much larger FSM.  Nevertheless, it shows the proper technique for declaring default values for the FSM outputs.  The trick is to add new states during development and then add new outputs while not forgetting to declare the default value.  Nothing works if latches are inferred.
« Last Edit: June 17, 2017, 01:07:52 am by rstofer »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #51 on: June 17, 2017, 01:31:10 am »
To each their own I suppose. I don't know what "reasonably sized is" but I have built a 64 bit processor on a DE0 (Cyclone III E3PC16F484C6), using the LCD I spoke of earlier as a peripheral. I even wrote a program for it, the ROM would be read, and would take keyboard input from PS/2 then send the input to the MUX that controls the output to the LCD. Had maybe 500 messages. I found going through them easy. *Shrugs*

This processor was an academic requirement of course and may very well still be considered small. It overclocked to 66 mhz. Base clock was 50 mhz. It was not pipelined, did not have a FPU and could not perform multiplication or division. Very basic processor. The clock multiplier was provided through Altera's megafunction IPs. For the RAM, I made two, one using Altera provided megafunctions, and then describing it in Verilog.

As a rough cut, your device had 15k logic elements (LUTs?) while the MachXO3 we were talking about earlier is less than half that size at 6900 elements.  My Digilent Nexys2 board has 19,512 logic elements so somewhat larger.

The Digilent board costs a lot more than the Lattice board but it has switches, LEDs, 7-Segment display, PS/2 input and VGA output.  It also has parallel flash and RAM on board.  The Lattice board is a LOT cheaper but you're on your own for peripherals.

Still, if are talking about a beginner board, the Lattice board will do a lot of things.  It just won't hold my CPU project...

 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #52 on: June 17, 2017, 04:55:53 am »
I think with these latest posts illustration code already has gone a little too far for what the OP has asked, being someone who has never developed on a FPGA before or even know what the languages are or how they work.
 

Offline westfwTopic starter

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #53 on: June 17, 2017, 05:07:57 am »
(I'm very pleased with the discussion that is being generated here.  A big "thank you" to everyone who is participating!)


Quote
Next step would be to learn one of the HDLs.

Once you've got the beginning of a grip on your chosen HDL, take the 'discrete' design you already made and implement the discrete parts you 'made up' in HDL, interconnect them in HDL, scribble a little HDL test-bed and hit the simulator.

The vendor tools for actual FPGAs can be quite a struggle to set up and get running with - not what you want at the 'hello world' stage. I'd recommend that if you're going Verilog that you grab the open source Icarus iverilog simulator

Wait!  I can write verilog/VHDL and simulate it without picking some vendor tool?  I was thinking that the only compiler/simulators around were in one vendor tool or another...  How does this work, without having the limitations of a particular chip in mind?   I just write my design files, and "later" move it to some vendor chip, tie in pin definitions and such, and see if it fits?  Very interesting!
 
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #54 on: June 17, 2017, 05:44:30 am »
Wait!  I can write verilog/VHDL and simulate it without picking some vendor tool?  I was thinking that the only compiler/simulators around were in one vendor tool or another...  How does this work, without having the limitations of a particular chip in mind?   I just write my design files, and "later" move it to some vendor chip, tie in pin definitions and such, and see if it fits?  Very interesting!

Yes - exactly this. The only proviso is that as soon as you use a single vendor-specific doohickey, and don't somehow isolate it from the rest of your design then you have lost that freedom. (much the same as mixing in any OS specific calls in S/W). It is very easy to get seduced by things like "Block RAM" macros, IP wizards and/or "megafunctions".

The more portable way is to find out how to infer them (e.g. write code where the tools go "Oh, I know that pattern! I can optimize that pattern into a RAM block!").

For Altera, have a look at http://www.gstitt.ece.ufl.edu/courses/spring10/eel4712/lectures/vhdl/qts_qii51007.pdf

For Xilinx have a look at https://www.xilinx.com/support/documentation/sw_manuals/xilinx2014_1/ug901-vivado-synthesis.pdf,

The tools are very picky at how they match the patterns, the closer you get to the vendor's published code the more likely you are to get the result you really want.

There is nothing worse than changing your toolset, and realizing that you have used a design pattern that in 30+ files and it doesn't work. This usually shows up as the design not fitting, because it hasn't used hard blocks for RAM and multipliers. You then have to recode and retest everything again. (Yes, this happened on a video design that went from Altera to Xilinx).

I would even go as far as suggesting that any parts of the design where you do these sorts of things should be isolated out into a sub-directory of vendor specific code.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #55 on: June 17, 2017, 05:54:26 am »
Third party stand alone VHDL and Verilog stand-alone compilers do exist, though they are not my area & they may cost money.
Whether you choose Altera's Quartus of Xilinx free toolsuite & write a verilog, or vhdl code, the program you write will be compatible in both tool suites except when you try to use a vendor specific library.  In fact, your code is even more cross compatible than writing a C program for a PIC VS an Atmega.  Remember, your Verilog/VHDL code describes nothing more than clocked bolean logic, with inputs and outputs.  The FPGA vendor editor suites just allows you to wire your inputs and outputs of each of your Verilog/VHDL source codes to the pins of the FPGA.  Only that there are optimized IO pins in some cases like dedicated clock inputs, but this is the same for whichever FPGA type you choose.

Now, when I say multiple VHDL/verilog source codes, this means in 1 chip, you can wire multiple copies of your code, or multiple different codes wired together or to different IO pins, or anything you can imagine.  For example, in my FPGA based video scaler, I have verilog programs:

DDR3_Ram_sequencer.v  (State machine which drives the RAS/CAS/WE/DQS... and RD_RDY and WR_RDY and DQ_OE)
Ram_8port_priority_bridge.v  (Has 8 read address, 8 write address inputs, sends the next one in queque to the DDR3_ram Controller)
Video_Line_Cache_in.v   (works with the above 2 codes.v for DDR ram 128 bit access, takes an input video stream at 32 bit at input pixel_in clock speed)
Video_Line_Cache_out.v   (works with the above 2 codes.v for DDR ram 128 bit access, sends video out at 32 bit at pixel_out clock speed)
Video_color-space-converter.v  (Works on the 32 bit video pipe, inbetween the input/output pins and the Video_Line_Cache_xxx.v, has brightness, contrast, saturation & hue controls.)
MCU_pic24_emulator.v     (Uses onchip FPGA ram to run code for onscreen menus and system operations like listen to the Ethernet and front panel, instructs all the other .v modules which have configuration inputs.)
RS232_bidir-fifo_com_port.v
Master_Raster_Sync_Generator.v
Others....v

All these .v modules you may think of as a new digital IC and they can be wired to just IOs or between each other internally.
These .v programs (could be described as modules) I made will compile both in Altera's and Xilinx's IDE tools except for 2 minor inconveniences regarding setting up the custom PLL which differs in both chips and defining the FPGA's internal dual port ram memories since I want to use dedicated enhanced features.  But this is a lesser problem since these configured functions are nothing more than another verilog_special_memory.v file personalized to the vendor's chip which for example my MCU_pic24_emulator.v would be wired to.  But this shouldn't be anything you should worry about at this stage.
« Last Edit: June 17, 2017, 06:06:06 am by BrianHG »
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #56 on: June 17, 2017, 08:46:42 am »
simulate it without picking some vendor tool? 

Well, here I use ModelSim,  but it's not the version included with Xilinx's tools, it's an external tool, and as editor & checker I use Sigasi, an other external tool. It's very productive as it has a deep understanding of what you write.

So, I write HDL with Sigasi, I simulate it with ModelSim, then I move to the Vendor's toolchain (Xilinx in my case) for two new purposes

-1- timing constraints and their analysis
-2- synthesis (and optionally, optimization)


 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #57 on: June 17, 2017, 08:59:58 am »
(I'm very pleased with the discussion that is being generated here.  A big "thank you" to everyone who is participating!)


Quote
Next step would be to learn one of the HDLs.

Once you've got the beginning of a grip on your chosen HDL, take the 'discrete' design you already made and implement the discrete parts you 'made up' in HDL, interconnect them in HDL, scribble a little HDL test-bed and hit the simulator.

The vendor tools for actual FPGAs can be quite a struggle to set up and get running with - not what you want at the 'hello world' stage. I'd recommend that if you're going Verilog that you grab the open source Icarus iverilog simulator

Wait!  I can write verilog/VHDL and simulate it without picking some vendor tool?  I was thinking that the only compiler/simulators around were in one vendor tool or another...  How does this work, without having the limitations of a particular chip in mind?   I just write my design files, and "later" move it to some vendor chip, tie in pin definitions and such, and see if it fits?  Very interesting!
There is a free one: GHDL. I use that to simulate VHDL which I later use in a Xilinx FPGA. However as usual with simulation you have to be aware that it is just as good as the stimuli you feed into it.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #58 on: June 17, 2017, 09:06:12 am »
p.s.
As I said before, there are some features which are vendor-specific.

e.g. Spartan6 comes with an useful built-in DDR controller. To use it ... you need to invoke an IP-wizard which automatically instantiates it for you, resulting an interface entity with the implementation hidden in a black block. It's hardware, implemented in hardware inside the fpga as special block which you can't change, you can only use it as Xilinx has designed.

Keep in mind, it's vendor-specific, and technology specific: not portable!

In this case, I take the interface entity, and I try to idealize its behavior into ModelSim, just to be able to simulate the whole system. Practically the DDR block is not simulated in details (at RTL level), I assume it works (Xilinx's homework), and it has been correctly instantiated (my homework), so I just add a large and ideally memory-block to ModelSim. 

Of course, then I have to verify these two hypothesis.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #59 on: June 17, 2017, 09:12:27 am »
GHDL

-1- it depends on GNAT, which is perpetually full of problems and bugs
-2- it doesn't cover the full vhdl specification, just a sub-set
-3- too much effort required since you need to adapt your source to it
-4- error-messages are silly, you can never understand what is wrong, you have to suppose
-5- stimuli are a mess, and very error prone as you have to write a lot of test-bench code
-6- which, especially points { 3, 4, 5}, reduces the productivity by five orders of magnitude

Conclusion:
GHDL is good if you don't have money, if you are a student and if your project is an university homework.

For professional projects done for business, (when someone checks your how long your job takes, and how complex it can go in a working-team) ModelSim is *THE* simulator one has to go.
« Last Edit: June 17, 2017, 09:59:01 am by legacy »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #60 on: June 17, 2017, 09:57:49 am »
If you simulate complete designs then going for Modelsim is a no-brainer but I don't simulate large designs. I only simulate small pieces and for that GHDL is good enough.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #61 on: June 17, 2017, 10:06:09 am »
small pieces and for that GHDL is good enough.

Even for small pieces, GHDL is defective for the above points. I spend three years on it, frankly I wish someone had pointed me out those points instead of making me to waste my time trying to fix/use it.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #62 on: June 17, 2017, 02:40:06 pm »
Vendor specific...

The video above that demonstrates how to install and use the Lattice toolchain is very good and as good a place to start as any.  However, right out of the gate, the author uses the internal oscillator provided on the chip and this absolutely won't be portable to any other device family.  The good news is that the MachX03 board itself does have an external 12 MHz oscillator.  What do you want to bet that the PLL used to kick up the speed won't be portable either?

I have decided to use the features provided and worry about porting later.  My hobby projects just aren't complex enough to worry about.  Portability is an illusion!  We can't even get the clock to work without vendor specific gadgets!

One thing I would hate is porting initialized BlockRAM.  I have written external programs that grab the memory contents from some file and write the entire VHDL file.  Just one more task when porting...

Back to the video... It covers:

1) Toolchain installation
2) License management - don't worry, the license is free!
3) Project creation
4) Verilog design entry
5) Testbench creation
6) Simulation
7) Synthesis
8) Pin assignment
9) Device programming
10) Virtual logic analyzer

Of course the coverage depth is quite shallow but it's a short video.  It is enough to get started!  The board itself is cheap enough, it's the shipping that I snivel about!



« Last Edit: June 17, 2017, 04:30:33 pm by rstofer »
 

Online MK14

  • Super Contributor
  • ***
  • Posts: 4539
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #63 on: June 17, 2017, 02:52:26 pm »
it's the shipping that I snivel about!

I get free shipping (from the US to the UK), with Digikey, who sell it. As long as the order value, is at least £33. Which is quite easy to achieve.
Hopefully within the US, it is similar.

But too late for now, as you seemed to say you already bought it from Lattice.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #64 on: June 17, 2017, 04:29:56 pm »
it's the shipping that I snivel about!

I get free shipping (from the US to the UK), with Digikey, who sell it. As long as the order value, is at least £33. Which is quite easy to achieve.
Hopefully within the US, it is similar.

But too late for now, as you seemed to say you already bought it from Lattice.

I don't get free shipping from Digikey but it is usually Priority Mail and that is very cheap and FAST.  I looked for stock at Mouser and they didn't have any.  I didn't look at Digikey and probably should have as they do have stock.  All my bad...

Digikey is a great supplier.


Late breaking news:  The board has shipped - from Mouser.  The very place I looked for stock.  I must have had a serious bout of 'senior moments' yesterday!
« Last Edit: June 17, 2017, 04:33:56 pm by rstofer »
 
The following users thanked this post: MK14

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #65 on: June 17, 2017, 04:56:31 pm »
Portability is an illusion!  We can't even get the clock to work without vendor specific gadgets!

Yup, sadly the Truth, especially if you use the Digital Clock Manager (DCM) primitive in Xilinx FPGA parts to implement delay locked loops, PLLs, digital frequency synthesizers, digital phase shifters, etc. This point is also relevant during timing-constraints analysis, which is both vendor and device specific, and it's a MUST-BE-DONE if you have to check low-level requirements from your customers.

p.s.
why Lattice? Never used, I am curious.
 

Online MK14

  • Super Contributor
  • ***
  • Posts: 4539
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #66 on: June 17, 2017, 05:03:31 pm »
I don't get free shipping from Digikey but it is usually Priority Mail and that is very cheap and FAST.  I looked for stock at Mouser and they didn't have any.  I didn't look at Digikey and probably should have as they do have stock.  All my bad...

Digikey is a great supplier.


Late breaking news:  The board has shipped - from Mouser.  The very place I looked for stock.  I must have had a serious bout of 'senior moments' yesterday!

Don't worry, similar/same things would wind me up. I find the Amazon system of almost constantly fluctuating prices, on many things, annoying.

Sometimes I buy something, and while it is being shipped, the price drops, and I find that annoying. But I'm kind of philosophical about it, and accept I will gain sometimes, and lose other times.

I know in theory, some people claim you can hassle Amazon customer services, and get the price dropped, on your order. Because the price dropped just after you ordered it. But I don't want to bother them and/or waste their and my time, over what is usually quite small amounts of money.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #67 on: June 17, 2017, 05:35:03 pm »

p.s.
why Lattice? Never used, I am curious.


You're right, why Lattice?  Beats me...  I have a large assortment of Digilent-Xilinx boards and I certainly don't need a low end board.  So, why am I interested?

Well, I watched the video.  I REALLY like the toolchain.  The licensing scheme is pretty painless and not nearly as obtuse as Xilinx's.  I like the way pins are configured with a spreadsheet.  I like the touch and feel of Diamond as it is similar to Xilinx's ISE (sort of).  In any event, the startup curve is a lot flatter than Vivado's (does anybody really understand the .XDC file?).  I like the Just In Time syntax analysis - save the file and syntax analysis is automatic and FAST (at least for small projects).

I can see the value in a $25 startup board;  I don't personally have any use for it but I'm sure something will come up.  I like the high pin count on headers, I don't really like Digilent's PMOD connectors, there simply aren't enough pins.  I do understand that the board has no peripheral gadgets except a bank of LEDs.  If I need SRAM, I'm on my own!

For the newcomers, this setup is all they really need to start creating logic.  The regrettable lack of switches and buttons is something of a bother but I imagine they can figure out something.  If they can't, well, maybe golf is a better hobby.

In the back of my mind, I am thinking about Caxton C Foster's 'minicomputer' - BLUE.  I have been thinking about this trivial 16 bit CPU for about 40 years.  As a CPU, it implements only the most trivial operations but it's a good first project.  In my case, it is just something I want to play with.  One thing it needs is a lot of IO for the switches and LEDs.  IO Expanders are one option but for the MachXO3 board, there is no need.  There are plenty of pins.  Maybe I'll finally get around to implementing it.  Al Williams https://www.awce.com/ did a vastly expanded version a few years ago but I don't see it around on his site.

ETA:  The BLUE project is available on OpenCores http://opencores.org/project,blue

Why do all this?  Well, I hope my grandson gets into EE or CS as a major.  It might be useful to have a trivial computer around just to discuss elementary architecture and 'the way it used to be'.  I also suspect that Vivado will sink a newcomer.  Just guessing...

« Last Edit: June 17, 2017, 05:48:27 pm by rstofer »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #68 on: June 17, 2017, 05:40:34 pm »
But I don't want to bother them and/or waste their and my time, over what is usually quite small amounts of money.

In the bigger scheme of things, the amount is trivial.  If I didn't want to pay it, I wouldn't have bought it.  Money is not one of my larger problems.  Old age is a much larger concern.
 
The following users thanked this post: MK14

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #69 on: June 17, 2017, 06:09:09 pm »
I like the way pins are configured with a spreadsheet.  I like the touch and feel of Diamond as it is similar to Xilinx's ISE (sort of).  In any event, the startup curve is a lot flatter than Vivado's (does anybody really understand the .XDC file?).

You can do this in Vivado too. Open "Elaborated Design" and it has a similar pin table. Once you select pins, it'll create an XDC file with the definitions (or update an existing one). You'll have do re-run synthesis though :(

I like the Just In Time syntax analysis - save the file and syntax analysis is automatic and FAST (at least for small projects).

Vivado does continuous syntax check for VHDL files. If something is wrong it draws a red squiggle and you can hover over it to see the error message. It is fast enough for me. Very handy when the synthesis is so slow.

In the Lattice video, the synthesis is rather fast, but I couldn't figure out if it was normal speed or fast forward.

 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #70 on: June 17, 2017, 06:12:19 pm »
I like the way pins are configured with a spreadsheet.  I like the touch and feel of Diamond as it is similar to Xilinx's ISE (sort of).  In any event, the startup curve is a lot flatter than Vivado's (does anybody really understand the .XDC file?).

You can do this in Vivado too. Open "Elaborated Design" and it has a similar pin table. Once you select pins, it'll create an XDC file with the definitions (or update an existing one). You'll have do re-run synthesis though :(

I like the Just In Time syntax analysis - save the file and syntax analysis is automatic and FAST (at least for small projects).

Vivado does continuous syntax check for VHDL files. If something is wrong it draws a red squiggle and you can hover over it to see the error message. It is fast enough for me. Very handy when the synthesis is so slow.

In the Lattice video, the synthesis is rather fast, but I couldn't figure out if it was normal speed or fast forward.
Diamond will synthesise &  place & route a simple design in about 10 seconds.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #71 on: June 17, 2017, 06:18:56 pm »
If you have the possibility ( = if your boss/customers pay it ), switch to Sigasi. It's the Eclipse-like for HDL :D
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #72 on: June 17, 2017, 06:50:49 pm »
If you have the possibility ( = if your boss/customers pay it ), switch to Sigasi. It's the Eclipse-like for HDL :D
Sigasi is nice but what I don't like is the time limited node locked license. For me such software is a no-go. What if they go out of business or my PC breaks just when I need to finish a project and I can't affort to wait until they change the license to a new PC? It would be great if Sigasi offered a perpetual license and someone cracked it so it is no longer node locked. I'd buy it in a heartbeat.

A reasonable alternative is the (open source) Eclipse plugin called Veditor. It can do much less than Sigasi but combined with Eclipse it is lightyears better than the editor in Xilinx ISE.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #73 on: June 17, 2017, 06:53:41 pm »
In the Lattice video, the synthesis is rather fast, but I couldn't figure out if it was normal speed or fast forward.

For the simple counter LEDs, synthesis is about 1/2 second and building both the bitstream and JEDEC file, from 'Rerun All', takes about 15 seconds.  Pretty impressive!
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #74 on: June 17, 2017, 06:55:42 pm »
If you have the possibility ( = if your boss/customers pay it ), switch to Sigasi. It's the Eclipse-like for HDL :D

I'll check with Social Security and see what they have to say (not!).
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #75 on: June 17, 2017, 07:09:05 pm »
Late breaking news:  The board has shipped - from Mouser.  The very place I looked for stock.  I must have had a serious bout of 'senior moments' yesterday!

Rather than a 'senior moment', it's more likely that Lattice have some reserved fulfilment stock at Mouser that doesn't show up as stock available for sale.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #76 on: June 17, 2017, 07:19:11 pm »
p.s.
why Lattice? Never used, I am curious.

For the equivalent sized parts to those offered by Xilinx or Altera I don't think that Lattice necessarily offers parts with any particular advantages. Where I think they have a winner is in the ICE40 range where there are a number of FPGAs in the £3-5 bracket (one off prices) with 1k to 8k LEs/cells/pick-your-own-terminology available in prototyping friendly QFP and QFN packages.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #77 on: June 17, 2017, 07:53:42 pm »
To the recent comment re: Vivado and its capability, yes, it really will do everything.  And, in many cases, there are multiple ways to get things done.  But, if you were a brand new EE student, would you want to use Vivado for your very first project?

Part of my problem with Vivado is that I am used to ISE.  I have been using ISE for 13 years or so and I still use it for Spartan 3 projects.  I haven't spent enough time with Vivado to get comfortable.  Lattice Diamond doesn't do everything that Vivado does and, in my view, Diamond is an easier way to start.  Or maybe I just like it because it is closer to ISE.

But, yes, Vivado is a tremendous upgrade from ISE.

 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #78 on: June 17, 2017, 09:16:30 pm »
p.s.
why Lattice? Never used, I am curious.

For the equivalent sized parts to those offered by Xilinx or Altera I don't think that Lattice necessarily offers parts with any particular advantages. Where I think they have a winner is in the ICE40 range where there are a number of FPGAs in the £3-5 bracket (one off prices) with 1k to 8k LEs/cells/pick-your-own-terminology available in prototyping friendly QFP and QFN packages.
Not familiar with ICE40 but an advantage of the XO2 family is onboard flash, plus  core voltage regulator, and even an internal oscillator, so they are very useable on 2-layer PCBs with no additional support parts - just a 3.3v supply, a JTAG header and off you go.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #79 on: June 17, 2017, 09:58:30 pm »
p.s.
why Lattice? Never used, I am curious.

For the equivalent sized parts to those offered by Xilinx or Altera I don't think that Lattice necessarily offers parts with any particular advantages. Where I think they have a winner is in the ICE40 range where there are a number of FPGAs in the £3-5 bracket (one off prices) with 1k to 8k LEs/cells/pick-your-own-terminology available in prototyping friendly QFP and QFN packages.
Not familiar with ICE40 but an advantage of the XO2 family is onboard flash, plus  core voltage regulator, and even an internal oscillator, so they are very useable on 2-layer PCBs with no additional support parts - just a 3.3v supply, a JTAG header and off you go.

Some, but not all, of the ICE40 range have those features with the exception of an on-board core voltage regulator - they need a nominal 1.2V plus whatever your I/O standard requires. Lattice have always been good at integrating features that get you closer to the ideal of 'just needs a supply and a programming header'. Anybody else remember their in system programmable PALs, when everybody else's PALs needed dedicated out of circuit, high voltage programming?
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: Learning FPGAs: wrong approach?
« Reply #80 on: June 20, 2017, 10:47:22 pm »
If you have the possibility ( = if your boss/customers pay it ), switch to Sigasi. It's the Eclipse-like for HDL :D

What does Sigasi cost these days? Their web site has the usual "contact me with pricing information" form, which usually indicates an expensive product. I seem to remember that it was $80 a month, but that was a few years ago.
 

Offline jefflieu

  • Contributor
  • Posts: 43
  • Country: au
Re: Learning FPGAs: wrong approach?
« Reply #81 on: June 20, 2017, 10:57:18 pm »
Would learning FPGA by writing peripherals for NIOS system be an interesting to you?
I did learn quite a lot when I was doing intern and I had to modify a peripheral of an existing Microblaze sytem to extend its functionality.
Everything else had been setup, timing constraints, pins configuration ... etc ... etc.
I only needed to work out how the bus worked and wrote simple codes to let the bus read registers and write registers. Clear registers on read ... etc

I have a project here and always need new peripherals then verification on different boards.
www.github.com/jefflieu/recon
If you've been doing software then most of the stuff should be familiar to you.

Cheers,
Jeff


i love Melbourne
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #82 on: June 21, 2017, 07:42:34 am »
p.s.
why Lattice? Never used, I am curious.

For the equivalent sized parts to those offered by Xilinx or Altera I don't think that Lattice necessarily offers parts with any particular advantages. Where I think they have a winner is in the ICE40 range where there are a number of FPGAs in the £3-5 bracket (one off prices) with 1k to 8k LEs/cells/pick-your-own-terminology available in prototyping friendly QFP and QFN packages.
Not familiar with ICE40 but an advantage of the XO2 family is onboard flash, plus  core voltage regulator, and even an internal oscillator, so they are very useable on 2-layer PCBs with no additional support parts - just a 3.3v supply, a JTAG header and off you go.

Some, but not all, of the ICE40 range have those features with the exception of an on-board core voltage regulator - they need a nominal 1.2V plus whatever your I/O standard requires. Lattice have always been good at integrating features that get you closer to the ideal of 'just needs a supply and a programming header'. Anybody else remember their in system programmable PALs, when everybody else's PALs needed dedicated out of circuit, high voltage programming?
I thought ICE40 had  OTP memory, or are there now some flash versions?
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #83 on: June 21, 2017, 08:02:54 am »
Would learning FPGA by writing peripherals for NIOS system be an interesting to you?
I did learn quite a lot when I was doing intern and I had to modify a peripheral of an existing Microblaze sytem to extend its functionality.
Everything else had been setup, timing constraints, pins configuration ... etc ... etc.
I only needed to work out how the bus worked and wrote simple codes to let the bus read registers and write registers. Clear registers on read ... etc

I have a project here and always need new peripherals then verification on different boards.
www.github.com/jefflieu/recon
If you've been doing software then most of the stuff should be familiar to you.

Cheers,
Jeff
Out of interest, what's the compile/run/debug cycle time doing that? Does including the NIOS stuff add a lot ?
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline jefflieu

  • Contributor
  • Posts: 43
  • Country: au
Re: Learning FPGAs: wrong approach?
« Reply #84 on: June 21, 2017, 08:48:30 am »
Would learning FPGA by writing peripherals for NIOS system be an interesting to you?
I did learn quite a lot when I was doing intern and I had to modify a peripheral of an existing Microblaze sytem to extend its functionality.
Everything else had been setup, timing constraints, pins configuration ... etc ... etc.
I only needed to work out how the bus worked and wrote simple codes to let the bus read registers and write registers. Clear registers on read ... etc

I have a project here and always need new peripherals then verification on different boards.
www.github.com/jefflieu/recon
If you've been doing software then most of the stuff should be familiar to you.

Cheers,
Jeff
Out of interest, what's the compile/run/debug cycle time doing that? Does including the NIOS stuff add a lot ?
Can you please be more specific, doing what? (this could be off topic though)
When you say add a lot, if you mean resources then NIOS stuff costs about 1000 to 1500 LUs + Flops, simple CPU core and Avalon bus
If you mean add a lot of effort, then yeah, it takes some effort to setup hardware and software correctly, but not so bad.
I think if the NIOS system is setup, FPGA can be learnt by adding/creating new peripherals, especially if you're familiar with software, it'll be more interesting.
The coding for CPU peripherals is mostly RTL design I'd say.
i love Melbourne
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #85 on: June 21, 2017, 09:45:53 am »
Would learning FPGA by writing peripherals for NIOS system be an interesting to you?
I did learn quite a lot when I was doing intern and I had to modify a peripheral of an existing Microblaze sytem to extend its functionality.
Everything else had been setup, timing constraints, pins configuration ... etc ... etc.
I only needed to work out how the bus worked and wrote simple codes to let the bus read registers and write registers. Clear registers on read ... etc

I have a project here and always need new peripherals then verification on different boards.
www.github.com/jefflieu/recon
If you've been doing software then most of the stuff should be familiar to you.

Cheers,
Jeff
Out of interest, what's the compile/run/debug cycle time doing that? Does including the NIOS stuff add a lot ?
Can you please be more specific, doing what? (this could be off topic though)
When you say add a lot, if you mean resources then NIOS stuff costs about 1000 to 1500 LUs + Flops, simple CPU core and Avalon bus
If you mean add a lot of effort, then yeah, it takes some effort to setup hardware and software correctly, but not so bad.
I think if the NIOS system is setup, FPGA can be learnt by adding/creating new peripherals, especially if you're familiar with software, it'll be more interesting.
The coding for CPU peripherals is mostly RTL design I'd say.
No I mean comparing developing a standalone function versus hanging something off a NIOS processor, what is the time penalty of the synthesize/place & route time doing the latter?
I've not used Altera, but IME with ISE and Diamond, for small designs, compile cycles of low tens of seconds are typical, and tolerable for a write/compile/debug/repeat workflow.
 If adding a processor makes this a lot longer, any benefit of using a processor to simplify testing may be outweighed by the extended debug cycle times. 
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #86 on: June 21, 2017, 10:23:57 am »
What does Sigasi cost these days?

I don't know, neither I want to know :D

What is the benefit of being an employed (even if not permanently, one year contract) for a job? When you are just a person who works freelance you might be asked to care about your personal tools (software, laptop, DSO, LA, RLC-meter, etc), which means that at least you have to phone vendors and waste your time with their marketing office, sometimes you also have phone your bank asking funds for buying them as your customer will refund you only when the job is done. I mean anticipatory funds, refunded with gain.

As employed there is always a wonderful secretary (yes, I have a secretary now, and two tulip plants in my office) who does the job for you, and a person in the stuff who pays for your tools.

Awesome!!! So, who cares? Now, I am more interested in productivity since more productivity means more plants in my office, may be a bigger office with two secretaries and an aquarium with tropical fishes  :D :D :D
 

Offline jefflieu

  • Contributor
  • Posts: 43
  • Country: au
Re: Learning FPGAs: wrong approach?
« Reply #87 on: June 21, 2017, 11:47:39 am »
No I mean comparing developing a standalone function versus hanging something off a NIOS processor, what is the time penalty of the synthesize/place & route time doing the latter?
I've not used Altera, but IME with ISE and Diamond, for small designs, compile cycles of low tens of seconds are typical, and tolerable for a write/compile/debug/repeat workflow.
 If adding a processor makes this a lot longer, any benefit of using a processor to simplify testing may be outweighed by the extended debug cycle times.
Compilation time is about 4 minutes for a design of 3K LUTS. I wouldn't say there's any penalty for using NIOS, it depends on what you want to do. Generally, compilation time is related to the size of the design and chip size. Embedded processor lets you do certain stuff quickly once the hardware is done. Interesting systems often comprise of processor running control stuff and FPGA fabric implementing custom stuff.
i love Melbourne
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: Learning FPGAs: wrong approach?
« Reply #88 on: June 21, 2017, 12:29:31 pm »
Awesome!!! So, who cares? Now, I am more interested in productivity since more productivity means more plants in my office, may be a bigger office with two secretaries and an aquarium with tropical fishes  :D :D :D

[OT]
And Fantozzi's rise in ranks scene comes to mind :D
[/OT]
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #89 on: June 21, 2017, 02:36:58 pm »
I thought ICE40 had  OTP memory, or are there now some flash versions?

Sorry, in a effort at writing economy I stuffed that up. Some have OTP, some don't, all can work with external SPI flash, all can be configured over SPI by an MPU. Some have an on-board oscillator, some don't.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline sporadic

  • Regular Contributor
  • *
  • Posts: 72
  • Country: us
    • forkineye.com
Re: Learning FPGAs: wrong approach?
« Reply #90 on: June 22, 2017, 06:58:37 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #91 on: June 22, 2017, 07:46:33 pm »
No I mean comparing developing a standalone function versus hanging something off a NIOS processor, what is the time penalty of the synthesize/place & route time doing the latter?
I've not used Altera, but IME with ISE and Diamond, for small designs, compile cycles of low tens of seconds are typical, and tolerable for a write/compile/debug/repeat workflow.
 If adding a processor makes this a lot longer, any benefit of using a processor to simplify testing may be outweighed by the extended debug cycle times.
You can always simulate. I like that better for complex designs because it allows to see ANY signal in detail and figure out what is wrong (very similar to stepping through a piece of C code with a debugger).
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #92 on: June 22, 2017, 07:47:05 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
As if VHDL and Verilog weren't confusing enough, now we have another HDL to learn.

What advantages does MyHDL have over the other two?
 

Offline sporadic

  • Regular Contributor
  • *
  • Posts: 72
  • Country: us
    • forkineye.com
Re: Learning FPGAs: wrong approach?
« Reply #93 on: June 22, 2017, 07:52:00 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
As if VHDL and Verilog weren't confusing enough, now we have another HDL to learn.

What advantages does MyHDL have over the other two?
It actually processes into VHDL or Verilog for synthesis.   The site does a better job explaining the pros and cons better than I ever could.  It's legit though, been used for ASICs.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #94 on: June 22, 2017, 09:18:59 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
As if VHDL and Verilog weren't confusing enough, now we have another HDL to learn.

What advantages does MyHDL have over the other two?

I don't see it either!  Anything I can do with Python HDL, I can do with VHDL and skip a couple of steps.  Perhaps the Python simulation is a little faster (maybe even a lot faster) but I don't usually bother with simulation.  If I did do simulation, I would use the chip vendor's simulator.  It's the only opinion that matters.

I look at it as "just because I can".

Maybe somebody can make the case that I should care about this but, at the moment, I don't.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #95 on: June 22, 2017, 09:27:01 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
As if VHDL and Verilog weren't confusing enough, now we have another HDL to learn.

What advantages does MyHDL have over the other two?

All these High Level Synthesis (HLS) HDLs seem to have common threads to address these (and other) problems:

- Couldn't hardware design  it be more like programming?
- The level of abstraction in HDLs is too low
- I don't want to micromanage bits - I just want it to work like integers and floats
- Productivity of HDLs is too low - e.g. testing through simulation is slow.
- I want to use programmers, not hardware designers

I have played with a couple.

- "It can't be more like programming?". There is a solid barrier that makes one not like the other. Programming updates a little bit of data every cycle. To make the most of FPGAs you can not use them like that, you need to make pipelines and have data flow through your design in a way programming can't do.

- "The level of abstraction in HDLs is too low". You can break out of low level HDL programming if you want, but at the cost of doing things somebody else's way, and most likely paying a lot for IP blocks that are huge, complex and costly. However if your needs are unique, then you need to work at low levels of abstraction, for at least part of the design. The 80/20 rule applies

- "I don't want to micromanage bits" - If you want to burn through FPGA resources at an alarming rate, and have minimal performance, all your 'variables' can be 64-bit integers. Sometimes the tools will pick up a reduced range and optimize unused bits way, sometimes it wont. The tighter you constrain the design (e.g. size of counters) the better the design will perform.

- "Productivity of HDLs is too low compared to programming" - Very valid. Being able to efficiently test designs like software is awesome. But then you have to verify that the resulting design actually is equivalent to the software...

- "I want to use programmers, not hardware designers" - If the programmer can't envisage what the design will look like in hardware, then they are just fumbling around in the dark. They will spend a lot of time trying to find an efficient way to express what they are trying to do in a way that the tool set likes and produces an efficient design.

In short - it seems to be great when cost is no object (e.g. research), performance is no object (e.g. research) and rapidly testing new things (e.g. research).

You can also paint yourself into a dead end. If your design tests out ok, fits into the target chip, but does not meet timing requirements then what can you do? You need a skilled HDL coder to to re-write the slow bit.

For some commercial use it is also workable, but requires a skilled hardware designer who knows the HLS tools and the problem space intimately, rather than a generic C/Python/whatever hack.

So in short it ends up with high-level code that is written in a quirky, ungainly way, but a 'normal programmer' can read and maybe make sense of - but a normal programmer will have minimal understanding of why it is like that. A single "refactor" of a module to make it "more normal" will break everything.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: MK14

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #96 on: June 22, 2017, 09:54:51 pm »
It actually processes into VHDL or Verilog for synthesis.   The site does a better job explaining the pros and cons better than I ever could. 
Apart from 'empowering hardware designers with the elegance and simplicity of the Python language' the only advantage I could see is that you can apparently quickly create and simulate a design interactively. I say 'apparently' because the website is full of 'page not founds' everywhere.

"For more information about installing on non-Linux platforms such as Windows, read about Installing Python Modules." - 404 Not Found.

Great! can't even get started...
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #97 on: June 22, 2017, 10:10:12 pm »
Perhaps the Python simulation is a little faster (maybe even a lot faster) but I don't usually bother with simulation.  If I did do simulation, I would use the chip vendor's simulator.  It's the only opinion that matters.

For general purpose verilog simulation (i.e. not process specific verification) there's verilator, an open source simulator that 'compiles' the verilog into C, which can then itself be compiled into machine code. It's fast, and on the right source material it's blazingly fast.

Also you can get at the internals of the simulation in a controlled fashion. I've used this to do mixed model simulation writing the analogue side of the simulation in C.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #98 on: June 23, 2017, 01:08:41 am »
- "The level of abstraction in HDLs is too low". You can break out of low level HDL programming if you want, but at the cost of doing things somebody else's way, and most likely paying a lot for IP blocks that are huge, complex and costly. However if your needs are unique, then you need to work at low levels of abstraction, for at least part of the design. The 80/20 rule applies.
Well you can go to extremely high levels of abstraction in VHDL, so its possible to have a higher level language by using the existing tools better. But the core issue is that programming for simultaneous execution is radically different to programming for sequential execution.

There have been some good attempts at C-hdl and they work well at matching some patterns, but remain poor at improving all code. So even with the high level tools you still end up needing to understand the flow and patterns that fit into logic, just as if you were programming HDL to begin with.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #99 on: June 23, 2017, 02:09:53 am »
- "The level of abstraction in HDLs is too low". You can break out of low level HDL programming if you want, but at the cost of doing things somebody else's way, and most likely paying a lot for IP blocks that are huge, complex and costly. However if your needs are unique, then you need to work at low levels of abstraction, for at least part of the design. The 80/20 rule applies.
Well you can go to extremely high levels of abstraction in VHDL, so its possible to have a higher level language by using the existing tools better. But the core issue is that programming for simultaneous execution is radically different to programming for sequential execution.

There have been some good attempts at C-hdl and they work well at matching some patterns, but remain poor at improving all code. So even with the high level tools you still end up needing to understand the flow and patterns that fit into logic, just as if you were programming HDL to begin with.

This is exactly the point.  There is a difference between writing sequential C code and designing hardware  and hardware design is, well, hard.  That's why it's called hardware.

Software speaks for itself  - soft.

It doesn't seem to me that CS majors are going to do well with HDL unless they also took some EE courses.  HDL is an entirely different thing.
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #100 on: June 23, 2017, 07:25:33 am »
- "The level of abstraction in HDLs is too low". You can break out of low level HDL programming if you want, but at the cost of doing things somebody else's way, and most likely paying a lot for IP blocks that are huge, complex and costly. However if your needs are unique, then you need to work at low levels of abstraction, for at least part of the design. The 80/20 rule applies.
Well you can go to extremely high levels of abstraction in VHDL, so its possible to have a higher level language by using the existing tools better. But the core issue is that programming for simultaneous execution is radically different to programming for sequential execution.

There have been some good attempts at C-hdl and they work well at matching some patterns, but remain poor at improving all code. So even with the high level tools you still end up needing to understand the flow and patterns that fit into logic, just as if you were programming HDL to begin with.

This is exactly the point.  There is a difference between writing sequential C code and designing hardware  and hardware design is, well, hard.  That's why it's called hardware.

Software speaks for itself  - soft.

It doesn't seem to me that CS majors are going to do well with HDL unless they also took some EE courses.  HDL is an entirely different thing.


Yes, without a course in digital logic/design a CS major will not be able to do well with HDL at all. You don't necessarily have to know how to design an IC at the transistor level, but you sure a shit need to know the boolean algebra to be able to multiplex, decode, encode, create registers, etc. and most importantly the graph theory for state machines.

I don't think people are realize that when doing HDL, you create a module, that module is essentially an IC, lets call it IC
  • . IC
  • can be dropped into a solder-less breadboard or be put on a surface mount board, or whatever. Every time you "instantiate" a module, you're plugging another one of IC
  • into a breadboard.


See this guy



He built an 8 bit computer on a huge breadboard. When I created a 64 bit processor on my FPGA, I described each and every one of those IC he has on that board and described how to wire them together using HDL. The HDL then interpreted what I described, synthesized it, and wired the transistors of the FPGA to create those IC and connections that it interpreted.
« Last Edit: June 23, 2017, 07:28:44 am by Mattjd »
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #101 on: June 23, 2017, 08:23:15 am »
Yes, without a course in digital logic/design a CS major will not be able to do well with HDL at all. You don't necessarily have to know how to design an IC at the transistor level, but you sure a shit need to know the boolean algebra to be able to multiplex, decode, encode, create registers, etc. and most importantly the graph theory for state machines.

I'm not sure I'd agree with that. When you design using HDL, you're describing the behaviour of the finished design in terms of what you want it to do. The synthesis tool might infer the need for multiplexers and D-types, but that's not the way the designer has to think. We're a level abstracted.

For example, suppose you're writing an SPI slave, which needs to be able to return one of a number of different values depending on which address is being read. If you're well versed in fundamental digital building blocks, then you might start thinking about how this would be realised using multiplexers and latches. It's fine to be aware, at a very general level, that these are the components which will be required, but you don't need to actually work out how to implement your desired logic using them.

Your code might look something like this:
Code: [Select]
IF sclk'event AND sclk = '1' THEN
  IF reset_n = '0' THEN
    spi_result <= 0;
  ELSE
    CASE spi_addr IS
    WHEN 0 =>
      spi_result <= version_register;
    WHEN 1 =>
      spi_result <= bytes_remaining;
    WHEN 2 =>
      spi_result <= irq_outstanding;
      irq_clear_sig <= NOT irq_clear_ack;
      counter <= 0;
    WHEN 3 =>
      spi_result <= measured_value (counter);
      counter <= counter + 1;
    WHEN 4 =>
      counter <= spi_written_value;
    END CASE;
  END IF;
END IF;

In this example, reading different register addresses should return different values, so clearly a multiplexer is required. Some values stored in latches are also clearly going to be needed.

But: there's quite a bit more to it than that. Reading the interrupt flag at address 2 also has the effect of clearing the flag and resetting a counter, so the design also requires a comparator and some reset logic for the counter. The counter also increments every time a result is read, so we need an adder, and it can also be directly updated by writing another register address.

Trying to work out the underlying building blocks required to implement this rapidly gets out of hand, but thankfully that's the job of the synthesis tool. I only need to be vaguely aware that X number of bits need to be preserved from one clock to the next, so I can estimate the logic usage of the design - and only then if it's big enough compared to the capacity of the chip for that to even possibly be an issue.

Quote
When I created a 64 bit processor on my FPGA, I described each and every one of those IC he has on that board and described how to wire them together using HDL. The HDL then interpreted what I described, synthesized it, and wired the transistors of the FPGA to create those IC and connections that it interpreted.

Oh, my. I do hope that something has got lost in translation somewhere, because describing individual discrete ICs (ie. standard, low level logic functions) and then joining them up is a terrible, terrible way to program an FPGA. HDL allows us to describe what we actually want a device to do, not how we think the thing we want could be built up out of basic logic elements.

Offline chris_leyson

  • Super Contributor
  • ***
  • Posts: 1541
  • Country: wales
Re: Learning FPGAs: wrong approach?
« Reply #102 on: June 23, 2017, 09:43:55 am »
Quote
Oh, my. I do hope that something has got lost in translation somewhere, because describing individual discrete ICs (ie. standard, low level logic functions) and then joining them up is a terrible, terrible way to program an FPGA. HDL allows us to describe what we actually want a device to do, not how we think the thing we want could be built up out of basic logic elements.
I once described the Cinematronics CPU as a bunch of individual TTL ICs for a simulation test bench but gave up trying to write a behavioral model as it was just taking far too long. Ended up turning the simulation model into something that could be synthesized and it worked. I totally agree it's the wrong approach but it was quick and dirty and just for fun.
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #103 on: June 23, 2017, 03:11:02 pm »
When you design using HDL, you're describing the behavior of the finished design in terms of what you want it to do. The synthesis tool might infer the need for multiplexers and D-types, but that's not the way the designer has to think. We're a level abstracted.
When designing any logic circuit you should start out with what you want it to do. HDL just skips the boring part of having to decide what gates to use and how to wire them.

Quote
describing individual discrete ICs (ie. standard, low level logic functions) and then joining them up is a terrible, terrible way to program an FPGA.
It has one valid use - converting an existing discrete design. But for a new design it's working backwards. The purpose of HDL is to avoid having to describe the circuit at individual gate level. Being aware of what type of logic circuit you are creating is good, but trying to reproduce specific 'discrete' logic chips is unnecessarily limiting. Unfortunately many tutorials start out by doing that, presumably to give the student something they are familiar with (not a bad thing in itself, by may give a wrong impression of how best to create a design).

However as a rank beginner who so far has only used WinCUPL - and tried to understand VHDL - the main problem I have is getting to grips with the language itself. Most of the tutorials I have tried  threw in new concepts without adequate explanation, and assumed you will pick up clues to the required syntax just by looking at examples. The result is I can read a piece of VHDL code and almost understand what is going on, but little details trip me up.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #104 on: June 23, 2017, 03:36:03 pm »
It has one valid use - converting an existing discrete design. But for a new design it's working backwards.

I can see that there might be justification for working this way if:
  • a known working design already exists, and
  • the new design must be a drop-in functional equivalent of the existing one, and
  • it must be formally proved, for some reason, that the new functionality exactly replicates the old under all conditions

Then I can see an argument for building a new HDL design based on existing circuits. In any other case, I do think that describing the desired behaviour is the way to go.

Quote
the main problem I have is getting to grips with the language itself. Most of the tutorials I have tried  threw in new concepts without adequate explanation, and assumed you will pick up clues to the required syntax just by looking at examples. The result is I can read a piece of VHDL code and almost understand what is going on, but little details trip me up.

There's a lot to be tripped up on, especially if you try to read VHDL the same way as you might try to read and interpret a sequentially executed language that runs on a microprocessor. Most of us do, of course, because it's entirely natural to read it from top to bottom, and in some instances things which happen towards the end of the source file (note: most definitely not "later" in the file!) do take precedence over things which happen nearer the beginning (not "earlier"!).

The complete absence of any correlation between time of execution, and position in the source file, can easily do anyone's head in.

I'm not sure there's an easy way round this, other than asking questions when you get stuck - sorry.

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #105 on: June 23, 2017, 03:53:50 pm »
It has one valid use - converting an existing discrete design. But for a new design it's working backwards.

I can see that there might be justification for working this way if:
  • a known working design already exists, and
  • the new design must be a drop-in functional equivalent of the existing one, and
  • it must be formally proved, for some reason, that the new functionality exactly replicates the old under all conditions

Then I can see an argument for building a new HDL design based on existing circuits. In any other case, I do think that describing the desired behaviour is the way to go.

Lest we forget, this is the very reason that VHDL exists. The US Department of Gung-ho and Killin' Furruners found that it was increasingly relying on systems full of VLSI chips that might disappear from the supply chain before the weapons system they were in reached end-of-life. They wanted a way of formally documenting chip designs to allow them to re-create these chips as necessary to keep systems in operation. The use of HDL as primary design tool and fodder for synthesis came later.

I'm not sure there's an easy way round this, other than asking questions when you get stuck - sorry.

You can make it less painful by picking Verilog instead of VHDL. [fx: ducks]
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #106 on: June 23, 2017, 04:02:55 pm »
I've been doing a new FPGA design from scratch all this week. It's now beer o'clock on Friday evening and I'm completely frazzled. Every time I close my eyes I see traces wiggling in ModelSim.

Does it really show that badly?  :-BROKE

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #107 on: June 23, 2017, 04:28:11 pm »
However as a rank beginner who so far has only used WinCUPL - and tried to understand VHDL - the main problem I have is getting to grips with the language itself. Most of the tutorials I have tried  threw in new concepts without adequate explanation, and assumed you will pick up clues to the required syntax just by looking at examples. The result is I can read a piece of VHDL code and almost understand what is going on, but little details trip me up.
IMHO one of the problems of VHDL is that many don't know how to really take advantage of it and do stupid things like using the std_logic_vector for all multi-bit signals and/or describe logic instead of functionality. That leads to longwinded incomprehensible code very quickly. Above all VHDL is a parallel programming language. Treat it as such and you will discover it has great power.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #108 on: June 23, 2017, 04:47:39 pm »
There's a lot to be tripped up on, especially if you try to read VHDL the same way as you might try to read and interpret a sequentially executed language that runs on a microprocessor. Most of us do, of course, because it's entirely natural to read it from top to bottom, and in some instances things which happen towards the end of the source file (note: most definitely not "later" in the file!) do take precedence over things which happen nearer the beginning (not "earlier"!).

They do happen earlier (or later). However, the things in VHDL don't happen at run time (as in C, for example), they happen at compile (synthesis) time. VHDL is more like Basic where interpreter reads the code and executes it immediately. The circuit being built is the result of this execution.

This is completely different from traditional languages (such as C) where the compiler builds a program, but doesn't execute it. The program executes later at run time.

 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #109 on: June 23, 2017, 06:24:59 pm »
That's not a distinction I'd have made. BASIC and C may normally be processed into op-codes using different methods, but it's physically possible to compile BASIC and to interpret C if you were so inclined. Regardless of how each is parsed and processed, they both represent a set of instructions to be executed one at a time in a particular order.

Not so with VHDL. The example I like to use is the classic 'how not to swap two values' example found in beginner level textbooks:

Code: [Select]
a <= b;
b <= a;

We're all familiar with how this fails to work. The first assignment makes a equal to b, and the previous value of a is lost forever. The second assignment makes b equal to a, which it was already. Both end up equal to the original value of b.

In VHDL that's not the case. Put these signal assignments into a clocked process, and they do indeed switch values, because the meaning is quite different. "a <= b" means "signal a must, at a time a very short distance in the future, take the value which signal b has right now".

Since no time elapses between one line of code and the next, both signals do indeed switch.

Moreover, it doesn't matter in which order these two lines are written, the meaning is identically the same thing regardless.

However: the order in which commands are placed does affect their precedence in the event of a conflict. For example:

Code: [Select]
a <= b;
a <= a;

...has absolutely no effect whatsoever. The value of a remains completely unchanged, because the later assignment ('a' in the future takes the value of whatever 'a' is right now) simply overrides the earlier one. Synthesize this code, and precisely no FPGA resources at all will be required. There certainly won't be a glitch on the output, as there would be if a similar sequence of operations were to be carried out in order by a microprocessor.

I use this type of construct a lot when dealing with FIFOs. For example:

Code: [Select]
fifo_we <= '0';

IF <interesting set of conditions> THEN
  fifo_data <= new_data_value;
  fifo_we <= '1';
END IF;

This ensures that the FIFO is only ever advanced when there really is new data, and under no other possible conditions.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #110 on: June 23, 2017, 07:02:40 pm »
I don't like to rely on how expressions are ordered. It will confuse people who are less familiar with VHDL and thus it costs time in the long run. For similar reasons I avoid certain constructs in C like the comma operator.

As a rule I code VHDL in a way so a signal assignment has a single condition. Sometimes this makes things harder at first but after some thinking about what I'm trying to achieve it usually results in less lines of code and a solution which is much easier to follow.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #111 on: June 23, 2017, 07:05:00 pm »
However as a rank beginner who so far has only used WinCUPL - and tried to understand VHDL - the main problem I have is getting to grips with the language itself. Most of the tutorials I have tried  threw in new concepts without adequate explanation, and assumed you will pick up clues to the required syntax just by looking at examples. The result is I can read a piece of VHDL code and almost understand what is going on, but little details trip me up.
IMHO one of the problems of VHDL is that many don't know how to really take advantage of it and do stupid things like using the std_logic_vector for all multi-bit signals and/or describe logic instead of functionality. That leads to longwinded incomprehensible code very quickly. Above all VHDL is a parallel programming language. Treat it as such and you will discover it has great power.

Most, if not all, entry level tutorials will use std_logic_vector rather than unsigned and that's how folks get started using std_logic_arith.all in order to implement counters or add vectors.  OTOH, the more pedantic approach of using unsigned simply means that I spend a lot of time writing casts between the types. 

As an example, I can't write the 32 bit unsigned output of an adder to a BlockRam (just to contrive an example) because the BlockRAM, over which I have no control of the definition, expects std_logic_vector.

http://www.synthworks.com/papers/vhdl_math_tricks_mapld_2003.pdf

And, yes, I am moving toward unsigned but I sure don't know why.

 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #112 on: June 23, 2017, 07:10:09 pm »
I don't like to rely on how expressions are ordered. It will confuse people who are less familiar with VHDL and thus it costs time in the long run. For similar reasons I avoid certain constructs in C like the comma operator.


Assuming that the code above (FIFO) was part of a larger FSM, the choice is to define a default condition for fifo_we and then override it when necessary or define it at every single state which is a lot of useless typing.

I have been recommending the default value throughout this topic and this is another example where it applies.

Different people use different styles.  I don't shy away from the C comma operator if it means I can avoid an 'else' block containing just a single line.

Again, different styles...
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #113 on: June 23, 2017, 07:14:13 pm »
However as a rank beginner who so far has only used WinCUPL - and tried to understand VHDL - the main problem I have is getting to grips with the language itself. Most of the tutorials I have tried  threw in new concepts without adequate explanation, and assumed you will pick up clues to the required syntax just by looking at examples. The result is I can read a piece of VHDL code and almost understand what is going on, but little details trip me up.
IMHO one of the problems of VHDL is that many don't know how to really take advantage of it and do stupid things like using the std_logic_vector for all multi-bit signals and/or describe logic instead of functionality. That leads to longwinded incomprehensible code very quickly. Above all VHDL is a parallel programming language. Treat it as such and you will discover it has great power.

Most, if not all, entry level tutorials will use std_logic_vector rather than unsigned and that's how folks get started using std_logic_arith.all in order to implement counters or add vectors.  OTOH, the more pedantic approach of using unsigned simply means that I spend a lot of time writing casts between the types. 

As an example, I can't write the 32 bit unsigned output of an adder to a BlockRam (just to contrive an example) because the BlockRAM, over which I have no control of the definition, expects std_logic_vector.

http://www.synthworks.com/papers/vhdl_math_tricks_mapld_2003.pdf

And, yes, I am moving toward unsigned but I sure don't know why.
That is why you should use std_numeric. Basic rule: if it is a number then use the types signed and unsigned. And don't infer blockrams (or any other primitives). Just create an array and the synthesizer will decide whether to use blockrams or other resources.

Then you can do stuff like this to read data from a memory:
Code: [Select]
if rising_edge(clk)
  a <= ram_data(read_pointer);
end if;
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #114 on: June 23, 2017, 07:31:51 pm »
I like the general approach of having the usual, most likely value of a signal be defined as a default, then have the code describe those things which are interesting or noteworthy under various conditions.

The FIFO example is one which works well. Another might be a counter, which increments on every clock edge except when some event causes it to reset to zero. You could end up with a lot of paths through a FSM all of which boil down to "do something interesting, and yes, don't forget to increment the bl**dy counter".

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #115 on: June 23, 2017, 08:34:52 pm »
"a <= b" means "signal a must, at a time a very short distance in the future, take the value which signal b has right now".

What do you mean by "right now?"
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #116 on: June 23, 2017, 08:50:07 pm »

That is why you should use std_numeric. Basic rule: if it is a number then use the types signed and unsigned. And don't infer blockrams (or any other primitives). Just create an array and the synthesizer will decide whether to use blockrams or other resources.


I don't usually infer BlockRams, I instantiate them and, more often than not, specify the initial contents.  Furthermore, it is easy to initialize contents post bit file generation by using Xilinx's data2mem utility.  This means I don't need to re-synthesize or rerun place/route just to test a different program (assuming a CPU project).

At one time, I was specifying the contents in the .ucf file simply because it eliminated having to re-synthesize.  Later on I ran across data2mem and now I don't even need to place/route or generate the bitfile.

These days, playing with the Lattice toolchain, it seems simpler to use their IPexpress to create the memory block and attach a filename for the contents.  Unfortunately, this implies I will have to re-synthesize to update the contents.  I'm not too sure what to think about that.  The good news is that I am only messing around with the Lattice MachXO3 board.  I'll spend most of my time in the Xilinx world.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #117 on: June 23, 2017, 10:19:37 pm »
"a <= b" means "signal a must, at a time a very short distance in the future, take the value which signal b has right now".

What do you mean by "right now?"

I mean, at the precise instant when either:

a) an active clock edge occurs, or
b) at the time when a signal in a process's sensitivity list changes state, causing the process to (for want of a better term) execute.

For example:

Code: [Select]
IF rising_edge (clk) THEN
  b <= a;
  a <= b;
END IF;

...causes the values stored in registers 'a' and 'b' to switch places at the precise time when a rising edge of the clock occurs.

In a real device, there are of course propagation delays to consider, so the values will actually switch a few nsec after the edge. Nevertheless, the meaning of the code is unambiguous, and the synthesis tool will work out the necessary layout to make the real logic behave correctly, including any signals which depend on those which have just been assigned.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #118 on: June 23, 2017, 10:32:22 pm »
Nitpicking mode: For synchronous processes it will work but for asynchronous processes (depending on other signals than a clock) it will result in a mess and the synthesizer won't be able to do anything about it. You can't rule out the nature of the FPGA fabric entirely so signals won't arrive at the same time.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #119 on: June 23, 2017, 11:23:00 pm »
Quote
What do you mean by "right now?"

I mean, at the precise instant when either:

a) an active clock edge occurs, or
b) at the time when a signal in a process's sensitivity list changes state, causing the process to (for want of a better term) execute.

You relate your time to the events which are going to happen on FPGA when you run your program. From this timing viewpoint, VHDL looks weird and non-sequential, which might be a cause of confusion. To see the sequence in VHDL, try to relate to the time when the VHDL compiler goes through your VHDL code.

Imagine, you're building a circuit from ICs on the breadboard. You have inserted the ICs and now you need to connect some wires. You will do this sequentially, but the exact sequence doesn't really matter - you can do it in any order as soon as the end result is the same. The sequence of connecting wires has nothing to do with the sequence of events which will transpire in the circuit when you turn it on.

Same with VHDL. As the VHDL reads your code, it connects wires accordingly (or do other changes to the circuit being buit). These operations are perfectly sequential. But there's no direct relation between the sequence of VHDL statements and the sequence of events in FPGA.

For example, look at the following VHDL code:

Code: [Select]
process(clk)
begin
  if rising_edge(clk) then
    -- Connect 7 wires to produce a shift
    for i in 0 to 6 loop
      a(i) <= a(i+1);
    end loop;
    -- Right Now the 8-th wire is still unconnected, so connect it
    a(7) <= a(0);
  end if;
end process;

Here VHDL goes through the "for" loop, making a single connection on every pass. This loop exists only in VHDL - nothing is going to loop in FPGA. The "i" variable will not exist in FPGA neither. The words "Right Now" in the comment refer to the time during the process of connecting wires (building circuit). This time point will not exist in FPGA. If you look at it this way, VHDL looks perfectly sequential. That's how I look at it.

 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #120 on: June 23, 2017, 11:28:24 pm »
"a <= b" means "signal a must, at a time a very short distance in the future, take the value which signal b has right now".

What do you mean by "right now?"

This is how I like to think about it.  the '<=' means the logic, or math will be performed once every clock cycle.  It's basically a set of D flipflops.  The variable to the left of '<=' is the outputs of the D flipflops.

So, if we say:

a <= b + c;

This means that the variables 'b' and 'c' go through and addition logic and feed the data inputs of flipflops which creates variable a.  This means at the next clock cycle, the value of a which change to the sum of 'b+c'.

If we say:
a <= a + 1;

This still looks like a simple C or Basic program line.  The inputs of 'a' d flipflops are tied to the output of 'a' flipflops added with 1.  Once again, which each clock, the outputs of a will have a new value.

Now, if we say:
a <= b;
b <= a;

What's going on here is that 'a' data inputs are tied to 'b' flipflop's outputs.  And 'b' data inputs are tied to 'a' data outputs.  Now, since Verilog/VHDL runs all the logic in parallel, when 1 single clock cycle comes, 'a' data in will latch 'b' outputs and 'b' data inputs will latch data 'a' outputs.  This effectively swaps the contents 'a' & 'b' each and every clock.

Now, if we say:
a <= b;
b <= c;
c <= a;

Just follow the above rules and you will see we sort of made a circular 3 word buffer, where every clock, all 3 variables a,b,c move all at once, at every clock, not one after the other.

Now, for that pesky line 'at a time a very short distance in the future'.  This has to do with how fast the clock is going.  If the clock is slower than the time for the wiring and logic gates on the silicon to setup all the data inputs, the formula on the right hand side of the '<=', then when the clock cycles, the output register being set on the left hand side of the '<=' will have the correct results.  If your clock is going too fast, errors in some or all of the bits feeding the data inputs of the d flipflop registers on the left side of the '<=' will have the wrong results.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #121 on: June 24, 2017, 01:35:47 am »
Just as a note to HDL developers VS software developers.  I do develop in both, and I do see how some of my HDL debugging can drive me nuts finding that tiny logic error which always creeps up somewhere in a sophisticated design, and it does take me longer to work with due to compile times for those designs which evolve to large to simulate on a small home setup.  However, I think I thoroughly enjoy the work and effort put into working FPGA just for the raw awesome potential to make anything happen, even at all the costs involved.

How many of you feel this way?
Or, do you go to FPGA just because you have no other choice and would prefer a simple MCU only solution?
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #122 on: June 24, 2017, 08:07:33 pm »
There's a lot to be tripped up on, especially if you try to read VHDL the same way as you might try to read and interpret a sequentially executed language that runs on a microprocessor... The complete absence of any correlation between time of execution, and position in the source file, can easily do anyone's head in.
I don't have a problem with that. It's obvious that hardware operates sequentially only when it is wired sequentially, and why should the position in the source file matter? 

What I am talking about is stuff like this:-

A VHDL Tutorial from Green Mountain Computing Systems, Inc. introduces you to VHDL with this:-

Quote
entity latch is
  port (s,r: in bit;
        q,nq: out bit);
end latch;

The first line indicates a definition of a new entity, whose name is latch. The last line marks the end of the definition. The lines in between, called the port clause, describe the interface to the design. The port clause contains a list of interface declarations. Each interface declaration defines one or more signals that are inputs or outputs to the design.

Each interface declaration contains a list of names, a mode, and a type. In the first interface declaration of the example, two input signals are defined, s and r. The list to the left of the colon contains the names of the signals, and to the right of the colon is the mode and type of the signals. The mode specifies whether this is an input (in), output (out), or both (inout). The type specifies what kind of values the signal can have. The signals s and r are of mode in (inputs) and type bit. Next the signals q and nq are defined to be of the mode out (outputs) and of the type bit (binary). Notice the particular use of the semicolon in the port clause. Each interface declaration is followed by a semicolon, except the last one, and the entire port clause has a semicolon at the end.
That's a lot to take in for your very first exposure to VHDL - a dense paragraph that introduces many new concepts but expects you to infer others (what is the meaning of 'is'? where is whitespace permitted? etc.) 

Then they follow it with this:-
Quote
architecture dataflow of latch is
  signal q0 : bit := '0';
  signal nq0 : bit := '1';
begin
  q0<=r nor nq0;
  nq0<=s nor q0;

  nq<=nq0;
  q<=q0;
end dataflow;

The first line of the declaration indicates that this is the definition of a new architecture called dataflow and it belongs to the entity named latch. So this architecture describes the operation of the latch entity. The lines in between the begin and end describe the latch's operation.
What does 'signal' mean in this context? Why is there no mode? What does ':=' mean? Why do we need 'begin'? nq is less than or equal to nq0?

I am currently reading this tutorial, which is making a lot more sense to me so far...
 
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #123 on: June 24, 2017, 11:59:27 pm »
Now i feel like everything i've learned about HDL is wrong.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #124 on: June 25, 2017, 01:48:22 am »
Now i feel like everything i've learned about HDL is wrong.
I use Verilog to make my life soooo much easier.  Especially if you use a simple single synchronous clock for everything, nothing asynchronous.  Coding this way make very portable designs across all FPGAs and PLDs.  As for the above VHDL example, it twists my head and I avoid it at all cost and wont ever use it.
 

Offline Amazing

  • Regular Contributor
  • *
  • Posts: 59
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #125 on: June 25, 2017, 03:41:36 am »
How many of you feel this way?
Or, do you go to FPGA just because you have no other choice and would prefer a simple MCU only solution?

I got dragged into FPGA work kicking and screaming due to a contractor that bailed on the project after creating the hardware but before writing the VHDL.  So we were stuck with a FPGA-based board and no one to program it.

I got lucky in that I found another contractor who was a wiz a VHDL and he got our board going.  He also taught me a ton and now I really enjoy being able to harness the power of FPGAs in my design.

One thing that I think is really fun is getting deeply involved in breaking down a problem, designing cores (e.g. ALUs) specifically for that problem, and pipelining out the wazoo to increase efficiency.

Sadly though I tend not to have time for that sort of thing on a paying gig -- then I just buy the next size up, describe the logic in state machines, and let the synthesizer do it's thing.  Much more cost efficient for low volume production that way.

What I learned about writing VHDL is that it's all about the mindset.  EE's love to remind us software folks to "remember, you're creating hardware, not writing a program".  But it's really much deeper and less obvious than that.

To everyone learning for the first time, I'd say, persevere, take small steps, don't worry about simulation or test benches at first, and read as much as you can on different styles of programming VHDL.  Eventually it will soak in and you will "get it".

I use Verilog to make my life soooo much easier.  Especially if you use a simple single synchronous clock for everything, nothing asynchronous.  Coding this way make very portable designs across all FPGAs and PLDs.  As for the above VHDL example, it twists my head and I avoid it at all cost and wont ever use it.

That's funny, I learned VHDL first and I think that Verilog is incomprehensible.
« Last Edit: June 25, 2017, 03:46:08 am by Amazing »
 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #126 on: June 25, 2017, 08:53:18 am »
You must never forget that you are describing hardware.
<= means 'is connected to', not 'becomes equal to'.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #127 on: June 25, 2017, 09:55:04 am »
You must never forget that you are describing hardware.
<= means 'is connected to', not 'becomes equal to'.

"is connected to" doesn't work in clocked processes. e.g:

Code: [Select]
  if rising_edge(clk) then
     a <= b;
  end if;

I don't think of it as "on the rising edge of 'clk, 'a' is connected to 'b' " - if asked to describe it, I would say "on each 'clk' tick, store the value of 'b' in 'a'".

I can't actually find the words that match what "<=" does in each of the different contexts in which it is used. I more think of "=>" as 'connected to', for example

Code: [Select]
i_counter: counter port map (
    clk     => sys_clk,
    count => cycle_count);

I think of that as "a counter, with 'clk' conencted ot 'sys_clk' and 'count' connected to 'cycle_count'"...
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #128 on: June 25, 2017, 10:08:34 am »
<= means 'is connected to', not 'becomes equal to'.

Sorry, Mike, I don't agree with you there. The only symbol which means "is connected to" is "=>", when used to map the ports of a component to signals at a higher level of hierarchy:

Code: [Select]
my_logic_gate: d_type PORT MAP (
  d_in => my_data,
  q_out => my_output,
  clk => master_clock
);

I think of "=>" as meaning "takes its new value from", or indeed, "becomes equal to" (at a point in the future one time quantum from now, but not actually now)

In a clocked process:
Code: [Select]
PROCESS (clk)
-- exchange the values of a and b on every clock edge
BEGIN
  IF clk'event AND clk = '1' THEN
    a <= b;
    b <= a;
  END IF;
END PROCESS;

...or in asynchronous logic...

Code: [Select]
PROCESS (a)
BEGIN
  b <= NOT a;
END PROCESS;


Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #129 on: June 25, 2017, 10:22:07 am »
You must never forget that you are describing hardware.
Actually you must forget about the hardware otherwise you'll be writing way too much code. When programming in C you are also not going to bother whether a variable is stored in a register r1 or r2 or where exactly it is in RAM. VHDL is the same. For example: you can write a<=a*(b+d) +c; in VHDL and the synthesizer will figure out it needs a multiplyer and how it needs to be connected. No need to infer it and deal with how it is actually connected.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #130 on: June 25, 2017, 10:27:04 am »
What does 'signal' mean in this context? Why is there no mode? What does ':=' mean? Why do we need 'begin'? nq is less than or equal to nq0?

I hate this, when tutorials are written by people a little too familiar with the subject matter, and they begin with material that should have been on about page 5, leaving out the important introduction to the subject (definitions, context, general explanation of what the heck is going on) which should have filled pages 1 to 4.

A "signal" is any value which needs to be stored, or output from the device. Almost every piece of data which your FPGA handles will be a "signal". The values of signals" are generally retained in the D-type latches which form part of the FPGA fabric.

I don't know what you mean by "mode" in this context.

":=" is a symbol used, in this context, to assign a default value to a signal, which it will have at the point when the FPGA has just been powered up and configured. It's a method often used to ensure that counters start at zero, state machines initialise to a valid 'idle' state, and so on.

"<=" does indeed mean "less than or equal" when used in the context of a comparison, but here, it means assignment (see long rambling posts above).

"Begin" just means "by this point, we've declared all the signals we're going to use... now here's the logic which defines their behaviour". It's just semantics. Some things must go before the 'begin', and some after. Don't read too much into it, just copy an example and structure your code the same way.

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #131 on: June 25, 2017, 11:15:58 am »
What does 'signal' mean in this context? Why is there no mode? What does ':=' mean? Why do we need 'begin'? nq is less than or equal to nq0?

I hate this, when tutorials are written by people a little too familiar with the subject matter, and they begin with material that should have been on about page 5, leaving out the important introduction to the subject (definitions, context, general explanation of what the heck is going on) which should have filled pages 1 to 4.

A "signal" is ...

I don't know what you mean by "mode" in this context.

":=" is a ...

"<=" does indeed mean ....


"Begin" just means "by this point, we've declared all the signals we're going to use... now here's the logic which defines their behaviour". It's just semantics. Some things must go before the 'begin', and some after. Don't read too much into it, just copy an example and structure your code the same way.

I think Bruce's questions were meant to be rhetorical. And I think you mean 'syntax' not "semantics".
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #132 on: June 25, 2017, 01:02:04 pm »
"<=" doesn't mean "connect", but it infers connection(s).

The only way to make things work on a breadboard is to place ICs and connect them with wires.

FPGA is a huge collection of elements (LUTs, FFs, RAM etc.). They're connected through configuration switches. The bitstream is simply a collection of bits.  Each bit controls a switch (or switches) thus making or breaking a connection.

The VHDL code is simply a mechanism to convey which connections are needed.

Code: [Select]
PROCESS (clk)
-- exchange the values of a and b on every clock edge
BEGIN
  IF clk'event AND clk = '1' THEN -- A signal which changes in this block is going to be a flip-flop clocked by clk
    a <= b; -- connect the output of flip-flop b to the input of flip-flop a
    b <= a; -- connect the output of flip-flop a to the input of flip-flop b
  END IF;
END PROCESS;

Code: [Select]
PROCESS (a)
BEGIN
  b <= NOT a; -- build an inverter. Connect its input to a and output to b.
END PROCESS;
« Last Edit: June 25, 2017, 01:37:00 pm by NorthGuy »
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #133 on: June 25, 2017, 01:38:55 pm »
What is wrong with seeing <= and := as assignment operators? Just like in C the = assigns the value from what is on the right to what is on the left. In VHDL <= and := assign what is on the right to what is on the left so there really isn't any difference.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #134 on: June 25, 2017, 03:32:15 pm »
Now i feel like everything i've learned about HDL is wrong.
I use Verilog to make my life soooo much easier.  Especially if you use a simple single synchronous clock for everything, nothing asynchronous.  Coding this way make very portable designs across all FPGAs and PLDs.  As for the above VHDL example, it twists my head and I avoid it at all cost and wont ever use it.

It's odd how the language you start with becomes your language of choice.  I started with VHDL and, for the life of me, I can't figure out Verilog.  VHDL tends to be more Pascal like in that it is quite verbose.  Verilog, in my view, is C like in that it can be quite terse.

I have made several half-hearted attempts to understand Verilog and I can't get there.  What I need to do is design an entire project using only Verilog and force myself to work with it.  But, no, I will get to the point where all I want is the finished project and it will be coded in VHDL.

I have NEVER understood the difference between blocking and non-blocking assignments in an 'always' block and whether it matters if the block is clocked.  I read this and get completely confused...

https://electronics.stackexchange.com/questions/91688/difference-between-blocking-and-nonblocking-assignment-verilog

In VHDL, it's a simple concept:  If the block is clocked, all assignments in the block are registered.  If the block isn't clocked, all assignments are combinatorial.  THIS I can understand!

Verilog has the '=' symbol for 'blocking' assignment and '<=' for 'non-blocking' assignments (whatever that may mean).  But the idea that one creates sequential logic and the other creates parallel logic within the 'always' block escapes me.  It's ALL parallel inside the chip!

I think I'm just too old to catch on...

 

Offline mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13748
  • Country: gb
    • Mike's Electric Stuff
Re: Learning FPGAs: wrong approach?
« Reply #135 on: June 25, 2017, 04:16:06 pm »
What is wrong with seeing <= and := as assignment operators? Just like in C the = assigns the value from what is on the right to what is on the left. In VHDL <= and := assign what is on the right to what is on the left so there really isn't any difference.
The problem is that in a programming language, assignment happens at a specific moment. In asynchronous logic, the assignment is effectively happenning continuously.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #136 on: June 25, 2017, 04:44:01 pm »
Verilog has the '=' symbol for 'blocking' assignment and '<=' for 'non-blocking' assignments (whatever that may mean).

A 'blocking' assignment blocks anything else from happening (simultaneously) in the same code block while the assignment is happening, a 'non-blocking' one doesn't.

So, if we start off with three registers and their initial values A=1, B=2 and C=3.

If we execute the following sequence of blocking assignments:

begin
   B = A;
   C = B;
end


we get the result A=1, B=1, C=1. That is, the first statement executed in its entirety before the second, each blocking assignment is 'executed' in sequence. Now let's do the same thing with non-blocking assignments, and the same initial values as before:

begin
   B <= A;
   C <= B;
end


This time the result is A=1, B=1, C=2. The values for the right hand sides were taken as we 'passed' 'begin', all the assignments happened simultaneously, and they all finished at the same time, just as we reached 'end'.

That's slightly simplistic and wouldn't probably satisfy a language lawyer, but it gives the essentially flavour of what's going on.

The blocking assignment is useful in writing test beds and the like but dangerous, and usually wrong, in writing code that you actually expect to be implemented in hardware. You can fake up quite a complex signal for a test bed by combining blocking assignments with delays but that kind of usage is not synthesizeable and so will never make it to real hardware.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #137 on: June 25, 2017, 04:50:22 pm »
I think Bruce's questions were meant to be rhetorical.
At the time I read the tutorial that was what I was thinking. I now know better, but this thread is helping to clarify some things in my mind.   
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #138 on: June 25, 2017, 05:11:48 pm »
Verilog has the '=' symbol for 'blocking' assignment and '<=' for 'non-blocking' assignments (whatever that may mean).

A 'blocking' assignment blocks anything else from happening (simultaneously) in the same code block while the assignment is happening, a 'non-blocking' one doesn't.

So, if we start off with three registers and their initial values A=1, B=2 and C=3.

If we execute the following sequence of blocking assignments:

begin
   B = A;
   C = B;
end


we get the result A=1, B=1, C=1. That is, the first statement executed in its entirety before the second, each blocking assignment is 'executed' in sequence. Now let's do the same thing with non-blocking assignments, and the same initial values as before:

begin
   B <= A;
   C <= B;
end


This time the result is A=1, B=1, C=2. The values for the right hand sides were taken as we 'passed' 'begin', all the assignments happened simultaneously, and they all finished at the same time, just as we reached 'end'.

That's slightly simplistic and wouldn't probably satisfy a language lawyer, but it gives the essentially flavour of what's going on.

The blocking assignment is useful in writing test beds and the like but dangerous, and usually wrong, in writing code that you actually expect to be implemented in hardware. You can fake up quite a complex signal for a test bed by combining blocking assignments with delays but that kind of usage is not synthesizeable and so will never make it to real hardware.

The first one infers a flip-flop with A as input and both B and C as outputs. Sequential (blocking) execution of Verilog statements produces parallel wiring.

The second one infers two flip-flops connected in a chain. A->FF->B->FF->C. Parallel (non-blocking) execution of Verilog statements produces serial wiring.

This is certainly a case of weird terminology.

I use VHDL because I started with it (pure coincidence). I have no intention of using Verilog. VHDL lets me do everything I would want it to do. I'm absolutely sure if I started with Verilog, the situation would be reverse and I would never wanted to use VHDL. Just as rstofer suggested. Imprinting :)
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1640
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #139 on: June 25, 2017, 05:36:28 pm »
in VHDL:

"<=" is used in assignments of signals.
":=" is used for assignment of variables.

Signals can exist in an architecture, process and procedures.
Variables can exist in process and functions.

A signal at an architecture level is basically a wire. It connects signals together with perhaps a few gates, like:
Code: [Select]
ARCHITECTURE ... OF ... IS
SIGNAL a, b, c : STD_LOGIC;
BEGIN
a <= b AND c;
END ARCHITECTURE;

This way you can compute new values within an entity (not shown in example).

Using a process you could compute new values of a at the rising edge of a clock, i.e. sequential logic:
Code: [Select]
ARCHITECTURE ... OF ... IS
SIGNAL a, b, c : STD_LOGIC;
BEGIN
PROCESS(clk)
BEGIN
IF rising_edge(clk) THEN
a <= b AND c;
END IF;
END PROCESS;
END ARCHITECTURE;

Why have variables when we have signals? Because if you assign a new value to a signal, it's new value will not take action immediately. Only after the process is finished running the new value is used.

A variable however is updated instantly, so you can assign a value and then read that new value from it. A variable does hold it's value after you "exit" the process as well. But you cannot use them in an architecture, so they are best used as intermediate values.

In terms of simulation this is a key difference. Signals are simulated using delta delays. That means that if a new value is assigned to a signal, it's delayed to take that value at t+1 'delta'. If new values for other signals need to be computed (e.g. b or c changed in first example) it will happen at t+2 delta, t+3 delta, etc. Delta is an arbitrary time stamp, just to differentiate it will happen slightly later in the future.

Because all statements in a process happen at a 1 timestamp, time can only be advanced when the process is left or a wait statement has been hit (unusual to do if you target hardware, especially using free tools).

In terms of synthesis onto real hardware, either a signal or a variable in a process can result in a wire or D-flip flop.. This is dependent if the value is first written and then read (=wire) or first read and then written (= flip flop).


I'm sure this has high similarities to Verilog blocking and non-blocking statements, but I haven't programmed much Verilog, mostly read code. Both languages are very similar, VHDL is strongly typed , Verilog is loosely typed. Verilog has some unique features, but so does VHDL...
« Last Edit: June 25, 2017, 05:39:46 pm by hans »
 

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 711
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #140 on: June 25, 2017, 05:55:11 pm »
For all the Python haters, yes.. you can design hardware with Python - http://www.myhdl.org :)
As if VHDL and Verilog weren't confusing enough, now we have another HDL to learn.

What advantages does MyHDL have over the other two?

All these High Level Synthesis (HLS) HDLs seem to have common threads to address these (and other) problems:

MyHDL is not HLS.  It is a bona fide HDL which just so happens to be implemented within the [very flexible] syntax of Python.  Everything you have in Verilog and VHDL you get in MyHDL too, and there is very little in MyHDL which does not map 1:1 back into the incumbent languages.

As to why MyHDL and not an incumbent HDL, I think the author would claim to have avoided some of the mistakes that were made Verilog/VHDL, in the same way that *any* second try usually comes out better, simply because it is informed by experience.  He (Jan) is more of a VHDL guy, and that definitely shows in MyHDL, but the verbosity and archaisms many people dislike in VHDL are much reduced in MyHDL.

Another big reason: writing test benches in Python is going to be almost infinitely better than writing them in Verilog/VHDL.  You can take advantage of the Python unit-test frameworks, simulate your DSP flow using NumPy/SciPy, make actually useful plots, and so on.  I think this aspect alone would tip the scales in MyHDL's favor were it not for...

The biggest reason NOT to use MyHDL:  It's not directly supported by FPGA vendors, and never will be.  The generated Verilog/VHDL output is fine as long as you are using vanilla HDL, but as soon as you need to work with and simulate a vendor-specific hard block, it becomes a major headache.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #141 on: June 25, 2017, 09:50:59 pm »
Actually you must forget about the hardware otherwise you'll be writing way too much code.

I don't think that is 100% true - if you forget that you are working in h/w you can drop into writing code that does not map well to H/W.

Thought experiment - A design requires a module that takes a clk signal and four 8-bit numbers (a_in, b_in, c_in, d_in) and sorts them low to high, to generate four outputs (a_out, b_out, c_out and d_out). Design it with a software mindset, and then a H/W mindset.

Software is easy:
Code: [Select]
  // copy them all across
  a_out <= a_in;  b_out <= b_in;  c_out <= c_in; d_out <= d_in;
  // bubble sort them
  if(a_out > b_out) swap(a_out, b_out);
  if(b_out > c_out) swap(b_out, c_out);
  if(c_out > d_out) swap(c_out, d_out);

  if(a_out > b_out) swap(a_out, b_out);
  if(b_out > c_out) swap(b_out, c_out);

  if(a_out > b_out) swap(a_out, b_out);
  // Should now be in order
(I might have 20% more code / cycles than needed)

The H/W mindset has additional factors
- Latency - can it be done in a single cycle? how many cycles are needed?
- Speed - what will clock fastest? - fastest is most likely three cycles.
- Logic resource used
- Maximizing concurrency
- Can it efficiently scale when the need for five ore more inputs inevitably comes along?

So when it comes to "which is the best way" for H/W there are more factors in play, even for as simple a task as ranking four numbers in order.



Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #142 on: June 25, 2017, 10:12:13 pm »
Just like software things like speed, resources, size only become relevant for corner cases and it takes a lot of time & effort to accomplish. Why should you suddenly optimise all facets of an FPGA design if you have lots of gates and lots of speed?
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #143 on: June 25, 2017, 10:56:53 pm »
Why should you suddenly optimise all facets of an FPGA design if you have lots of gates and lots of speed?
Plenty of reasons, some of which may or may not apply.

- If you didn't have constraints that you need to hit (speed, power, latency, cost, size) the you wouldn't be using FPGAs, and you would do it in S/W.

- If your design is even somewhat well thought out, you know the bits you have to worry performance-wise before you even start implementing, and you know what is fluff where you don't even have to try.

- Battery life. Making the bulk more efficient is the best way to reduce power demands

- If working on a product usually device will be selected well before the design is finished, and all the economics pretty much fixed. If you are in the nice place of using 60% of the resource then you can let the design bloat. If you are using 85% or 90% then bloat might force you to use a bigger part with compatible footprint,

- Spare resources = can add more features = better product for same price

- The easier the bulk of a design is to place and route, the more flexibility the tools have for placing and routing the toughest parts of the design.

- Changing pipeline depths late in the development process to improve timing is costly (redesign, retest, reintegrate)

- 6.73ns - The common FullHD pixel clock is 148.5MHz. If you are working on a video design you need to hit this and have a wee bit of slack.

- A sharp tool is better than a dull one
« Last Edit: June 25, 2017, 10:59:58 pm by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #144 on: June 25, 2017, 11:31:54 pm »
Thought experiment - A design requires a module that takes a clk signal and four 8-bit numbers (a_in, b_in, c_in, d_in) and sorts them low to high, to generate four outputs (a_out, b_out, c_out and d_out). Design it with a software mindset, and then a H/W mindset.

Oh, I like that. I might give that a crack in Verilog tomorrow and see where I get.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 
The following users thanked this post: hamster_nz

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #145 on: June 26, 2017, 12:42:34 am »
Thought experiment - A design requires a module that takes a clk signal and four 8-bit numbers (a_in, b_in, c_in, d_in) and sorts them low to high, to generate four outputs (a_out, b_out, c_out and d_out). Design it with a software mindset, and then a H/W mindset.

I don't think you gain a lot in terms of efficiency, but I would argue it is easier to design with hardware mindset. You can synthesise your "software mindset" design and see how much resources it uses. You can then compare to what you would get with "hardware mindset".

Assuming Xilinx 7-series 6-input LUTs, you would need:

- 6 modules to do 6 comparisons - 4 LUTs each = 24 LUTs. It'll take 2 layers of combinatory logic. You'll get 6 outputs from this representing the results of the comparisons

- For each 8 bit output - 6 x 2 table which converts 6 outputs from the previous layer into the 2-bit index. The 2-bit index will select which input you want to multiplex to the given output. 2 LUTs each = 8 LUTs. One layer of combinatory logic.

- For each bit of the outputs (32 bits total) a mux which uses 2-bit index from the previous layer to select one of the 4 inputs. 1 LUT each = 32 LUTs. One layer of combinatory logic.

Bottom line:

24 + 8 + 32 = 64 LUTs = 16 slices.

2 + 1 + 1 = 4 layers of combinatory logic roughly 0.7 ns each (including intra-layer routing) = 2.8 ns. I'd expect it would run fine with 4 ns clock period - 250 MHz.

 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #146 on: June 26, 2017, 04:31:10 am »
Thought experiment - A design requires a module that takes a clk signal and four 8-bit numbers (a_in, b_in, c_in, d_in) and sorts them low to high, to generate four outputs (a_out, b_out, c_out and d_out). Design it with a software mindset, and then a H/W mindset.

I don't think you gain a lot in terms of efficiency, but I would argue it is easier to design with hardware mindset. You can synthesise your "software mindset" design and see how much resources it uses. You can then compare to what you would get with "hardware mindset".

Assuming Xilinx 7-series 6-input LUTs, you would need:

- 6 modules to do 6 comparisons - 4 LUTs each = 24 LUTs. It'll take 2 layers of combinatory logic. You'll get 6 outputs from this representing the results of the comparisons

- For each 8 bit output - 6 x 2 table which converts 6 outputs from the previous layer into the 2-bit index. The 2-bit index will select which input you want to multiplex to the given output. 2 LUTs each = 8 LUTs. One layer of combinatory logic.

- For each bit of the outputs (32 bits total) a mux which uses 2-bit index from the previous layer to select one of the 4 inputs. 1 LUT each = 32 LUTs. One layer of combinatory logic.

Bottom line:

24 + 8 + 32 = 64 LUTs = 16 slices.

2 + 1 + 1 = 4 layers of combinatory logic roughly 0.7 ns each (including intra-layer routing) = 2.8 ns. I'd expect it would run fine with 4 ns clock period - 250 MHz.

Pretty much the same idea I had - get all the comparisons out the way, then select the outputs.

I asked a software friend how they would do it. First reply was to put an "ORDERED BY" clause on the SQL query used to get the items.

The second one was along the lines of

Code: [Select]
   array items = [a_in, b_in, c_in, d_in];
   sort(items);
   a_out = items[0];
   b_out = items[1];
   c_out = items[2];
   d_out = items[3];


Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #147 on: June 26, 2017, 06:23:45 am »
I asked a software friend how they would do it. First reply was to put an "ORDERED BY" clause on the SQL query used to get the items.
That's scary on so many levels  :scared:

In an FPGA, I'd do it one of two ways depending on the required clock speed and latency.

To do it in a single cycle, I'd make use of VHDL variables, and translate your 'software mindset' example more or less directly.

If that method ended up too slow to meet the required fmax, then it would need to be pipelined. On the first clock, perform three of the compare/swap operations, store the intermediate results in internal registers, and set a flag. Then, on the second, perform the other three compare/swaps, assign the final result to the outputs, and clear the flag again.

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #148 on: June 26, 2017, 07:59:16 am »
To do it in a single cycle, I'd make use of VHDL variables, and translate your 'software mindset' example more or less directly.

If that method ended up too slow to meet the required fmax, then it would need to be pipelined. On the first clock, perform three of the compare/swap operations, store the intermediate results in internal registers, and set a flag. Then, on the second, perform the other three compare/swaps, assign the final result to the outputs, and clear the flag again.
I'm trying to find the reference but one of the big open source processor/SoC teams were using a strict coding style where the work was all done in functions, and registers were directly inferred as a discrete block with nothing else in it. Very tidy style when you're doing algorithm intensive work.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #149 on: June 26, 2017, 08:39:31 am »
I asked a software friend how they would do it. First reply was to put an "ORDERED BY" clause on the SQL query used to get the items.
That's scary on so many levels  :scared:

In an FPGA, I'd do it one of two ways depending on the required clock speed and latency.

To do it in a single cycle, I'd make use of VHDL variables, and translate your 'software mindset' example more or less directly.

If that method ended up too slow to meet the required fmax, then it would need to be pipelined. On the first clock, perform three of the compare/swap operations, store the intermediate results in internal registers, and set a flag. Then, on the second, perform the other three compare/swaps, assign the final result to the outputs, and clear the flag again.

To pipeline speed this one up, here is how I would do it:

(a)
Compare all inputs with each other and generate 4 sets of 2 bit selection flags/words.
--- and ---  Store all 4 inputs in D-flipflop registers.

(b) The output of all 4 D-flipflop registers would feed 4 x  4:1 mux selection units, each unit receiving the 2 bit selection flags generating 4 sorted outputs.

In a basic level, (a) can be done with 4x4 'if/else' statements generating the 4 sets of 2 bit selection registers, + 4 temporary storage registers.
(b) can be done with 4 x 'case' or 'if' statements creating the 4 sorted output registers, though there are better more compact advance coding methods to achieve the same results, this would just be simple sit in your face.

With Altera FPGAs, doing it this way, with 4 inputs, up to 16 bits each, sorted to 4 outputs, your sorts will be delayed by 2 clocks instead of 1, but this would achieve the best reasonable fmax & you can feed a new 4x number set every single clock.  To achieve the best fmax with 32 bit numbers, or sorting more than 4 16 bit numbers will require multi stepped pipeline breaking down the magnitude of the numbers then even the mux selection feed of sorted result will require a piped multiple step clock due to the size of Altera's logic cells where the FMAX seems to deteriorate extensively with some operations squeezing more than a 2x32 bit comparison, or even mux selection per clock.

(NOTE, this is not an example of clean coding, I chose this strategy based on experience with Altera's Quartus knowing that the fitter will synthesize this code for top FMAX, not for the tightest possible gate count, and, I know there are many other methods to achieve the same results.)

I'm sure a hardwired ASIC could do much larger magnitude sorts at full speed in a single clock & the VHDL/Verilog code would be down to the few lines described a few posts above.
« Last Edit: June 26, 2017, 09:01:49 am by BrianHG »
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #150 on: June 26, 2017, 09:30:46 am »
To pipeline speed this one up, here is how I would do it:

(a)
Compare all inputs with each other and generate 4 sets of 2 bit selection flags/words.
--- and ---  Store all 4 inputs in D-flipflop registers.

(b) The output of all 4 D-flipflop registers would feed 4 x  4:1 mux selection units, each unit receiving the 2 bit selection flags generating 4 sorted outputs.

I think that's a better algorithm, thanks.

My method requires three comparators on the first cycle, three more on the second cycle, and the inputs to some depend on the outputs of others, so there's an extra propagation delay to consider, which might limit fmax.

Your method also requires six comparators, but all their inputs are known at the start of the first cycle, so they can operate faster.

You also require multiplexers, but I'm willing to bet they're faster than logical comparators.

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #151 on: June 26, 2017, 10:02:10 am »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #152 on: June 26, 2017, 11:32:10 am »
A purely combinatorial version of the sorting problem in Verilog. Code actually simulated against random numbers and it works.

Because it's purely combinatorial one hopes that a decent synthesizer would mush this down to the minimum possible number of gates. If you want to count up discrete circuit elements it's 12 8-bit comparators, 12 1-bit adders, 16 2-bit comparators, 128 2-input AND gates and 8 4-bit OR gates.



Code: [Select]
module sorter (input wire [7:0] A, B, C, D, output wire [7:0] E, F, G, H);

wire AgtB = (A > B);
wire AgtC = (A > C);
wire AgtD = (A > D);
wire [1:0] Apos = (AgtB + AgtC + AgtD); // population count of how many other inputs this input is greater than

wire BgtA = (B > A);
wire BgtC = (B > C);
wire BgtD = (B > D);
wire [1:0] Bpos = (BgtA + BgtC + BgtD);

wire CgtA = (C > A);
wire CgtB = (C > B);
wire CgtD = (C > D);
wire [1:0] Cpos = (CgtA + CgtB + CgtD);

wire DgtA = (D > A);
wire DgtB = (D > B);
wire DgtC = (D > C);
wire [1:0] Dpos = (DgtA + DgtB + DgtC);

// For all you VHDL-only crowd the {8{aBit}} 'widens' the single bit to 8 bits
assign E = A & {8{Apos==3}} | B & {8{Bpos==3}} | C & {8{Cpos==3}} | D & {8{Dpos==3}};
assign F = A & {8{Apos==2}} | B & {8{Bpos==2}} | C & {8{Cpos==2}} | D & {8{Dpos==2}};
assign G = A & {8{Apos==1}} | B & {8{Bpos==1}} | C & {8{Cpos==1}} | D & {8{Dpos==1}};
assign H = A & {8{Apos==0}} | B & {8{Bpos==0}} | C & {8{Cpos==0}} | D & {8{Dpos==0}};

endmodule

Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #153 on: June 26, 2017, 11:35:51 am »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.

Wouldn't it have been quicker to write "Wave magic wand." or "Assign task to minion."?  :)
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #154 on: June 26, 2017, 11:41:41 am »
Spent a night watching Doctorr Who and fiddling with code.

All solutions are single-cycle, and outputs are registered, constrained for 200MHz. Results:

1) Bubble sort 
   96.833 MHZ
   10.327 ns
   124 LUTs

2) A bit like a shell sort -
  148.65 MHZ
  6.727ns
  105 LUTs


3) H/W optimized design (six tests to index a lookup table, that is then used to MUX the outputs), as per NorthGuy -
   234.19 MHz 
   4.027ns
   61 LUTs
 
So with the last design being twice as fast, and well under half the size, but took 4x longer to write :-)
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #155 on: June 26, 2017, 11:49:02 am »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.

Wouldn't it have been quicker to write "Wave magic wand." or "Assign task to minion."?  :)
No, it lets the synthesizer deal with the problem. You might be surprised by the results.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #156 on: June 26, 2017, 12:06:06 pm »
Spent a night watching Doctorr Who and fiddling with code.

All solutions are single-cycle, and outputs are registered, constrained for 200MHz. Results:

1) Bubble sort 
   96.833 MHZ
   10.327 ns
   124 LUTs

2) A bit like a shell sort -
  148.65 MHZ
  6.727ns
  105 LUTs


3) H/W optimized design (six tests to index a lookup table, that is then used to MUX the outputs), as per NorthGuy -
   234.19 MHz 
   4.027ns
   61 LUTs
 
So with the last design being twice as fast, and well under half the size, but took 4x longer to write :-)

The 2 stage HW optimized recommendation was not NorthGuy, it was me BrianHG...
As for the longer writes of my optimized designs, after making a 1080p video mixers and filters on really old slow Cyclone 1 devices a decade ago with a buggy crashing Quartus at the time, and slow compiles, you could imagine my frustrations.  But getting such old FPGAs running 2 channel 30 bit color at 148.5MHz with simple DDR ram, you better believe the ingenious compact chunks of Verilog I created was as compact & as fast as can be without having to resort to AHDL and no special Altera functions other than the PLL clock function block and their pipeline multiply/add and dual-port ram mega-functions.
« Last Edit: June 26, 2017, 12:28:23 pm by BrianHG »
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #157 on: June 26, 2017, 12:09:45 pm »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.

Wouldn't it have been quicker to write "Wave magic wand." or "Assign task to minion."?  :)
No, it lets the synthesizer deal with the problem. You might be surprised by the results.

I doubt the synthesizer is going to "write a VHDL function" for you. Sounds like the Montgomery Scott solution - [Fx: pick up mouse, use as microphone] "Computer: write me a VHDL function that sorts a variable number of 8 bit numbers, and pour me a nice single malt in the replicator."
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #158 on: June 26, 2017, 02:13:10 pm »
To pipeline speed this one up, here is how I would do it:

(a)
Compare all inputs with each other and generate 4 sets of 2 bit selection flags/words.
--- and ---  Store all 4 inputs in D-flipflop registers.

(b) The output of all 4 D-flipflop registers would feed 4 x  4:1 mux selection units, each unit receiving the 2 bit selection flags generating 4 sorted outputs.


About pipelining.

The speed of combinatorial logic depends on the number of layers. Simple design has only one layer:

IN->LUT->OUT

Of course, there may be many parallel paths like that, but all LUTs are fed directly from the input. This makes it the fastest.

Then you introduce LUTs which depends on the values produced by other LUTs, like this:

IN->LUT->LUT->OUT

Here you have two layers of LUTs. You need to wait until the LUTs of the first layer settle and provide stable outputs to the LUTs of the second layer. Then you must wait for the LUTs of the second layer. Therefore, it takes longer. Each layer adds roughly 0.7ns on Xilinx.

The design we're discussing has 4 layers:

IN->LUT->LUT->LUT->LUT->OUT

Only the longest path affects the overall speed. For example, in this design there's a shorter path which goes from input to the final MUX. It only has one LUT. It could be done faster, but the presence of longer paths don't let the design run faster. The speed is roughly determined by the number of layers on the longest path.

Any combinatorial design can be pipelined.

You don't do it as AndyC suggested by splitting things which already can run in parallel. You do it by inserting flip-flops between combinatorial layers:

IN->LUT->LUT->FF->LUT->LUT->OUT

Now the clock doesn't need to wait for all four layers to complete. Once two layers are done, the flip-flop can clock and remember the intermediary result. On the next clock, the next two layers of LUTs will finish the job. You turned 4-layer design into 2-layer design, but now there's one clock delay.

One flip-flop must be inserted in every path, be it a simple wire or a LUT.

To maximize the clock speed, you need to minimize the number of layers. This can be done by inserting flip-flops exactly in the middle of LUT chain. In the example above, two layers go before the flip-flop and two layers go after it.

You don't do it as BrianHG did:

IN->LUT->LUT->LUT->FF->LUT->OUT

In his design, he put 3 layers (2 layers of comparison and one layer to generate MUX inputs) before the flip-flops, and only one layer (MUX) after the flip-flop. If you do this, the first stage will be 3-layer design and the second stage will be 1-layer design. Since they're clocked by the same clock, the overall design is still 3-layer. It is faster than 4-layer design, but it is slower than 2-layer design.

To get 2 layer design you need this:

IN->LUT->LUT->FF->LUT->LUT->OUT

Which means the 2 layers of comparisons go before the flip-flop, and everything else goes after, as this:

Stage 1. 6 bits of comparison results are saved using 6 flip-flops. Since flip-flops must go into every path, we also need 32 flip-flop to save the original inputs.

Stage 2. MUX input is generated from comparison results (one layer) and MUX selects the appropriate input (second layer).

This produces fast 2-layer design.

We can pipeline even further:

IN->LUT->FF->LUT->FF->LUT->FF->LUT->OUT

Now we've got one-layer design, which is as fast as it gets, but you need to wait 3 extra clocks to get the result. Also, this will be tedious to program - you'll have to pipeline comparison operations.

« Last Edit: June 26, 2017, 02:16:33 pm by NorthGuy »
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #159 on: June 26, 2017, 02:24:26 pm »
You don't do it as AndyC suggested by splitting things which already can run in parallel.

Just for the sake of clarity, what I had in mind was an implementation of the 'bubble sort' method, not the 'rank-then-multiplex method':

- on the first clock, perform the first three compare/swap operations (a-b, b-c, c-d). The outcome of each of these depends on the previous operation, so it takes 3 levels' worth of delay time

- on the second clock, perform the second set of three compare/swaps (a-b, b-c, a-b) on the intermediate results which were stored after the first clock.

The overall effect is to split a logical operation that would have taken 6 levels' worth of delay, and splits it into two operations each of which takes only 3. It would, of course, be possible to split this into a 6 stage pipe, each of which does just one compare/swap, and that might not be a bad implementation at all if you don't mind the latency or storage requirement.

That's not splitting things that can run in parallel... is it?

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #160 on: June 26, 2017, 02:42:33 pm »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.

Wouldn't it have been quicker to write "Wave magic wand." or "Assign task to minion."?  :)
No, it lets the synthesizer deal with the problem. You might be surprised by the results.

I doubt the synthesizer is going to "write a VHDL function" for you. Sounds like the Montgomery Scott solution - [Fx: pick up mouse, use as microphone] "Computer: write me a VHDL function that sorts a variable number of 8 bit numbers, and pour me a nice single malt in the replicator."
Duhhu  :palm: .  You are supposed to write the VHDL function yourself but let the synthesizer deal with the actual implementation.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #161 on: June 26, 2017, 02:54:16 pm »
Regarding sorting: write a VHDL function with a variable number of inputs which does the sorting in a for-loop.

Wouldn't it have been quicker to write "Wave magic wand." or "Assign task to minion."?  :)
No, it lets the synthesizer deal with the problem. You might be surprised by the results.

I doubt the synthesizer is going to "write a VHDL function" for you. Sounds like the Montgomery Scott solution - [Fx: pick up mouse, use as microphone] "Computer: write me a VHDL function that sorts a variable number of 8 bit numbers, and pour me a nice single malt in the replicator."
Duhhu  :palm: .  You are supposed to write the VHDL function yourself but let the synthesizer deal with the actual implementation.

Indeed one is, but you just waved your hand and regally said 'Let it be done', that's what I'm poking fun at.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #162 on: June 26, 2017, 03:19:11 pm »
Spent a night watching Doctorr Who and fiddling with code.

All solutions are single-cycle, and outputs are registered, constrained for 200MHz. Results:

1) Bubble sort 
   96.833 MHZ
   10.327 ns
   124 LUTs

2) A bit like a shell sort -
  148.65 MHZ
  6.727ns
  105 LUTs


3) H/W optimized design (six tests to index a lookup table, that is then used to MUX the outputs), as per NorthGuy -
   234.19 MHz 
   4.027ns
   61 LUTs
 
So with the last design being twice as fast, and well under half the size, but took 4x longer to write :-)
I just ran this VHDL software approach bubble sort through the Xilinx synthesizer using 4 inputs each 8 bits wide:
https://stackoverflow.com/questions/42420983/bubble-sort-in-vhdl
Result: 73 LUTs when optimised for speed and 70 LUTs when optimised for area (Spartan6)

The result speaks for itself. The synthesizer does a way better job then off-the-cuff hardware-like implementations in HDL so just describe the problem and let the synthesizer deal with it. These discussions remind me of the endless C versus assembly arguments.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #163 on: June 26, 2017, 04:05:10 pm »
I just ran this VHDL software approach bubble sort through the Xilinx synthesizer using 4 inputs each 8 bits wide:
https://stackoverflow.com/questions/42420983/bubble-sort-in-vhdl
Result: 73 LUTs when optimised for speed and 70 LUTs when optimised for area (Spartan6)

What clock speed are you getting with this?
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #164 on: June 26, 2017, 04:08:56 pm »
I just ran this VHDL software approach bubble sort through the Xilinx synthesizer using 4 inputs each 8 bits wide:
https://stackoverflow.com/questions/42420983/bubble-sort-in-vhdl
Result: 73 LUTs when optimised for speed and 70 LUTs when optimised for area (Spartan6)
What clock speed are you getting with this?
That depends entirely on the FPGA so I didn't include that.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #165 on: June 26, 2017, 04:28:26 pm »
That depends entirely on the FPGA so I didn't include that.

Please tell us.

LUTs also depend on FPGA. Spartan-6 has 6-input LUTs. Others have 4-input LUTs, so you would need lot more of them for the same design.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #166 on: June 26, 2017, 05:00:56 pm »
That depends entirely on the FPGA so I didn't include that.
Please tell us.

LUTs also depend on FPGA. Spartan-6 has 6-input LUTs. Others have 4-input LUTs, so you would need lot more of them for the same design.
On a Spartan6 speed grade 2 device I can go slightly over 100MHz (while making sure all paths are constrained by adding extra input and output registers). If I enable 'register balancing' (more or less automatic pipelining) I can get it to run at over 400MHz. Both frequencies come from place&routed designs.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3893
  • Country: 00
  • STM32, STM8, AVR, 8051
Re: Learning FPGAs: wrong approach?
« Reply #167 on: June 26, 2017, 05:39:06 pm »
I am currently reading this tutorial, which is making a lot more sense to me so far...

Thank you very much for that book. Might be really helpful for me, a total dumb CPLD/FPGA beginner that spent all of his previous life with sequential MCUs.  :)
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #168 on: June 26, 2017, 05:40:02 pm »
On a Spartan6 speed grade 2 device I can go slightly over 100MHz (while making sure all paths are constrained by adding extra input and output registers). If I enable 'register balancing' (more or less automatic pipelining) I can get it to run at over 400MHz. Both frequencies come from place&routed designs.

This is a similar result to what hamster_nz have posted. The "hardware mindset" produces about 2x speed for combinatorial logic compare to the "software mindset" optimized with tools. This is about the same speed difference as the difference between Xilinx UltraScale+ and Spartan-6.

I'm surprised that the tools didn't do a better job. They're taking so much time from the code to bitstream. What the hell are they doing all this time? I expected their optimizations to be nearly perfect.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #169 on: June 26, 2017, 05:49:31 pm »
On a Spartan6 speed grade 2 device I can go slightly over 100MHz (while making sure all paths are constrained by adding extra input and output registers). If I enable 'register balancing' (more or less automatic pipelining) I can get it to run at over 400MHz. Both frequencies come from place&routed designs.
This is a similar result to what hamster_nz have posted. The "hardware mindset" produces about 2x speed for combinatorial logic compare to the "software mindset" optimized with tools. This is about the same speed difference as the difference between Xilinx UltraScale+ and Spartan-6.
Without knowing which FPGA Hamster_nz targeted and what synthesis settings he used you can't make this comparison. So where do you get a 2x speed improvement from? Also 400MHz is more than 234MHz so I'd say the 'software approach' is ahead for now.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #170 on: June 26, 2017, 06:36:23 pm »
Without knowing which FPGA Hamster_nz targeted and what synthesis settings he used you can't make this comparison. So where do you get a 2x speed improvement from?

Whatever he used was the same FPGA and he's got roughly 2x difference. Your numbers are similar to his, and why wouldn't they be - you did the same thing.

Also 400MHz is more than 234MHz so I'd say the 'software approach' is ahead for now.

As I explained few posts ago, you can pipeline any pure combinatorial design.

The speed of the design depends on the number of combinatorial layers. You can either run all layers in a single clock - then your clock speed get limited. Or you can pipeline the layers (by inserting flip-flops between them). If completely pipelined, the clock speed will be roughly the same for any design, but it'll be one extra clock delay for every combinatorial layer you remove by pipelining.

It is meaningless to compare pipelined design with purely combinatorial design in terms of clock speed (or in terms of clock cycles for that matter).

 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #171 on: June 26, 2017, 07:06:17 pm »
The speed of the design depends on the number of combinatorial layers. You can either run all layers in a single clock - then your clock speed get limited. Or you can pipeline the layers (by inserting flip-flops between them). If completely pipelined, the clock speed will be roughly the same for any design, but it'll be one extra clock delay for every combinatorial layer you remove by pipelining.

It is meaningless to compare pipelined design with purely combinatorial design in terms of clock speed (or in terms of clock cycles for that matter).

It would be helpful if you didn't use 'speed' for both 'latency' and 'throughput', what you're trying to say would be much clearer if you used the two separate terms.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #172 on: June 26, 2017, 07:09:27 pm »
Spent a night watching Doctorr Who and fiddling with code.

All solutions are single-cycle, and outputs are registered, constrained for 200MHz. Results:

1) Bubble sort 
   96.833 MHZ
   10.327 ns
   124 LUTs

2) A bit like a shell sort -
  148.65 MHZ
  6.727ns
  105 LUTs


3) H/W optimized design (six tests to index a lookup table, that is then used to MUX the outputs), as per NorthGuy -
   234.19 MHz 
   4.027ns
   61 LUTs
 
So with the last design being twice as fast, and well under half the size, but took 4x longer to write :-)
I just ran this VHDL software approach bubble sort through the Xilinx synthesizer using 4 inputs each 8 bits wide:
https://stackoverflow.com/questions/42420983/bubble-sort-in-vhdl
Result: 73 LUTs when optimised for speed and 70 LUTs when optimised for area (Spartan6)

The result speaks for itself. The synthesizer does a way better job then off-the-cuff hardware-like implementations in HDL so just describe the problem and let the synthesizer deal with it. These discussions remind me of the endless C versus assembly arguments.

TLDR: Can you check that the results is actually a LUT count, and not occupied slices count?

Really interesting! Your results literally kept me awake at night.... :)

A 4-element bubble sort is six identical compare-then-maybe-swap stages. This requires an 8-bit comparison and two 2:1 8-bit MUXes - around 4+2*8 = 20 slices. That checks out with my numbers, as 124 is divisible by 6. The second method only uses five of these stages, hence it uses 5/6th the resources.

Performance-wise the critical path of the bubble sort is through all six compare-then-maybe-swap stages, and in my second method it is only four stages, hence the second method clocking around 50% faster.

The final method uses six 8-bit compares, a 32x8-bit memory, and four 8-bit 4:1 MUXes, so should use around 6*4+8+32 = 64 LUTs, It gets its efficiency by having the pre-computed (and somewhat error prone) values in the 32x8 memory. It removes some of the work required and everything fits nicely with a LUT-6 architecture. As the critical path is only thorough a comparisons and two LUTs it should be about 3x faster than the bubble sort (as it may well be, if I constrain it harder).

If your method is a bubble sort (and I don't doubt it is), and does use 73 LUTs (which I slightly doubt), then it has taken less than 11 LUTs to do what should take at least 20, and I want to know why!

If it is a slice count, then the LUT count it is most likely around the 120 number that I would expect, and my universe is back in balance, and I will sleep well.

The performance is also pretty good for what is a generation older silicon, but not that good that I expect a bug.
« Last Edit: June 26, 2017, 07:12:00 pm by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #173 on: June 26, 2017, 07:33:20 pm »
Actually my earlier LUT number is for an Artix7. Somehow ISE didn't catch I wanted to use a Spartan 6! The other numbers (speed) are for the Spartan6 design. The Spartan 6 design uses 79 Slice LUTs and occupies 33 slices (optimised for speed). I think your reasoning goes off the trail because the synthesizer turns the problem into logic equations which are then minimized keeping the architecture of the FPGA in mind. This means that some of the hardware you describe is probably combined in a way you can't see when designing 'in hardware'. I think it is very similar to a C compiler optimising for pre-fetching and caching.
« Last Edit: June 26, 2017, 07:34:53 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #174 on: June 26, 2017, 07:48:22 pm »
I just ran this VHDL software approach bubble sort through the Xilinx synthesizer using 4 inputs each 8 bits wide:
https://stackoverflow.com/questions/42420983/bubble-sort-in-vhdl
I have a question about that code.

This:-
Code: [Select]
        if rising_edge(clk) then
            for j in bubble'LEFT to bubble'RIGHT - 1 loop
                for i in bubble'LEFT to bubble'RIGHT - 1 - j loop
                    if unsigned(var_array(i)) > unsigned(var_array(i + 1)) then
                        temp := var_array(i);
                        var_array(i) := var_array(i + 1);
                        var_array(i + 1) := temp;
                    end if;
                end loop;
            end loop;
            sorted_array <= var_array;
        end if;

unfolds into multiple iterations (with different array indexes) of this, right?
Code: [Select]
if unsigned(var_array(0)) > unsigned(var_array(1)) then
                        temp := var_array(0);
                        var_array(0) := var_array(1);
                        var_array(1) := temp;

So we have a comparator who's output determines whether the two array entries are either 1. swapped, or 2. left alone. This is all happening during one clock cycle, and the ':=' means that the operation occurs immediately ie. the logic is not clocked but simply runs as fast as it can, right? What stops the the values in temp, var_array(0) and var_array(1) from continuously cycling around until the comparator changes state?

     
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #175 on: June 26, 2017, 08:13:31 pm »
It would be helpful if you didn't use 'speed' for both 'latency' and 'throughput', what you're trying to say would be much clearer if you used the two separate terms.

Sorry for the confusing. I'll try to re-write in your terms.

When running in pure combinatorial form (in one-clock cycle), a design with more combinatorial layers will require longer clock period and lower clock frequency. Thus, it will have lower throughput and longer latency. The latency will be equal to one clock period.

If fully pipelined, any design will have the same maximum clock frequency and the same throughput. However, a design with more combinatorial layers will have longer latency. Its latency will be equal to the number of combinatorial layers multiplied by clock period.

Is this more understandable?
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #176 on: June 26, 2017, 09:04:07 pm »
What stops the the values in temp, var_array(0) and var_array(1) from continuously cycling around until the comparator changes state?
It's worth taking a moment to re-iterate: VHDL code is *not* a sequence of instructions that are executed one at a time in order by the target device.

When you write an algorithm using variables, think of it as saying "Hey, compiler! I want you to synthesize some logic for me. Here's a method which describes what outputs I want for a given set of inputs. Now *you* go off and work out the best set of logic gates to give me the outputs I want, OK?"

All that this nested loop is doing, is providing a way - some way - of determining what the outputs should be for a given set of inputs. The compiler executes the loops, works out what the eventual relationship will be between input and output on a given clock edge, then programs this into look-up tables.

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #177 on: June 26, 2017, 09:16:53 pm »
Actually my earlier LUT number is for an Artix7. Somehow ISE didn't catch I wanted to use a Spartan 6! The other numbers (speed) are for the Spartan6 design. The Spartan 6 design uses 79 Slice LUTs and occupies 33 slices (optimised for speed). I think your reasoning goes off the trail because the synthesizer turns the problem into logic equations which are then minimized keeping the architecture of the FPGA in mind. This means that some of the hardware you describe is probably combined in a way you can't see when designing 'in hardware'. I think it is very similar to a C compiler optimising for pre-fetching and caching.
Can you post/PM me the code you are using?

The code I saw  Stack Exchange only made one pass of the items per clock cycle, so to sort four items would take three cycles.

It would also explain the low LUT usage
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #178 on: June 26, 2017, 09:30:41 pm »
Actually my earlier LUT number is for an Artix7. Somehow ISE didn't catch I wanted to use a Spartan 6! The other numbers (speed) are for the Spartan6 design. The Spartan 6 design uses 79 Slice LUTs and occupies 33 slices (optimised for speed). I think your reasoning goes off the trail because the synthesizer turns the problem into logic equations which are then minimized keeping the architecture of the FPGA in mind. This means that some of the hardware you describe is probably combined in a way you can't see when designing 'in hardware'. I think it is very similar to a C compiler optimising for pre-fetching and caching.
Can you post/PM me the code you are using?

The code I saw  Stack Exchange only made one pass of the items per clock cycle, so to sort four items would take three cycles.

It would also explain the low LUT usage

Here it is but it uses a nested loop so it seems to me it is doing a full bubble-sort.

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;

package array_type is
    type bubble is array (0 to 3) of unsigned(7 downto 0);
end package;

library ieee;
use ieee.std_logic_1164.all;
use work.array_type.all;

entity bubblesort is
    port (
        signal clk:             in  std_logic;
        signal reset:           in  std_logic;
        signal in_array_in:        in  bubble;
        signal sorted_array_out:    out bubble
    );
end entity;


architecture foo of bubblesort is
    use ieee.numeric_std.all;

--signals to allow optimal routing
    signal in_array: bubble;
    signal sorted_array: bubble;

begin


BSORT:
    process (clk)
        variable temp:      unsigned (7 downto 0);
        variable var_array:     bubble;       
    begin

--move inside if rising_edge... to catch the whole thing inside the clock constraint and
--not depend on routing delays between input & output pads.
in_array<=in_array_in;
sorted_array_out <=sorted_array;
--

        var_array := in_array;
        if rising_edge(clk) then

            for j in bubble'LEFT to bubble'RIGHT - 1 loop
                for i in bubble'LEFT to bubble'RIGHT - 1 - j loop
                    if var_array(i) > var_array(i + 1) then
                        temp := var_array(i);
                        var_array(i) := var_array(i + 1);
                        var_array(i + 1) := temp;
                    end if;
                end loop;
            end loop;
            sorted_array <= var_array;
        end if;
    end process;
end architecture foo;
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #179 on: June 26, 2017, 09:46:57 pm »
Am I wrong (I'm a Veriliog guy nit VHDL), but isn't this bubble sort sorting 4 variables 'array (0 to 3)' only 3 bit 'of unsigned(7 downto 0)' ?
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #180 on: June 26, 2017, 09:50:11 pm »
Am I wrong (I'm a Veriliog guy nit VHDL), but isn't this bubble sort sorting 4 variables 'array (0 to 3)' only 3 bit 'of unsigned(7 downto 0)' ?
No, it is sorting an array with 4 elements where each element is an 8 bit unsigned int.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: BrianHG

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #181 on: June 27, 2017, 12:38:57 am »
Am I wrong (I'm a Veriliog guy nit VHDL), but isn't this bubble sort sorting 4 variables 'array (0 to 3)' only 3 bit 'of unsigned(7 downto 0)' ?
No, it is sorting an array with 4 elements where each element is an 8 bit unsigned int.
Sorry, I'm used to seeing 255 downto 0, as in my default internal ram-buss size.  It's been over 5 years since I've done any HDL.
Or in Verilog, I skimp out and just use wire[255:0], or, reg[255:0] for a single 256 bit word/buss, or just 'integer' and let the compiler workout how many bits it needs to be to finalize the logic...
Back when I started in 2004, Quartus' internal compiler was very crappy in just decoding a buss & at the time, as well as crashing with anything too complex, they recommended using a third party HDL compiler with Quartus, or to learn AHDL, Altera hardware definition language.  This is why my coding style leans towards more like 'Assembly' rather than letting the compiler work for you like 'C' coding.
« Last Edit: June 27, 2017, 12:46:43 am by BrianHG »
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #182 on: June 27, 2017, 02:37:22 am »
It's worth taking a moment to re-iterate: VHDL code is *not* a sequence of instructions that are executed one at a time in order by the target device.
Yes, and that's where I was confused because there seemed to be a circular assignment. Now I can see that while ':=' assignments are immediate, statements using them are executed sequentially by the compiler as it builds up the logical relationship between them.

Quote
All that this nested loop is doing, is providing a way - some way - of determining what the outputs should be for a given set of inputs. The compiler executes the loops, works out what the eventual relationship will be between input and output on a given clock edge, then programs this into look-up tables.
Right. So after examining all the statements sequentially it figures out what logic is required to bubble sort the entire array in a single clock cycle?
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Learning FPGAs: wrong approach?
« Reply #183 on: June 27, 2017, 06:26:43 am »
Right. So after examining all the statements sequentially it figures out what logic is required to bubble sort the entire array in a single clock cycle?

Almost. It figures out what logic is required to *sort* the entire array in a single clock - not necessarily *bubble* sort.

The details of how the outputs were derived from each possible set of inputs are lost. The compiler executes the loop, builds up a table of outputs vs inputs, then uses that table to generate the necessary logic.

That's not to say it can't make use of the original code to get some hints as to how the logic might work, but it doesn't have to.

There might even be an interesting exercise here. For example, a bubble sort algorithm to sort 'n' elements has to do (n-1) compare/swap operations on the first pass, then (n-2) on the second, and (n-3) on the third, and so on. In a software implementation, shortening each subsequent pass by one element is a trivial and worthwhile optimisation. In VHDL, though, it really shouldn't make any difference at all whether this is done, because the outcome of the nested loop is exactly the same whether each subsequent pass gets shortened or not.

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #184 on: June 27, 2017, 09:35:45 am »
Actually my earlier LUT number is for an Artix7. Somehow ISE didn't catch I wanted to use a Spartan 6! The other numbers (speed) are for the Spartan6 design. The Spartan 6 design uses 79 Slice LUTs and occupies 33 slices (optimised for speed). I think your reasoning goes off the trail because the synthesizer turns the problem into logic equations which are then minimized keeping the architecture of the FPGA in mind. This means that some of the hardware you describe is probably combined in a way you can't see when designing 'in hardware'. I think it is very similar to a C compiler optimising for pre-fetching and caching.
Can you post/PM me the code you are using?

The code I saw  Stack Exchange only made one pass of the items per clock cycle, so to sort four items would take three cycles.

It would also explain the low LUT usage

Here it is but it uses a nested loop so it seems to me it is doing a full bubble-sort.

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;

package array_type is
    type bubble is array (0 to 3) of unsigned(7 downto 0);
end package;

library ieee;
use ieee.std_logic_1164.all;
use work.array_type.all;

entity bubblesort is
    port (
        signal clk:             in  std_logic;
        signal reset:           in  std_logic;
        signal in_array_in:        in  bubble;
        signal sorted_array_out:    out bubble
    );
end entity;


architecture foo of bubblesort is
    use ieee.numeric_std.all;

--signals to allow optimal routing
    signal in_array: bubble;
    signal sorted_array: bubble;

begin


BSORT:
    process (clk)
        variable temp:      unsigned (7 downto 0);
        variable var_array:     bubble;       
    begin

--move inside if rising_edge... to catch the whole thing inside the clock constraint and
--not depend on routing delays between input & output pads.
in_array<=in_array_in;
sorted_array_out <=sorted_array;
--

        var_array := in_array;
        if rising_edge(clk) then

            for j in bubble'LEFT to bubble'RIGHT - 1 loop
                for i in bubble'LEFT to bubble'RIGHT - 1 - j loop
                    if var_array(i) > var_array(i + 1) then
                        temp := var_array(i);
                        var_array(i) := var_array(i + 1);
                        var_array(i + 1) := temp;
                    end if;
                end loop;
            end loop;
            sorted_array <= var_array;
        end if;
    end process;
end architecture foo;

Got back home and tested it, using the same testbench - a 32-bit counter feeding the inputs, outputting to pins.

With the Vivado default Strategy & PerfOprimized_High:
Code: [Select]
1. Utilization by Hierarchy
---------------------------

+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
|    Instance    |     Module     | Total LUTs | Logic LUTs | LUTRAMs | SRLs | FFs | RAMB36 | RAMB18 | DSP48 Blocks |
+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
| top_sort_4     |          (top) |        125 |        125 |       0 |    0 |  96 |      0 |      0 |            0 |
|   (top_sort_4) |          (top) |          1 |          1 |       0 |    0 |  64 |      0 |      0 |            0 |
|   uut          | sort_4_wrapper |        124 |        124 |       0 |    0 |  32 |      0 |      0 |            0 |
|     uut        |     bubblesort |        124 |        124 |       0 |    0 |  32 |      0 |      0 |            0 |
+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
Timing is 10.563ns / 94.67 MHz (when constrained for 200MHz)

With the Vivado "Area Optimized" Strategy:
Code: [Select]
+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
|    Instance    |     Module     | Total LUTs | Logic LUTs | LUTRAMs | SRLs | FFs | RAMB36 | RAMB18 | DSP48 Blocks |
+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
| top_sort_4     |          (top) |        109 |        109 |       0 |    0 |  96 |      0 |      0 |            0 |
|   (top_sort_4) |          (top) |          1 |          1 |       0 |    0 |  64 |      0 |      0 |            0 |
|   uut          | sort_4_wrapper |        108 |        108 |       0 |    0 |  32 |      0 |      0 |            0 |
|     uut        |     bubblesort |        108 |        108 |       0 |    0 |  32 |      0 |      0 |            0 |
+----------------+----------------+------------+------------+---------+------+-----+--------+--------+--------------+
Interestingly, timing is a slightly faster 10.205 ns / 97.99 MHz (when constrained for 200MHz)

"sort_4_wrapper.vhd" just makes the interface compatible with the interface I used at the top level:

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

use work.array_type.all;


entity sort_4_wrapper is
    Port ( clk   : in STD_LOGIC;
           a_in  : in STD_LOGIC_VECTOR (7 downto 0);
           b_in  : in STD_LOGIC_VECTOR (7 downto 0);
           c_in  : in STD_LOGIC_VECTOR (7 downto 0);
           d_in  : in STD_LOGIC_VECTOR (7 downto 0);
           a_out : out STD_LOGIC_VECTOR (7 downto 0);
           b_out : out STD_LOGIC_VECTOR (7 downto 0);
           c_out : out STD_LOGIC_VECTOR (7 downto 0);
           d_out : out STD_LOGIC_VECTOR (7 downto 0));
end sort_4_wrapper;

architecture Behavioral of sort_4_wrapper is
    signal in_array_in : bubble;
    component bubblesort is
    port (
        signal clk              : in  std_logic;
        signal reset            : in  std_logic;
        signal in_array_in      : in  bubble;
        signal sorted_array_out : out bubble
    );
    end component;
    signal sorted_array_out : bubble;
begin

    in_array_in(0)   <= unsigned(a_in);
    in_array_in(1)   <= unsigned(b_in);
    in_array_in(2)   <= unsigned(c_in);
    in_array_in(3)   <= unsigned(d_in);
   
uut: bubblesort port map (
        clk              => clk,
        reset            => '0',
        in_array_in      => in_array_in,
        sorted_array_out => sorted_array_out);
    a_out  <= std_logic_vector(sorted_array_out(0));
    b_out  <= std_logic_vector(sorted_array_out(1));
    c_out  <= std_logic_vector(sorted_array_out(2));
    d_out  <= std_logic_vector(sorted_array_out(3));
end Behavioral;

So now I am at a loss - I can't recreate your results. I get exactly what my unrolled BubbleSort does (which is what I expected). Do you have any hints of what I have missed?

What are you using as the 'source' for your inputs? as mentioned, I'm just using a 32-bit counter, and the outputs just to to external pins. It was the easiest way to ensure that nothing can get optimized away...

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #185 on: June 27, 2017, 09:54:29 am »
I have I/O pins at the inputs & outputs. It might be the synthesizer from ISE14.7 is better then the one from Vivado. Last news I heard is that Vivado's synthesizer isn't quite there yet. Besides that I also placed & routed the design which gives some extra logic optimisation.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #186 on: June 27, 2017, 10:58:33 am »
I have I/O pins at the inputs & outputs. It might be the synthesizer from ISE14.7 is better then the one from Vivado. Last news I heard is that Vivado's synthesizer isn't quite there yet. Besides that I also placed & routed the design which gives some extra logic optimisation.

Humm, fully P+R under ISE with defaults, for Spartan 6 LX 4. I get 255 LUTs (see attached image).

One thing I do see as "odd" is that as written you need to add two more signals to the process's sensitivity list - it should be:

    process (clk, in_array_in, sorted_array)

I haven't made the change, but I wonder if that is the cause for our differences?
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #187 on: June 27, 2017, 11:12:57 am »
No, not to the sensitivity list but inside the 'if rising_edge...' clause so the logic is caught by the clock constraint. Otherwise the input to logic and flipflop to output routing would add extra delays.
Did you enable optimise across hierarchy / keep hierarchy? AFAIK it is off by default but it produces lesser results but it would clutter your results because they would include the counter.
« Last Edit: June 27, 2017, 11:16:01 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Learning FPGAs: wrong approach?
« Reply #188 on: June 27, 2017, 06:05:22 pm »
Almost. It figures out what logic is required to *sort* the entire array in a single clock - not necessarily *bubble* sort.
Yes, I understand that. The compiler creates logic that performs the requested function, but it decides how to do that.  The source algorithm is a bubble sort, but the logic doesn't have to look anything like a bubble sort - it just has to produce the same output as a bubble sort.

In software we think of an array as a block of memory with numbers stored in it, and sorting the array changes the order of its contents. An advantage of Bubble Sort over other algorithms is that since elements are swapped 'in place' its memory footprint can be very low. I had incorrectly assumed that the VHDL code was also sorting the array 'in place'. However it actually takes an array as input and then fills another array with the sorted data. IOW the output is a (separate) sorted version of the input. If the array data came from some memory (registers or RAM) then it could be stored back into that memory if desired, or it could be used elsewhere without affecting the original array's contents.
 
The disadvantage of Bubble Sort is that as array size increases so processing time increases exponentially. In a hardware implementation this is not necessarily true because an entire array can be sorted in one clock cycle, but (I presume) in real hardware the number of gates required will increase exponentially, which may increase latency and reduce the maximum permitted clock frequency. Also the largest array that can be sorted in 1 clock cycle is limited by the maximum number of bits that can be operated on in parallel. Large arrays would have to be stored in RAM and sorted in several passes just like in software.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #189 on: June 27, 2017, 06:34:57 pm »
Also the largest array that can be sorted in 1 clock cycle is limited by the maximum number of bits that can be operated on in parallel. Large arrays would have to be stored in RAM and sorted in several passes just like in software.

Exactly. If you wanted it to be scalable. or if you wanted to sort arrays of variable size, it would be much better to store the array in BRAM and sort it in place. You could create a specialized soft core for this (or a state machine if you will), which would do it sequentially. But it would be much much slower.

If you wanted to make it faster, you could use a more advanced software sorting algorithms, such as quicksort. Or you could come up with FPGA-friendly algorithm which manages to establish a pipeline and make bubble-sort faster. But such approach would be much more complex than sorting a small array with combinatorial logic.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #190 on: June 27, 2017, 07:33:04 pm »
Also the largest array that can be sorted in 1 clock cycle is limited by the maximum number of bits that can be operated on in parallel. Large arrays would have to be stored in RAM and sorted in several passes just like in software.

Exactly. If you wanted it to be scalable. or if you wanted to sort arrays of variable size, it would be much better to store the array in BRAM and sort it in place. You could create a specialized soft core for this (or a state machine if you will), which would do it sequentially. But it would be much much slower.

If you wanted to make it faster, you could use a more advanced software sorting algorithms, such as quicksort. Or you could come up with FPGA-friendly algorithm which manages to establish a pipeline and make bubble-sort faster. But such approach would be much more complex than sorting a small array with combinatorial logic.
One of the things I did to the sorting example I posted was adding extra buffer (flipflop) stages between the input & output. The synthesizer can use that to do the pipelining for you. IOW you can let the tools do a lot of work for you before you need to resort to getting into the nitty-gritty bits of an FPGA.

However sorting large amounts of data is better done using an iterative approach.
« Last Edit: June 27, 2017, 07:36:39 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Mattjd

  • Regular Contributor
  • *
  • Posts: 230
  • Country: us
Re: Learning FPGAs: wrong approach?
« Reply #191 on: June 27, 2017, 08:09:12 pm »
How are you going about doing the pipelining? Are you running all your in/outs through always blocks that represent a register or what?
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #192 on: June 27, 2017, 08:49:03 pm »
How are you going about doing the pipelining? Are you running all your in/outs through always blocks that represent a register or what?

I don't use Verilog. In VHDL it is very simple.

Combinatorial within one clock cycle:

Code: [Select]
process(clk)
begin
  if rising_edge(clk) then
    x <= (a+b)+c;
  end if;
end process;

Pipelined:

Code: [Select]
process(clk)
begin
  if rising_edge(clk) then
    -- stage 1
    a_plus_b <= a+b;
    c_stage_2 <= c;

    -- stage 2
    x <= a_plus_b + c_stage_2;
  end if;
end process;

Signals a_plus_b and c_stage_2 are added flip-flops.
« Last Edit: June 27, 2017, 09:02:37 pm by NorthGuy »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #193 on: June 28, 2017, 12:01:04 am »
No, not to the sensitivity list but inside the 'if rising_edge...' clause so the logic is caught by the clock constraint. Otherwise the input to logic and flipflop to output routing would add extra delays.
Did you enable optimise across hierarchy / keep hierarchy? AFAIK it is off by default but it produces lesser results but it would clutter your results because they would include the counter.
So... mystery deepens. I see a learning experience ahead for me.

I've now using ISE. I've promoted your module to be the top level, so all 32 inputs are on pins, and the outputs are registered before pins - usage is now 76 LUTs  :o :scared:

As everything except the output registers is async, the timing looks to be > 20ns for the inputs to the output registers to be valid. Not quite sure how to read the numbers in the timing report... However, registering the  inputs makes the LUT count go up dramatically to 222, with a Fmax of 89.8 MHz.

The "designed with hardware in mind" version is 76 LUTs, (with an FMAX of > 213MHz), so without trolling through the technology schematic it seems that the inputs-unregistered bubble sort version has the freedom to be optimized down to the design, but the restrictions placed on it by having registers on the inputs prevents this from occurring. (wonder why that would be?...)

So the "designed with hardware in mind" can be almost 3x smaller, and > 2x faster than the "simple bubble sort" version. It also delivers consistent performance and usage no matter how it is used.

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #194 on: June 28, 2017, 01:53:06 am »
Though not with this code, Altera's Quartus II v9 & above, just adding an extra stage of DFF, without any additional logic or deliberate pipe-lining, before or after feeding such a piece of HDL code will actually have the same effect of potentially slimming the LUT count and doubling the FMAX.  Though this may be just the way I was coding at the time, but there are features in the compiler to decompose logic & reconstruct logic to achieve the best possible FMAX both int compiler stage and the fitting/physical synthesis stage.

Darn, if I had quartus installed on one of my PC's today, I would have played with this VHDL code already and posted the results...
« Last Edit: June 28, 2017, 01:59:32 am by BrianHG »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #195 on: June 28, 2017, 03:22:15 am »
Though not with this code, Altera's Quartus II v9 & above, just adding an extra stage of DFF, without any additional logic or deliberate pipe-lining, before or after feeding such a piece of HDL code will actually have the same effect of potentially slimming the LUT count and doubling the FMAX.  Though this may be just the way I was coding at the time, but there are features in the compiler to decompose logic & reconstruct logic to achieve the best possible FMAX both int compiler stage and the fitting/physical synthesis stage.

Darn, if I had quartus installed on one of my PC's today, I would have played with this VHDL code already and posted the results...

There are a lot of undocumented tricks on how you can get great results with inference - how to cast your code 'just right' so a DSP block is inferred, with all the right pipeline registers, or so it uses block RAM, or using LUTs become shift registers rather than a chain of FFs.

The thing that annoys me is that the patterns that work are not well defined. For example, in a clocked process

 data <= memory(to_integer(unsigned(address)));

should infer a block RAM if memory is big enough, but as a general rule, anything with an expression for the array index won't:

  data <= memory(to_integer(unsigned(address)+1));

Will only infer LUTs and flip-flops. It leaves you with 'land mines' in your code:

  addr_temp := unsigned(address)+1;  -- assign address to variable
  data <= memory(to_integer(addr_temp)); -- look up address

They have big flags waving away saying "Who wrote this junk! Make me shiny!", and when you touch them your design blows up.

(the example is somewhat contrived, I would have to test to find an exact case when I can prove this to be the case, but you get the idea)
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 7738
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #196 on: June 28, 2017, 03:48:02 am »
There are a lot of undocumented tricks on how you can get great results with inference - how to cast your code 'just right' so a DSP block is inferred, with all the right pipeline registers, or so it uses block RAM, or using LUTs become shift registers rather than a chain of FFs.
That pretty much boils down what's been going on...
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #197 on: June 28, 2017, 04:04:45 am »
The thing that annoys me is that the patterns that work are not well defined. For example, in a clocked process

 data <= memory(to_integer(unsigned(address)));

should infer a block RAM if memory is big enough, but as a general rule, anything with an expression for the array index won't:

  data <= memory(to_integer(unsigned(address)+1));

On Xilinx BRAM must be clocked. You cannot calculate the address, give it to BRAM and get a combinatorial result. Therefore, when you try to do that (as in your second expression above), you will never get BRAM. You may get "distributed memory" instead, because the distributed memory can get you the combinatorial result you want.

If "address" is registered, then you may get the BRAM by removing "+1".
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #198 on: June 28, 2017, 06:01:04 am »
No, not to the sensitivity list but inside the 'if rising_edge...' clause so the logic is caught by the clock constraint. Otherwise the input to logic and flipflop to output routing would add extra delays.
Did you enable optimise across hierarchy / keep hierarchy? AFAIK it is off by default but it produces lesser results but it would clutter your results because they would include the counter.
So... mystery deepens. I see a learning experience ahead for me.

I've now using ISE. I've promoted your module to be the top level, so all 32 inputs are on pins, and the outputs are registered before pins - usage is now 76 LUTs  :o :scared:

As everything except the output registers is async, the timing looks to be > 20ns for the inputs to the output registers to be valid. Not quite sure how to read the numbers in the timing report... However, registering the  inputs makes the LUT count go up dramatically to 222, with a Fmax of 89.8 MHz.

The "designed with hardware in mind" version is 76 LUTs, (with an FMAX of > 213MHz), so without trolling through the technology schematic it seems that the inputs-unregistered bubble sort version has the freedom to be optimized down to the design, but the restrictions placed on it by having registers on the inputs prevents this from occurring. (wonder why that would be?...)

So the "designed with hardware in mind" can be almost 3x smaller, and > 2x faster than the "simple bubble sort" version. It also delivers consistent performance and usage no matter how it is used.
That is the wrong conclusion. By adding the registers you add more logic to the design. Also: did you P&R the design? There is an extra logic optimisation stage in there as well.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #199 on: June 28, 2017, 08:03:47 am »
That is the wrong conclusion. By adding the registers you add more logic to the design. Also: did you P&R the design? There is an extra logic optimisation stage in there as well.

I checked the Technology Schematic - Inputs go straight into a FF from the IBUF, outputs straight from the FF to OBUF.

And then I checked the FPGA editor, and all 32 inputs run into a slice, and in that slice they run directly into a FF's D input (the input MUX on the FF that acts as CE is set to a fixed value).

As for the 32 outputs, they are all directly from the output of a flipflop to the output buffer.

So that is all 64 FFs accounted for. All the logic is between these two sets of FFs, and no retiming has occurred.

As far as I have seen for Xilinx, the P+R optimization makes zero difference to what generic primitives are actually used - only where they are placed on the die and how they are connected (hence the name place and route). The logic of the design at that point is fixed by the Implementation step.

(And of course the timing of a design depends on how well it is P+Red due to how well it minimzes routing delays, and there are a few little corner cases like route-throughs, which do consume additional LUTs by running signals through them like a buffer to tweek timing, but resource usage can only go up during P+R, and the logical design is not transformed at all)
« Last Edit: June 28, 2017, 08:31:50 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #200 on: June 28, 2017, 10:25:46 am »
So the "designed with hardware in mind" can be almost 3x smaller, and > 2x faster than the "simple bubble sort" version. It also delivers consistent performance and usage no matter how it is used.
That is the wrong conclusion. By adding the registers you add more logic to the design. Also: did you P&R the design? There is an extra logic optimisation stage in there as well.

Nah, that conclusion is bang on, the two designs are nothing alike even when they use the same quantity of resources. The inferred design is complete crap. Just to show how different they are, here are the slowest path in each design:

Slowest path in the "outputs registered only, inferred design":

Code: [Select]
Slack (setup path):     -13.355ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               b_in<1> (PAD)
  Destination:          uut/sorted_array_out_1_6 (FF)
  Destination Clock:    clk_BUFGP rising at 0.000ns
  Requirement:          10.000ns
  Data Path Delay:      25.822ns (Levels of Logic = 25)
  Clock Path Delay:     2.492ns (Levels of Logic = 2)
  Clock Uncertainty:    0.025ns

  Clock Uncertainty:          0.025ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.050ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: b_in<1> to uut/sorted_array_out_1_6
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    P139.I               Tiopi                 0.790   b_in<1>
                                                       b_in<1>
                                                       b_in_1_IBUF
                                                       ProtoComp29.IMUX.10
    SLICE_X1Y59.C1       net (fanout=4)        1.933   b_in_1_IBUF
    SLICE_X1Y59.C        Tilo                  0.259   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o1
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o3
    SLICE_X2Y59.B5       net (fanout=1)        0.764   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o1
    SLICE_X2Y59.B        Tilo                  0.203   N30
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o1_SW2
    SLICE_X2Y59.A5       net (fanout=1)        0.222   N30
    SLICE_X2Y59.A        Tilo                  0.203   N30
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o1
    SLICE_X6Y54.C5       net (fanout=1)        1.238   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o2
    SLICE_X6Y54.C        Tilo                  0.204   uut/in_array_in[0][7]_in_array_in[1][7]_mux_1_OUT<4>
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o21
    SLICE_X4Y59.CX       net (fanout=18)       1.004   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_1_o
    SLICE_X4Y59.CMUX     Tcxc                  0.164   N28
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o5
    SLICE_X4Y59.A2       net (fanout=1)        0.624   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o1
    SLICE_X4Y59.A        Tilo                  0.203   N28
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o1_SW2
    SLICE_X7Y50.A6       net (fanout=1)        1.058   N28
    SLICE_X7Y50.A        Tilo                  0.259   uut/in_array_in[1][7]_in_array_in[2][7]_mux_4_OUT<0>
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o1
    SLICE_X7Y50.B6       net (fanout=1)        0.118   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o2
    SLICE_X7Y50.B        Tilo                  0.259   uut/in_array_in[1][7]_in_array_in[2][7]_mux_4_OUT<0>
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o2
    SLICE_X5Y52.D4       net (fanout=12)       0.987   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_4_o
    SLICE_X5Y52.D        Tilo                  0.259   uut/in_array_in[2][7]_in_array_in[1][7]_mux_5_OUT<3>
                                                       uut/Mmux_in_array_in[1][7]_in_array_in[2][7]_mux_4_OUT141
    SLICE_X6Y51.D6       net (fanout=4)        0.668   uut/in_array_in[2][7]_in_array_in[1][7]_mux_5_OUT<3>
    SLICE_X6Y51.CMUX     Topdc                 0.368   uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o1
                                                       uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o1_F
                                                       uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o1
    SLICE_X7Y49.C4       net (fanout=1)        0.513   uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o2
    SLICE_X7Y49.C        Tilo                  0.259   uut/in_array_in[2][7]_in_array_in[3][7]_mux_7_OUT<3>
                                                       uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o2
    SLICE_X7Y50.C5       net (fanout=10)       0.372   uut/in_array_in[3][7]_in_array_in[2][7]_LessThan_7_o
    SLICE_X7Y50.C        Tilo                  0.259   uut/in_array_in[1][7]_in_array_in[2][7]_mux_4_OUT<0>
                                                       uut/Mmux_in_array_in[2][7]_in_array_in[3][7]_mux_7_OUT18
    SLICE_X7Y51.D2       net (fanout=3)        0.602   uut/in_array_in[2][7]_in_array_in[3][7]_mux_7_OUT<0>
    SLICE_X7Y51.D        Tilo                  0.259   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o3
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o2
    SLICE_X6Y48.B6       net (fanout=1)        0.468   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o3
    SLICE_X6Y48.B        Tilo                  0.203   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o1
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o4
    SLICE_X6Y48.D1       net (fanout=2)        0.482   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o1
    SLICE_X6Y48.CMUX     Topdc                 0.368   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o1
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o1_F
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o1
    SLICE_X5Y48.B6       net (fanout=1)        0.607   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o2
    SLICE_X5Y48.B        Tilo                  0.259   uut/in_array_in[2][7]_in_array_in[1][7]_mux_14_OUT<3>
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o24_SW0
    SLICE_X5Y48.A5       net (fanout=1)        0.187   N24
    SLICE_X5Y48.A        Tilo                  0.259   uut/in_array_in[2][7]_in_array_in[1][7]_mux_14_OUT<3>
                                                       uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o24
    SLICE_X5Y47.C3       net (fanout=14)       0.507   uut/in_array_in[2][7]_in_array_in[1][7]_LessThan_13_o
    SLICE_X5Y47.C        Tilo                  0.259   uut/in_array_in[1][7]_in_array_in[2][7]_mux_13_OUT<0>
                                                       uut/Mmux_in_array_in[1][7]_in_array_in[2][7]_mux_13_OUT12
    SLICE_X4Y49.C6       net (fanout=3)        0.489   uut/in_array_in[1][7]_in_array_in[2][7]_mux_13_OUT<0>
    SLICE_X4Y49.CMUX     Tilo                  0.361   uut/in_array_in[0][7]_in_array_in[1][7]_mux_1_OUT<0>
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o4_G
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o4
    SLICE_X5Y51.B1       net (fanout=2)        0.643   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o1
    SLICE_X5Y51.B        Tilo                  0.259   uut/in_array_in[0][7]_in_array_in[1][7]_mux_1_OUT<3>
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o1_SW1
    SLICE_X4Y50.C5       net (fanout=2)        0.352   N19
    SLICE_X4Y50.CMUX     Tilo                  0.361   N18
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o1_G
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o1
    SLICE_X5Y49.D4       net (fanout=1)        0.424   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o2
    SLICE_X5Y49.D        Tilo                  0.259   N26
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o24_SW0
    SLICE_X7Y45.A3       net (fanout=1)        0.838   N26
    SLICE_X7Y45.AMUX     Tilo                  0.313   uut/in_array_in[0][7]_in_array_in[1][7]_mux_16_OUT<2>
                                                       uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o24
    SLICE_X7Y46.D3       net (fanout=7)        0.543   uut/in_array_in[1][7]_in_array_in[0][7]_LessThan_16_o
    SLICE_X7Y46.DMUX     Tilo                  0.313   uut/in_array_in[0][7]_in_array_in[1][7]_mux_16_OUT<6>
                                                       uut/Mmux_in_array_in[0][7]_in_array_in[1][7]_mux_16_OUT171
    OLOGIC_X12Y23.D1     net (fanout=1)        2.214   uut/in_array_in[1][7]_in_array_in[0][7]_mux_17_OUT<6>
    OLOGIC_X12Y23.CLK0   Todck                 0.803   uut/sorted_array_out_1<6>
                                                       uut/sorted_array_out_1_6
    -------------------------------------------------  ---------------------------
    Total                                     25.822ns (7.965ns logic, 17.857ns route)
                                                       (30.8% logic, 69.2% route)



Slowest path in the "outputs registered only, coded for H/W design":

Code: [Select]
Paths for end point c_out_2 (OLOGIC_X11Y2.D1), 268 paths
--------------------------------------------------------------------------------
Slack (setup path):     0.064ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               a_in<1> (PAD)
  Destination:          c_out_2 (FF)
  Destination Clock:    clk_BUFGP rising at 0.000ns
  Requirement:          10.000ns
  Data Path Delay:      12.477ns (Levels of Logic = 6)
  Clock Path Delay:     2.566ns (Levels of Logic = 2)
  Clock Uncertainty:    0.025ns

  Clock Uncertainty:          0.025ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.050ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: a_in<1> to c_out_2
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    P111.I               Tiopi                 0.790   a_in<1>
                                                       a_in<1>
                                                       a_in_1_IBUF
                                                       ProtoComp13.IMUX.25
    SLICE_X4Y58.D4       net (fanout=7)        2.715   a_in_1_IBUF
    SLICE_X4Y58.D        Tilo                  0.203   a[7]_c[7]_LessThan_3_o22
                                                       a[7]_c[7]_LessThan_3_o23
    SLICE_X5Y40.C3       net (fanout=1)        1.505   a[7]_c[7]_LessThan_3_o22
    SLICE_X5Y40.C        Tilo                  0.259   a[7]_c[7]_LessThan_3_o23
                                                       a[7]_c[7]_LessThan_3_o24
    SLICE_X5Y40.B6       net (fanout=1)        0.285   a[7]_c[7]_LessThan_3_o23
    SLICE_X5Y40.B        Tilo                  0.259   a[7]_c[7]_LessThan_3_o23
                                                       a[7]_c[7]_LessThan_3_o25
    SLICE_X7Y36.C2       net (fanout=8)        1.240   a[7]_c[7]_LessThan_3_o
    SLICE_X7Y36.C        Tilo                  0.259   BUS_0001_d[7]_wide_mux_11_OUT<5>
                                                       Mram_table41
    SLICE_X7Y14.A6       net (fanout=8)        2.389   _n0044<4>
    SLICE_X7Y14.A        Tilo                  0.259   BUS_0003_d[7]_wide_mux_10_OUT<2>
                                                       Mmux_BUS_0003_d[7]_wide_mux_10_OUT31
    OLOGIC_X11Y2.D1      net (fanout=1)        1.511   BUS_0003_d[7]_wide_mux_10_OUT<2>
    OLOGIC_X11Y2.CLK0    Todck                 0.803   c_out_2
                                                       c_out_2
    -------------------------------------------------  ---------------------------
    Total                                     12.477ns (2.832ns logic, 9.645ns route)
                                                       (22.7% logic, 77.3% route)

Designed for the H/W code beats the pants off of that fully inferred design. 6 levels of logic vs 25. And it meets the timing requirement rather than missing it by 120%

Can you supply any data to support your conclusion?
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #201 on: June 28, 2017, 01:20:05 pm »
... 6 levels of logic ...

It did the comparisons in 3 levels, but it certainly can be done in 2.
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #202 on: July 02, 2017, 04:52:04 am »
Hey, fun exercise! :-+

... The inferred design is complete crap. Just to show how different they are, here are the slowest path in each design: ...
No kidding. I didn't even try to do an inferred design for this problem. Last time I did that it gave me a headache and made my hex vision act up for days.  :scared:  :o

Anyways, below the timings of my attempt at a hardware-targeted design. Worst path:

Code: [Select]
================================================================================
 Timing constraint: TS_clk_400 = PERIOD TIMEGRP "clk_400" TS_GCLK / 4 HIGH 50% INPUT_JITTER 0.2 ns;
 For more information, see Period Analysis in the Timing Closure User Guide (UG612).
  416 paths analyzed, 278 endpoints analyzed, 0 failing endpoints
  0 timing errors detected. (0 setup errors, 0 hold errors, 0 component switching limit errors)
  Minimum period is   2.454ns.
 --------------------------------------------------------------------------------
 
 Paths for end point sort_four/select_sort_order/mux_this_2/out_7 (SLICE_X47Y81.D5), 1 path
 --------------------------------------------------------------------------------
 Slack (setup path):     0.046ns (requirement - (data path - clock path skew + uncertainty))
   Source:               sort_four/packed_evals_to_sels/sel_2_0 (FF)
   Destination:          sort_four/select_sort_order/mux_this_2/out_7 (FF)
   Requirement:          2.500ns
   Data Path Delay:      2.357ns (Levels of Logic = 1)
   Clock Path Skew:      -0.015ns (0.297 - 0.312)
   Source Clock:         clk_400 rising at 0.000ns
   Destination Clock:    clk_400 rising at 2.500ns
   Clock Uncertainty:    0.082ns
 
   Clock Uncertainty:          0.082ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
     Total System Jitter (TSJ):  0.070ns
     Discrete Jitter (DJ):       0.147ns
     Phase Error (PE):           0.000ns
 
   Maximum Data Path at Slow Process Corner: sort_four/packed_evals_to_sels/sel_2_0 to sort_four/select_sort_order/mux_this_2/out_7
     Location             Delay type         Delay(ns)  Physical Resource
                                                        Logical Resource(s)
     -------------------------------------------------  -------------------
     SLICE_X37Y83.CQ      Tcko                  0.430   sort_four/packed_evals_to_sels/sel_2<0>
                                                        sort_four/packed_evals_to_sels/sel_2_0
     SLICE_X47Y81.D5      net (fanout=8)        1.554   sort_four/packed_evals_to_sels/sel_2<0>
     SLICE_X47Y81.CLK     Tas                   0.373   sort_four/select_sort_order/mux_this_2/out<7>
                                                        sort_four/select_sort_order/mux_this_2/Mmux_sel[1]_d[7]_wide_mux_1_OUT81
                                                        sort_four/select_sort_order/mux_this_2/out_7
     -------------------------------------------------  ---------------------------
     Total                                      2.357ns (0.803ns logic, 1.554ns route)
                                                        (34.1% logic, 65.9% route)
 
 --------------------------------------------------------------------------------

I constrained it conservatively at 400 MHz, and with a decent amount of clock uncertainty. Did several runs, and it easily meets timing. And based on some things I noticed (el stupido routing decisions by PAR) I'm guessing that with some extra constraints it would probably do around 425 MHz. Still have margin left on the clock uncertaintly as well....

This is using ISE 14.7 and targeting a spartan-6: xc6slx45-2csg324. The design is pipelined, 3 stages.

Incidentally, do you have the project settings? Either .xise file or empty project .zip will work. Just to make sure that I am not using different settings that will give skewed results.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #203 on: July 02, 2017, 05:42:04 pm »


I am playing with old CPLD, XC9500 serie, the above PCB is a matrix-keyboard controller, nothing special but it makes me to appreciate what comes for free with CoolRunner: built-in "pullup" :D

I recycled what I happened to find at home, a few big CPLD chips, good because they are 5V tolerant, but constraints don't allow pullup/pulldown since the physical XC9500-hardware doesn't have it.

So, that's the reason why I added a big-and-long SIL pack on the PCB.
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3893
  • Country: 00
  • STM32, STM8, AVR, 8051
Re: Learning FPGAs: wrong approach?
« Reply #204 on: July 02, 2017, 05:51:15 pm »
Also have a ton of such old devices laying around, including some crazy old FPGAs. However why bothering with those non-FLASH based devices? Get yourself at least either an Altera MAX II device or Xilinx XC9500XL. The both are FLASH based and the latter ones also 5V compatible. Both cheap too.  :)
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #205 on: July 02, 2017, 07:51:05 pm »
Yup, I also have a few of XC9572 chips in PLC84 package as well as a couple of XC2C64A in smd package. Might be I will realize a second board.

What I really miss is ... a couple of Spartan2 fpga chips. They are 5V tolerant and it on some designs it makes easier than using a 5V <-> 3.3V level-shifter. I have plenty of Spartan3 and Spartan6 chips, whose IO-core is 3.3V as MAX voltage, but the last Spartan2 chip that I had ... was soldered on a Nintendo ADV adapter IO-core 5V, which I built several years ago when Spartan2 was available everywhere.

Regret ... I didn't buy more chips. :palm: :palm: :palm:
« Last Edit: July 03, 2017, 09:02:16 am by legacy »
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #206 on: July 03, 2017, 12:22:02 am »
Hey, fun exercise! :-+
...
This is using ISE 14.7 and targeting a spartan-6: xc6slx45-2csg324. The design is pipelined, 3 stages.
Fun indeed, I've access to fully licensed tools so might have a slight edge here (possibly some extra options/strategies unlocked) but I'm not running smart explorer to get the last few % out of the design and yet there appears to be a lot of slack available from the attempts so far.

ISE 14.7 xc6slx45-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 106 LUTs >440 MHz
Fully pipelined with 4 stages. 118 LUTs >540 MHz
(requires using both edges of clock)

ISE 14.7 xc7a100t-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 132 LUTs >580 MHz
Fully pipelined with 4 stages. 141 LUTs >580 MHz
(both switching limited)

Vivado X.X xc7a35t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >400 MHz

Vivado X.X xc7a100t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >410 MHz

It's known that ISE can do a better synthesis job on many designs but its orphaned for device support now and harder to use going forward. But 7 series parts are easily 50-100% faster than Spartan 6 so many designs need to be reassessed for area/speed tradeoff and can be adapted to the new Vivado synthesis at the same time. These results above are using a sort algorithm better suited for FPGA implementation but still written with a high level functional description in VHDL, so its not necessary to get down to gate level descriptions but rather knowing how to map to resources allows you to design for minimum area while still using high level constructs.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #207 on: July 03, 2017, 02:06:01 am »
Hey, fun exercise! :-+
...
This is using ISE 14.7 and targeting a spartan-6: xc6slx45-2csg324. The design is pipelined, 3 stages.
Fun indeed, I've access to fully licensed tools so might have a slight edge here (possibly some extra options/strategies unlocked) but I'm not running smart explorer to get the last few % out of the design and yet there appears to be a lot of slack available from the attempts so far.

ISE 14.7 xc6slx45-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 106 LUTs >440 MHz
Fully pipelined with 4 stages. 118 LUTs >540 MHz
(requires using both edges of clock)

ISE 14.7 xc7a100t-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 132 LUTs >580 MHz
Fully pipelined with 4 stages. 141 LUTs >580 MHz
(both switching limited)

Vivado X.X xc7a35t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >400 MHz

Vivado X.X xc7a100t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >410 MHz

It's known that ISE can do a better synthesis job on many designs but its orphaned for device support now and harder to use going forward. But 7 series parts are easily 50-100% faster than Spartan 6 so many designs need to be reassessed for area/speed tradeoff and can be adapted to the new Vivado synthesis at the same time. These results above are using a sort algorithm better suited for FPGA implementation but still written with a high level functional description in VHDL, so its not necessary to get down to gate level descriptions but rather knowing how to map to resources allows you to design for minimum area while still using high level constructs.

Wow - these are quite significant differences. I wonder what ISE knows that Vivado doesn't? 

Maybe they are using different underlying timing models... When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #208 on: July 03, 2017, 02:55:35 am »
When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.

If that was the case, then Vivado would work faster with smaller parts (e.g. the synthesis/implementation for XC7A50T would be faster than for XC7A200T), but this doesn't seem to be the case. I'd rather suspect the usual - poor design, overbloat. Vivado is just generally terribly slow.
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #209 on: July 03, 2017, 03:04:29 am »
Wow - these are quite significant differences. I wonder what ISE knows that Vivado doesn't? 

Maybe they are using different underlying timing models... When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.
My understanding is that from ISE to Vivado was a radical change of design, importantly so that the tools could continue scaling out to larger designs. ISE scales poorly when you use the larger devices at high utilisation while Vivado could run on much less memory and route with higher utilisation (the iterative routing seems to do very well).
When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.
If that was the case, then Vivado would work faster with smaller parts (e.g. the synthesis/implementation for XC7A50T would be faster than for XC7A200T), but this doesn't seem to be the case. I'd rather suspect the usual - poor design, overbloat. Vivado is just generally terribly slow.
Slower, but with less memory and its able to close timing on designs that ISE couldn't.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Learning FPGAs: wrong approach?
« Reply #210 on: July 03, 2017, 09:19:51 am »
How much memory does Vivado usually eat during the synthesis?

p.s. about computing horsepower, i9 has been already released by Intel, which means .... i7 is going to have a price-drop  :D :D :D !!!
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #211 on: July 03, 2017, 09:33:56 am »
How much memory does Vivado usually eat during the synthesis?
The peak memory use is typically during routing and Xilinx only suggest memory for the overall process rather than each stage as it would be unusual to run the stages on different machines:
https://www.xilinx.com/products/design-tools/vivado/memory.html
You can hunt down the ISE version with the wayback machine, but they're both a little optimistic and real world use it higher than their tables once you add in the other things running during a build.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Learning FPGAs: wrong approach?
« Reply #212 on: July 03, 2017, 10:39:35 am »
When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.
If that was the case, then Vivado would work faster with smaller parts (e.g. the synthesis/implementation for XC7A50T would be faster than for XC7A200T), but this doesn't seem to be the case. I'd rather suspect the usual - poor design, overbloat. Vivado is just generally terribly slow.
Slower, but with less memory and its able to close timing on designs that ISE couldn't.
When it comes to ISE getting good results it depends a lot on the placing cost tables settings whether it can meet the timing or not. With a poor setting the P&R can run for 24 hours without meeting timing while with others settings the design goes through the P&R stage is less than 10 minutes and meet all timing constraints. Unfortunately it takes trial & error to get the right placing cost table settings.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Learning FPGAs: wrong approach?
« Reply #213 on: July 03, 2017, 01:42:42 pm »
Slower, but with less memory ...

That's one of the poor decisions. Since the world migrated to 64-bit, you can have huge memory. The speed, however, doesn't progress much - my 6-year old i5 processor is only 30% slower than the best modern mass-produced Intel CPU. How stupid is it to sacrifice speed in order to get less memory usage?

I'm sure there were hundreds of bad decisions like that on different levels which made Vivado as slow as it is. It's funny that it's being marketed as Ultra-Fast.

... and its able to close timing on designs that ISE couldn't.

May be. I don't know.
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #214 on: July 04, 2017, 01:24:07 am »
When compared to ISE, Vivado seems to spend half an age doing nothing when working on small designs. I assume it is dynamically building routing/timing models for the whole die before it places/routing anything.
If that was the case, then Vivado would work faster with smaller parts (e.g. the synthesis/implementation for XC7A50T would be faster than for XC7A200T), but this doesn't seem to be the case. I'd rather suspect the usual - poor design, overbloat. Vivado is just generally terribly slow.
Slower, but with less memory and its able to close timing on designs that ISE couldn't.
When it comes to ISE getting good results it depends a lot on the placing cost tables settings whether it can meet the timing or not. With a poor setting the P&R can run for 24 hours without meeting timing while with others settings the design goes through the P&R stage is less than 10 minutes and meet all timing constraints. Unfortunately it takes trial & error to get the right placing cost table settings.
Thats not unique to ISE, Vivado suffers the same wildly variable results from the initial seeds.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Learning FPGAs: wrong approach?
« Reply #215 on: July 04, 2017, 02:07:00 am »
When it comes to ISE getting good results it depends a lot on the placing cost tables settings whether it can meet the timing or not. With a poor setting the P&R can run for 24 hours without meeting timing while with others settings the design goes through the P&R stage is less than 10 minutes and meet all timing constraints. Unfortunately it takes trial & error to get the right placing cost table settings.
Thats not unique to ISE, Vivado suffers the same wildly variable results from the initial seeds.
It is most likely unavoidable - you have to add enough randomness prevent the P+R process from falling into the same local minima all the time (a.k.a. "getting stuck in a rut"). How often this happens is most likely design dependant, and something to do with the bisection width of the design. Highly connected designs will be more likely to suffer bad placement decisions, but flowing pipelines will usually play nice.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #216 on: July 04, 2017, 03:25:22 am »
When it comes to ISE getting good results it depends a lot on the placing cost tables settings whether it can meet the timing or not. With a poor setting the P&R can run for 24 hours without meeting timing while with others settings the design goes through the P&R stage is less than 10 minutes and meet all timing constraints. Unfortunately it takes trial & error to get the right placing cost table settings.
Thats not unique to ISE, Vivado suffers the same wildly variable results from the initial seeds.
It is most likely unavoidable - you have to add enough randomness prevent the P+R process from falling into the same local minima all the time (a.k.a. "getting stuck in a rut"). How often this happens is most likely design dependant, and something to do with the bisection width of the design. Highly connected designs will be more likely to suffer bad placement decisions, but flowing pipelines will usually play nice.
I find it easier with Vivado as there are a diverse group of directives ("strategies") which can be individually applied (usually iteratively) at each stage, much more flexibility with some feeling of control and less reliance on the initial seed being lucky.
 

Offline Cerebus

  • Super Contributor
  • ***
  • Posts: 10576
  • Country: gb
Re: Learning FPGAs: wrong approach?
« Reply #217 on: July 04, 2017, 01:23:44 pm »
It is most likely unavoidable - you have to add enough randomness prevent the P+R process from falling into the same local minima all the time (a.k.a. "getting stuck in a rut"). How often this happens is most likely design dependant, and something to do with the bisection width of the design. Highly connected designs will be more likely to suffer bad placement decisions, but flowing pipelines will usually play nice.

That's a direct consequence of the underlying graph layout algorithms (graph as in vertices and edges, not squiggly lines on paper). I saw exactly the same phenomenon some years back when I was working on a network management tool that tried to draw a decent network diagram from the connectivity graph of the network. It was surprising how big a change in layout one would see from little tweaks to weightings and other parameters.
Anybody got a syringe I can use to squeeze the magic smoke back into this?
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #218 on: January 31, 2018, 01:51:13 pm »
Fun indeed, I've access to fully licensed tools so might have a slight edge here (possibly some extra options/strategies unlocked) but I'm not running smart explorer to get the last few % out of the design and yet there appears to be a lot of slack available from the attempts so far.

ISE 14.7 xc6slx45-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 106 LUTs >440 MHz
Fully pipelined with 4 stages. 118 LUTs >540 MHz
(requires using both edges of clock)

ISE 14.7 xc7a100t-2csg324
Minimum area, combinatorial only. 58 LUTs
Logical pipeline of 3 stages. 132 LUTs >580 MHz
Fully pipelined with 4 stages. 141 LUTs >580 MHz
(both switching limited)

Vivado X.X xc7a35t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >400 MHz

Vivado X.X xc7a100t-2csg324
Minimum area, combinatorial only. 76 LUTs
Logical pipeline of 3 stages. 114 LUTs >380 MHz
Fully pipelined with 4 stages. 147 LUTs >410 MHz

Wow, that's pretty damn impressive. Especially the ISE result is ... well, fast!

Quote from: Someone
It's known that ISE can do a better synthesis job on many designs but its orphaned for device support now and harder to use going forward. But 7 series parts are easily 50-100% faster than Spartan 6 so many designs need to be reassessed for area/speed tradeoff and can be adapted to the new Vivado synthesis at the same time. These results above are using a sort algorithm better suited for FPGA implementation but still written with a high level functional description in VHDL, so its not necessary to get down to gate level descriptions but rather knowing how to map to resources allows you to design for minimum area while still using high level constructs.

Does this sort algorithm have a name? I'd guess maybe a bitonic sort network, but even then 580 MHz on a spartan-6 is neat result. :) A somewhat related question, do you know of any good books or other forms of reference material where one can go and read up on the various parallel algorithms? Specifically with an eye to fpga implementation, but if there's a good compendium of handy ciruits for VLSI then that's certainly better than what I have now. I find that, as with any problem really, a large part of the job is "Pick the right data structure and algorithm / representation and possible operators". I don't mind reinventing the wheel every now and then, provided the payoff is some extra insight that can be used in future projects. But every once in a while it would be nice just to be able to browse the catalog as it were, read up on several ways to get the computation of the day done, and then pick one. Then "all" you will have to do is not fsck up the implementation. Which can be enough of a challenge already. Especially without coffee.
 

Online Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Learning FPGAs: wrong approach?
« Reply #219 on: February 01, 2018, 07:36:01 am »
Does this sort algorithm have a name? I'd guess maybe a bitonic sort network, but even then 580 MHz on a spartan-6 is neat result. :) A somewhat related question, do you know of any good books or other forms of reference material where one can go and read up on the various parallel algorithms? Specifically with an eye to fpga implementation, but if there's a good compendium of handy ciruits for VLSI then that's certainly better than what I have now.
Even working to a specific sort algorithm it still needs a lot of experience to map that efficiently to primitives, and several networks can achieve the same result:
https://en.wikipedia.org/wiki/Bitonic_sorter
https://en.wikipedia.org/wiki/Batcher_odd–even_mergesort
https://en.wikipedia.org/wiki/Pairwise_sorting_network
(https://en.wikipedia.org/wiki/Sorting_network)
any one might be optimal for the particular network/data size or platform. For algorithm design there aren't canned examples like with analog circuits as assumptions/constraints which can be used to optimise any given problem are tightly intertwined with the implementation, its always good to spend some time looking at possible ways to solve the problem before committing too much effort into any single one.
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #220 on: February 01, 2018, 08:15:24 pm »
For algorithm design there aren't canned examples like with analog circuits as assumptions/constraints which can be used to optimise any given problem are tightly intertwined with the implementation, its always good to spend some time looking at possible ways to solve the problem before committing too much effort into any single one.
Oh I don't expect any canned examples. Besides, where would be the fun in that? Fully agreed on spending some time on multiple different ways to solve it. I guess the point I was trying to make, is that you can only spend time on those multiple different ways if you actually know about the existence of those different ways. Basically I would be happy already with a dictionary of algorithms usable on programmable logic, with a one line description. At least then I have a term I can google and hunt for papers to read. Right now it's you don't know what you don't know... For example a fat tree encoder is damn handy, but I don't think I'd ever come across that on the software side of algorithms. And the hardware side is definitely less accessible. Well, that or I need glasses + a google refresher course or something...
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Learning FPGAs: wrong approach?
« Reply #221 on: February 04, 2018, 04:26:51 pm »
While working out the logic bits for another project, I just realized that I totally missed something with the sorting circuit. :palm: At the time I was feeling all clever and stuff, because I had just optimized the way of doing a comparison. Before that I actually described the hardware as per the hardware-description-language mantra, so (a < b). Not totally unexpected, that gave crap timing. So worked out what a comparison actually is in arithmetic, implement that. Yup, definitely better. Hence the feeling all clever and stuff. Only to realize now that while yes, that may have been better, but I could have done in 1 slice what I did there using 2 whole frigging slices. Doh! Well great, now I have to try that as well. Curse you curiosity!

Incidentally, I was just checking the timing report of the old circuit. What kind of clock uncertainty to use? Would be good to compare apples with apples. Or just reduce every inconvenience to zero and see how high the numbers get. Because if benchmarks have taught me anything, it is that higher numbers moar better.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf