Author Topic: How much "Higher level" verilog is used in industry? (Adder example)  (Read 3710 times)

0 Members and 1 Guest are viewing this topic.

Offline DmeadsTopic starter

  • Regular Contributor
  • *
  • Posts: 164
  • Country: us
  • who needs deep learning when you have 555 timers
Okay So I consider myself a newbie in FPGAs.

today I built an 8 bit adder using full adders and one half adder.

All the parts (even the XOR gates needed) i built from the verilog primitives AND, OR, and NOT.

It was really cool to see everything work amazingly, but then I realized i could replace my 178 lines of code with;

assign output = inputA + inputB

This is way less fun to build, but does the exact same in the simulation and is WAYYYYY faster.

My question is this;

lets say a company is prototying an ALU on an FPGA,

would they do what I did and spend a lot of time on the structural design? or would they simply use the higher level code for addition?

im sure there are advantages to both, but could someone tell me what they are please?

Thanks.

-Dom
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2812
  • Country: nz
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #1 on: December 29, 2019, 04:46:47 am »
Have a look at OpenSparc CPU source and see what a formally closed source high end CPU looks like...

https://opencores.org/projects/sparc64soc
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: Dmeads

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11694
  • Country: us
    • Personal site
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #2 on: December 29, 2019, 04:52:42 am »
High level code as much as possible. You will die of an old age trying to assemble X86 from NAND gates.

But you obviously follow some basic rules to help the tool out. Don't just type random stuff like it is Python. There are certain constructions tolls understand better than others.

And things like memories are instantiated manually from the fab library.
« Last Edit: December 29, 2019, 05:08:11 am by ataradov »
Alex
 
The following users thanked this post: Dmeads

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9929
  • Country: us
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #3 on: December 29, 2019, 06:06:37 am »
How do the 2 versions deal with signed, unsigned, carry I in, carry out and overflow?  Those little details are a really big deal!
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8027
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #4 on: December 29, 2019, 08:28:39 am »
How do the 2 versions deal with signed, unsigned, carry I in, carry out and overflow?  Those little details are a really big deal!
?
Carry in:
assign Sum = in_a + in_b + carry_in - borrow_in;

reg carry out example 1:
carry_out <=( in_a + in_b)  >= (2**adder_bits);
borrow_out <= (in_a + in_b) < 0;
example 2:
carry_out <= sum[adder_bits];
borrow_out <= sum[adder_bits+1];

DSP overflow/range limiting used in for example, color space converters with programmable contrast & brightness:

         if (formula[bits+1]) result <= 0 ;  // formula returned a negative number, so set output result to 0
else if  (formula[bits+0]) result <= (2**bits) - 1 ; // formula returned a number too large, so set result to highest number
else                                        result <= formula ; // the formula is a value in between 0 and the highest possible value, make the result equal to the formula

Or, if you used the 'wire signed' or 'reg signed' or 'integer signed'.  'unsigned' is also useful for deliberate 2's compliment adaptations.
« Last Edit: December 29, 2019, 09:10:08 am by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8027
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #5 on: December 29, 2019, 08:42:24 am »
Here is a home made 32bit alu for a processor with an 8bit data bus:
(you may replace al the 'if()' with a case statement)
Code: [Select]
module alu_32bit (out, clk, func, in);

output [7:0] out;
input clk;
input [4:0] func;
input [7:0] in;

reg signed [31:0] a_reg;
reg signed [31:0] b_reg;
reg [7:0] out;

wire signed [63:0] mult_out;
wire signed [31:0] div_out;
wire signed [31:0] add_out;
wire signed [31:0] sub_out;

assign mult_out = a_reg * b_reg;
assign add_out  = a_reg + b_reg;
assign sub_out  = a_reg - b_reg;


//***************************************************************************
//***************************************************************************
//***************************************************************************
wire [31:0] sub_wire1;

lpm_divide lpm_divide_component (
.denom (b_reg),
.clock (clk),
.numer (a_reg),
.quotient (div_out),
.remain (sub_wire1),
.aclr (1'b0),
.clken (1'b1));
defparam
lpm_divide_component.lpm_drepresentation = "SIGNED",
lpm_divide_component.lpm_hint = "MAXIMIZE_SPEED=0,LPM_REMAINDERPOSITIVE=TRUE",
lpm_divide_component.lpm_nrepresentation = "SIGNED",
lpm_divide_component.lpm_pipeline = 13,
lpm_divide_component.lpm_type = "LPM_DIVIDE",
lpm_divide_component.lpm_widthd = 32,
lpm_divide_component.lpm_widthn = 32;
//***************************************************************************
//***************************************************************************
//***************************************************************************

always@(posedge clk) begin

if (func == 'h10 ) begin
a_reg[31:24] <= in[7:0];

end else if (func == 'h11 ) begin
a_reg[23:16] <= in[7:0];

end else if (func == 'h12 ) begin
a_reg[15:8]  <= in[7:0];

end else if (func == 'h13 ) begin
a_reg[7:0]   <= in[7:0];

end else if (func == 'h14 ) begin
b_reg[31:24] <= in[7:0];

end else if (func == 'h15 ) begin
b_reg[23:16]  <= in[7:0];

end else if (func == 'h16 ) begin
b_reg[15:8] <= in[7:0];

end else if (func == 'h17 ) begin
b_reg[7:0]  <= in[7:0];

end else if (func == 'h18 ) begin // A = a_reg * b_reg
a_reg[31:0]  <= mult_out[31:0];

end else if (func == 'h19 ) begin // A = a_reg / b_reg
a_reg[31:0]  <= div_out[31:0];

end else if (func == 'h1A ) begin // A = a_reg + b_reg
a_reg[31:0]  <= add_out[31:0];

end else if (func == 'h1B ) begin // A = a_reg - b_reg
a_reg[31:0]  <= sub_out[31:0];


end else if (func == 'h1C ) begin // Copy a-reg to b-reg
b_reg[31:0]  <= a_reg[31:0];

end else if (func == 'h1D ) begin // Copy b-reg to a-reg
a_reg[31:0]  <= b_reg[31:0];

end else if (func == 'h1E ) begin // Swap b-reg to a-reg
a_reg[31:0]  <= b_reg[31:0];
b_reg[31:0]  <= a_reg[31:0];

end else if (func == 'h1F ) begin // compare a_reg to b_reg
out[0] <= (a_reg == 0);     // Zero flag
out[1] <= (a_reg == b_reg); // Equal
out[2] <= (a_reg < b_reg); // a < b
out[3] <= (a_reg > b_reg); // a > b
end


if (func == 'h08 ) begin
out[7:0] <= a_reg[31:24];

end else if (func == 'h09 ) begin
out[7:0] <= a_reg[23:16];

end else if (func == 'h0A ) begin
out[7:0] <= a_reg[15:8];

end else if (func == 'h0B ) begin
out[7:0] <= a_reg[7:0];

end else if (func == 'h0C ) begin
out[7:0] <= b_reg[31:24];

end else if (func == 'h0D ) begin
out[7:0] <= b_reg[23:16];

end else if (func == 'h0E ) begin
out[7:0] <= b_reg[15:8];

end else if (func == 'h0F ) begin
out[7:0] <= b_reg[7:0];
end

end // always
endmodule
 
The following users thanked this post: Dmeads

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 20357
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #6 on: December 29, 2019, 09:53:36 am »
Many FPGAs have both
  • "low level" LUTs/cells used to implement primitive and/or/not and flip-flop functions
  • "high level" blocks such as adders and multipliers, and many other functions; these are faster and smaller than the equivalent LUT functions

It is unlikely that an FPGA synthesiser would convert a whole blob of explicit "low level" "structural" LUTs into the equivalent "high level" "behavioural" block.

The "high level" blocks can be and are instantiated in both structural models and behavioural models, as appropriate and convenient.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: Dmeads

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3238
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #7 on: December 29, 2019, 02:52:15 pm »
FPGA's building blocks are not gates, but LUTs. Therefore, your primitives do not map well to real FPGA hardware, hence the failure.

FPGA usually have something special to implement addition. For example Xilinx's logic cells have built-in carry chain logic. If you use addition in Verilog, that's what are you going to get (in most cases anyway). It has built-in carry-in and carry-out wires. You can instantiate it directly. In most cases, it'll be the same as if you used Verilog addition. So, there's no reason not to use Verilog addition, unless you want to make things faster.

If you want to make things faster, you ought to know the underlying hardware and work with it manually. In FPGA, there's no access to individual gates, so that's wouldn't be something you want to use. Instead you need to study the structure of the FPGA and find hardware-specific methods. It might be hard to jump over FPGA built-in addition methods, but usually you can do something at the expense of using much more logic. For example, you can split your number in halves and calculate carry for each half independently. Say, for a 32-bit adder you use two 16-bit halves. Of course, you have to feed the carry from the bottom half to the upper half. Therefore, to parallel the processes, you'd need two adders for the upper half - one for '0' carry in and another for '1' carry in. By the time the real carry is ready, both adders are done and you now can use the real carry in to mux out the correct result. This will be faster, but this is more work, and use massively more logic than a simple adder.

I would be surprised if I found out that the adders in the ALU of modern Intel processors are written as a Verilog addition, but who knows.
 
The following users thanked this post: Dmeads

Online asmi

  • Super Contributor
  • ***
  • Posts: 2778
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #8 on: December 29, 2019, 04:51:59 pm »
On Xilinx 7 series you can make a super-fast 48x48 bit adder using a single DSP tile. AFAIK you can push it into 400+ MHz range if you pipeline it properly. I seriously doubt synthesizer will infer it from using addition operator. This kind of reinforces my point that you need to utilize chip-specific hardware if you want to extract maximum performance.
But in my experience adders are almost never on a critical path, if anything, bit shifters are on a critical path inside ALU a lot more often. Case in point - simple HDL addition operator produces 64bit adder which can go above 200 MHz on Artix SG2 fabric, while I was unable to make a single-cycle 64bit bit shifter which can ran at 200+ MHz.
 
The following users thanked this post: Dmeads

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15185
  • Country: fr
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #9 on: December 29, 2019, 05:03:13 pm »
On Xilinx 7 series you can make a super-fast 48x48 bit adder using a single DSP tile. AFAIK you can push it into 400+ MHz range if you pipeline it properly. I seriously doubt synthesizer will infer it from using addition operator. This kind of reinforces my point that you need to utilize chip-specific hardware if you want to extract maximum performance.

I think in most cases, synthesis will not infer anything pipelined by itself as automatic pipelining of hdl could get very complicated fast, and often impossible due to your code not allowing for the extra latency required for a pipeline (so automatically pipelining a typically non-pipelined code would lead to incorrect translation).

It certainly infers non-pipelined versions of resources such as adders and multipliers, though, when they are available. And surely this won't get you the top performance as it's not pipelined.

HDL-wise, I don't know what kind of new language construct could allow expressing a pipelined adder (for instance) with just the '+' operator. It would certainly require additional constructs.

I don't know if there's any higher-level HDL that allows some kind of automatic pipelining (which would still not guarantee that internal pipelined resources of FPGAs would get used, as most if not all higher-level HDLs translate to Verilog or VHDL anyway...)
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15185
  • Country: fr
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #10 on: December 29, 2019, 05:06:54 pm »
I would be surprised if I found out that the adders in the ALU of modern Intel processors are written as a Verilog addition, but who knows.

I doubt it. They are most likely heavily pipelined,  and they'd run into the problem I just stated in the above post, how to express pipelined ADDs with a simple '+' operator...

Actually, I wouldn't be too surprised if the very basic blocks like adders were hand-optimized and not actually written in HDL at all...
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 2778
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #11 on: December 29, 2019, 05:11:23 pm »
I don't know if there's any higher-level HDL that allows some kind of automatic pipelining (which would still not guarantee that internal pipelined resources of FPGAs would get used, as most if not all higher-level HDLs translate to Verilog or VHDL anyway...)
I know for fact that Vivado synthesizer will "absorb" registers (== pipeline) into DSP tile when implementing HDL's multiply operator:
Code: [Select]
arg0 <= <input_1>;
arg1 <= <input_2>;
res_pipe <= arg0 * arg1;
result <= res_pipe;
here arg0 and arg1 will be absorbed as input pipe registers (B and D) of DSP tile, res_pipe - as pipeline register M, and result as output pipeline register P of a DSP tile.

For any random logic you will need to enable retiming in order for Vivado to optimize timing by moving pipeline registers around while preserving functionality.
« Last Edit: December 29, 2019, 05:17:12 pm by asmi »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15185
  • Country: fr
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #12 on: December 29, 2019, 05:29:52 pm »
I don't know if there's any higher-level HDL that allows some kind of automatic pipelining (which would still not guarantee that internal pipelined resources of FPGAs would get used, as most if not all higher-level HDLs translate to Verilog or VHDL anyway...)
I know for fact that Vivado synthesizer will "absorb" registers (== pipeline) into DSP tile when implementing HDL's multiply operator:

Didn't know Vivado was this "smart", I wonder whether all other tools are able to do this (such as ISE, or Quartus, Lattice stuff...)

So OK, in this case you're giving the synthesizer hints about the pipeline by using a chain of registers. I suppose if you want to express a deeper pipeline for some operation, you'll add one more register per stage. That's a nice feature if it's infered properly, but at least yes the result will be correct as you actually expressed the pipeline structure in your code.

For a 100% guarantee of the actual implementation, I would probably still tend to use the IP version of the operator directly if it has pipelined variants... so making sure the code won't actually get synthesized as a non-pipelined ADD with just extra registering... would certainly make me feel more comfortable.
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 2778
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #13 on: December 29, 2019, 06:08:46 pm »
Didn't know Vivado was this "smart", I wonder whether all other tools are able to do this (such as ISE, or Quartus, Lattice stuff...)

So OK, in this case you're giving the synthesizer hints about the pipeline by using a chain of registers. I suppose if you want to express a deeper pipeline for some operation, you'll add one more register per stage. That's a nice feature if it's infered properly, but at least yes the result will be correct as you actually expressed the pipeline structure in your code.
I think the concept of retiming exist in many synthesizers, but I don't have any personal experience with others.
Also Vivado does similar things with other HW blocks - like BRAM for example, which also has output pipeline register to help with clock-to-out delays.

For a 100% guarantee of the actual implementation, I would probably still tend to use the IP version of the operator directly if it has pipelined variants... so making sure the code won't actually get synthesized as a non-pipelined ADD with just extra registering... would certainly make me feel more comfortable.
Just for fun I just written a simple test:
Code: [Select]
module top #(
    parameter WIDTH = 17
) (
    input clk,
    input [WIDTH-1:0] arg0,
    input [WIDTH-1:0] arg1,
    output logic [2*WIDTH-1:0] result
);

bit [WIDTH-1:0] a0, a1;

bit [2*WIDTH-1:0] res_pipe[0:1], res;
   
always_ff @(posedge clk) begin
    a0 <= arg0;
    a1 <= arg1;
   
    res_pipe[0] <= a0 * a1;
    res_pipe[1] <= res_pipe[0];
   
    res <= res_pipe[1];
   
    result <= res;
end
endmodule
When implemented on a a100t-fgg484-2, it runs at Fmax for DSP tile (550.661 MHz) with no problems. All DSP tile pipeline registers are utilized.
 
The following users thanked this post: SiliconWizard

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11694
  • Country: us
    • Personal site
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #14 on: December 29, 2019, 06:58:48 pm »
FPGA tools will infer things pretty well if they are described in a certain way. Unfortunately sometimes this differs between the vendors. There is no way to describe a dual-port block RAM for Xilinx and Altera using the same Verilog code, for example.

That's why there are pretty massive coding guides, and Xilinx had the template library right in the ISE.

I feel like some of those optimizations are literally pattern matching on the code.
Alex
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 20357
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #15 on: December 29, 2019, 08:15:40 pm »
FPGA tools will infer things pretty well if they are described in a certain way. Unfortunately sometimes this differs between the vendors. There is no way to describe a dual-port block RAM for Xilinx and Altera using the same Verilog code, for example.

I prefer to encapsulate such concepts, and directly instantiate the manufacturer's block. That way I know what I'm getting simply by reading the data sheet.

Quote
That's why there are pretty massive coding guides, and Xilinx had the template library right in the ISE.

Yup.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8027
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #16 on: December 29, 2019, 10:33:32 pm »
FPGA tools will infer things pretty well if they are described in a certain way. Unfortunately sometimes this differs between the vendors. There is no way to describe a dual-port block RAM for Xilinx and Altera using the same Verilog code, for example.

That's why there are pretty massive coding guides, and Xilinx had the template library right in the ISE.

I feel like some of those optimizations are literally pattern matching on the code.
Yes, for certain functions, if you want that FMAX and cross platform capabilities, the best way is to make those required functions sub-modules instead of using inline verilog functions.  Have 2 of the same function built, 1 called xxxx_INTEL.sv and the other xxxx_XILINX.sv, each function having identical ins and outs while inside each, you will infer each vendor's specific memory or DSP function.  In your main code, you can make a 'generate' & 'if' in the module declarations to choose which vendor's studio you may be using to auto-compile for that specific environment.
« Last Edit: December 30, 2019, 02:14:38 am by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8027
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #17 on: December 30, 2019, 02:33:36 am »
Just for fun I just written a simple test:
Code: [Select]
module top #(
    parameter WIDTH = 17
) (
    input clk,
    input [WIDTH-1:0] arg0,
    input [WIDTH-1:0] arg1,
    output logic [2*WIDTH-1:0] result
);

bit [WIDTH-1:0] a0, a1;

bit [2*WIDTH-1:0] res_pipe[0:1], res;
   
always_ff @(posedge clk) begin
    a0 <= arg0;
    a1 <= arg1;
   
    res_pipe[0] <= a0 * a1;
    res_pipe[1] <= res_pipe[0];
   
    res <= res_pipe[1];
   
    result <= res;
end
endmodule
When implemented on a a100t-fgg484-2, it runs at Fmax for DSP tile (550.661 MHz) with no problems. All DSP tile pipeline registers are utilized.
Careful now, at 17, or even 18 bit, your selected Artix DSP slice is specified to operate at 550MHz in a single clock cycle, no pipelining needed.  If you need that pipeline to achieve 550MHz, you must have needed the pipes for routing the IO pins or to route other sections in the FPGA to and from the multiplier slice, not for getting the multiplier itself to operate at 550MHz like you say.  I once had an identical problem with Altera.  If you truly called Xilinx's dedicated DSP function instead and defined a true pipeline of 2 clocks, your FMAX should have gone up or stayed the same if you exceeded 18 bits x 18 bits.  I'm looking at the figures from Xilinx's brochure '7-series-product-selection-guide.pdf'.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27702
  • Country: nl
    • NCT Developments
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #18 on: December 30, 2019, 09:40:12 am »
I don't know if there's any higher-level HDL that allows some kind of automatic pipelining (which would still not guarantee that internal pipelined resources of FPGAs would get used, as most if not all higher-level HDLs translate to Verilog or VHDL anyway...)
I know for fact that Vivado synthesizer will "absorb" registers (== pipeline) into DSP tile when implementing HDL's multiply operator:
Didn't know Vivado was this "smart", I wonder whether all other tools are able to do this (such as ISE, or Quartus, Lattice stuff...)
In general the synthesizers are that smart. I usually write high level code. Xilinx has a cook-book which tells you exactly how high level commands are mapped to IP blocks. When dealing with HDL my motto is: don't describe hardware but describe the process. This leads to 10 times less code which is easier to follow and scales better too. If very high performance is needed it could be necessary to infer IP block directly but this usually goes for less than 1% of the design (like using assembler in a C program).
« Last Edit: December 30, 2019, 09:41:48 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: BrianHG, emece67

Offline emece67

  • Frequent Contributor
  • **
  • !
  • Posts: 614
  • Country: 00
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #19 on: December 30, 2019, 10:08:38 am »
.
« Last Edit: August 19, 2022, 02:43:54 pm by emece67 »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15185
  • Country: fr
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #20 on: December 30, 2019, 05:10:04 pm »
Just for fun I just written a simple test:
Code: [Select]
module top #(
    parameter WIDTH = 17
) (
    input clk,
    input [WIDTH-1:0] arg0,
    input [WIDTH-1:0] arg1,
    output logic [2*WIDTH-1:0] result
);

bit [WIDTH-1:0] a0, a1;

bit [2*WIDTH-1:0] res_pipe[0:1], res;
   
always_ff @(posedge clk) begin
    a0 <= arg0;
    a1 <= arg1;
   
    res_pipe[0] <= a0 * a1;
    res_pipe[1] <= res_pipe[0];
   
    res <= res_pipe[1];
   
    result <= res;
end
endmodule
When implemented on a a100t-fgg484-2, it runs at Fmax for DSP tile (550.661 MHz) with no problems. All DSP tile pipeline registers are utilized.
Careful now, at 17, or even 18 bit, your selected Artix DSP slice is specified to operate at 550MHz in a single clock cycle, no pipelining needed.

Even for a multiplier? That's impressive.

 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4858
  • Country: au
    • send complaints here
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #21 on: December 30, 2019, 11:12:28 pm »
When implemented on a a100t-fgg484-2, it runs at Fmax for DSP tile (550.661 MHz) with no problems. All DSP tile pipeline registers are utilized.
Careful now, at 17, or even 18 bit, your selected Artix DSP slice is specified to operate at 550MHz in a single clock cycle, no pipelining needed.
Even for a multiplier? That's impressive.
No so simple:
 
The following users thanked this post: BrianHG, SiliconWizard

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 27702
  • Country: nl
    • NCT Developments
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #22 on: December 31, 2019, 12:19:47 am »
But do realise that these speeds are extremely hard to achieve in real designs. You probably have to resort to manual placement. The thing is that routing delays will eat into the timing budget. This can be partly mitigated by using the pipeline register (which is why it is there). The maximum clock frequency is mostly a nice & shiny number to put in a datasheet.
« Last Edit: December 31, 2019, 12:21:38 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4858
  • Country: au
    • send complaints here
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #23 on: December 31, 2019, 02:32:10 am »
But do realise that these speeds are extremely hard to achieve in real designs. You probably have to resort to manual placement. The thing is that routing delays will eat into the timing budget. This can be partly mitigated by using the pipeline register (which is why it is there). The maximum clock frequency is mostly a nice & shiny number to put in a datasheet.
The tools are very aggressive at absorbing pipeline registers into hard cells, often causing the reverse problem with timing met on the RAM/DSP and the route immediately before/after its hard registers fails. Getting within 20% of the fmax is easy enough without hand placement, and I've seen designs hitting fmax as the timing failure (again no hand placement, just instantiating primitives).
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8027
  • Country: ca
Re: How much "Higher level" verilog is used in industry? (Adder example)
« Reply #24 on: December 31, 2019, 02:40:28 am »
But do realise that these speeds are extremely hard to achieve in real designs. You probably have to resort to manual placement. The thing is that routing delays will eat into the timing budget. This can be partly mitigated by using the pipeline register (which is why it is there). The maximum clock frequency is mostly a nice & shiny number to put in a datasheet.
The tools are very aggressive at absorbing pipeline registers into hard cells, often causing the reverse problem with timing met on the RAM/DSP and the route immediately before/after its hard registers fails. Getting within 20% of the fmax is easy enough without hand placement, and I've seen designs hitting fmax as the timing failure (again no hand placement, just instantiating primitives).
Coding design architecture counts.  Design from the beginning and achieving the FMAX shouldn't be a problem without any manual placement.  Achieving that FMAX with over 90% full fpga, with full multicorner timing analysis at full temperature range also illustrates excellent design practice from the ground up.  Doing so with an over 96% full fpga is entering a degree of luck so long as that 96% is true generic logic and not logic auto generated by the fitter intended to aid in achieving the FMAX.

Or, maybe I'm just giving Intel's Quartus too little credit as I had to personally make architecture the most important thing if I once needed their data sheet stated 270MHz FMAX for dual port ram, all of it in 1 huge chunk running at that speed, with logic and math all around it.
« Last Edit: December 31, 2019, 02:43:18 am by BrianHG »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf