Electronics > FPGA

Notes on Gowin ALU Primitive Usage

(1/1)

Rainwater:
Do to the incomplete documentation provided by gowin regarding usage of the CLU as an ALU in modes 'countup', 'countdown', and 'countupdown', I've spent the last few hours running multiple test on the 'tang nano 9k' dev board, using the 'GW1NR-9c' chip. My goal was to be able to simulate the functions of the ALU in the oss-cad-suite toolchain.
I want to share my results b/c this info is not on google. and I lose stuff like this very regularly, so it will be easy to find.
All constructive criticism is welcome.
truth tables

--- Code: ---    alu_simulation_wrapper
    (
        .I0(    alu_I0 ),
        .I1(    settings[2] ),                 
        .I3(    settings[1] ),
        .CIN(   settings[0] ),
        .COUT(  alup_cout ),
        .SUM(   alup_sum )
    );
output in binary truth table format
 I0
   I1
     CIN
       I3
          COUT
            SUM
`ALU_TYPE_ADD
TYPE 0    BC + AC + AB = |{ &{I1, CIN}, &{I0, CIN}, &{I0, I1} };
            A'B'C + A'BC' + AB'C' + ABC = |{ &{!I0, !I1, CIN}, &{!I0, I1, !CIN}, &{I0, !I1, !CIN}, &{I0, I1, CIN} };
 0|0|0|0  0|0
 1|0|0|0  0|1
 0|1|0|0  0|1
 1|1|0|0  1|0
 0|0|1|0  0|1
 1|0|1|0  1|0
 0|1|1|0  1|0
 1|1|1|0  1|1
 0|0|0|1  0|0
 1|0|0|1  0|1
 0|1|0|1  0|1
 1|1|0|1  1|0
 0|0|1|1  0|1
 1|0|1|1  1|0
 0|1|1|1  1|0
 1|1|1|1  1|1

`ALU_TYPE_SUB
TYPE 1    B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };
            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  0|0
 1|1|0|0  0|1
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  0|1
 1|1|1|0  1|0
 0|0|0|1  0|1
 1|0|0|1  1|0
 0|1|0|1  0|0
 1|1|0|1  0|1
 0|0|1|1  1|0
 1|0|1|1  1|1
 0|1|1|1  0|1
 1|1|1|1  1|0

`ALU_TYPE_ADDSUB
TYPE 2    AC + B'CD' + BCD + AB'D' + ABD = |{ &{I0, CIN}, &{!I1, CIN, !I3}, &{I1, CIN, I3}, &{I0, !I1, !I3}, &{I0, I1, I3} };
            A'B'C'D' + A'B'CD + A'BC'D + A'BCD' + AB'C'D + AB'CD' + ABC'D' + ABCD = |{ &{!I0, !I1, !CIN, !I3}, &{!I0, !I1, CIN, I3}, &{!I0, I1, !CIN, I3}, &{!I0, I1, CIN, !I3}, &{I0, !I1, !CIN, I3}, &{I0, !I1, CIN, !I3}, &{I0, I1, !CIN, !I3}, &{I0, I1, CIN, I3} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  0|0
 1|1|0|0  0|1
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  0|1
 1|1|1|0  1|0
 0|0|0|1  0|0
 1|0|0|1  0|1
 0|1|0|1  0|1
 1|1|0|1  1|0
 0|0|1|1  0|1
 1|0|1|1  1|0
 0|1|1|1  1|0
 1|1|1|1  1|1

`ALU_TYPE_NOTEQ
TYPE 3    C + A'B + AB' = |{ CIN, &{!I0, I1}, &{I0, !I1} };
             A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{ I0, !I1, CIN}, &{I0, I1, !CIN} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  1|0
 1|1|0|0  0|1
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  1|1
 1|1|1|0  1|0
 0|0|0|1  0|1
 1|0|0|1  1|0
 0|1|0|1  1|0
 1|1|0|1  0|1
 0|0|1|1  1|0
 1|0|1|1  1|1
 0|1|1|1  1|1
 1|1|1|1  1|0

`ALU_TYPE_GREATER_EQ
TYPE 4    B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };
            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  0|0
 1|1|0|0  0|1
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  0|1
 1|1|1|0  1|0
 0|0|0|1  0|1
 1|0|0|1  1|0
 0|1|0|1  0|0
 1|1|0|1  0|1
 0|0|1|1  1|0
 1|0|1|1  1|1
 0|1|1|1  0|1
 1|1|1|1  1|0

`ALU_TYPE_LESS_EQ
TYPE 5    A'C + A'B + BC = |{ &{!I0, CIN}, &{!I0, I1}, &{I1, CIN} };
            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} };
 0|0|0|0  0|1
 1|0|0|0  0|0
 0|1|0|0  1|0
 1|1|0|0  0|1
 0|0|1|0  1|0
 1|0|1|0  0|1
 0|1|1|0  1|1
 1|1|1|0  1|0
 0|0|0|1  0|1
 1|0|0|1  0|0
 0|1|0|1  1|0
 1|1|0|1  0|1
 0|0|1|1  1|0
 1|0|1|1  0|1
 0|1|1|1  1|1
 1|1|1|1  1|0

`ALU_TYPE_COUNTUP
TYPE 6    AC = &{I0, CIN};
            A'C + AC' = |{ &{!I0, CIN}, &{I0, !CIN} };
 0|0|0|0  0|0
 1|0|0|0  0|1
 0|1|0|0  0|0
 1|1|0|0  0|1
 0|0|1|0  0|1
 1|0|1|0  1|0
 0|1|1|0  0|1
 1|1|1|0  1|0
 0|0|0|1  0|0
 1|0|0|1  0|1
 0|1|0|1  0|0
 1|1|0|1  0|1
 0|0|1|1  0|1
 1|0|1|1  1|0
 0|1|1|1  0|1
 1|1|1|1  1|0

`ALU_TYPE_COUNTDOWN
TYPE 7    C + A = |{ CIN, I0 };
            A'C' + AC = |{ &{!I0, !CIN}, &{I0, CIN} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  0|1
 1|1|0|0  1|0
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  1|0
 1|1|1|0  1|1
 0|0|0|1  0|1
 1|0|0|1  1|0
 0|1|0|1  0|1
 1|1|0|1  1|0
 0|0|1|1  1|0
 1|0|1|1  1|1
 0|1|1|1  1|0
 1|1|1|1  1|1

`ALU_TYPE_COUNTUPDOWN
TYPE 8    CD' + AD' + AC = |{ &{CIN, !I3}, &{I0, !I3}, &{I0, CIN} };
            A'C'D' + A'CD + AC'D + ACD' = |{ &{!I0, !CIN, !I3}, &{!I0, CIN, I3}, &{I0, !CIN, I3}, &{I0, CIN, !I3} };
 0|0|0|0  0|1
 1|0|0|0  1|0
 0|1|0|0  0|1
 1|1|0|0  1|0
 0|0|1|0  1|0
 1|0|1|0  1|1
 0|1|1|0  1|0
 1|1|1|0  1|1
 0|0|0|1  0|0
 1|0|0|1  0|1
 0|1|0|1  0|0
 1|1|0|1  0|1
 0|0|1|1  0|1
 1|0|1|1  1|0
 0|1|1|1  0|1
 1|1|1|1  1|0

`ALU_TYPE_MULTIPLIER
TYPE 9    ABC = &{ I0, I1, CIN };
            A'C + B'C + ABC' = |{ &{!I0, CIN}, &{!I1, CIN}, &{I0, I1, !CIN} };
 0|0|0|0  0|0
 1|0|0|0  0|0
 0|1|0|0  0|0
 1|1|0|0  0|1
 0|0|1|0  0|1
 1|0|1|0  0|1
 0|1|1|0  0|1
 1|1|1|0  1|0
 0|0|0|1  0|0
 1|0|0|1  0|0
 0|1|0|1  0|0
 1|1|0|1  0|1
 0|0|1|1  0|1
 1|0|1|1  0|1
 0|1|1|1  0|1
 1|1|1|1  1|0
--- End code ---

So after about 2 tries, i cheated and used an online kmap calculator to pinpoint my errors. http://www.32x8.com/

The effect of ALU_CHAIN.CIN value is as follows

--- Code: ---+   `ALU_TYPE_ADD       .CIN should be set to 'LOW'.    otherwise it will count up   by 1   .COUT indicates overflow when HIGH
-   `ALU_TYPE_SUB       .CIN should be set to 'HIGH'.   otherwise it will count down by 1.  .COUT indicates underflow when LOW
+-  `ALU_TYPE_ADDSUB     The effect is based on I3.     I3('b0) for subtraction,  I3('b1) for addition. (see .CIN & .COUT above)
!=  `ALU_TYPE_NOTEQ     .CIN should be set to 'LOW'.    otherwise .COUT will return HIGH    .COUT (LOW I0 == I1 ) (HIGH I0 != I1)
>= `ALU_TYPE_GREATER_EQ .CIN should be set to 'HIGH'.   otherwise function is I0 > I1       .COUT (LOW I0 <  I1 ) (HIGH I0 >= I1)
<=  `ALU_TYPE_LESS_EQ   .CIN should be set to 'HIGH'.   otherwise function is I0 < I1       .COUT (LOW I0 >  I1 ) (HIGH I0 <= I1)
++  `ALU_TYPE_COUNTUP   .CIN should be set to 'HIGH'.   otherwise the function is paused    .COUT indicates overflow when HIGH
--  `ALU_TYPE_COUNTDOWN .CIN should be set to 'LOW'.    otherwise the function is paused    .COUT indicates underflow when LOW
+-`ALU_TYPE_COUNTUPDOWN  The effect is based on I3.     I3('b0) for subtraction,  I3('b1) for addition. (see .CIN & .COUT above)
*  `ALU_TYPE_MULTIPLIER  its complicated

--- End code ---

attached is
  AUL_truth_tables.v - used to generate the truth tables
  ALU.v - used to simulate the ALU primitives for the tang nano 9k.

(rant)
This has been a challenge.

--- Quote --- Has to be an easier way to extract this information.
--- End quote ---
The first 3 chapters of my text book.

I have had great difficulty in getting Gowin's toolchain to synthesize simple counters, FIFO pointers, really any arithmetic at 100MHz.
Sometimes it will infer an alu, most the time it will not. This is a combination of my inexperience with verilog and IDE settings.
 
for example

--- Code: ---    always @( posedge clk ) begin
        if( enable ) begin                  // when running
            counter <= counter + 'd1;
            if( counter >= reset_value ) begin             
                strobe_ff <= 1'b1;
            end
        end
        if( !rst_n || strobe_ff ) begin       // reset the counter on reset
            counter <= 'd1;
            strobe_ff <= 1'b0;
        end
    end

--- End code ---
Fmax values are taken from the timing report, not the inaccurate synthesis report.
counter is defined at being
--- Code: ---reg unsigned [15:0] counter = 0;
--- End code ---
The critical path here is 'counter <= counter + 'd1;'. because counter is set elsewhere(line #9), there is a mux in front of it, and I get a Fmax around 80~95MHz.

By swapping that line out for a declared alu, Fmax reaches between 150~175MHz.
when I remove the reset and allowing a free running counter, (removing the MUX) FMax increases,
using 'CIN' as an enable port removed another level of logic.
And adding additional registers, to calculate and store the reset value. I get an FMax at 197.8Mhz.
I'm so close to 200 I can almost touch it.
By studying the 'Floor planner' or placement constraints editor, the ALU_CHAIN is broken up across multiple slices. But these slices are not adjacent to one another. Im hoping a few constraints will fix the issue and get me past the 200 mark. (future me here, so the synthesis and PNR tools have settings, that are by default, set for area, not speed. changing these now infers an alu for everything, tightly packed and fast. Still not 200mhz yet.).SoC SDRAM runs at 208mhz, thats my goal. Their protected IP HSFIFO builds at speeds approaching 250mhz, So I know it is doable.


laugensalm:
These are interesting observations. I haven't dug into the layout of the gowin silicon any further, but I remember a similar situation on Xilinx when running out of multipliers, f_max optimiziation has to occur manually, as the tools exhibit random behaviour or end up synthesizing for a very long time.

Have you tried replacing the binary counters by gray logic? That's what most FIFO architectures drive on to allow higher frequencies while providing support for differing in/out clocks, like the FIFOHS IP core. I would be surprised if this IP core would allocate an ALU.

gnuarm:
I was going to use the Gowin parts on a project, but it was shot down by the customer.  My customer sells a lot to the US government and Gowin had been placed on a ban list of being too aligned with the Chinese military.  They managed to get off the list, but the customer was still not interested, so I planned on using parts from a Chinese company with their headquarters in the US.  I guess it pays to not be so obvious. 

Anyway, I remember that the documentation was very, very poor on the various functional blocks like the ALUs.  I seem to recall that I just let the synthesizer do what it wanted to.  As long as it met my speed requirements, the parts were large enough to do it in random logic. 

I remember going through a US sales rep or distributor to get some details on something else I was using, likely using a differential input pair as part of an sigma/delta ADC.  They had an app note on it, but were applying the RC in a funny way.  When I tried to discuss this, they literally could not understand the concept of not following the app note blindly.  I never got any useful info from them. 

I would recommend the Efinix parts over Gowin.  Or, I should say I recommend Efinix support over Gowin. 

Rainwater:
Selection of this dev board was based on the beginner verilog tutorials posted at
https://learn.lushaylabs.com/tang-nano-series/
Their plugin for visual studio has made my first experience with this dev board easy and pleasurable. But very quickly I outgrown SymbiYosys's limitations with Gowin products. I started using the Gowin IDE. Visual studio is still my goto for writing and testing, but building, timing, and programming are best left to the manufacturer's software.
edit:

--- Quote ---Have you tried replacing the binary counters by gray logic
--- End quote ---

honestly, that never occurred to me as a possibility. I understand the concept from reading about gray encoding, but have yet to apply it to a project. I've mainly been focused on comparing what verilog post syntheses structures look like to the circuit I desire, sometimes post-syn is better, most the time it is equivalent, and few times it is not.
I only discovered what an fpga was a few months ago, so there is currently a lot of learning going on. started c/c++ in the 90s, amateur level. the transition has been very smooth, really wish I learned this first but these tools where not available at the time.
in short I know what circuits I need to string together to make what I want, But I have yet to learn to write verilog in a manner that properly describes this without explicitly writing primitives.

laugensalm:

--- Quote from: gnuarm on May 25, 2024, 09:03:08 am ---I was going to use the Gowin parts on a project, but it was shot down by the customer.  My customer sells a lot to the US government and Gowin had been placed on a ban list of being too aligned with the Chinese military.  They managed to get off the list, but the customer was still not interested, so I planned on using parts from a Chinese company with their headquarters in the US.  I guess it pays to not be so obvious. 

--- End quote ---

Somehow this reminds me of the ridiculous accusations against Huawei, where an open telnet port was proof enough for some three letter government organisation to shout for a ban.
(I've done some reversing).
How was Xilinx 'aligned' again?
Although this might sound paradox and I'm certainly not aligned with chinese politics: From the technical aspect, there's no reason not to go with Gowin for the long term.

Anyhow, back to the subject:
Gray counters will take some tension out on most sane architectures, even when cross-clock-transition is not really required. They'll also use less power and create less digital noise. Some LFSR architectures can reduce logic usage even further and run faster, but they'll only work as counters in comparison, obviously. For sync FIFOS with same in/out clock they do fine.

You can simply create a gray counter here:
https://mybinder.org/v2/gh/hackfin/cyrite.howto.git/master?labpath=examples%2Fgray_counter.ipynb

Just run each cell and modify the SIZE parameter, once it's up (random startup time applies).

Navigation

[0] Message Index

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod