### Author Topic: Notes on Gowin ALU Primitive Usage  (Read 2811 times)

0 Members and 1 Guest are viewing this topic.

#### Rainwater

• Regular Contributor
• Posts: 60
• Country:
##### Notes on Gowin ALU Primitive Usage
« on: May 13, 2024, 01:19:02 am »
Do to the incomplete documentation provided by gowin regarding usage of the CLU as an ALU in modes 'countup', 'countdown', and 'countupdown', I've spent the last few hours running multiple test on the 'tang nano 9k' dev board, using the 'GW1NR-9c' chip. My goal was to be able to simulate the functions of the ALU in the oss-cad-suite toolchain.
I want to share my results b/c this info is not on google. and I lose stuff like this very regularly, so it will be easy to find.
All constructive criticism is welcome.
truth tables
Code: [Select]
    alu_simulation_wrapper    (        .I0(    alu_I0 ),        .I1(    settings[2] ),                         .I3(    settings[1] ),        .CIN(   settings[0] ),        .COUT(  alup_cout ),        .SUM(   alup_sum )    );output in binary truth table format I0   I1     CIN       I3          COUT            SUMALU_TYPE_ADDTYPE 0    BC + AC + AB = |{ &{I1, CIN}, &{I0, CIN}, &{I0, I1} };            A'B'C + A'BC' + AB'C' + ABC = |{ &{!I0, !I1, CIN}, &{!I0, I1, !CIN}, &{I0, !I1, !CIN}, &{I0, I1, CIN} }; 0|0|0|0  0|0 1|0|0|0  0|1 0|1|0|0  0|1 1|1|0|0  1|0 0|0|1|0  0|1 1|0|1|0  1|0 0|1|1|0  1|0 1|1|1|0  1|1 0|0|0|1  0|0 1|0|0|1  0|1 0|1|0|1  0|1 1|1|0|1  1|0 0|0|1|1  0|1 1|0|1|1  1|0 0|1|1|1  1|0 1|1|1|1  1|1ALU_TYPE_SUBTYPE 1    B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  0|0 1|1|0|0  0|1 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  0|1 1|1|1|0  1|0 0|0|0|1  0|1 1|0|0|1  1|0 0|1|0|1  0|0 1|1|0|1  0|1 0|0|1|1  1|0 1|0|1|1  1|1 0|1|1|1  0|1 1|1|1|1  1|0ALU_TYPE_ADDSUBTYPE 2    AC + B'CD' + BCD + AB'D' + ABD = |{ &{I0, CIN}, &{!I1, CIN, !I3}, &{I1, CIN, I3}, &{I0, !I1, !I3}, &{I0, I1, I3} };            A'B'C'D' + A'B'CD + A'BC'D + A'BCD' + AB'C'D + AB'CD' + ABC'D' + ABCD = |{ &{!I0, !I1, !CIN, !I3}, &{!I0, !I1, CIN, I3}, &{!I0, I1, !CIN, I3}, &{!I0, I1, CIN, !I3}, &{I0, !I1, !CIN, I3}, &{I0, !I1, CIN, !I3}, &{I0, I1, !CIN, !I3}, &{I0, I1, CIN, I3} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  0|0 1|1|0|0  0|1 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  0|1 1|1|1|0  1|0 0|0|0|1  0|0 1|0|0|1  0|1 0|1|0|1  0|1 1|1|0|1  1|0 0|0|1|1  0|1 1|0|1|1  1|0 0|1|1|1  1|0 1|1|1|1  1|1ALU_TYPE_NOTEQTYPE 3    C + A'B + AB' = |{ CIN, &{!I0, I1}, &{I0, !I1} };             A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{ I0, !I1, CIN}, &{I0, I1, !CIN} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  1|0 1|1|0|0  0|1 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  1|1 1|1|1|0  1|0 0|0|0|1  0|1 1|0|0|1  1|0 0|1|0|1  1|0 1|1|0|1  0|1 0|0|1|1  1|0 1|0|1|1  1|1 0|1|1|1  1|1 1|1|1|1  1|0ALU_TYPE_GREATER_EQTYPE 4    B'C + AB' + AC = |{ &{!I1, CIN}, &{I0, !I1}, &{I0, CIN} };            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  0|0 1|1|0|0  0|1 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  0|1 1|1|1|0  1|0 0|0|0|1  0|1 1|0|0|1  1|0 0|1|0|1  0|0 1|1|0|1  0|1 0|0|1|1  1|0 1|0|1|1  1|1 0|1|1|1  0|1 1|1|1|1  1|0ALU_TYPE_LESS_EQTYPE 5    A'C + A'B + BC = |{ &{!I0, CIN}, &{!I0, I1}, &{I1, CIN} };            A'B'C' + A'BC + AB'C + ABC' = |{ &{!I0, !I1, !CIN}, &{!I0, I1, CIN}, &{I0, !I1, CIN}, &{I0, I1, !CIN} }; 0|0|0|0  0|1 1|0|0|0  0|0 0|1|0|0  1|0 1|1|0|0  0|1 0|0|1|0  1|0 1|0|1|0  0|1 0|1|1|0  1|1 1|1|1|0  1|0 0|0|0|1  0|1 1|0|0|1  0|0 0|1|0|1  1|0 1|1|0|1  0|1 0|0|1|1  1|0 1|0|1|1  0|1 0|1|1|1  1|1 1|1|1|1  1|0ALU_TYPE_COUNTUPTYPE 6    AC = &{I0, CIN};            A'C + AC' = |{ &{!I0, CIN}, &{I0, !CIN} }; 0|0|0|0  0|0 1|0|0|0  0|1 0|1|0|0  0|0 1|1|0|0  0|1 0|0|1|0  0|1 1|0|1|0  1|0 0|1|1|0  0|1 1|1|1|0  1|0 0|0|0|1  0|0 1|0|0|1  0|1 0|1|0|1  0|0 1|1|0|1  0|1 0|0|1|1  0|1 1|0|1|1  1|0 0|1|1|1  0|1 1|1|1|1  1|0ALU_TYPE_COUNTDOWNTYPE 7    C + A = |{ CIN, I0 };            A'C' + AC = |{ &{!I0, !CIN}, &{I0, CIN} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  0|1 1|1|0|0  1|0 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  1|0 1|1|1|0  1|1 0|0|0|1  0|1 1|0|0|1  1|0 0|1|0|1  0|1 1|1|0|1  1|0 0|0|1|1  1|0 1|0|1|1  1|1 0|1|1|1  1|0 1|1|1|1  1|1ALU_TYPE_COUNTUPDOWNTYPE 8    CD' + AD' + AC = |{ &{CIN, !I3}, &{I0, !I3}, &{I0, CIN} };            A'C'D' + A'CD + AC'D + ACD' = |{ &{!I0, !CIN, !I3}, &{!I0, CIN, I3}, &{I0, !CIN, I3}, &{I0, CIN, !I3} }; 0|0|0|0  0|1 1|0|0|0  1|0 0|1|0|0  0|1 1|1|0|0  1|0 0|0|1|0  1|0 1|0|1|0  1|1 0|1|1|0  1|0 1|1|1|0  1|1 0|0|0|1  0|0 1|0|0|1  0|1 0|1|0|1  0|0 1|1|0|1  0|1 0|0|1|1  0|1 1|0|1|1  1|0 0|1|1|1  0|1 1|1|1|1  1|0ALU_TYPE_MULTIPLIERTYPE 9    ABC = &{ I0, I1, CIN };            A'C + B'C + ABC' = |{ &{!I0, CIN}, &{!I1, CIN}, &{I0, I1, !CIN} }; 0|0|0|0  0|0 1|0|0|0  0|0 0|1|0|0  0|0 1|1|0|0  0|1 0|0|1|0  0|1 1|0|1|0  0|1 0|1|1|0  0|1 1|1|1|0  1|0 0|0|0|1  0|0 1|0|0|1  0|0 0|1|0|1  0|0 1|1|0|1  0|1 0|0|1|1  0|1 1|0|1|1  0|1 0|1|1|1  0|1 1|1|1|1  1|0
So after about 2 tries, i cheated and used an online kmap calculator to pinpoint my errors. http://www.32x8.com/

The effect of ALU_CHAIN.CIN value is as follows
Code: [Select]
+   ALU_TYPE_ADD       .CIN should be set to 'LOW'.    otherwise it will count up   by 1   .COUT indicates overflow when HIGH-   ALU_TYPE_SUB       .CIN should be set to 'HIGH'.   otherwise it will count down by 1.  .COUT indicates underflow when LOW+-  ALU_TYPE_ADDSUB     The effect is based on I3.     I3('b0) for subtraction,  I3('b1) for addition. (see .CIN & .COUT above)!=  ALU_TYPE_NOTEQ     .CIN should be set to 'LOW'.    otherwise .COUT will return HIGH    .COUT (LOW I0 == I1 ) (HIGH I0 != I1)>= ALU_TYPE_GREATER_EQ .CIN should be set to 'HIGH'.   otherwise function is I0 > I1       .COUT (LOW I0 <  I1 ) (HIGH I0 >= I1)<=  ALU_TYPE_LESS_EQ   .CIN should be set to 'HIGH'.   otherwise function is I0 < I1       .COUT (LOW I0 >  I1 ) (HIGH I0 <= I1)++  ALU_TYPE_COUNTUP   .CIN should be set to 'HIGH'.   otherwise the function is paused    .COUT indicates overflow when HIGH--  ALU_TYPE_COUNTDOWN .CIN should be set to 'LOW'.    otherwise the function is paused    .COUT indicates underflow when LOW+-ALU_TYPE_COUNTUPDOWN  The effect is based on I3.     I3('b0) for subtraction,  I3('b1) for addition. (see .CIN & .COUT above)*  ALU_TYPE_MULTIPLIER  its complicated
attached is
AUL_truth_tables.v - used to generate the truth tables
ALU.v - used to simulate the ALU primitives for the tang nano 9k.

(rant)
This has been a challenge.
Quote
Has to be an easier way to extract this information.
The first 3 chapters of my text book.

I have had great difficulty in getting Gowin's toolchain to synthesize simple counters, FIFO pointers, really any arithmetic at 100MHz.
Sometimes it will infer an alu, most the time it will not. This is a combination of my inexperience with verilog and IDE settings.

for example
Code: [Select]
    always @( posedge clk ) begin        if( enable ) begin                  // when running            counter <= counter + 'd1;            if( counter >= reset_value ) begin                             strobe_ff <= 1'b1;            end        end        if( !rst_n || strobe_ff ) begin       // reset the counter on reset            counter <= 'd1;            strobe_ff <= 1'b0;         end    endFmax values are taken from the timing report, not the inaccurate synthesis report.
counter is defined at being
Code: [Select]
reg unsigned [15:0] counter = 0;The critical path here is 'counter <= counter + 'd1;'. because counter is set elsewhere(line #9), there is a mux in front of it, and I get a Fmax around 80~95MHz.

By swapping that line out for a declared alu, Fmax reaches between 150~175MHz.
when I remove the reset and allowing a free running counter, (removing the MUX) FMax increases,
using 'CIN' as an enable port removed another level of logic.
And adding additional registers, to calculate and store the reset value. I get an FMax at 197.8Mhz.
I'm so close to 200 I can almost touch it.
By studying the 'Floor planner' or placement constraints editor, the ALU_CHAIN is broken up across multiple slices. But these slices are not adjacent to one another. Im hoping a few constraints will fix the issue and get me past the 200 mark. (future me here, so the synthesis and PNR tools have settings, that are by default, set for area, not speed. changing these now infers an alu for everything, tightly packed and fast. Still not 200mhz yet.).SoC SDRAM runs at 208mhz, thats my goal. Their protected IP HSFIFO builds at speeds approaching 250mhz, So I know it is doable.

« Last Edit: May 16, 2024, 11:45:53 pm by Rainwater »
"You can't do that" - challenge accepted

#### laugensalm

• Regular Contributor
• Posts: 125
• Country:
##### Re: Notes on Gowin ALU Primitive Usage
« Reply #1 on: May 25, 2024, 07:32:20 am »
These are interesting observations. I haven't dug into the layout of the gowin silicon any further, but I remember a similar situation on Xilinx when running out of multipliers, f_max optimiziation has to occur manually, as the tools exhibit random behaviour or end up synthesizing for a very long time.

Have you tried replacing the binary counters by gray logic? That's what most FIFO architectures drive on to allow higher frequencies while providing support for differing in/out clocks, like the FIFOHS IP core. I would be surprised if this IP core would allocate an ALU.

#### gnuarm

• Super Contributor
• Posts: 2247
• Country:
##### Re: Notes on Gowin ALU Primitive Usage
« Reply #2 on: May 25, 2024, 09:03:08 am »
I was going to use the Gowin parts on a project, but it was shot down by the customer.  My customer sells a lot to the US government and Gowin had been placed on a ban list of being too aligned with the Chinese military.  They managed to get off the list, but the customer was still not interested, so I planned on using parts from a Chinese company with their headquarters in the US.  I guess it pays to not be so obvious.

Anyway, I remember that the documentation was very, very poor on the various functional blocks like the ALUs.  I seem to recall that I just let the synthesizer do what it wanted to.  As long as it met my speed requirements, the parts were large enough to do it in random logic.

I remember going through a US sales rep or distributor to get some details on something else I was using, likely using a differential input pair as part of an sigma/delta ADC.  They had an app note on it, but were applying the RC in a funny way.  When I tried to discuss this, they literally could not understand the concept of not following the app note blindly.  I never got any useful info from them.

I would recommend the Efinix parts over Gowin.  Or, I should say I recommend Efinix support over Gowin.
Rick C.  --  Puerto Rico is not a country... It's part of the USA
- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

#### Rainwater

• Regular Contributor
• Posts: 60
• Country:
##### Re: Notes on Gowin ALU Primitive Usage
« Reply #3 on: May 25, 2024, 07:15:23 pm »
Selection of this dev board was based on the beginner verilog tutorials posted at
https://learn.lushaylabs.com/tang-nano-series/
Their plugin for visual studio has made my first experience with this dev board easy and pleasurable. But very quickly I outgrown SymbiYosys's limitations with Gowin products. I started using the Gowin IDE. Visual studio is still my goto for writing and testing, but building, timing, and programming are best left to the manufacturer's software.
edit:
Quote
Have you tried replacing the binary counters by gray logic

honestly, that never occurred to me as a possibility. I understand the concept from reading about gray encoding, but have yet to apply it to a project. I've mainly been focused on comparing what verilog post syntheses structures look like to the circuit I desire, sometimes post-syn is better, most the time it is equivalent, and few times it is not.
I only discovered what an fpga was a few months ago, so there is currently a lot of learning going on. started c/c++ in the 90s, amateur level. the transition has been very smooth, really wish I learned this first but these tools where not available at the time.
in short I know what circuits I need to string together to make what I want, But I have yet to learn to write verilog in a manner that properly describes this without explicitly writing primitives.
« Last Edit: May 25, 2024, 10:04:02 pm by Rainwater »
"You can't do that" - challenge accepted

#### laugensalm

• Regular Contributor
• Posts: 125
• Country:
##### Re: Notes on Gowin ALU Primitive Usage
« Reply #4 on: May 26, 2024, 11:25:49 am »
I was going to use the Gowin parts on a project, but it was shot down by the customer.  My customer sells a lot to the US government and Gowin had been placed on a ban list of being too aligned with the Chinese military.  They managed to get off the list, but the customer was still not interested, so I planned on using parts from a Chinese company with their headquarters in the US.  I guess it pays to not be so obvious.

Somehow this reminds me of the ridiculous accusations against Huawei, where an open telnet port was proof enough for some three letter government organisation to shout for a ban.
(I've done some reversing).
How was Xilinx 'aligned' again?
Although this might sound paradox and I'm certainly not aligned with chinese politics: From the technical aspect, there's no reason not to go with Gowin for the long term.

Anyhow, back to the subject:
Gray counters will take some tension out on most sane architectures, even when cross-clock-transition is not really required. They'll also use less power and create less digital noise. Some LFSR architectures can reduce logic usage even further and run faster, but they'll only work as counters in comparison, obviously. For sync FIFOS with same in/out clock they do fine.

You can simply create a gray counter here:
https://mybinder.org/v2/gh/hackfin/cyrite.howto.git/master?labpath=examples%2Fgray_counter.ipynb

Just run each cell and modify the SIZE parameter, once it's up (random startup time applies).

Smf