Author Topic: Clock Enable and timing constants (Read 1612 times)

Rainwater · « **on:** May 25, 2024, 10:45:29 pm »

I have avoided using all but clock speed timing constants. But now I want to use a `clock enable` pin to do some more complex logic that just cant be done in one clock cycle.
So now I have a lot of red setup violation in my timing report(103) and want to know if there is an easier way to write these constants.
really simple, my external clock is 27mhz, feeds a rpll with a 200.6 mhz output(clk_200)
attached to clk_200 is a counter and register, for every 8 'clk_200' ticks, I get one tick on 'ce_25mhz'

My first attempt was to write

Code: [Select]

set_multicycle_path -through [get_nets {ce_25mhz}] -setup -end 8
set_multicycle_path -through [get_nets {ce_25mhz}] -hold-end 7

My thinking is, the toolchain will see the constrant on the ce_25mhz and it will carry through into the modules using it.
this didn't work. the violation count went to 99.

So my next step was to breakout the pen and paper, and go through all the verilog using CE and record the registers that are clocked on CE.
after writing a constant containing 35 registers, I now pass timing.

This can not be the proper way. this process would be worth automating. surely I'm mucking it up.
Any advice would greatly appreciated.

BrianHG · « **Reply #1 on:** May 25, 2024, 11:15:24 pm »

I'm not an expert, but, making a clock enable input have a 7 to 8 cycle clock delay from your 27m reference would be nearly impossible to properly timing fit in your design since the 7-8x slower than 27mhz delay window would need to be achieved in async logic or IO pin delay on the FPGA fabric. This is a huge delay. For multicycle, the most I've used in the past was 2-1 for transferring data from a 400mhz domain to a 300 or 200 mhz domain.

What I would do is have a single 'DFF' running at your high MHZ system speed clock in/sample your CE pin at your 200.6 mhz, then the output of that DFF is used to enable you 200.6mhz code.

The only timing .sdc entry should be the input pin's relative setup & hold with reference to your 200.6mhz clock.

Then, I would make a 'falsepath' from the input pin to the 27mhz clock so that the compiler ignores attempting to time constraint that input with the clock going into my PLL source.

Note that this is how I would approach this issue.

nctnico · « **Reply #2 on:** May 25, 2024, 11:26:25 pm »

Quote from: Rainwater on May 25, 2024, 10:45:29 pm

I have avoided using all but clock speed timing constants. But now I want to use a `clock enable` pin to do some more complex logic that just cant be done in one clock cycle.
So now I have a lot of red setup violation in my timing report(103) and want to know if there is an easier way to write these constants.
really simple, my external clock is 27mhz, feeds a rpll with a 200.6 mhz output(clk_200)
attached to clk_200 is a counter and register, for every 8 'clk_200' ticks, I get one tick on 'ce_25mhz'

My first attempt was to write
Code: [Select]
set_multicycle_path -through [get_nets {ce_25mhz}] -setup -end 8 set_multicycle_path -through [get_nets {ce_25mhz}] -hold-end 7 My thinking is, the toolchain will see the constrant on the ce_25mhz and it will carry through into the modules using it.
this didn't work. the violation count went to 99.

So my next step was to breakout the pen and paper, and go through all the verilog using CE and record the registers that are clocked on CE.
after writing a constant containing 35 registers, I now pass timing.

This can not be the proper way. this process would be worth automating. surely I'm mucking it up.

I don't think you are mucking things up. I don't recognise the timing constraint syntax you are using but it might be possible to group all the registers enabled by the CE line and have a single constraint. An alternative approach is to create a lower frequency clock from the PLL and use that to clock the lower speed logic. Typically the constraints will be generated automatically because the routing software now knows the relationships between the clocks by looking at the PLL configuration. Or at least you can create a constraint for the lower speed clock. The advantage of this approach is that you can add / remove logic from the low speed clock domain without needing to change the constraints. The CE line becomes a strobe to kick the lower speed logic into action.

Rainwater · « **Reply #3 on:** May 26, 2024, 01:06:17 am »

my apologies for not explaining this better. The external clock is only being used to drive the pll. everything else is coming from the 200mhz pll output. this output is how im generating a 25mhz CE pin. the external clock is not being used for any logic. just the pll

I have a clk_200 domain, which runs a few modules. (sram, external input synchronizers, and soon some dsp's)
I want to use a clock enable pins( CE pin ) to drive other logic that doesn't need that high speed. such as a UART, some 2Khz PWM and a few SPI interfaces that max out at 2Mhz.

All the reading I have done says there are two solutions to this problem. #1 is to generate another clock domain, and #2 is to use a CE pin.
There are pros and cons to each solution. Solution #2 is better suited to my needs and design.
My device does not have dual port ram.
Solution #1 would complicate access to/from the FIFO's from the slow clk to the fast clk, access to the read or write enable pin of the fifo would span multiple fast_clock ticks, resulting in multiple reads/writes with each slow clock access. and requires a complex setup to solve(edge detection) and over complicates burst writes/reads from the slow clock domain.

Solution #2 does not have this problem, I can toggle pins as fast as my clk speed at the cost of routing resources for an addition CE pin with a high fanout

When I say pins, I am not talking about external pins, but ports within the verilog modules.

I have been able to greatly simplify the constant, by adding a prefix to the registers used in the multi-cycle path, then using wildcards to search for these nets and apply the MCP.
https://cdn.gowinsemi.com.cn/SUG940E.pdf is the timing constraint users guide for my device.

BrianHG · « **Reply #4 on:** May 26, 2024, 02:16:02 am »

If the CE pin is just a system enable, not a logic timing critical point which must meet the 5ns 200Mhz timing at every endpoint in your desing, then just use the 'set_false_path' for that net. Your compiler will make everything else routed as timing critical and just route the CE so that it just gets there whenever.

This means it will be easier for you compiler to focus on the important timing elsewhere.

Looking at one of my .sdc files here:
BrianHG_DDR3_DECA.sdc (Scroll down to the 'set_false_path' section...)

You can see I even use 'set_false_path' between my Max 10's 50mhz clock input pin to the PLL source input and all my core DDDR3 core clocks and VGA clocks. Without this exclusion, the Quartus compiler would try to not only sync the PLL core clock with it's driven IOs, but also with the reference source 50mhz input pin and my VGA clock source and it's PLL. Cutting these paths means I dont care about the PLL's timing and uncertainty between it's input and core with IO, I only care about the PLL's core outputs and it's driven IOs making it easier for Quartus to not have to route around an additional timing constraint of the 50MHz source which I only use as a reference for the system PLL.

nctnico · « **Reply #5 on:** May 26, 2024, 10:19:03 am »

Quote from: BrianHG on May 26, 2024, 02:16:02 am

If the CE pin is just a system enable, not a logic timing critical point which must meet the 5ns 200Mhz timing at every endpoint in your desing, then just use the 'set_false_path' for that net.

I don't think that is a good idea because then you'll get random clock enables at points in time where you don't expect it (while the logic is still stabilising).
CE has to be a global net but since there is no logic involved, routing it to meet setup & hold for a 5ns clock period shouldn't be a problem. So even though the CE line is enabled once every few clocks, ALL the flipflops affected by the CE line need to act on the SAME clock edge for the design to work.

Rainwater · « **Reply #6 on:** May 26, 2024, 11:11:18 am »

After some hardware testing, my design did not work with all the MCP constraints, so I removed the constraints and uploaded the code full of timing errors, which runs smoothly.
Digging deeper, my worst case slack without timing constraints is -0.8ns. So im not failing by much
After a little brainstorming and some sleep, i turned the MCP back on, debounced a button and drove my ce_25mhz with it.
The observations led me to generate a report on the ce_25mhz pin itself. Which had a total delay of 12ns when using the MCP constraints. 80% was routing delays and replication
I removed the ce pin from the constrant, it now has positive slack and the design is functional.
The ce pin gets && and || in a few places, i might need to optimize this away.

This still leads me to the conclusion that i have to hand pick every register clocked on the CE pin to use within the timing constraints for this style of logic control to work

BrianHG · « **Reply #7 on:** May 26, 2024, 11:22:30 am »

Quote from: nctnico on May 26, 2024, 10:19:03 am

Quote from: BrianHG on May 26, 2024, 02:16:02 am
If the CE pin is just a system enable, not a logic timing critical point which must meet the 5ns 200Mhz timing at every endpoint in your desing, then just use the 'set_false_path' for that net.
I don't think that is a good idea because then you'll get random clock enables at points in time where you don't expect it (while the logic is still stabilising).
CE has to be a global net but since there is no logic involved, routing it to meet setup & hold for a 5ns clock period shouldn't be a problem. So even though the CE line is enabled once every few clocks, ALL the flipflops affected by the CE line need to act on the SAME clock edge for the design to work.

I'm still expecting 'Rainwater' to take the CE input pin and clock it through a DFF running at his 200mhz core clock before feeding his logic internal multiple CEs. That DFF's output should still have normal timing constraints and once set, all to daughter logic should trigger within a single 200mhz clock cycle. Only the souce IO pin will no longer be constrained to the source 27MHz clock.

If 'Rainwater' is having trouble meeting timing constraints and needs an absurd 7-8 multicycle from a 27mhz clock domain source, then maybe he should instead DFF the CE input at 27mhz, then pass that DFF output through 200mhz DFF, then use that DFF's output as his internal CE.

If 27 is his source clock and his PLL derived output is 200.6mhz, that odd division means the source 27mhz domain reference needs to be guaranteed to sit on any phase of his PLL's output at 200.6mhz with the added uncertainty between the PLL's input and it's output. This can be a timing nightmare and multiple DFFs clocked at the 2 clock domains before distributing the node to the rest of your design means the compiler only needs to worry about 1 single net inbetween to perfect it's timing, or, just use the false_path.

The other choice is to use a perfect multiple of 2 for your PLLs output from the clock ijnput. IE: choosing a 216Mhz core (a frequency I use a lot as I done a lot of SMPTE broadcast video designs...) means a perfect 8:1 from your 27mhz source reference, then the compile knows the constant relationship between the 27mhz domain and the PLL output.

Rainwater · « **Reply #8 on:** May 26, 2024, 11:58:11 am »

Quote from: BrianHG on May 26, 2024, 11:22:30 am

I'm still expecting 'Rainwater' to take the CE input pin and clock it through a DFF running at his 200mhz core clock before feeding his logic internal multiple CEs.

The CE is internally generated, not external.
The 200 mhz clock is counted on a 3 bit register, the carry out feeds a DFF that is the CE_25mhz pin.
My terminology is horrible, i keep calling this a pin, it is not, it is an internal register. I apologize for the confusion

BrianHG · « **Reply #9 on:** May 26, 2024, 12:29:58 pm »

Quote from: Rainwater on May 26, 2024, 11:58:11 am

Quote from: BrianHG on May 26, 2024, 11:22:30 am
I'm still expecting 'Rainwater' to take the CE input pin and clock it through a DFF running at his 200mhz core clock before feeding his logic internal multiple CEs.
The CE is internally generated, not external.
The 200 mhz clock is counted on a 3 bit register, the carry out feeds a DFF that is the CE_25mhz pin.
My terminology is horrible, i keep calling this a pin, it is not, it is an internal register. I apologize for the confusion

Ok, this changes things by quite an amount. If all your CE need parallel activation, then you cannot use my 'set_false_path' trick. Your design will go corrupt.

Now, how is the CE generated? Using a carry out from a 3 bit counter means sounds like it may be generated from combinational logic. This is a no-no when designing high speed enables.

You need to pass the carry out through a simple DFF sort of like so:
always @(posedge clk) CE_for_my_logic <= CE_from_my_carry_out;

This will make you CE driven from a single DFF logic cell, no combinational logic inbetween to add nets for routing timing.

To make things even faster, you can split your CEs in parallel: (you need to disable 'remove duplicate logic for these registers)

always @(posedge clk) begin
CE_for_my_logic_#1 <= CE_from_my_carry_out;
CE_for_my_logic_#2 <= CE_from_my_carry_out;
CE_for_my_logic_#3 <= CE_from_my_carry_out;
end

Or, sequentially: (potentially the best FMAX if your logic can get away with it, or just run everything on output #2...)
always @(posedge clk) begin
CE_for_my_logic_#1 <= CE_from_my_carry_out;
CE_for_my_logic_#2 <= CE_for_my_logic_#1
end

Now, if you need everything to begin 1 clock early, all you need to do it make 'CE_from_my_carry_out' into a 'look ahead carry out' so that it is ready 1 clock early before getting the DFF delay treatment.

Also, when using multicycle, only use a multicycle setup of 2 and hold of 1. If you use anything larger, then your logic design will completely fail unless you have taken specific measures to accommodate a random 7-8 clock delay across your entire design. And remember, you still need to have coded appropriately for that setup time of 2 clock cycles. You are better off just piping and DFF clocking you CE logic like I mentioned above as all you are doing is forcing a man made setup time of 2 clocks, 1 hold cycle without the random timing constraint setup by the 'multicycle' timing made in your .sdc file.

Rainwater · « **Reply #10 on:** May 26, 2024, 12:33:47 pm »

sorry were posting at the same moments.
Here is a simplistic example

Code: [Select]

`default_nettype none

`define second_length  'd25000000

module counter_with_strobe
    #( 
            parameter WIDTH = 4,
    )
    (
        input   wire                rst,
        input   wire                clk,
        input   wire                enable,
        input   wire [WIDTH-1:0]    reset_value,
        output  wire                strobe
    );
    reg     [WIDTH-1:0]    cws_counter_ff   = 'd1;
    reg                    strobe_ff        = 0;
    assign                 strobe           = strobe_ff;

    always @( posedge clk ) begin
        if( enable ) begin  // optional multi-cycle path
            cws_counter_ff     <= cws_counter_ff + 'd1;
            if( cws_counter_ff >= reset_value ) begin
                cws_counter_ff <= 'd1;
                strobe_ff <= 1'b1;
            end
        end                 // end of multi-cycle path
        if( rst )
            cws_counter_ff <= 'd1;
        if( rst || strobe_ff)
            strobe_ff <= 1'b0;
    end
endmodule

module top(
    input   wire            clk_27,
    output  wire    [5:0]   led,
    input   wire            btn1
);
    // 200MHz clk
    wire clk_200;
    wire clk_200_lock;
    Gowin_rPLL pll_clk_gen_200(
        .clkout(    clk_200 ), //output clkout
        .lock(      clk_200_lock ),
        .clkin(     clk_27  ) //input clkin
    );

    wire ce_25mhz;
    counter_with_strobe #( .WIDTH( $clog2('d8) + 1 ) ) 
    clk_enable_25mhz
    (   .clk  (         clk_200 ),
        .rst(           rst ),
        .enable(        1'b1),      // not using multi cycle path
        .reset_value(   'd8 ),
        .strobe(        ce_25mhz )
    );

    wire second_strobe;
    counter_with_strobe #( .WIDTH( $clog2(`second_length) + 1 ) ) 
    second_counter
    (   .clk  (         clk_200 ),
        .rst(           rst ),
        .enable(        ce_25mhz),          // using multi cycle path
        .reset_value(   `second_length ),
        .strobe(        second_strobe )
    );

// drive leds so the code doesnt get optimized away
    reg     [5:0] led_reg = 0;
    assign led = led_reg;
    reg     second_ticker = 0;
    always @( posedge clk_200 )
        if( second_strobe )
            second_ticker <= ~second_ticker;

    always @( posedge clk_200 ) begin
        led_reg[5] <= second_ticker;
    end
endmodule

the timing constraints to make this large counter work have to include 'second_counter/cws_counter_ff*'. to tell the timing analyzer that it is ok for the addition and `greater than or equal to` operations to take more than 1 clock cycle.
My question is,
Will I always have to specify each vector/register, or is there a way to tell the tool chain, if it involves the 'ce_25mhz' register, to apply the constraint?

BrianHG · « **Reply #11 on:** May 26, 2024, 05:46:42 pm »

Ok, your code looks fine. The : strobe_ff <= 1'b1; means your strobe's output is already a single flip-flop, so, everything is as fast as it can be. (Yes, there are tricks to do thing faster if you are driving a huge chunk of your FPGA fabric all in parallel, but we will stay away from that for now...)

Quote

the timing constraints to make this large counter work have to include 'second_counter/cws_counter_ff*'. to tell the timing analyzer that it is ok for the addition and `greater than or equal to` operations to take more than 1 clock cycle.

Ok, now we have a problem. You should not have any timing constraints allowing you CE to deal with huge counters. Such a constraint is telling the compiler that you do not mind if the CE enables/disables portions of your logic on the current clock cycle, or, a cycle late. You may get lucky, but, your CE 25mhz counter running at 200mhz may have portions which may try to increment at the current 200mhz cycle while another will inc on the next creating a random corruption which you may not notice during operation.

In this case, nctnico is correct:

Quote from: nctnico on May 26, 2024, 10:19:03 am

I don't think that is a good idea because then you'll get random clock enables at points in time where you don't expect it (while the logic is still stabilising).

I get the feeling that your:

Quote

to tell the timing analyzer that it is ok for the addition and `greater than or equal to` operations to take more than 1 clock cycle.

Are you talking about another for example, 30 bit counter's reset cycle in your system, or are you talking about your 'module counter_with_strobe'?

Because of your input 'input wire [WIDTH-1:0] reset_value,', and the 'if( cws_counter_ff >= reset_value )', if 200mhz is at the edge of your FPGA's speed capabilities, a random input from somewhere else instead of a hard-wired parameter means the way you did this code can boggle down a slow FPGA.

If you like, I can re-write this module up-side-down where even an over 20bit programmable range counter will easily achieve 200mhz if a 4 bit counter is already able to run at 200mhz on your FPGA.

Rainwater · « **Reply #12 on:** May 26, 2024, 10:03:59 pm »

Quote from: BrianHG on May 26, 2024, 05:46:42 pm

You should not have any timing constraints allowing you CE to deal with huge counters. Such a constraint is telling the compiler that you do not mind if the CE enables/disables portions of your logic on the current clock cycle, or, a cycle late.

Correct, as i learned this morning, constraing the CE caused all sorts of problems as some registers received the enable signal between 0 and 12ns

Quote

I get the feeling that your:
Quote
to tell the timing analyzer that it is ok for the addition and `greater than or equal to` operations to take more than 1 clock cycle.

Are you talking about another for example

No. Im referencing line number 22 & 23

Code: [Select]

            cws_counter_ff     <= cws_counter_ff + 'd1;
            if( cws_counter_ff >= reset_value ) begin

Quote

Because of your input 'input wire [WIDTH-1:0] reset_value,', and the 'if( cws_counter_ff >= reset_value )', if 200mhz is at the edge of your FPGA's speed capabilities, a random input from somewhere else instead of a hard-wired parameter means the way you did this code can boggle down a slow FPGA.

The code i posted was a minimum example.
The sample code demonstrates the need for a multicycle path.
As much as I love optimization and minimalist design, i would like to keep this post on topic.
Using a multi cycle path, allows the design to use a simpler circuit at the cost of learning timing constraints.

Preview/post mixup:
I have ensured that CE will only go high for 1 clock cycle, every 8 cycles.
Also that reset_value will remain stable for the duration of the current ce_25mhz period.
Im trying to learn the best practice for writing a timing constraint that will properly describe this.
Currently this requires 3 constraints
One through cws_counter_ff, for resetting the initial value, and incrementing the value.
One for the >= comparison
And one for the rst condition

BrianHG · « **Reply #13 on:** May 26, 2024, 11:04:28 pm »

What I am saying is that so long as the output path of your CE is ok, but it is the magnitude comparator '>=' which is killing your fitter's timing, you can re-write your existing code so that even a programmable length 32bit counter should achieve 200mhz without any timing constraints settings at all. IE: full 200MHz, single clock execution, no BS tricks.

BrianHG · « **Reply #14 on:** May 26, 2024, 11:16:38 pm »

Take a look at this:

Code: [Select]

module counter_with_strobe
    #(
            parameter WIDTH = 4,
    )
    (
        input   wire                rst,
        input   wire                clk,
        input   wire                enable,
        input   wire [WIDTH-1:0]    reset_value,
        output  wire                strobe
    );
    reg     [WIDTH-1:0]    cws_counter_ff   = 'd1;
    reg                    strobe_ff        = 0;
    assign                 strobe           = strobe_ff;

    always @( posedge clk ) begin

      if (rst) begin
                    cws_counter_ff <= reset_value;
                    strobe_ff      <= 1'b0;
      end else

        if( enable ) begin  // optional multi-cycle path

            if (cws_counter_ff == (WIDTH)'d0 || cws_counter_ff == (WIDTH)'d1) begin
                                    cws_counter_ff <= reset_value;
                                    strobe_ff      <= 1'b1;
            end else begin
                                    cws_counter_ff <= cws_counter_ff - 1'b1;
                                    strobe_ff      <= 1'b0;
            end
        end                 // end of enable
      end                   // end of !rst
    end                     // end of always@clk

endmodule

On line 25, the '|| cws_counter_ff == (WIDTH)'d1' inside the 'if' is optional if you want a divisor of 1 to give you an enable every single clock cycle. Otherwise, 0 would also give you an enable every clock cycle.

This code should count and enable at 200mhz even if you make it 32bits wide. No fancy timing constraints tricks.

Rainwater · « **Reply #15 on:** May 27, 2024, 01:41:23 am »

attached is the full implementation of the desired circuit.
It was wrote with primitives first(TYPE=1), then formally verified, then wrote in verilog(TYPE=0).
when speed is needed, TYPE=1 is the choice, but if it meets timing, type 0 takes less resources.

thank you for module. This takes less lut resources, but changes the function/features of the module
With your configuration, any change to the reset_value will not take effect until the counter reaches 1 || 0;
my FMax is limited by my cheap hardware, currently a primitive 15bit ALU runs at around 200mhz. see this post for details
I change the value of reset_counter throughout the state machine that uses it, after it has started running. Tho not the best practice, I find writing the logic easier to read and follow. adjusting the reset_value when I enter a state, is easier and takes less LUT than adjusting it before I jump into the state.

perhaps it will better explain how im using this one module, but it is not the only one which I wish to use MCP with.
I have been able to make the constraints much more manageable by prefixing 'MCP_' to the register name then writing the constrants using 'MCP_*' wildcards. This also helps prevent errors when writing/reading the logic because it is obvious the registers are used differently.

BrianHG · « **Reply #16 on:** May 27, 2024, 01:57:21 am »

Quote from: Rainwater on May 27, 2024, 01:41:23 am

With your configuration, any change to the reset_value will not take effect until the counter reaches 1 || 0;

Sorry, I was unaware of your different requirement.

Also remember, in an FPGA, LUT usage is not as important as LC usage. LC is usually your final cap.

There are other ways to make a magnitude compare or change the behavior of my code.
For example, you may reset the counter on any change of the reset value.

Or, you may make a 2-clock magnitude compare. IE: create numerous 8x8 bit compares (compare the bottom 8 bits, clocked, and then the top 8 bits, into 2 different regs) then, 'AND' those reg results to use as your period reset. This 16 bit magnitude compare will have a much higher FMAX than a true 16bit-32bit in 1 clock cycle compare. You just need to remember to start your up counter at 2 instead of 1, and make a special global 'IF' in case the input count value is 0 or 1 to immediately set the outp0ut CE.

EG:
compare_p1 <= (cws_counter_ff[15:8] >= reset_value[15:8]);
compare_p2 <= (cws_counter_ff[7:0 ] >= reset_value[7:0 ]); // compare_p1/2 are each a 1 bit reg.

if( compare_p1 && compare_p2 ) begin

Need faster, just make 4x 4bit compares 'compare_p1/2/3/4'...

BrianHG · « **Reply #17 on:** May 27, 2024, 02:18:11 am »

Also, just try registering the 'reset value' into a local register. That will improve the timing of large magnitude compares since wherever the source came from, it would have a local buffer placed where it need to be placed for best performance.

IE: (Literally this stupid...)
reset_value_latched <= reset_value_input_port;

In your 'if', just use the latched version.

Rainwater · « **Reply #18 on:** May 27, 2024, 03:04:58 am »

Quote from: BrianHG on May 27, 2024, 01:57:21 am

For example, you may reset the counter on any change of the reset value.

So long as the adjustment does not lose the current count, for example, i use this to make my UART's last stop bit, 2 clocks shorter, this gives me 1 tick in the idle state, then one more tick to react to the next data word to be sent. Keeping the end of the stop bit and beginning of the start bit in perfect timing while managing the ce_25mhz timing.
The best alternative I thought of was to implement a running counter, and track the start time for calculating the new reset_value input, but this tripled my registers
After that I wrote the ALU_PIPELINE module, it works fast, +250mhz, but requires converting the reset value into a ..... complacated format ... before it can be used
With the pjpelined counter, if the counter_ff arrives at the reset_value, with the carry chain $countones() != 0, then the strobe would be skipped. The solution is to pre calculating what the counter_ff and its carry chain would be, then use that as a comparison. Way to complicated just to strobe a light.
So I left it and moved onto the next module.

Quote

Also, just try registering the 'reset value' into a local register. That will improve the timing of large magnitude compares since

Correct, but that eats more regusters, and using the MCP gives my signal over 40ns to lounge in the breakroom, before it has to show up and work.

Thanks for the tips, it is very much appreciated.

Quote

In your 'if', just use the latched version

The synthesis report says zero latches. But it also gives the fmax that i wish I had.

From the reading i have done, latches can only be infered from an combinational blocks and not sequential blocks.
And the design viewer usually will show a LUT into a DFF, or a DFFE with ether a register or LUT feeding the CE. If using a vector with an inital value, a combination of DFFSE and DFFRE are usually inferred.
By defining a value in every case, the implementation does not use any CE pins and builds a mux to feed the d port of the primitives. Even for a 1 bit register. When I obmit changing the value, unless I need to, then a DFFRE primitive is build, with strobe_ff(.D(1'bq) .CE(cws_counter_ff >= reset_value) .RESET(Q || rst)) The results are less resources because (>=) is shared between, counter_ff RESET port and strobe_ff CE port, instead of a WIDTH length 3x mux(reset value, increments value, comparison value) feeding onto counter_ff.D, only the increment value feeds it, without a mux
primitive guide can be viewed here

BrianHG · « **Reply #19 on:** May 27, 2024, 07:58:43 am »

I did not know this was being used to create a UART. You can take a look here on how I did mine, a UART which corrects for bit skew, horrible timing errors and also synchronizes the TX output with the RX input down to the ns.

https://www.eevblog.com/forum/fpga/verilog-rs232-uart-and-rs232-debugger-source-code-and-educational-tutorial/

Just look at the 'The ' SYNC_RS232_UART.v ' portion and it's timing illustration.
You may change the parameter baud rate with a 16bit input port.
I do not know if it will achieve 250mhz on a Gowin FPGA.
You can just modify the RX&TX byte's LSB/MSB for even and odd parity and add another input to count 10 bits for 8E1/8N1 modes.

BrianHG · « **Reply #20 on:** May 27, 2024, 06:37:49 pm »

Quote from: Rainwater on May 27, 2024, 03:04:58 am

Quote
In your 'if', just use the latched version
The synthesis report says zero latches. But it also gives the fmax that i wish I had.

From the reading i have done, latches can only be infered from an combinational blocks and not sequential blocks.
And the design viewer usually will show a LUT into a DFF, or a DFFE with ether a register or LUT feeding the CE. If using a vector with an inital value, a combination of DFFSE and DFFRE are usually inferred.

Note that I called it a latch. It is nothing more than a DFF register word with an optional write enable. Every DFF register bit usually consumes a Logic Cell, except when you are just buffering or pipe clocking data without logic at it's input, some compilers may auto-infer those bits into the FPGA's block-ram bits pf various types saving valuable Logic Cells.

Unless you are trying to create a Gowin device specific code, or trying to debug weird compiler behavior, try not to read in too much what type of cells the compiler chooses to use and try to make code which can be ported to any vendor's FPGA studio, and learn coding strategies which will increase your FMAX no matter which vendor's FPGA you send your code to. Otherwise, you code may only operate at high speed on only 1 chip.

Rainwater · « **Reply #21 on:** May 30, 2024, 01:01:04 am »

your right, these timing constraints, are making doing other things much more difficult, and after more reading, yes clock enable can be used in this matter, but is much easier to use a separate clock domain. The more effective use of clock enable is to add latency to the pathway. I finally see the points you've been making now. I'm new to verilog, high speed design, and have to learn things the hard way. Thank you for your suggestions. It would have most likely be a few more weeks before I fully understood this problem and came to the same conclusion.

Rainwater · « **Reply #22 on:** June 01, 2024, 01:46:11 am »

Using the pointers you have provided, Thank you again BrianHG, I came up with the code posted below. There are still a few bugs in it. really just one, but I have the solution, just not implemented it yet. line #64 causes the strobe to be missed when formal verification plays with the 'reset_value'.

these values are for a 15 bit wide counter, with 4 ticks of latency. If I implement over 16 bits, line #64 again, requires another logic layer due to by device only having 4 bit LUT. and gets me below 200mhz. Once I implement the bug fix, ill add some generated pipe lining to this as well and solve that issue to. This is a lot of work just to count a buad rate, but has been an enjoyable academic exercise. so much different that c/c++. I love it.

Code: [Select]

`default_nettype none
`define CWS_TYPE_SIMPLE 0
`define CWS_TYPE_PRIMITIVE 1
`define CWS_TYPE_PIPELINED 2

module counter_with_strobe
    #( 
        `ifdef FORMAL
            parameter WIDTH = 4,
            parameter ALU_WIDTH = 2,
        `else
            parameter WIDTH = 15,
           parameter ALU_WIDTH = 4,
        `endif

        parameter TYPE = 2
    )
    (
        input   wire                rst,
        input   wire                clk,
        input   wire                enable,
        input   wire [WIDTH-1:0]    reset_value,
        output  wire                strobe,
        output  wire                valid
    );
    generate
        case (TYPE)
        `CWS_TYPE_PIPELINED: begin
            // the 'reg [WIDTH-1:0] counter;' will be broken into chunks.
            // each chunk's arithmetic COUT will be stored in the reg carrie_chain[] for the next chunks CIN.
            // the first chunk will not have a CIN, but the enable signal
            // the last chunk will not have a COUT
            // the counter may contain only one chunk.
            localparam CHUNK_COUNT      = WIDTH % ALU_WIDTH == 0         // find the minimum amount of chunks needed to contain the full counter
                                                ? WIDTH / ALU_WIDTH
                                                : WIDTH / ALU_WIDTH + 1;
            localparam LAST_CHUNK_SIZE  = WIDTH % ALU_WIDTH == 0         // find the size of the last chunk needed to contain for the counter.
                                                ? ALU_WIDTH
                                                : WIDTH % ALU_WIDTH;
            reg [WIDTH-1:0]         counter_ff = 'd1;
            reg [CHUNK_COUNT-1:0]   carry_chain = 0;
            reg [CHUNK_COUNT-1:0]   cmp_chain = 0;
            reg                     strobe_ff = 0;
            assign  strobe  = strobe_ff;
            // used for formal verification. optimized out when put into hardware
            reg [CHUNK_COUNT:0] valid_tracker   = 0;
            assign              valid           = valid_tracker[CHUNK_COUNT];
            always @( posedge clk ) begin
                if( rst ) begin
                    valid_tracker <= 1'd1;
                end else begin
                    if( enable )
                        valid_tracker <= 1'd1;
                    else
                        valid_tracker <= { valid_tracker[CHUNK_COUNT-1:0], 1'b0 };
                end
            end
            //// end valid output.
            reg r_trigger = 0;
            wire trigger = r_trigger;
            if( CHUNK_COUNT == 1)
                always @(posedge clk) r_trigger <= counter_ff >= reset_value;
            else
                always @(posedge clk) r_trigger <= &cmp_chain; // bug here

            integer idx;    // for loop iterator ... current_chunk
            always @( posedge clk ) begin
                strobe_ff <= 0;   // turn strobe_ff off.
                if( rst ) begin
                    counter_ff <= 'd1;
                    carry_chain <= 0;
                    cmp_chain <= 0;
                end else begin
                    // carry chain propagation,
                    // exceptions
                    //  first chunk - .CIN(enable), all others .CIN(carry_chain[idx-1])
                    //  last_chunk  - .COUT(null)
                    //  last_chunk  - .WIDTH(LAST_CHUNK_SIZE)
                    for( idx = 0; idx <= CHUNK_COUNT - 1; idx = idx + 1 ) begin
                        if( idx != CHUNK_COUNT - 1 ) begin // !LAST_CHUNK
                            { carry_chain[idx], counter_ff[idx*ALU_WIDTH+:ALU_WIDTH] } <= { 1'b0, counter_ff[idx*ALU_WIDTH+:ALU_WIDTH] } + (idx == 0 ? enable : carry_chain[idx-1]);
                            cmp_chain[idx] <= counter_ff[idx*ALU_WIDTH+:ALU_WIDTH] >= reset_value[idx*ALU_WIDTH+:ALU_WIDTH];
                        end else begin    // == LAST_CHUNK
                            counter_ff[WIDTH-1:WIDTH-LAST_CHUNK_SIZE] <= counter_ff[WIDTH-1:WIDTH-LAST_CHUNK_SIZE] + (idx == 0 ? enable : carry_chain[idx-1]);
                            cmp_chain[idx] <= counter_ff[WIDTH-1:WIDTH-LAST_CHUNK_SIZE] >= reset_value[WIDTH-1:WIDTH-LAST_CHUNK_SIZE];
                        end
                    end 
                    if( enable ) begin
                        if( trigger ) begin
                            counter_ff <= 'd1;
                            carry_chain <= 0;
                            strobe_ff <= 1;
                        end
                    end
                end // !rst
            end 
        end
        endcase
    endgenerate
endmodule


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Clock Enable and timing constants (Read 1612 times)

Share me