Author Topic: Derived clocks, best practices? (Read 4725 times)

aheid · « **on:** February 08, 2020, 02:27:04 pm »

So I've just made my first dumb and simple LED "breather" using a sin table lookup and PWM. For the table lookup, I needed a slow clock to increment the lookup address, so I made a simple one from a counter (see clock_div module attached).

I got a warning though which may be related: "WARNING - The preferred point for defining clocks is top level ports and driver pins. Pad delays will not be taken into consideration if clocks are defined on nets." Is this related or something else?

In either case this got me thinking about how to best do this, like what are the preferred ways of making derived clocks? Say I want to make an I2C master I need a 100kHz clock, or for driving those WS2182 led strips etc etc.

Please excuse the code quality, got barely a few hours of HDL programming under my belt, and I know my modules could benefit from parameters (that's on the to-do list).

aheid · « **Reply #1 on:** February 08, 2020, 02:41:34 pm »

To clarify I guess I'm after design principles or guidelines that I can keep in mind. I know LSFR is an efficient way to do clock division, I've used those before as RNGs so familiar with them. Reading material welcome.

OwO · « **Reply #2 on:** February 08, 2020, 02:57:18 pm »

The usual advice is to not do clock division in user logic, and only use PLL blocks for that, or to use a fast clock and a clock enable that is high every N cycles.

In practice as long as there isn't any interaction between logic clocked from the original clock and the derived clock, it doesn't matter and you can ignore the warning. Or all crossings must be treated as asynchronous. In Xilinx devices you can drive a BUFG from the derived clock to get rid of the warning I think, but for timing analysis the two clocks must be assumed to be independent.

I personally do not like big clock enables, because it is a high fanout signal that also must be routed as if it's a clock (and take up clock buffers), so clock gating (BUFGCE) is usually preferred instead and can give you power savings.

aheid · « **Reply #3 on:** February 08, 2020, 03:40:18 pm »

Thanks for the detailed response. Glad I asked.

Seems my current hardware (iCE40) does not support clock gating, but the keyword did lead me to some useful documentation both from Lattice and others.

SiliconWizard · « **Reply #4 on:** February 08, 2020, 04:20:08 pm »

Note that with your example, you'd get the same result embedding the clock divide counter inside the "sin_addr" generator process, thus not needing to actually generate an extra clock with the issues that come with clock distribution. Your design would remain fully synchronous. This is an approach that you can take if using a dedicated PLL/clock divider is inconvenient.

aheid · « **Reply #5 on:** February 08, 2020, 07:22:47 pm »

Quote from: SiliconWizard on February 08, 2020, 04:20:08 pm

Note that with your example, you'd get the same result embedding the clock divide counter inside the "sin_addr" generator process, thus not needing to actually generate an extra clock with the issues that come with clock distribution.

Since I'm a n00b, could you elaborate (no pun intended) a bit? Only thing that comes to my mind right away is storing previous bit value, and do an edge detect based on (prev < cur), but I suspect I'm missing something.

SiliconWizard · « **Reply #6 on:** February 08, 2020, 08:55:49 pm »

Quote from: aheid on February 08, 2020, 07:22:47 pm

Quote from: SiliconWizard on February 08, 2020, 04:20:08 pm
Note that with your example, you'd get the same result embedding the clock divide counter inside the "sin_addr" generator process, thus not needing to actually generate an extra clock with the issues that come with clock distribution.

Since I'm a n00b, could you elaborate (no pun intended) a bit? Only thing that comes to my mind right away is storing previous bit value, and do an edge detect based on (prev < cur), but I suspect I'm missing something.

OK. Remove CLOCK_DIV, its instantiation and the 'sin_clk' signal.
The top level will be something like this now: (i've also removed the 'count' signal which doesn't appear to be used?)

Code: [Select]

module top(rst, RGB0, a1);
input rst;
output RGB0, a1;

wire a1 ;

HSOSC OSCInst0 (
.CLKHFEN(1'b1),
.CLKHFPU(1'b1),
.CLKHF(oclk)
);
defparam OSCInst0.CLKHF_DIV = "0b01";

reg [17:0] sin_ctr;
reg [7:0] sin_addr;
wire [7:0] sin_val;

parameter NDIV = 17'd100000;

always @(posedge oclk)
	begin
		if (sin_ctr == (NDIV - 17'd1)) begin
			sin_addr <= sin_addr + 7'd1;
			sin_ctr <= 17'd0;
		end
		else begin
			sin_ctr <= sin_ctr + 17'd1;
		end
	end

ROM_SIN_256 sin_inst(
	.clk(oclk),
	.addr(sin_addr),
	.dout(sin_val)
);

PWM pwm_inst(
	.clk(oclk),
	.rst(rst),
	.duty_cycle(sin_val),
	.pulse(a1)
);

RGB inst1(	
				.CURREN(1'b1), 
				.RGBLEDEN(1'b1), 
				.RGB0PWM(a1), 
				.RGB1PWM(), 
				.RGB2PWM(), 
				.RGB2(), 
				.RGB1(), 
				.RGB0(RGB0)
				);
defparam inst1.CURRENT_MODE = 1 ;
defparam inst1.RGB0_CURRENT = "0b000011" ;
defparam inst1.RGB1_CURRENT = "0b000111" ;
defparam inst1.RGB2_CURRENT = "0b001111" ;

initial
begin
	sin_addr <= 7'b0;
	sin_ctr <= 17'b0;
end

endmodule

You can use any dividing factor this way, not just a power of two. You can also use a register instead of a constant parameter for the factor, then in your small example, you could easily modulate the sine frequency. Have fun! (Probably not very useful for driving LEDs, but would be fun for generating a sine wave through some DAC.)

Report back if it works, as Verilog is not my HDL, so hope I got it right.

langwadt · « **Reply #7 on:** February 08, 2020, 10:35:03 pm »

Quote from: SiliconWizard on February 08, 2020, 08:55:49 pm

Quote from: aheid on February 08, 2020, 07:22:47 pm
Quote from: SiliconWizard on February 08, 2020, 04:20:08 pm
Note that with your example, you'd get the same result embedding the clock divide counter inside the "sin_addr" generator process, thus not needing to actually generate an extra clock with the issues that come with clock distribution.

Since I'm a n00b, could you elaborate (no pun intended) a bit? Only thing that comes to my mind right away is storing previous bit value, and do an edge detect based on (prev < cur), but I suspect I'm missing something.

OK. Remove CLOCK_DIV, its instantiation and the 'sin_clk' signal.
The top level will be something like this now: (i've also removed the 'count' signal which doesn't appear to be used?)

..

You can use any dividing factor this way, not just a power of two. You can also use a register instead of a constant parameter for the factor, then in your small example, you could easily modulate the sine frequency. Have fun! (Probably not very useful for driving LEDs, but would be fun for generating a sine wave through some DAC.)

Report back if it works, as Verilog is not my HDL, so hope I got it right.

you can do it all in one and hit almost any frequency (on average) and with a register you safely change change the frequency anytime

Code: [Select]


..

reg  [25:0] sin_ctr = 0;

// f = fclk/2^26 * NDIV  
parameter NDIV = 26'd33554;

always @(posedge oclk)
	begin
			sin_ctr <= sin_ctr + NDIV;
	end

ROM_SIN_256 sin_inst(
	.clk(oclk),
	.addr(sin_ctr[25:25-7]),
	.dout(sin_val)
);
...

OwO · « **Reply #8 on:** February 09, 2020, 05:16:21 am »

Quote from: SiliconWizard on February 08, 2020, 04:20:08 pm

Note that with your example, you'd get the same result embedding the clock divide counter inside the "sin_addr" generator process, thus not needing to actually generate an extra clock with the issues that come with clock distribution.

... and that's why I tell people to stop using "process" in HDL, because it easily misleads you on what hardware is actually generated. When you add a nested "if" like that, what it does is add conditions to the clock enable of the flipflops that the logic drives. If you have more than a handful of signal assignments in an if block, and some of them are wide signals (e.g. 32 bit values), that leads to a high fanout clock enable, which the tools may promote to a clock (and take up clock routing resources) or lead to bad timings (when it isn't promoted).

For beginners, if you want to actually understand hardware design, I recommend generating the clock enable explicitly. I'm not sure how the syntax in verilog is but in VHDL you might write something like:

Code: [Select]

signal cnt, cntNext: unsigned(7 downto 0);
signal cmp, cmpNext: std_logic;
...

cmpNext <= '1' when cnt = 99 else '0';
cntNext <= to_unsigned(0, 8) when cmpNext='1' else cnt+1;
cnt <= cntNext when rising_edge(clk);
cmp <= cmpNext when rising_edge(clk);

That will generate one pulse on "cmp" every 100 cycles. You can then use "cmp" as a clock enable:

Code: [Select]

sin_addr <= sin_addr+1 when cmp='1' and rising_edge(clk);

aheid · « **Reply #9 on:** February 09, 2020, 12:49:22 pm »

Excellent info guys, thanks a lot!

Quote from: OwO on February 09, 2020, 05:16:21 am

For beginners, if you want to actually understand hardware design, I recommend generating the clock enable explicitly.

I see the same kind of style is promoted in Lattice's HDL coding guide (which I just found). I think I understand the point of it. Using this style vs the one illustrated by SiliconWizard took a bit more resources (possibly due to rst handling?). However max delay in critical path went down from ~18000 to ~14000, which I presume means it's a lot faster yes?

For comparison here's the two variants of my "address generator" module:

Code: [Select]

module ADDR_GEN(
    input clk,
    input rst,
    output reg[7:0] addr
);
    parameter PRESCALER = 18'd100000;
    parameter START_ADDR = 8'b0;

    reg [17:0] ctr;

    always @(posedge clk or posedge rst)
        if (rst)
        begin
            ctr <= 18'b0;
            addr <= START_ADDR;
        end
        else if (ctr == (PRESCALER - 18'b1))
        begin
            ctr <= 18'b0;
            addr <= addr + 8'b1;
        end
        else
        begin
            ctr <= ctr + 18'b1;
        end
    
endmodule

Code: [Select]

module ADDR_GEN(
    input clk,
    input rst,
    output reg[7:0] addr
);
    parameter PRESCALER = 18'd100000;
    parameter START_ADDR = 8'b0;

    reg [17:0] ctr;
    wire [17:0] ctr_next;

    reg rollover;    
    wire rollover_next;

    assign rollover_next = (ctr == (PRESCALER - 18'b1)) ? 1'b1 : 1'b0;

    assign ctr_next = (rollover_next) ? 18'b0 : ctr + 18'b1;

    always @(posedge clk or posedge rst)
        if (rst)
            ctr = 18'b0;
        else
            ctr = ctr_next;

    always @(posedge clk)
        rollover = rollover_next;

    always @(posedge clk or posedge rst)
        if (rst)
            addr <= START_ADDR;
        else if (rollover)
            addr <= addr + 8'b1;
                
endmodule

OwO · « **Reply #10 on:** February 09, 2020, 05:53:57 pm »

In this simple example I don't think there's going to be much performance difference, but the idea is just mainly to be "aware" of what kind of logic you are laying down; there are also vendor specific differences to be aware of too, for example altera flipflops have a "synchronous load" input that can be selected, which is basically a mux in front of the flipflop for free, while Xilinx and most other vendors don't have that, so (for example) you might avoid putting a mux after an adder, because that's going to overflow into a new slice.

Btw I recommend removing that asynchronous reset (or making it synchronous)

aheid · « **Reply #11 on:** February 09, 2020, 06:17:06 pm »

Quote from: OwO on February 09, 2020, 05:53:57 pm

Btw I recommend removing that asynchronous reset (or making it synchronous)

Ah yes, didn't focus on that so just dragged it along. Making it synchronous reduced LUT count significantly. I'll keep that in mind.

Thanks again

tchicago · « **Reply #12 on:** January 05, 2021, 08:29:26 pm »

Quote from: aheid on February 08, 2020, 02:27:04 pm

I got a warning though which may be related: "WARNING - The preferred point for defining clocks is top level ports and driver pins. Pad delays will not be taken into consideration if clocks are defined on nets." Is this related or something else?

Have you been able to get rid of this warning? I'm using the ice40UP device, and whenever I add the internal oscillator IP or externally driven PLL IP, I get this warning. I've already spent two days trying all types of pre-constraints describing clocks and derived clocks, including the exact same as in PLL documentation. Tried to reformulate clock constraints in all ways possible. Simplified the project to doing almost nothing, and still get this warning. Only if I remove the PLL and drive the clock externally, the warning goes away and the clocks are routed as expected.

Interesting that if open up the Lattice Radiant's example counter_reveal (blinking blue led) for ice40UP, which uses HSOSC and try to synthesize it, it also gives the same warning. Could be a bug in Radiant or something...

ale500 · « **Reply #13 on:** February 13, 2021, 04:42:28 pm »

I know it could be a bit OT but, on IceCube there was a primitive to route a signal through a global buffer (SB_GB) but in Radiant I do not find any similar buffer, I used it once to route a derived clock from the internal oscillator as 6 MHz was a bit too much. Do anyone know how this global buffer is called now ?

gnuarm · « **Reply #14 on:** February 13, 2021, 10:23:17 pm »

I didn't read every reply in detail, but it doesn't look like this was covered adequately.

It is much better to have only one clock in a design and use clock enables for events that happen at lower rates. Otherwise you have to worry with signals crossing clock domains which is a messy subject.

I'm working on a joint project and I'm finding designers who like to create separate clocks for things like SPI ports, thinking they need to actually clock registers at the bus speed rather than just enable the FFs.

I did design an SPI slave once that shifted the data on the external clock and synchronized to the internal clock at the handshake to the internal logic, but there was no real advantage. If that interface had a very high speed and presented setup and hold time issues working from the internal clock I might reconsider. Otherwise just use clock enables using the same internal main clock always. Avoid the messy clock domain crossing issues or at least restrict them to the I/Os.

asmi · « **Reply #15 on:** February 14, 2021, 01:08:11 am »

Quote from: gnuarm on February 13, 2021, 10:23:17 pm

I didn't read every reply in detail, but it doesn't look like this was covered adequately.

It is much better to have only one clock in a design and use clock enables for events that happen at lower rates. Otherwise you have to worry with signals crossing clock domains which is a messy subject.

I'm working on a joint project and I'm finding designers who like to create separate clocks for things like SPI ports, thinking they need to actually clock registers at the bus speed rather than just enable the FFs.

I did design an SPI slave once that shifted the data on the external clock and synchronized to the internal clock at the handshake to the internal logic, but there was no real advantage. If that interface had a very high speed and presented setup and hold time issues working from the internal clock I might reconsider. Otherwise just use clock enables using the same internal main clock always. Avoid the messy clock domain crossing issues or at least restrict them to the I/Os.

It is a good goal, but it's not always possible. In some cases you need very precise timings for IO interface, in others - you need to clock some logic at lower rate because there is too much of it for main clock, and pipelining is infeasible for some reason.
In general I tend to use a dedicated clock for each IO interface, even if later they all are connected to a single clock source as this allows to decouple IO clock for the main one in case you need to adjust it later on (say, you decided to replace 133 MHz QSPI flash for a 100 MHz one, and so you need to change the IO clock to support that).

gnuarm · « **Reply #16 on:** February 23, 2021, 04:52:39 pm »

Quote from: asmi on February 14, 2021, 01:08:11 am

It is a good goal, but it's not always possible. In some cases you need very precise timings for IO interface, in others - you need to clock some logic at lower rate because there is too much of it for main clock, and pipelining is infeasible for some reason.
In general I tend to use a dedicated clock for each IO interface, even if later they all are connected to a single clock source as this allows to decouple IO clock for the main one in case you need to adjust it later on (say, you decided to replace 133 MHz QSPI flash for a 100 MHz one, and so you need to change the IO clock to support that).

Sure, if you absolutely need to clock the inputs and outputs at a specific time that requires clocking off the external clock. It does not require any processing at that rate other than potentially DDR to parallel data.

The point is the more processing done in the outside clock domain typically the more complex the interfaces and the more attention that needs to be paid to the clock domain crossing. That is not to say all processing is verboten. I'm saying switch to the main clock rate at the earliest possible point.

It is exactly the fallacy you mention about "too much" processing for the main clock that I am addressing by the use of clock enables.

asmi · « **Reply #17 on:** February 23, 2021, 05:46:00 pm »

Quote from: gnuarm on February 23, 2021, 04:52:39 pm

It is exactly the fallacy you mention about "too much" processing for the main clock that I am addressing by the use of clock enables.

In order to do that, you will have to add multicycle constraints, otherwise your timing is going to fail (because STA has no idea about your clock enables, and will always assume propagation needs to occur within a single clock cycle). That is pain in the butt, so I prefer using different clocks. Especially since FPGAs I use have hardware FIFOs, so crossing clock domains is not that big of a deal.

gnuarm · « **Reply #18 on:** February 23, 2021, 05:56:49 pm »

Quote from: asmi on February 23, 2021, 05:46:00 pm

Quote from: gnuarm on February 23, 2021, 04:52:39 pm
It is exactly the fallacy you mention about "too much" processing for the main clock that I am addressing by the use of clock enables.
In order to do that, you will have to add multicycle constraints, otherwise your timing is going to fail (because STA has no idea about your clock enables, and will always assume propagation needs to occur within a single clock cycle). That is pain in the butt, so I prefer using different clocks. Especially since FPGAs I use have hardware FIFOs, so crossing clock domains is not that big of a deal.

I can't recall a design that didn't have multicycle constraints. There's always something that runs slower than the clock rate.

When doing professional work clock domain crossing requires specific analysis to verify it will work and there are no tools for this, so it has to be done by hand. Too much documentation for me. One clock domain and the timing tools take care of the multicycle timing analysis. I just have to point them out, easy to do in comparison and easy to verify.

SiliconWizard · « **Reply #19 on:** February 23, 2021, 06:24:30 pm »

Quote from: gnuarm on February 23, 2021, 05:56:49 pm

Quote from: asmi on February 23, 2021, 05:46:00 pm
Quote from: gnuarm on February 23, 2021, 04:52:39 pm
It is exactly the fallacy you mention about "too much" processing for the main clock that I am addressing by the use of clock enables.
In order to do that, you will have to add multicycle constraints, otherwise your timing is going to fail (because STA has no idea about your clock enables, and will always assume propagation needs to occur within a single clock cycle). That is pain in the butt, so I prefer using different clocks. Especially since FPGAs I use have hardware FIFOs, so crossing clock domains is not that big of a deal.

I can't recall a design that didn't have multicycle constraints. There's always something that runs slower than the clock rate.

Yeah. (Well, of course, if your clock freq is low enough, you may not need multicycle constraints to pass timing analysis...)
But otherwise, multicycle constraints are pretty handy. There's a large number of cases for which they can be useful. Quite often, you even have "clock enables" that are not completely obvious - every time you have conditional constructs inside a process, for instance, you may have "hidden" clock enables.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Derived clocks, best practices? (Read 4725 times)

Share me