Author Topic: FPGA: More elegant (and less timing violating) way of doing simple register map?  (Read 5925 times)

0 Members and 1 Guest are viewing this topic.

Offline dmills

  • Super Contributor
  • ***
  • Posts: 2093
  • Country: gb
I would amend that as don't divide the clock in fabric and then feed it to a CLOCK input on other logic, feeding it to a clock enable input is fine.

if rising_edge(clk) and clk_en then ....

You do see a lot of truly horrible HDL out there, particularly from folk who think of HDL as programming and do not get that sequence expresses priority not execution flow.

Regards, Dan.
 

Offline simmconn

  • Regular Contributor
  • *
  • Posts: 55
In general, I agree. But like Bassman59 suggested, there are always cases that you have to use logic to divide the clock and then feed to the global clock tree. For instance, I once have a clock divider whose output clock phase needs to quickly align with a one-shot input trigger signal edge. The number of possible phases are more than the simultaneously available DCM/PLL outputs, and the 'time to lock' needed is shorter than it takes to reconfigure the DCM/PLL through its serial interface.

IMO if the FPGA vendor provided a function (or routing possibility), they have envisioned its use. It is up to the designer to use it wisely.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4510
  • Country: au
    • send complaints here
IMO if the FPGA vendor provided a function (or routing possibility), they have envisioned its use. It is up to the designer to use it wisely.
This is it, the tools used to have problems timing and/or routing these sorts of paths, but both of those aspects have been improved and its possible to build those sorts of designs with relative efficiency. There are even situations where its desirable (and already done by the vendors without you realising).
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
IMO if the FPGA vendor provided a function (or routing possibility), they have envisioned its use. It is up to the designer to use it wisely.
I think you paint this a bit too passively - do not go against recommendations and recommended practices unless you want to fight dragons.

Example: There is this clocking path that allows a Transceiver PLL's Reference clock to be driven from the global clocking network, selecting GTGREFCLK as the clock source on GTXE2_CHANNEL is possible, and comes with the note of "Reference clock generated by the internal FPGA logic. This input is reserved for internal testing purposes only".

You can use it - and I have used it, and it might save your bacon when a ref clock was in the wrong place, but oh my, I wouldn't go recommending it to others to use it. Spin a new PCB with the clock on the correct pins if you have to.

Likewise driving clocking networks from the output of flip flops is "a bad idea". Use it only when you really have to. There are better ways and you should choose these in preference to routing FF outputs onto clocking networks, if at all possible.





Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living


Quote from: simmconn on Today at 06:10:49 AM
In general, I agree. But like Bassman59 suggested, there are always cases that you have to use logic to divide the clock and then feed to the global clock tree. For instance, I once have a clock divider whose output clock phase needs to quickly align with a one-shot input trigger signal edge. The number of possible phases are more than the simultaneously available DCM/PLL outputs, and the 'time to lock' needed is shorter than it takes to reconfigure the DCM/PLL through its serial interface.

IMO if the FPGA vendor provided a function (or routing possibility), they have envisioned its use. It is up to the designer to use it wisely.



Some comments on the use of the clock enable as a mechanism for dividing clocks.

One, it is always instructive to see what a synthesizer does with your code. For example, Xilinx flops have a D and a CE input, and you might think that if your code was something like:

Code: [Select]
UseCE : process (clk) is
begin
    if rising_edge(clk) then
        if (ce == ‘1’) then
            q <= d;
        end if;
    end if;
end process UseCE;

it will synthesize to what you expect: the ce signal driving the CE input on the flop. But in more complex situations, I’ve seen the synthesizer build interesting logic in front of the CE input as well as the D, because ultimately the goal of the synthesizer is logic minimization, and in those cases the synth thought splitting the logic between the D and CE inputs was more efficient.

Two, the clock enable signal can have a high fan-out, which makes it a candidate for a low-skew global net, but in newer architectures those global nets are for clocks only, and you can’t route from a clock net to a logic net. So even if you’ve worked out the timing constraints for the multi-cycle paths, you still have to ensure that the clock enable signal itself meets timing.

I think we all agree that the “Best” thing to do is to use a DLL or PLL or other clock-conditioning block to generate a lower-frequency clock which is guaranteed to be synchronous with the generating clock. Crossing the clock domains in this case is straightforward. Xilinx pushes you to do this, what with the generous number of DLLs and global clock nets available in their parts, combined with the inability to route a CE on a global net.

But it’s the case where you don’t have or can’t use a DLL/PLL and you also have a lot of logic that doesn’t meet timing at some high clock frequency, and the clock enable is loaded down such that it won’t work either, that you have to resort to dividing the clock with flops and then putting that output on a global net. And like I said before, once you do that, you have to consider the two clock domains to be totally asynchronous and manage the crossing very carefully. And yeah, you’ll feel dirty doing it.

If you’re worked with Microsemi ProASIC-3 parts this all might seem familiar.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
it will synthesize to what you expect: the ce signal driving the CE input on the flop. But in more complex situations, I’ve seen the synthesizer build interesting logic in front of the CE input as well as the D, because ultimately the goal of the synthesizer is logic minimization, and in those cases the synth thought splitting the logic between the D and CE inputs was more efficient.

Xilinx's logic cells have only one CE per slice which controls all 8 flip-flops, so if you have lots of different CE signals, it has to manufacture something. I don't think this is more efficient, it's just because there's no other way. If you have a single CE which covers wide areas of logic, this shouldn't happen.

At any rate, if you want lots of logic driven by the same clock, you just apply CE to the clock. This way you don't need to worry about CE fan-out for flip-flops.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
At any rate, if you want lots of logic driven by the same clock, you just apply CE to the clock. This way you don't need to worry about CE fan-out for flip-flops.

If by "apply CE to the clock" you mean drive the clock input with the CE signal (after running it through a BUFG), sure, but that's just a variant on the divided-with-flip-flops idea that many here oppose!
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
At any rate, if you want lots of logic driven by the same clock, you just apply CE to the clock. This way you don't need to worry about CE fan-out for flip-flops.

If by "apply CE to the clock" you mean drive the clock input with the CE signal (after running it through a BUFG), sure, but that's just a variant on the divided-with-flip-flops idea that many here oppose!

I think they are talking about using a BUFGCE primitive, when CE is asserted it holds the clock buffer output low until the first rising edge after CE is removed. Because the clock buffer does the switching the original timing relationship is still valid.

If so, the end result still needs to use two clocking networks - the one using BUFG and then the 'divided' one using BUFGCE - end result is only a little bit different from using a PLL/MMCM/DCM/whatever, except the slow clock doesn't have a 50% duty cycle.


Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3137
  • Country: ca
At any rate, if you want lots of logic driven by the same clock, you just apply CE to the clock. This way you don't need to worry about CE fan-out for flip-flops.

If by "apply CE to the clock" you mean drive the clock input with the CE signal (after running it through a BUFG), sure, but that's just a variant on the divided-with-flip-flops idea that many here oppose!

Many clock primitives allow CE - BUFGCE, BUFMRCE, BUFR.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf