Just another musing. Not strictly FPGA stuff, it's more like a general digital logic matter.
For serializing data, one can either use a shift register, shift and output the LSB or MSB at each cycle... or, we can use a multiplexer, and select the right bit at each cycle.
In your opinion, what is the better approach in terms of area and Fmax? Is there a break-even number of bits for which one becomes better than the other? How scalable (in terms of number of bits) one is compared to the other?
From what I've seen, dedicated serializers in FPGAs (but honestly I have only seen a bit of details for a limited number of models) tend to be a mix of both shift registers and multiplexers, cascaded in some way.
Your thoughts? (Again it's a general question about low-level implementation of serializers, not a question dealing with implementing serializers in HDL.)
Every so often I muse on this, too. Yes, it's weird.
Anyway.
The shift register approach fits in well with FPGA fabrics that have them in the slice, such as the Xilinx SRL32. There are no routing or area penalties. But! This is a "fake" shift register because it's actually implemented in the CLB as a mux.
The mux approach requires a counter to pick the desired output, and that counter has to be initialized prior to the start of shift process. Doing this with the load which precedes the shift makes sense. The code ends up looking like:
mux_based_serializer : process (clk, rst_l) is
begin
if rst_l = '0' then
bitcnt <= 0;
sr <= (others => '0');
elsif rising_edge(clk) then
if load = '1' then
sr <= loadvalue;
bitcnt <= MSB;
elsif bitcnt > 0 then
bitcnt <= bitcnt - 1;
end if;
end if;
end process mux_based_serializer;
outbit <= sr(bitcnt);
This has an advantage in that the bit select (for the mux) and the counter which drives the shift process are the same. The serializer idles when the counter indicates the last bit.
The shift-register approach might look like it doesn't need the counter:
shiftreg_based_serializer : process (clk, rst_l) is
begin
if rst_l = '0' then
sr <= (others => '0');
elsif rising_edge(clk) then
if load = '1' then
sr <= loadvalue;
else
sr <= sr(sr'left - 1 downto 0) & '0';
end if;
end if;
end process shiftreg_based_serializer;
outbit <= sr(sr'left);
... but something has to manage when the shift register is loaded, and that's often a counter, or a state machine which has to know how many bits to shift so it knows when it can load the next word. So there's no savings there.
Expanding the length of the serializer is easily managed by changing a constant or generic which defines the number of bits and that's the same for both cases.
And that brings up the larger idea, which is that a serializer doesn't exist in a vacuum. It needs logic to control it and that all gets wrapped up into the design and might be hard to quantify.
I think using an SRL32 (or equivalent) might end up being the "fastest" and use the fewest routing resources. But my guess is that for rational shift registers (32 bits in length) neither approach "wins."