Author Topic: MUXes or tristate for busses?  (Read 3773 times)

0 Members and 1 Guest are viewing this topic.

Online SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15198
  • Country: fr
MUXes or tristate for busses?
« on: November 16, 2020, 05:16:31 pm »
Just a quick question to ponder on...

When dealing with busses (with several interconnects) in HDL, do you use MUXes or tristate buffers? Are there situations for which one is better than the other? Would you do differently if targetting FPGAs or ASICs?

I for one use MUXes almost exclusively for this, but MUXes can get pretty expensive on FPGAs for large busses, so I'm wondering what's your take on this.
 

Online asmi

  • Super Contributor
  • ***
  • Posts: 2782
  • Country: ca
Re: MUXes or tristate for busses?
« Reply #1 on: November 16, 2020, 05:34:33 pm »
Tristate buffers do not exist inside FPGAs (at least those I worked with). You will always end up with MUXes, even if you didn't think you would.

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4655
  • Country: dk
Re: MUXes or tristate for busses?
« Reply #2 on: November 16, 2020, 05:43:58 pm »
Tristate buffers do not exist inside FPGAs (at least those I worked with). You will always end up with MUXes, even if you didn't think you would.

yeh afaik fpgas haven't had tristate buffers for nearly 20 years
 

Online ataradov

  • Super Contributor
  • ***
  • Posts: 11699
  • Country: us
    • Personal site
Re: MUXes or tristate for busses?
« Reply #3 on: November 16, 2020, 05:46:22 pm »
Yes, internal tri-state logic will either result in an error (Lattice tools did that) or be translated to regular muxes if possible, and reported as an error when not possible.
Alex
 

Online SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15198
  • Country: fr
Re: MUXes or tristate for busses?
« Reply #4 on: November 17, 2020, 03:29:15 pm »
Yeah, I know most FPGAs these days don't have internal tristate buffers. (I remember FPGAs that did but that's indeed old so I can't even remember for sure the models...)

So yes, if your code uses internal tristate busses, multiplexers will be inferred (whenever possible though...)
My question is more about code style, knowing that (as I hinted) I always favor generic code whenever possible (so code that can be reused for ASICs and not just for FPGAs).
As I said, I only use MUXes these days, but I was wondering...

In other words, are tristate buffers for interconnecting busses more efficient than multiplexers when directly synthesized for ASICs? Or less efficient? (In terms of area, speed and power consumption...?)

If anyone has more experience with this...
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4221
  • Country: gb
Re: MUXes or tristate for busses?
« Reply #5 on: November 17, 2020, 03:41:53 pm »
are tristate buffers for interconnecting busses more efficient than multiplexers when directly synthesized for ASICs? Or less efficient?

According to the book "Real World FPGA Design With Verilog", they are less efficient.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 5022
  • Country: si
Re: MUXes or tristate for busses?
« Reply #6 on: November 17, 2020, 03:45:08 pm »
I did them as MUXes because that's what they end up as, but even if you make it as a tristate bus the compiler will proabobly create the same MUX in the end since as mentioned above modern FPGAs don't support physical internal tristates. Never made a ASIC design so i don't care.

On an ASIC however tristate tends to be the way to go because it requires less logic/transistors to implement and signal routing is more coveniant, but may present a harder to drive load due to being one huge net. Its a compromise, one might be better in some cases but not all.

Id say look into it once you get to the point of having to actually make an ASIC design. The timing analysis will tell you how much of your valuable setup/hold margin you are using up in the bus switch.
 

Online chris_leyson

  • Super Contributor
  • ***
  • Posts: 1544
  • Country: wales
Re: MUXes or tristate for busses?
« Reply #7 on: November 17, 2020, 05:59:05 pm »
Xilinx XC3000 and XC4000 had internal tri-state buffers for signal routing. Might have an XC4025 dev board from way back, somewhere.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: MUXes or tristate for busses?
« Reply #8 on: November 18, 2020, 11:10:37 pm »
The trick I use to mux data/address busses and gain FMAX is all my modules which output data hand an 'AND' with they output data.  This means is high, the output data is true, if low, all the output data is low.

Now, I take all my different module's output busses and just logically 'OR' them together.
The mass output of all the or'ed busses is my mux's result.

Pro's, you may significantly increase FMAX if you are combining (or'ing) 8 busses from all over the FPGA to feed a result latch since part of how the mux works is each busses source is partially prepped ahead in time.  This can help if you do this trying to mux 4 banks of 8 muxed ored busses, OE, a huge 32:1.

Cons, if more than 1 bus' 'AND' driver stage is enabled, your output data will be corrupt.

It is usually just easier to multi-clock-stage mux down a bus to get high FMAX at the cost of everything being in a 2 or more tier pipe.

The AND - OR trick I use is there to eliminate any pipeline stage clock when necessary.
 
The following users thanked this post: SiliconWizard

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4221
  • Country: gb
Re: MUXes or tristate for busses?
« Reply #9 on: November 19, 2020, 12:27:12 pm »
I'd like to have a crossbar-matrix inside the FPGA. I know there was an IBM chip achieving such feature, but it hasn't been (reverse)documented yet.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 
The following users thanked this post: SiliconWizard

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4221
  • Country: gb
Re: MUXes or tristate for busses?
« Reply #10 on: November 19, 2020, 12:29:54 pm »
Cons, if more than 1 bus' 'AND' driver stage is enabled, your output data will be corrupt.

Isn't there a address-to-slot decoder that solves this? I mean, at each cycle there should be only one address on the bus, thus only one slot decoded, hence only one device's output enabled.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15198
  • Country: fr
Re: MUXes or tristate for busses?
« Reply #11 on: November 19, 2020, 02:27:00 pm »
I'd like to have a crossbar-matrix inside the FPGA. I know there was an IBM chip achieving such feature, but it hasn't been (reverse)documented yet.

That would be great!
 

Online SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15198
  • Country: fr
Re: MUXes or tristate for busses?
« Reply #12 on: November 19, 2020, 02:29:55 pm »
On an ASIC however tristate tends to be the way to go because it requires less logic/transistors to implement and signal routing is more coveniant, but may present a harder to drive load due to being one huge net. Its a compromise, one might be better in some cases but not all.

Yep, that was my thought.

Id say look into it once you get to the point of having to actually make an ASIC design. The timing analysis will tell you how much of your valuable setup/hold margin you are using up in the bus switch.

Yeah. Partitioning your multiplexing (not necessarily for everything of course, but at least the large busses) would be the way to go to make this easier.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: MUXes or tristate for busses?
« Reply #13 on: November 19, 2020, 04:17:50 pm »
Cons, if more than 1 bus' 'AND' driver stage is enabled, your output data will be corrupt.

Isn't there a address-to-slot decoder that solves this? I mean, at each cycle there should be only one address on the bus, thus only one slot decoded, hence only one device's output enabled.
Well, the question is are you decoding internal code and address selection?
Or are you decoding multiple external address input selection?

SiliconWizard wasn't particularly clear, though it sounded as if he had multiple internal source muxes and addresses to select from.
Do you need the best 'Static' speed selection performance, or can you pipeline a clock delay, or even multiple ones?

All these factors change what approach you will take if you are trying to achieve a restrictive FMAX or IO timing.

Tree pipelined muxes will always be that fastest as when you add clocks, you may eat more logic cells, however, because of the tree-branch structure, your tree will have a bunch of 2:1 muxes which can usually clock up beyond 400MHz even on the slowest FPGAs.  In Quartus, there already exists a definable 'MUX' megafunction (just means you do not need to code your own tree 'MUX' tree in HDL) where you specify 3 parameters, # or sources to mux, width and the number of piped clock cycles you want the result in making it easy for a 32:1 24 bit bus to achieve >400MHz performance on any old Cyclone, but it will take 3 or 4 clocks in the pipeline from input to output.

Also, when comparing address ranges, skipping out least significant address bits exclusively to define a page size, IE only the upper address bits must equal a particular value to be considered in range also offers the fastest performance since you are only creating an equality comparitor instead of a magnitude A to magnitude B comparitor which takes more gates to evaluate.
« Last Edit: November 19, 2020, 04:22:22 pm by BrianHG »
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4221
  • Country: gb
Re: MUXes or tristate for busses?
« Reply #14 on: November 20, 2020, 10:51:48 am »
the question is are you decoding internal code and address selection?
Or are you decoding multiple external address input selection?

Code: [Select]
cpu.data_in  <= mux[slot_id] { device0.data_out, device1.data_out, ... device_(n-1).data_out }
cpu.data_out =>              { device0.data_in, device1.data_in, ... device_(n-1).data_in },
                               device(slot_id).enable <= True,
                                        Others.enable <= False
    slot_id <= decode_address(address)
    address <= cpu.address;


This is the current working scheme of my Softcore_System_On_Fpga (SSOF << SoC ).
Note, only one device is enable per bus-cycle.


SiliconWizard wasn't particularly clear, though it sounded as if he had multiple internal source muxes and addresses to select from.

I assumed he has a Softcore and wants to connect it to some useful device, such as uart, timer, gpio, ...

All these factors change what approach you will take if you are trying to achieve a restrictive FMAX or IO timing.

My mux-approach is very large (32bit data + 22bit address + bus control), but I can achieve FMAX.

Tree pipelined muxes will always be that fastest as when you add clocks, you may eat more logic cells, however, because of the tree-branch structure, your tree will have a bunch of 2:1 muxes which can usually clock up beyond 400MHz even on the slowest FPGAs.  In Quartus, there already exists a definable 'MUX' megafunction (just means you do not need to code your own tree 'MUX' tree in HDL) where you specify 3 parameters, # or sources to mux, width and the number of piped clock cycles you want the result in making it easy for a 32:1 24 bit bus to achieve >400MHz performance on any old Cyclone, but it will take 3 or 4 clocks in the pipeline from input to output.

Also, when comparing address ranges, skipping out least significant address bits exclusively to define a page size, IE only the upper address bits must equal a particular value to be considered in range also offers the fastest performance since you are only creating an equality comparitor instead of a magnitude A to magnitude B comparitor which takes more gates to evaluate.

Here I am confused, and probably I lost your points ... this answer sounds too much complex.
« Last Edit: November 20, 2020, 10:58:32 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline emece67

  • Frequent Contributor
  • **
  • !
  • Posts: 614
  • Country: 00
Re: MUXes or tristate for busses?
« Reply #15 on: November 20, 2020, 11:03:23 am »
.
« Last Edit: August 19, 2022, 04:04:42 pm by emece67 »
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4221
  • Country: gb
Re: MUXes or tristate for busses?
« Reply #16 on: November 20, 2020, 12:45:50 pm »
I remember to have used tristates on ASIC

Did you design chip?
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online SiliconWizardTopic starter

  • Super Contributor
  • ***
  • Posts: 15198
  • Country: fr
Re: MUXes or tristate for busses?
« Reply #17 on: November 20, 2020, 03:44:49 pm »
Yes, I was strictly speaking about internal busses, and yes, with "wide" busses in mind. The question, and the "FPGA vs. ASIC" point were a hint, but after re-reading my OP, I admit it wasn't perfectly clear.

External interconnections are still a different beast, even though the same question could still be interesting. But in both cases, multiplexers are always more expensive in terms of area. (And yes again, this is a general question, as on most modern FPGAs, internal tristate buffers do not exist, so strictly for FPGAs, the question is more a question of code style, assuming the synthesis tool can infer the MUXes properly, which frankly, having witnessed a number of inferring bugs in many of these tools, I wouldn't even play with these days...)

And the idea of crossbar matrices sounds interesting. I wonder why FPGAs don't embed those? Maybe that would just be hard selecting a reasonable number of them and distributing them properly on die so they are effectively usable... whereas LUTs are general-purpose structures that are much easier to deal with.
 

Offline emece67

  • Frequent Contributor
  • **
  • !
  • Posts: 614
  • Country: 00
Re: MUXes or tristate for busses?
« Reply #18 on: November 20, 2020, 06:23:48 pm »
.
« Last Edit: August 19, 2022, 04:04:49 pm by emece67 »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8031
  • Country: ca
Re: MUXes or tristate for busses?
« Reply #19 on: November 27, 2020, 02:28:36 am »
the question is are you decoding internal code and address selection?
Or are you decoding multiple external address input selection?

Code: [Select]
cpu.data_in  <= mux[slot_id] { device0.data_out, device1.data_out, ... device_(n-1).data_out }
cpu.data_out =>              { device0.data_in, device1.data_in, ... device_(n-1).data_in },
                               device(slot_id).enable <= True,
                                        Others.enable <= False
    slot_id <= decode_address(address)
    address <= cpu.address;


This is the current working scheme of my Softcore_System_On_Fpga (SSOF << SoC ).
Note, only one device is enable per bus-cycle.


SiliconWizard wasn't particularly clear, though it sounded as if he had multiple internal source muxes and addresses to select from.

I assumed he has a Softcore and wants to connect it to some useful device, such as uart, timer, gpio, ...

All these factors change what approach you will take if you are trying to achieve a restrictive FMAX or IO timing.

My mux-approach is very large (32bit data + 22bit address + bus control), but I can achieve FMAX.

Tree pipelined muxes will always be that fastest as when you add clocks, you may eat more logic cells, however, because of the tree-branch structure, your tree will have a bunch of 2:1 muxes which can usually clock up beyond 400MHz even on the slowest FPGAs.  In Quartus, there already exists a definable 'MUX' megafunction (just means you do not need to code your own tree 'MUX' tree in HDL) where you specify 3 parameters, # or sources to mux, width and the number of piped clock cycles you want the result in making it easy for a 32:1 24 bit bus to achieve >400MHz performance on any old Cyclone, but it will take 3 or 4 clocks in the pipeline from input to output.

Also, when comparing address ranges, skipping out least significant address bits exclusively to define a page size, IE only the upper address bits must equal a particular value to be considered in range also offers the fastest performance since you are only creating an equality comparitor instead of a magnitude A to magnitude B comparitor which takes more gates to evaluate.

Here I am confused, and probably I lost your points ... this answer sounds too much complex.


An example home-made 2 level mux tree simplified:

Code: [Select]
    slot_id_stage1 <= decode_address(address)
    slot_id_stage2 <= slot_id_stage1

cpu.data_in_stage1_a <= mux[slot_id_stage1[1:0]] { device0.data_out, device1.data_out, device2.data_out, device3.data_out }
cpu.data_in_stage1_b <= mux[slot_id_stage1[1:0]] { device4.data_out, device5.data_out, device6.data_out, device7.data_out }
cpu.data_in_stage1_c <= mux[slot_id_stage1[1:0]] { device8.data_out, device9.data_out, device10.data_out, device11.data_out }
cpu.data_in_stage1_d <= mux[slot_id_stage1[1:0]] { device12.data_out, device13.data_out, device14.data_out, device15.data_out }

// now for the output stage
cpu.data_in          <= mux[slot_id_stage2[3:2]] {cpu.data_in_stage1_a, cpu.data_in_stage1_b, cpu.data_in_stage1_c, cpu.data_in_stage1_d }

In Quartus, read about the 'LPM_MUX' function.  It's use is simple like this:
Code: [Select]
lpm_mux LPM_MUX_component (
.clock (clock),
.data (sub_wire1),
.sel (sel),
.result (sub_wire17)
// synopsys translate_off
,
.aclr (),
.clken ()
// synopsys translate_on
);
defparam
LPM_MUX_component.lpm_pipeline = 1,
LPM_MUX_component.lpm_size = 16,
LPM_MUX_component.lpm_type = "LPM_MUX",
LPM_MUX_component.lpm_width = 32,
LPM_MUX_component.lpm_widths = 4;
Just place in a chosen number for the parameter 'xxxx.lpm_pipeline' and it will do all the hidden dirty work for you.  The larger the pipeline, the higher the 'FMAX'.

See the documentation in the 'IP Library'.
Other third party source code for auto-generating variable pipelined muxes might already exist out there, but Altera's internal one existed since the start.
« Last Edit: November 27, 2020, 02:30:26 am by BrianHG »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf