Author Topic: Optimizing MUXes  (Read 2582 times)

0 Members and 1 Guest are viewing this topic.

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Optimizing MUXes
« on: April 08, 2020, 04:30:44 pm »
I did some thinking about what costs space and does not do much. Then about MUXes and how to optimize them. There is some reading about it but I ended up with the old  truth of optimizing:
Optimizing a MUX might be a good idea, but not having it in the first place is even better.
I wonder how much of this:

with selector_OPERANT_source select
 dst_operationA_in <= src_registerA when 'A',
                                  src_registerB when 'B';
                              
with selector_OPERANT_source select
 dst_operationB_in <= src_registerA when 'A',
                                  src_registerB when 'B';

dst_operationA_out <= foo(dst_operationA_in, ...);
dst_operationB_out <= foo(dst_operationB_in, ...);

with selector_RESULT_source select
 dst_register <= dst_operationA_out when 'A',
                           dst_operationB_out when 'B';

... finds it's way into silicon
when the following would have done the same:

with selector_OPERANT_source select
 dst_operationX_in <= src_registerA when 'A',
                                    src_registerB when 'B';
dst_operationA_in <= dst_operationX_in;
dst_operationB_in <= dst_operationX_in;

dst_operationA_out <= foo(dst_operationA_in, ...);
dst_operationB_out <= foo(dst_operationB_in, ...);

with selector_RESULT_source select
 dst_register <= dst_operationA_out when 'A',
                          dst_operationB_out when 'B';

Most likely not written as a combinational statement but inside a process with if-elsif too.
Do synthesisers catch that?
My point is this: Any circuit that drives a MUX input but is not selected is in that state effectively 'open', therefore it does not matter what it is driven with at it's input at that moment. Should that circuits input be on the output of another MUX, then that MUX is also not needed in that state. In the above example and possibly in many cases there is a MUX for each of the sources of the downstream MUX. All but one will be unused at any time.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: Optimizing MUXes
« Reply #1 on: April 08, 2020, 07:13:11 pm »
Most likely not written as a combinational statement but inside a process with if-elsif too.
Do synthesisers catch that?

The synthesis result from the continuous "with-select" assignment should be identical to a case statement inside a process. That should end up being implemented as an n-to-1 mux.

The if-then-elsif results in priority-encoded logic, which is not the same.

Quote
My point is this: Any circuit that drives a MUX input but is not selected is in that state effectively 'open', therefore it does not matter what it is driven with at it's (sic) input at that moment.

That is true! But that is true in the larger sense, as all of the things on the right-hand-side of an assignment that are not selected by an if statement (or a case) don't matter to the target of the assignment.

Quote
Should that circuits input be on the output of another MUX, then that MUX is also not needed in that state. In the above example and possibly in many cases there is a MUX for each of the sources of the downstream MUX. All but one will be unused at any time.

Yes.

But all of that logic has to exist, because you expect to select it at some point, right? Otherwise, if the selectors in your case statements or your if statements were constant, then the tools should optimize the never-selected branches away.
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #2 on: April 09, 2020, 05:42:08 am »
Quote
But all of that logic has to exist, because you expect to select it at some point, right?

Yes and No. It will never select all of it at any one point. So it depends on whether there is similarity among that logic. If it is selecting from entirely different sources, then it must exist. In the example I provided it doesn't have to exist more than once at all. Only what is different or is utilized at the same time must exist.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Optimizing MUXes
« Reply #3 on: April 09, 2020, 06:18:39 am »
That stuff is up to you to optimize. If my code describes the same logic twice, my intent is to generate two copies of the hardware (for example for two distant parts of the circuit). I do not want the tools merging same logic or doing other high level logic optimizations. The only simplification I expect the synthesizer to do is constant propagation, which I don't consider an optimization but rather a required step in synthesis.
Email: OwOwOwOwO123@outlook.com
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #4 on: April 09, 2020, 06:26:26 am »
(for example for two distant parts of the circuit)
Is that even possible?
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Optimizing MUXes
« Reply #5 on: April 09, 2020, 06:35:06 am »
Yes, often I have to duplicate registers that generate control signals, after I determine that high fanout is the cause of not meeting timing. You will have to use vendor specific attributes to tell the synthesizer not to merge duplicate registers (attribute "keep" for vivado).
Email: OwOwOwOwO123@outlook.com
 
The following users thanked this post: notadave

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #6 on: April 09, 2020, 06:37:48 am »
I drew an example that is less clear cut than the one I wrote. I also did not draw how you could do the same with five MUX2 instead of those two MUX4.  Yes that is not breath taking but also prove that there is potential.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Optimizing MUXes
« Reply #7 on: April 09, 2020, 06:53:24 am »
No the synthesizer will not and should not optimize that. Plus on a modern FPGA you want to use big logic functions rather than many small 2-input logic functions, so the many 2-MUX approach is inferior to the 4-MUX one. On a 7 series FPGA a single LUT can implement a 4-MUX which isn't very expensive at all. Micro-optimizations like these do not make sense on a FPGA.
Email: OwOwOwOwO123@outlook.com
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #8 on: April 09, 2020, 07:18:33 am »
Plus on a modern FPGA you want to use big logic functions rather than many small 2-input logic functions, so the many 2-MUX approach is inferior to the 4-MUX one. On a 7 series FPGA a single LUT can implement a 4-MUX which isn't very expensive at all. Micro-optimizations like these do not make sense on a FPGA.

Feel free to draw an example of any size you like. Do use 8-MUX to make it future prove.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Optimizing MUXes
« Reply #9 on: April 09, 2020, 10:46:49 am »
8-MUX will only use two LUTs on a 7 series FPGA because of the builtin mux combiner. OTOH 2-MUX may in practice use one LUT each because of slice routing restrictions. I looked at a random example, an AXI skid buffer implementation, that used a 32 bit 2-MUX that ended up using 32 LUTs for the mux part. I don't think MUX optimization like described is worth it on a modern FPGA. You should focus your efforts elsewhere, for example looking at where your design could map to builtin primitives. Going back to the AXI skid buffer, my implementation used a SRL16 which is a shift register that uses half of a LUT on 7 series devices, and allows a 15-deep FIFO for no extra cost.
Email: OwOwOwOwO123@outlook.com
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Optimizing MUXes
« Reply #10 on: April 09, 2020, 10:56:43 am »
In your last image the obvious solution is a 6-MUX and a single instance of L, but don't expect the synthesizer to figure that out for you. Why not just describe that structure instead? Yes, if you write your code with a long process and lots of if/else branches you may end up with the structure shown in your picture, so expect to get spaghetti logic. I always advocate against behavioral code in synthesis for this reason. The code should describe the exact structure of the hardware.
Email: OwOwOwOwO123@outlook.com
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #11 on: April 09, 2020, 12:27:36 pm »
OwO, you are off topic / besides the point.  Yes, I should have made that L1 and L2 instead of writing just L but I thought it would be clear enough.
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Optimizing MUXes
« Reply #12 on: April 09, 2020, 01:48:36 pm »
OwO, you are off topic / besides the point.  Yes, I should have made that L1 and L2 instead of writing just L but I thought it would be clear enough.
Or you are not reading it right?

The way I read it OwO is pretty much on topic. The point is NOT the labeling of L. The point IS that if you describe your logic in an overly complicated fashion, do NOT expect the braindead synthesizer to figure out the most optimal equivalent for you. Especially if you decide to be double clever and put each tiny bit of logic into a separate module. Logic optimization across hierarchy can be rather sub-optimal.

IMO there is nothing to be gained (besides education) by optimizing muxes as per your first post. But by all mean try it out, and check the synthesizer output.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15148
  • Country: fr
Re: Optimizing MUXes
« Reply #13 on: April 09, 2020, 02:26:17 pm »
After reading all this, I'm not convinced there is really ground to optimize much, at least at the HDL level.

The only real thing to be careful about, as Bassman59 mentioned, is the use of if-then-elsif... constructs. A simple if-then-else will result in a simple MUX,  but if you introduce additional cases with "elsif", the whole thing will be synthesized as a priority encoder, which is clearly suboptimal. So avoid if-then-elsif in general unless priority IS a requirement.

Other than that, from the few examples you gave, I don't think it will make much difference. Feel free to experiment though with your favorite synthesis tool.
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #14 on: April 09, 2020, 02:45:45 pm »
Did a single one of you read the HDL example I gave in the first post?
 

Offline mrflibble

  • Super Contributor
  • ***
  • Posts: 2051
  • Country: nl
Re: Optimizing MUXes
« Reply #15 on: April 09, 2020, 03:36:54 pm »
Did a single one of you read the HDL example I gave in the first post?
Sure. I had a long reply to the first post. But I guess it got optimized away  ;) , because the entire point is 100% moot.

And in SiliconWizard's post I also note plenty of snippets that are a positive indication of having read your first post. Odd that you fail to notice the reference to your "Most likely not written as a combinational statement but inside a process with if-elsif too." Because I notice several posts pointing out to you that this <snippet-in-first-post> is a bad idea because of <reasons-given-in-replies-to-your-first-post>.

So maybe we read it just fine, but you are just not getting the replies you were hoping for?  :-// When wondering what the synthesizer will catch ... why not simply try it? My summary on synthesizer results is this: synthesizer sucks donkeybawls (*). If you have fairly lax timing requirements, by all means write behavioral code that's 10 logic levels deep. Result will be suboptimal, but your design will meet timing requirements just fine. If however this part of the design has to be fast, and you need the synthesis result to be 1 logic level max, then the tools unfortunately still need a fair bit of hand-holding. Now what has that to do with your first post?!? Everything, that's what. It means that you are misallocating your optimization developer time units. But as said, just code it up and check the synthesizer output. Best way to verify your hypothesis regarding optimization.
 
(*) Technical term commonly used to rate synthesizer output on the Donkey scale. It's a bit like dB, but with more donkeys.
 

Offline notadaveTopic starter

  • Contributor
  • Posts: 49
  • Country: de
Re: Optimizing MUXes
« Reply #16 on: April 09, 2020, 04:26:32 pm »
Sure. I had a long reply to the first post. But I guess it got optimized away, because the entire point is 100% moot.
I would be amused if i were not so disappointed.

Quote
I also note plenty of snippets that are a positive indication of having read your first post.
Now I know that reading comprehension is the issue. The OECD's PISA results might be true after all.

Quote
Because I notice several posts pointing out to you that this <snippet-in-first-post> is a bad idea because of <reasons-given-in-replies-to-your-first-post>.
That part was not even about the actual topic.

Quote
not getting the replies you were hoping for?
To be understood is a low expectation.

Quote
timing requirements
It is about area and utilization, timing was never part of it.

Quote
your optimization developer time units
I was not talking about any circuit in particular.

Quote
But as said, just code it up and check the synthesizer output. Best way to verify your hypothesis regarding optimization.
Ok, you are right. I already have the HDL, might as well write some more and look what one synthesis tool does with it.
Possibly it pics up on the fact that there are multiple MUXes with the same selector (selector_OPERANT_source) and the same inputs.

Because it does not seem to be understood I drew the example from post one.
 

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: Optimizing MUXes
« Reply #17 on: April 09, 2020, 05:26:43 pm »
Did a single one of you read the HDL example I gave in the first post?

I did.

And what I saw were a bunch of continuous assignments that any synthesizer worth using will optimize based on the target device fabric. In a Xilinx 7-series, it may end up taking one LUT. In a MicroSemi ProASIC-3, it'll be a chain of muxes. In a Lattice MachXO2, it'll take a couple of LUTs, I don't know.

I suppose I should ask: did you read my response to your post?
 
The following users thanked this post: mrflibble, SiliconWizard


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf