Author Topic: Vivado net delay tuning  (Read 6563 times)

0 Members and 1 Guest are viewing this topic.

Online asmi

  • Super Contributor
  • ***
  • Posts: 2874
  • Country: ca
Re: Vivado net delay tuning
« Reply #25 on: March 02, 2023, 03:09:04 am »
- I was already down to 1 logic level and thinking "this can't be pipelined any more" (unless all you wanted was a shift register / delay line). Thanks to further nudges in that direction on here, it then dawned on me that the extra pipeline stage might actually reduce net delays as opposed to LUT/MUX delays.
It's actually easy to visualize. You have a long net delay when your connection is - well, long, but if you add a pipeline register, placer can place that register in the middle of that long line and this way it's going to cut the net delay in half, because now signal only needs to travel half of the original distance over a clock cycle (either from the source to the register, or from the register to the destination). There is a reason Xilinx implemented a crap ton of registers in their CLBs (there are two times more registers than there are LUTs in a slice), so use them to your advantage. If your design is not latency-sensitive, you can achieve very high frequency by pipelining the crap out of it. Like someone here once said, just keep adding pipeline registers into failing nets until they stop failing, or your reach a limit of a latency your design can tolerate.
 
The following users thanked this post: Mario87

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5188
  • Country: au
    • send complaints here
Re: Vivado net delay tuning
« Reply #26 on: March 02, 2023, 04:31:28 am »
Not sure quite what you're after regarding working / reference / source, but the information I stated came from the datasheet and the implementation timing report. It was telling me that I have a slack of -129ps. The clock was declared at 400MHz, or 2500ps period. At 75% of 464MHz we are at 348MHz as you state, which is a period of 2873ns, a difference of 373ps, which is significantly more than the 129ps I needed to have zero slack, and would have closed timing if I had dropped the clock to that speed.
But you have changed nothing so there is no gain to your system by wishing for extra timing. Was 400Mhz, is still 400Mhz and the timing failed for that design.

We are not saying 75% is a measure for all systems that is achievable easily, its a finger in the air point of about the best most people would attempt. That still requires effort and attention so just waving it away as a given is really foolish.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #27 on: March 02, 2023, 12:26:03 pm »
asmi,

One of the things that was causing some confusion is that the nets were electrically long, but physically short. So imagine you have a LUT or FDRE within a slice or two of the DSP48E1, but it is still failing timing. That's what I was seeing in the Device view and at that time I was thinking "well, how can adding another FDRE, which will have a hard time being any closer, actually help?". I figured that Vivado ought to know what the net delays are between any two points on the die, using whichever routing is fastest, so how do I ask it to use that info in order to change the placement or the routing. In theory, that should have been a solution, but reality said otherwise. As I alluded to earlier and as you have also stated, inserting an FDRE may reduce the number of net segments / interconnect junctions thus reducing net delay on part of the failing path.

Someone,

I'm not sure how you concluded that I changed nothing. As per the last post, I found that adding a reg into the failing path, and then doing post-route PhysOpt gave timing closure where previously it was failing. Now it should be noted that this was done as an academic exercise - ie theory suggested that the introduction of the reg could split the failing net into two smaller paths that may then meet timing (which they did), not as a long term solution.

With regard to waving away the 75%, I think this is a misunderstanding. My point was that if I had dropped the clock to 348MHz, it would have given timing closure with ample room to spare, at that point in the design process. That's not the same thing as saying that if I do that, then the rest of the design will be easy, only that it would have worked right now. I need to have a think about how far to drop the clock speed before progressing the design.

Many thanks,

Pat.
 

Offline tchiwam

  • Regular Contributor
  • *
  • Posts: 134
  • Country: ca
Re: Vivado net delay tuning
« Reply #28 on: March 06, 2023, 12:08:12 pm »
This is in no way the solution but I found out that moving stuff out of the timing critical area helps a bit. For some reason vivado slapped my IP address in the middle of my ADC and DAC processin. moving these fixed value near the Ethernet helped a lot.
 

Offline pbernardi

  • Contributor
  • Posts: 20
  • Country: br
Re: Vivado net delay tuning
« Reply #29 on: March 06, 2023, 05:39:33 pm »
Hello _pat_,

Two comments about your discussion.

- You are using a Series 7 speed grade -1. If speed is so critical, canĀ“t a faster speed grade be used?

- Regarding the BRAM x DSP max. frequency: if you connect the BRAM output directly to the DSP, you are limited to BRAM frequency. But you can always think in bandwidth to improve the results. For example, if you have a 16-bits operation at your DSP, you can remove the data as 32-bits from BRAM, pipeline it and send these 32-bits in 2x cycles of 16-bits to your DSP. This way, if you BRAM works @380 MHz, the output logic may work until 760 MHz (theoretically). Not sure if this is applicable to your design.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #30 on: March 07, 2023, 01:35:43 pm »
tchiwam,

Agreed, moving as much as you can out of the path helps. The issue I had was that I was down to only one item in the path, and I couldn't move it out of the path. 70% of the delay was coming from the net, not from the logic, so the usual approach of moving things around in the pipeline wasn't going to work, it needed a shorter net delay.

pbernardi,

You are of course correct that a faster speed grade might get timing closure where the slower device cannot, but this misses the point that the slower device *should* have been able to get timing closure. Remember that the only reason it was failing was due to net delays... the logic was plenty fast enough.

With regard to the BRAM suggestion, yes you can MUX wider data into the DSP48E1. That wasn't going to work for my application (which is more latency oriented than throughput - which is why adding a pipeline stage is a PITA since that requires everything else to compensate for that extra latency), but yes, for raw throughput it could certainly work... though only up to 464MHz since that's the max pipelined throughput of a DSP48E1 in the SG1 Spartan7. [You can of course run the DSP48E1 as 2 lots of 24 bit-wide data, or 4 lots of 12 bit-wide-data, for a max of 1856M data-elements per second, per DSP48E1].

All,

In order to avoid confusion, and because I did actually have some failures which had *no* logic in the path at all, I'll concentrate on that case and not muddy the waters with having logic in the path :

The major lesson I learnt from this exercise is that the timing report does not necessarily allude to the fact that a net is not necessarily monolithic - ie that it is comprised of multiple segments, each of which adds its own delay to the net and that one can leverage that fact to gain timing closure by introducing extra flops somewhere along this path... again, just to be clear, we are talking about a direct connection from point A to point F, which is reported as a total net delay between two sequential elements with *no logic* inbetween them (it's just a wire). In reality this takes a path ABCDEF, representing multiple segments of the same net (*no* logic!). By breaking this path at C/D you have two sub-paths ABC and DEF. A flop between C and D solves the timing closure since an edge launched at A arrives at C in time. Also an edge launched at D arrives at F in time. It does introduce a pipeline delay, and you may have to compensate for that, BUT it will get timing closure where previously you couldn't.

Separately, even when doing this, it was necessary to get Vivado to do post-route PhysOpt, but once that was done, it worked :)

Cheers,

Pat.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3305
  • Country: ca
Re: Vivado net delay tuning
« Reply #31 on: March 07, 2023, 06:04:28 pm »
Don't trust it when it classifies the delays as net delays vs logic delays. Many things that I would classify as logic delays (such as delays on inputs on the LUT) are classified as net delays.

You may be able to optimize delays by manual placing and routing, or instantiating LUTs manually. For example, A1 input of a LUT is slower than A6. Thus you can see what pins it is using on your LUTs and try to fix that. Or you can try to find innovative routing. Most of the times, tools can find the best solution, but sometimes they miss - this is not a deterministic process. So, you may be able to do better manually. But manual routing is a tedious job. You don't really want to do it unless there's no other choice.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #32 on: March 07, 2023, 09:28:45 pm »
NorthGuy,

Thanks for the heads up on the attribution of delays. Must admit the timing report was a tad "curious" at first :)

I did have a play with manual placement, tweaking things after Vivado did the initial heavy lifting, but I made it worse, LOL. I suspect that's to do with my present lack of knowledge of the routing to/from the interconnects, which could result in a BEL being being physically closer but then electrically further away, just because of the way that it has to be routed. With more experience it should be possible to get a better result, for small sections of the design, than the tools can achieve, but as you say, you really wouldn't want to do it unless you had no choice.

Cheers,

Pat.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf