Author Topic: Vivado net delay tuning  (Read 7033 times)

0 Members and 1 Guest are viewing this topic.

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Vivado net delay tuning
« on: February 28, 2023, 10:50:09 pm »
Hello All!

Working on a design in Vivado for an XC7S50, speed grade 1 - this "should" be capable of core speed of around 460MHz. The design uses a clock at 400MHz and it is pipelined to try to meet timing, but Vivado still struggles. It is almost always down to net delay, not logic... eg 30% logic and 70% net delay. Even with only a single LUT in the path it often fails. Think I may have even seen it fail from one sequential element to another even without any combinatorial logic in the path at all.

At this point it seems that either a) I am being too optimistic about the achievable speed with this device or b) I am missing a P&R option to make it prioritise net delay more. At one point Performance_NetDelay_low managed to succeed, but then just adding some new logic elsewhere can break a timing path that had previously been fixed again. At this point I seem to be spending most of the time fighting the tools rather than actually making progress.

Any suggestions most welcome.

Cheers,

Pat.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5210
  • Country: au
    • send complaints here
Re: Vivado net delay tuning
« Reply #1 on: March 01, 2023, 04:45:07 am »
Any suggestions most welcome.
Unsolvably high net delay is usually either: congestion (how high is the utilisation?) or routing between "hard"/fixed primitives that can't be freely swapped like IOs.

The speed grades and switching characteristics are a very rubbery guide, few designers try to get near them. Which parameter are you citing for your 460MHz ?
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #2 on: March 01, 2023, 11:51:23 am »
Hello Someone,

Thanks for taking the time to reply!

With regard to your questions :

The utilisation is very low at the moment. I figured that this part of the design would be the hardest to get right, and so decided that should be the place to start, since any design decisions (eg pipeline stages) will affect other parts - may as well mitigate the possibility of having to re-time things that already exist, by implementing things relative to something that already works.

It is an interesting point that you make with regard to hard/fixed resources - it looks like the net delays that are giving the most trouble are those that go to/from such resources (DSP48E1 in this case). This had me thinking that maybe the "fix" is to deliberately introduce some extra delay and then make these paths multi-cycle (virtual-pipeline)... but then one would need to persuade the optimiser that these "unnecessary" delays are actually needed. Mmmmmm.....

With regard to the speed expectations, they were based on DS189, the Spartan 7 Datasheet. Specifically, the maximum stated for the DSP48E1 slice is 464.25MHz (with all pipeline regs), I can't use the BlockRAM in this section since it is limited to 388.2MHz so I'm having to use DistRAM (which is specced for 2.5ns clock period). Whilst the MMCM does allow a somewhat faster clock, you can't get it to "go places" since the BUFGs and BUFHs are limited to 464.00Mhz (and BUFRs are 315MHz). The limiting factor, then, is the maximum clock through the DistRAM, which is 400MHz. But I was getting timing failures at 400MHz even before implementing the RAM (as mentioned, into the DSP48E1s).

Separately, I stumbled upon some further info regarding post-place and post-route PhysOpt that I should investigate a little more. I believe that is already in play with Performance_NetDelay_low, but apparently it is iterative and can be run again, and again, but it seems like using a canon to crack a walnut, and ideally I'd like to get timing closure without having to torture the tools.

Many thanks,

Pat.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5210
  • Country: au
    • send complaints here
Re: Vivado net delay tuning
« Reply #3 on: March 01, 2023, 05:18:57 pm »
Specifically, the maximum stated for the DSP48E1 slice is 464.25MHz (with all pipeline regs), I can't use the BlockRAM in this section since it is limited to 388.2MHz so I'm having to use DistRAM (which is specced for 2.5ns clock period).
Ambitious, typical designs on Xilinx reach "high effort" at 2/3 to 3/4 of the DSP maximum frequency. Any higher and it will need almost ideal pipelining, which the tools have actually been quite bad automating, so its into RPM and/or hand placements. It is like people trying to use the data sheet values of MOSFETs which assume the die is held at 25 degrees on an infinite heatsink, guaranteed performance but only in unrealistic situations.
This had me thinking that maybe the "fix" is to deliberately introduce some extra delay and then make these paths multi-cycle (virtual-pipeline)... but then one would need to persuade the optimiser that these "unnecessary" delays are actually needed. Mmmmmm.....
The way timing constraints work is not always obvious, any multi-cycle path would leave the intermediate edges completely unrestrained so its not possible to do what you say. Xilinx are pretty clear on that:
Quote from: UG949
Multicycle path exceptions must reflect the design functionality and must be applied on paths that do not have an active clock edge at every cycle, on either the source clock, the destination clock or both clocks.

There is no magic flag/option for the tools to suddenly work at these sorts of performance levels. Dust off the reference books and get ready to do the critical parts manually.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #4 on: March 01, 2023, 05:41:06 pm »
In my experience at such super-high frequencies (for Artix fabric) you will need to add additional pipeline registers on top of registers which exist inside hard blocks (both BRAM and DSP) as Vivado's placement is driven by timings and so it tend to clump stuff around MCMM or I/O columns, which usually lead to loooong connections to hard blocks which are quite far away from I/O columns (depends on a density and specific devices). I usually try to avoid going above 250 MHz for my designs whenever possible for easier timing closure.

Online KE5FX

  • Super Contributor
  • ***
  • Posts: 2137
  • Country: us
    • KE5FX.COM
Re: Vivado net delay tuning
« Reply #5 on: March 01, 2023, 06:45:02 pm »
Does it help if you tag your pipeline registers with ASYNC_REG=TRUE?  Maybe it's removing them, thinking they aren't necessary.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #6 on: March 01, 2023, 06:55:39 pm »
Does it help if you tag your pipeline registers with ASYNC_REG=TRUE? 
That flag is irrelevant for pipeline registers.

Maybe it's removing them, thinking they aren't necessary.
That is not possible because it will break the functionality.

Online KE5FX

  • Super Contributor
  • ***
  • Posts: 2137
  • Country: us
    • KE5FX.COM
Re: Vivado net delay tuning
« Reply #7 on: March 01, 2023, 07:04:39 pm »
That is not possible because it will break the functionality.

Your faith in Vivado is an inspiration to us all. :)

But yeah, it would be pretty outlandish if it were removing pipeline registers. 
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #8 on: March 01, 2023, 07:11:07 pm »
Many thanks to Someone and asmi for your replies!

Thanks for the heads up of 2/3 to 3/4 max speed. I did stipulate that I was being ambitious/optimistic and that my issue could be down to that :) I am seeing that the tools aren't great at keeping the net delays sensible. I have some degree of control over the logic delays by way of the HDL, but there's "not much" I can do about the net delays. At present, certainly after PhysOpt, it does seem to be putting logic near the DSP slices, but it seems it's still too far away :(

The multi-cycle-path "suggestion" was kind of meant as a tongue-in-cheek thing, hence the "Mmmmmmm....." :) A real register is likely a better approach (though as mentioned, I have seen timing failures between an FDRE and an adjacent DSP48E1 with no logic in the path, PhysOpt did fix that particular path).

Presently, re-running the PhysOpt does help a little, but it's still not there, so yes, there doesn't appear to be a magic button (or switch / widget / wizard / whatever) that seems to be able to achieve timing closure just by setting it.

With regard to the addition of extra pipeline regs, I was contemplating that on the way home in the car - I had initially tried that yesterday on a path that was failing from the DistRAM to the DSP48E1 - but Vivado then inferred BlockRAM and made the situation a whole lot worse, LOL. Thinking about it logically though, I believe this has a chance of working insofar as there are presently two net delays into the DSP48E1 and that could be reduced to 1. I will tweak the HDL to make it keep the DistRAM and infer an actual register to see if that helps.

At this point I could "fix" the design in its existing state by dropping it down to 380MHz, but before I concede defeat I thought I'd try to explore a few more options. I had contemplated conceding full-on defeat and dropping down to 200MHz, where timing closure would a "trivial", BUT then the non-trivial work that I had done to mitigate the latency introduced be the necessary pipelining would have been a wasted effort.

Many thanks,

Pat.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #9 on: March 01, 2023, 07:23:05 pm »
Your faith in Vivado is an inspiration to us all. :)
It's not faith - it's experience. In all my years with Vivado I had never seen it make a change which brakes functionality.

But yeah, it would be pretty outlandish if it were removing pipeline registers.
"Only amaters blame their tools" (C)

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #10 on: March 01, 2023, 07:24:08 pm »
You can also try enabling retiming, together with physical optimizations sometimes it can do small wonders.

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #11 on: March 01, 2023, 07:28:25 pm »
KE5FX,

I haven't used ASYNC_REG=TRUE. Where possible, I'm using the regs built into the DSP48E1, otherwise they're created in process blocks with the 400MHz clock in the sensitivity list and gated on rising edge. I would be somewhat perturbed if Vivado decided that it could eliminate a register in the fabric by, for example, upping the number of A:B registers and bringing the CE across to the new reg, especially since the DSP48E1s are instatiated rather than inferred. Thankfully it doesn't seem to be doing that. Though it is inferring RAM64Ms but then only using 3 of the 4 bits... because Vivavoodoo ;)

asmi,

Nice vote of confidence in Vivado :) Now if I can persuade it to keep things close together, I might be in with a fighting chance! I haven't tried the retiming option yet, so I'll give that a whirl. Might get lucky....

Many thanks,

Pat.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #12 on: March 01, 2023, 07:36:52 pm »
but Vivado then inferred BlockRAM and made the situation a whole lot worse, LOL.
At such high frequencies I recommend using explicit primitives instead of relying on inference to make sure it implements exactly the hardware you expect.

Also there is the "last argument of the King" - you can "pretend" that you have a speed grade 2 device and program your sg1 device with this bitstream, it will very likely still work OK, even if it will be consuming more power and heating up more than normal. Obviously you will need to do a full characterization and testing at a full PVT envelope to ensure YOUR design still works properly, and in general it's not something I would ever do for a production designs, but for one-offs and non-commercial designs - why not?

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #13 on: March 01, 2023, 08:05:57 pm »
asmi,

I guess I was just being lazy with regard to inference vs instantiation on the DistRAM - it is easy enough to check what Vivado actually inferred, and in fairness to it, the way I "added" the register was always going to be hinting that a BlockRAM was more appropriate, so I don't specifically hold that part of it against Vivado - what is interesting is that despite knowing the clock period to the process that infers the RAM exceeds the capabilities of BlockRAM, Vivado still goes an infers BlockRAM that is guaranteed to fail timing  :palm:

Agreed, as a one-off bench fun project I could just put the bitstream into the device - and perhaps if the timing was only out by 5ps then that would be a decent bet, but it's over 100ps in a critical area, so I'm either going to have to improve the path delay, or slow down the clock [or maybe sg2 might just meet timing, but the wallet will be having a critical timing failure of its own resulting in a FIFO underflow ;) ]

Many thanks,

Pat.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2883
  • Country: ca
Re: Vivado net delay tuning
« Reply #14 on: March 01, 2023, 08:20:08 pm »
I guess I was just being lazy with regard to inference vs instantiation on the DistRAM - it is easy enough to check what Vivado actually inferred, and in fairness to it, the way I "added" the register was always going to be hinting that a BlockRAM was more appropriate, so I don't specifically hold that part of it against Vivado - what is interesting is that despite knowing the clock period to the process that infers the RAM exceeds the capabilities of BlockRAM, Vivado still goes an infers BlockRAM that is guaranteed to fail timing  :palm:
That's easily explained because inference is happening during synthesis, and at that point it's not concerned with timings at all because it doesn't yet know where this stuff is going to be placed. Before placement, all timings are "provisional", i.e. essentially are simply guesses.

Agreed, as a one-off bench fun project I could just put the bitstream into the device - and perhaps if the timing was only out by 5ps then that would be a decent bet, but it's over 100ps in a critical area, so I'm either going to have to improve the path delay, or slow down the clock [or maybe sg2 might just meet timing, but the wallet will be having a critical timing failure of its own resulting in a FIFO underflow ;) ]
SG2 silicon is exactly the same as SG1 (or SG3 for that matter), the only difference is binning, and so an SG1 silicon has a pretty good chance to work as if it would be SG2 one (unless you are unlucky). I only tried it once on Zynq-015 device, and SG1 device worked just fine as a SG2 one, including multigigabit transceivers, the latters were the whole reason I went for it because SG2 MGTs can do PCIE 2 (5Gbps) while SG1 can only do PCIE 1 (2.5 Gbps). All PCIE devices I happened to have laying around worked just fine and reported good PCIE 2 links. The way I made it happen is I set up the Vivado project for SG2 part, and then programmed my SG1 device with it. Everything worked just fine, I only had to add a heatsink as it will heating up quite a bit more than usual.

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5210
  • Country: au
    • send complaints here
Re: Vivado net delay tuning
« Reply #15 on: March 01, 2023, 09:15:41 pm »
I would be somewhat perturbed if Vivado decided that it could eliminate a register in the fabric by, for example, upping the number of A:B registers and bringing the CE across to the new reg, especially since the DSP48E1s are instatiated rather than inferred.
The tools have pulled and pushed register in/out of hard IP to "help" and made it worse in other designs. Above mention of attributes should point to the KEEP and DONT_TOUCH options.

I guess I was just being lazy with regard to inference vs instantiation on the DistRAM - it is easy enough to check what Vivado actually inferred, and in fairness to it, the way I "added" the register was always going to be hinting that a BlockRAM was more appropriate, so I don't specifically hold that part of it against Vivado - what is interesting is that despite knowing the clock period to the process that infers the RAM exceeds the capabilities of BlockRAM, Vivado still goes an infers BlockRAM that is guaranteed to fail timing  :palm:
That's easily explained because inference is happening during synthesis, and at that point it's not concerned with timings at all because it doesn't yet know where this stuff is going to be placed. Before placement, all timings are "provisional", i.e. essentially are simply guesses.
There were (and surely still are?) timing driven choices in synthesis.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #16 on: March 01, 2023, 09:21:21 pm »
asmi,

You are of course correct that inference happens during synthesis, but it's not like it couldn't check constraints during synth - it just chooses not to. Now there may be some very good reasons for that. But it's not like it's impossible to do :)

Also agreed on the binning. All the silicon comes out the same fab, lots from the same wafer, so there's nothing "magical" about sg2 - other than a binning test at the fab (or downstream) determined that piece of silicon was "better" than others. Again, I'd be quite happy to push an sg2 bitstream onto an sg1 device for fun on the bench, but not on production stuff.

Many thanks,

Pat.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16107
  • Country: fr
Re: Vivado net delay tuning
« Reply #17 on: March 01, 2023, 09:29:03 pm »
You got some more elaborate answers here.

Short answer will be: a).
In practice, even for relatively basic single-level logic, you'll rarely reach the typical rated frequency shown in datasheets, when routing delays are included. Even after tweaking options, setting timing constraints and even trying to manually place some blocks.

Add to this the fact that the timing models used in FPGA tools (same for all vendors, certainly not specific to Vivado) are pretty pessimistic - point is to give you figures that will work reliably on real hardware for sure, not to give you the best estimate.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #18 on: March 01, 2023, 09:33:12 pm »
Someone,

I could certainly accept the tools moving regs to "help" when things are inferred - that's fair play since the engineer hasn't told the tool precisely what they want.... they hinted at it. When instantiating something like a DSP48E1 the tool should IMHO either instantiate precisely what has been requested or bomb saying that it cannot, explaining why. Whether they do that in the real world or not is another matter! In my case, thus far, the tool has respected the instantiated stuff so I can't complain there. I also take a fair chunk of the responsibility for it inferring BlockRAM (even though it could have figured out that cannot work) since I hinted at BlockRAM in the HDL - I should have been more explicit that I want a register external to the RAM.

With regard to timing driven choices in synth, I am inadequately familiar with Vivado at this point to make any meaningful observations beyond those I already have.

Many thanks,

Pat.
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #19 on: March 01, 2023, 09:40:00 pm »
SiliconWizard,

I am more and more coming to the conclusion that I'm either going to have to experience an underflow error in the wallet for sg2 silicon, drop the clock speed, or spend inordinate amounts of time manually tweaking things every time I implement the next stage in the design.

I certainly can't complain about the tools being pessimistic - it is understandable given that Xilinx could be on the wrong end of litigation if their tools predicted something would work, then it later failed in the field due to excessive optimism, resulting in consequential losses - better safe than sorry. Though a way to tweak the pessimism level might be a useful knob one could tweak (on the proviso that results are only guaranteed in the default setting).

Many thanks,

Pat.
 

Offline glenenglish

  • Frequent Contributor
  • **
  • Posts: 473
  • Country: au
  • RF engineer. AI6UM / VK1XX . Aviation pilot. MTBr
Re: Vivado net delay tuning
« Reply #20 on: March 01, 2023, 10:51:07 pm »
- Use deliberate primitives unless you verify inference / synthesis
- dont expect > 75% of datasheet speed unless you hand place / floorplan it yourself .
- aggressively pipeline
- dont expect miracles with 7 series > 70% utilization.
- review implemented schamtic to see if the tool did what you expect.

I run my 7 series designs at about 50% of max. up to 70% in very  small regions with own clock  and clock domain crossing to get in and out of the fast region....

I think Vivado is one of the best tools around. ever. took a while for it to mature.

-glen

« Last Edit: March 01, 2023, 10:52:51 pm by glenenglish »
 
The following users thanked this post: Someone

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #21 on: March 01, 2023, 11:28:43 pm »
Hiya Glen,

Many thanks for your suggestions :

- Already on the same page w.r.t. deliberate primitives / checking SCM when inferring.
- The 75% rule would "easily" close timing here, it would buy me 373ps and I don't need anywhere near that (at the mo), and that's 75% of the 464MHz the DSP48E1 is rated at. At 75% of the DistRAM speeds, there'd be 833ps "spare".
- I was already down to 1 logic level and thinking "this can't be pipelined any more" (unless all you wanted was a shift register / delay line). Thanks to further nudges in that direction on here, it then dawned on me that the extra pipeline stage might actually reduce net delays as opposed to LUT/MUX delays.
- Thanks for the heads up on the 70% utilisation. As mentioned, I kind of fell at the first hurdle, so that's not going to be an issue for the foreseeable future.
- When inferring, I do like to look at the SCM. Does pose some interesting questions, like why it chose to only use A,B and C data bits in RAM64Ms it inferred, for a 64x16 RAM - it could have done it with 4 RAM64Ms but ended up with 6. I may end up instantiating these rather than inferring.

There is certainly a theme with regard to implemented speeds being considerably lower than max... 50-66% up to 70-75% seems to be the ranges being suggested.

It would seem that Vivado is well liked / respected so I'm not regretting that decision at the moment (even if it has thrown me a few curve balls).

Cheers,

Pat.
 

Online hamster_nz

  • Super Contributor
  • ***
  • Posts: 2821
  • Country: nz
Re: Vivado net delay tuning
« Reply #22 on: March 01, 2023, 11:51:47 pm »
400MHz is pretty fast for those parts...

Just in case you haven't discovered this workflow yet.

With an implemented design open:

* View and clock on the click on the failed route of interest in the timing report.

* Press "F4" to get a schematic of what is failing.

* Maybe also click on the 'clk' signals to the source and target FFs to chase them back to the clock source (e.g. pin, MMCM, PLL). This lets you check that you are not swapping between clock domain that are related but not the same.

* When you click on a signal/net in the schematic view it will also be highlighted in the device view, so you can see where the net is running physically.

Hope it helps!


Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 5210
  • Country: au
    • send complaints here
Re: Vivado net delay tuning
« Reply #23 on: March 02, 2023, 12:30:46 am »
I could certainly accept the tools moving regs to "help" when things are inferred - that's fair play since the engineer hasn't told the tool precisely what they want.... they hinted at it. When instantiating something like a DSP48E1 the tool should IMHO either instantiate precisely what has been requested or bomb saying that it cannot, explaining why. Whether they do that in the real world or not is another matter
So you want it to move things around to help timing, but not move things around to help timing? Registers on DSPs and RAM are treated almost the same as fabric for timing and optimisation. If you really don't want things moved then it is over to you to tell the tools that.

- The 75% rule would "easily" close timing here, it would buy me 373ps and I don't need anywhere near that (at the mo), and that's 75% of the 464MHz the DSP48E1 is rated at. At 75% of the DistRAM speeds, there'd be 833ps "spare".
Please put your working/references/source up.

At the high end of the rule of thumb several of us here use:
DSP max frequency 464MHz,  x 0.75 = 348Mhz well short of your target
LUT ram max frequency 400MHz, zero margin

Yes it is more complex than that with the fixed and routing delays and specific setup/hold. Which is why when you try and push things it is not one dimension "but the datasheet says 464MHz !@$!$!FFF". Even stepping up a speed grade you are still at the pointy end of hard work to achieve the speeds you say you need.
I am more and more coming to the conclusion that I'm either going to have to experience an underflow error in the wallet for sg2 silicon, drop the clock speed, or spend inordinate amounts of time manually tweaking things every time I implement the next stage in the design.
Hard things are hard, what's new?
 

Offline _pat_Topic starter

  • Contributor
  • Posts: 32
  • Country: gb
Re: Vivado net delay tuning
« Reply #24 on: March 02, 2023, 01:55:45 am »
hamster_nz,

Many thanks for your workflow suggestions. It pretty much mirrors what I was doing in order to find where things were going awry. The major issue is with the net delays. If it was going through a lot of logic, and the tool had shown me that, then it would have been a case of tweaking the HDL. When there's only one LUT in the path and it still fails, it gets "interesting" :)

Someone,

Think we are slightly cross-purposes on what I mean with regard to the tool "helping" - yes I would like the tool to move fabric regs if that is what it needs to do in order to lower net delay to close timing, but no I would not want it moving an instantiated DSP48E1 reg out into the fabric on its own.... if it is not possible to close timing on an instantiated DSP48E1 then I would expect to have to change the instantiation to disable that reg, and to create a register in the HDL. Thankfully, it doesn't appear to have done that (yet, maybe it is lulling me into a false sense of security, LOL).

Not sure quite what you're after regarding working / reference / source, but the information I stated came from the datasheet and the implementation timing report. It was telling me that I have a slack of -129ps. The clock was declared at 400MHz, or 2500ps period. At 75% of 464MHz we are at 348MHz as you state, which is a period of 2873ns, a difference of 373ps, which is significantly more than the 129ps I needed to have zero slack, and would have closed timing if I had dropped the clock to that speed.

The LUTRAM frequency is a bit of an enigma. We both agree the datasheet states 2.5ns period / 400MHz max, but when you look at the timing report it shows a tiny propagation delay from read address in to data out. I guess that shouldn't be surprising since these are LUTs used as RAM and they are fast in that mode. I suspect the limit has more to do with the synchronous write side of the DistRAM than the async read.

With regard to the sg2 / pointy end / hard things being hard - I had failed to appreciate just how "slow" the signal routing is. Let me explain. I am seeing net delays up to 1.5ns. That's the best part of 450mm worth of propagation at c, and even if phase velocity was as low as 0.66 then it would still be 300mm.... in other words many, many die-sizes worth, so I just wasn't expecting such long delays. You live and learn :)

As an exercise I did try the experiment of adding another pipeline stage whose sole purpose in life is to reduce net delays and I am pleased to report that it worked.... after a little prodding.... needed to add a RAM_STYLE attribute and it did need one round of post-route PhysOpt to close timing - then it managed to get the slack up from -129ps to 43ps  :-+

That being said, I suspect that as I add more things, it is going to come back and haunt me again....

Many thanks,

Pat.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf