Author Topic: Vivado is shit at timing optimization?  (Read 8194 times)

0 Members and 1 Guest are viewing this topic.

Offline OwOTopic starter

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Vivado is shit at timing optimization?
« Reply #25 on: June 27, 2019, 08:45:38 am »
1. De-multiplex the address A, so that you now have Aeven (for even clocks) and Aodd (for odd clocks)
2. Bring Aeven and Aodd from f clock domain to f/2.
3. After clock domain crossing, you supply Aeven to one of the dual read ports of BRAM and Aodd to the other read port
That would work if it's a readonly block ram, but for the transposer memory needed in the FFT it has to write at the same rate as it's reading (the requirement is the same in a single path delay feedback FFT architecture), so you would need quad port memory (2 read ports and 2 write ports). I'm not aware of any Xilinx FPGA with quad port block rams, but it's doable in LUTRAM.
Email: OwOwOwOwO123@outlook.com
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3246
  • Country: ca
Re: Vivado is shit at timing optimization?
« Reply #26 on: June 27, 2019, 02:30:36 pm »
That would work if it's a readonly block ram, but for the transposer memory needed in the FFT it has to write at the same rate as it's reading (the requirement is the same in a single path delay feedback FFT architecture), so you would need quad port memory (2 read ports and 2 write ports). I'm not aware of any Xilinx FPGA with quad port block rams, but it's doable in LUTRAM.

I don't think you can do it with LUT  RAM. To write at 2x speed you need 2 write ports, but LUT RAM has only one write port. In fact, they're not real multi-port at all. All their so-called multiport configurations are multiple LUTs written with the same data and then read independently.

I think BRAM has more potential. Apparently, you will have multiple BRAM block in your memory array. Each of them can be accessed independently. You even can make two simultaneous writes to the same block (but only if it is not read at the same time). If you arrange your algorithm so that there's no consecutive writes to the same BRAM block, you can get 2x write speed. Given that you can freely shuffle address lines, this might be doable.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2797
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #27 on: June 27, 2019, 02:57:40 pm »
Why can't you just use two BRAM blocks in parallel to double the word width and get 2x bandwidth this way? There are plenty of BRAM blocks in modern FPGA (even relatively small A35T has 50/100 of them!). When I need more than two read ports, I just write the same data into multiple BRAMs at the same time, and then you can read with however many addresses at the same time by reading from different BRAMs that all have the same data.

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 15323
  • Country: fr
Re: Vivado is * at timing optimization?
« Reply #28 on: June 27, 2019, 05:13:56 pm »
Why can't you just use two BRAM blocks in parallel to double the word width and get 2x bandwidth this way? There are plenty of BRAM blocks in modern FPGA (even relatively small A35T has 50/100 of them!). When I need more than two read ports, I just write the same data into multiple BRAMs at the same time, and then you can read with however many addresses at the same time by reading from different BRAMs that all have the same data.

Ditto.
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3246
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #29 on: June 27, 2019, 06:07:26 pm »
Why can't you just use two BRAM blocks in parallel to double the word width and get 2x bandwidth this way? There are plenty of BRAM blocks in modern FPGA (even relatively small A35T has 50/100 of them!). When I need more than two read ports, I just write the same data into multiple BRAMs at the same time, and then you can read with however many addresses at the same time by reading from different BRAMs that all have the same data.

Ditto.

If your consecutive memory accesses always use consecutive addresses then you can get 2x bandwidth by widening your memory. However, if you need random access, widening will not help because the 2 consecutive accesses may work with data at two different addresses.

As to multiple copies of the same memory to create multiple read ports, they can help with reads, but not with writes, because you need to write to all copies anyway.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2797
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #30 on: June 27, 2019, 06:31:41 pm »
If your consecutive memory accesses always use consecutive addresses then you can get 2x bandwidth by widening your memory. However, if you need random access, widening will not help because the 2 consecutive accesses may work with data at two different addresses.
It works for random access just as well. You will just have addressable elements's size doubled (i.e. a single address will correspond to 64 bit-wide word instead of 32 bit one). You pass the same address to both BRAMs, take half of data bits from the first BRAM, and the second half from another one.

As to multiple copies of the same memory to create multiple read ports, they can help with reads, but not with writes, because you need to write to all copies anyway.
I can't remember any situation when I needed many write ports at the same time as many read ports. There were however quite a few cases when I needed multiple read ports and a single write one.

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3246
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #31 on: June 27, 2019, 06:59:49 pm »
If your consecutive memory accesses always use consecutive addresses then you can get 2x bandwidth by widening your memory. However, if you need random access, widening will not help because the 2 consecutive accesses may work with data at two different addresses.
It works for random access just as well. You will just have addressable elements's size doubled (i.e. a single address will correspond to 64 bit-wide word instead of 32 bit one). You pass the same address to both BRAMs, take half of data bits from the first BRAM, and the second half from another one.

Exactly. If you want to read it from the same address it'll work, but if you need one 32-bit word from one address and another 32-bit from a different address (which is called random access), 64-bit wide memory won't help.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2797
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #32 on: June 27, 2019, 07:21:53 pm »
Exactly. If you want to read it from the same address it'll work, but if you need one 32-bit word from one address and another 32-bit from a different address (which is called random access), 64-bit wide memory won't help.
This is not a random access. Random access implies access to random addresses sequentially, not simultaneously. And doubling the data means you process double word per cycle by doubling your processing pipeline.

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3246
  • Country: ca
Re: Vivado is * at timing optimization?
« Reply #33 on: June 27, 2019, 08:38:53 pm »
Exactly. If you want to read it from the same address it'll work, but if you need one 32-bit word from one address and another 32-bit from a different address (which is called random access), 64-bit wide memory won't help.
This is not a random access. Random access implies access to random addresses sequentially, not simultaneously. And doubling the data means you process double word per cycle by doubling your processing pipeline.

Yeap. Originally, we want to fetch 32-bit data sequentially at random addresses (such as 0x04, 0x20, 0x1e, 0x03 etc.). But we want to do it 2x faster than the memory can sustain.

You suggest using fetches from 64-bit memory (two halves simultaneously) to solve this problem. This way you do get two 32-bit data pieces at once, thus meeting the speed requirement. But your method implies that both upper and lower halves of the 64-bit data you have fetched must go from the same place (e.g. if we fetched the first 32-bit data from 0x04, the second one will not be from 0x20), which is not what we originally wanted. Thus the random access requirement is not met.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf