Author Topic: Designing an FFT co-processor. How Parallel can FFT be?  (Read 1398 times)

0 Members and 1 Guest are viewing this topic.

Offline InterestedTomTopic starter

  • Regular Contributor
  • *
  • Posts: 56
  • Country: england
Designing an FFT co-processor. How Parallel can FFT be?
« on: June 17, 2018, 03:27:33 pm »
I want to design an FFT co-processor to fit in the Cyclone V SoC on the Terasic DE10-Nano.

I have seen a paper about implementing FFT in an FPGA, but they only used one "Butterfly Unit" and just clocked through the data and twiddle factor RAM inside the FPGA in the correct sequence for a radix-2 algorithm.

I want to explore implementing an FFT co-processor with built in configurability, i.e. an array of butterfly units with a configurable number of points for the FFT.

How feasible is this?
 

Offline donmr

  • Regular Contributor
  • *
  • Posts: 155
  • Country: us
  • W7DMR
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #1 on: June 17, 2018, 06:15:17 pm »
This depends on things like:

- How many points of what size do you want to handle?
- Can you present all of the inputs at once?
- How big an FPGA do you have?
- How fast does it need to run?

Besides the gate count of the butterfly units, large multiplexers can get big and slow too.

Try drawing some diagrams of what it would look like and estimate the sizes.

Any function can be make fully parallel if you can present all of the inputs, you have enough gates, and you are willing to take long enough for the result.  Think of a huge ROM.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19507
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #2 on: June 17, 2018, 06:28:28 pm »
Concentrate on the routing. ISTR there is an irreducible minimum number of wires that cross over each other.

With FPGAs the routing is always the problem. (I exaggerate, of course, but there's truth in that jest!)
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #3 on: June 17, 2018, 06:54:15 pm »
There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.
 

Offline InterestedTomTopic starter

  • Regular Contributor
  • *
  • Posts: 56
  • Country: england
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #4 on: June 17, 2018, 08:18:19 pm »
There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

Ok so pipelining would suggest to me that I split the FFT into 2n parts and then combine them with a number of subsequent stages. Or did you mean pipeline the single butterfly unit by adding registers between each operation?
 

Offline jbb

  • Super Contributor
  • ***
  • Posts: 1143
  • Country: nz
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #5 on: June 17, 2018, 08:26:34 pm »
There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

This is true. Each output term will eventually map back to all the input terms, so there’s a lot of data path to think about.

FFTs also require you to have all of the inputs before you can begin, so there’s always the latency of collecting your data samples. Might it be possible to begin the FFT as your samples come in? I did an FIR filter bank this way once; it precomputed as much as possible and then waited for the final sample to come in.

Finally, you mentioned variable size. I expect that this would, in hardware terms, involve more MUXing, which is difficult for place and route of the FPGA. I suspect the way forward there is to look at a 2^n radix, and use software to zero pad inputs of less than 2^n
 
The following users thanked this post: InterestedTom

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #6 on: June 17, 2018, 08:39:27 pm »
There is limited potential for parallel operations in an FFT, so extremely low latency is a problem. There is a lot of potential for pipelining, so getting high throughput is straightforward.

Ok so pipelining would suggest to me that I split the FFT into 2n parts and then combine them with a number of subsequent stages. Or did you mean pipeline the single butterfly unit by adding registers between each operation?
You can pipeline within the butterfly, so that you can crank up the clock rate of the butterfly. You can pipeline the columns of butterflies, using multiple sets of butterfly hardware, to increase throughput for the same clock rate. There is a lot of potential for pipelining in the FFT, depending how much hardware you want to throw at the problem.
 
The following users thanked this post: InterestedTom

Offline chris_leyson

  • Super Contributor
  • ***
  • Posts: 1541
  • Country: wales
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #7 on: June 17, 2018, 09:06:55 pm »
Quote
There is limited potential for parallel operations in an FFT
There is lots of potential for parallel operations. Consider a 144 point complex FFT, if you do it radix 12 then you have to calculate 24 12-point butterflies, it's 24 because there are two stages. If you use the prime factor algorithm then you don't need twiddle factors for the 12-point FFTs or butterflies so that save a bit of hardware, however, you still need them for the 144-point FFT. With two pipelined 12-point FFT engines you can do a 144-point FFT in 12 clock cycles. Add two more 12-point FFT engines and you can do fast convolution with the same throughput. It was for doing fast 2D convolution and I only got as far as implementing one fully parallel pipelined 12-point FFT and it was huge. I never got to looking at adding another one as the company went bust.
« Last Edit: June 17, 2018, 09:09:22 pm by chris_leyson »
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #8 on: June 17, 2018, 11:02:29 pm »
Google...

Try searching for 'pipeline fft architectures optimized for fpgas'  or any other combination of fpga & fft
 

Offline rhb

  • Super Contributor
  • ***
  • Posts: 3483
  • Country: us
Re: Designing an FFT co-processor. How Parallel can FFT be?
« Reply #9 on: June 18, 2018, 12:37:09 am »
Texas Memory Systems made dedicated FFT engines which they sold for medical and military applications.  The company was founded by Holly Frost and later acquired by IBM.

The FFT is fundamentally a matrix multiplication problem which can be decomposed into prime factor sizes.  So it is "trivially" parallel in one dimension.  Things get stickier as you move to higher dimensions because of the data access.

In any case, the way to implement an FFT co-processor today is with an FPGA.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf