Author Topic: Xilinx FFT IP Core - window filtering function is not implemented?  (Read 4761 times)

0 Members and 1 Guest are viewing this topic.

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Hello,

I just has finished study Xilinx documentation for "Fast Fourier Transform v9.1" - see link:

https://www.xilinx.com/support/documentation/ip_documentation/xfft/v9_1/pg109-xfft.pdf

and I wasn't able to find information about "window filtering". It seems that such function isn't implemented. I am wondering why?
The same situation with Gowin "FFT" IP core. I thought that such importent function should be implemented in "FFT" IP Core.
I found such article about implementing "window filtering" in FPGA (VHDL):
https://discourse.world/h/2018/10/23/Features-of-window-filtering-on-FPGA

Could somebody give me clues about implementation of "window filtering" in FPGA (especially in Verilog).

Thanks in advance and regards
 

Offline SMB784

  • Frequent Contributor
  • **
  • Posts: 421
  • Country: us
    • Tequity Surplus
« Last Edit: February 22, 2022, 01:07:20 pm by SMB784 »
 
The following users thanked this post: FlyingDutch

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #2 on: February 21, 2022, 07:06:21 pm »
Hello,

I just has finished study Xilinx documentation for "Fast Fourier Transform v9.1" - see link:

and I wasn't able to find information about "window filtering". It seems that such function isn't implemented. I am wondering why?
The same situation with Gowin "FFT" IP core. I thought that such importent function should be implemented in "FFT" IP Core.

Windowing is a separate process performed before you do the FFT.

The FFT expects windowed data.

Why isn't it included with an FFT core? Because there are many window choices, and which you use depends on your needs.
 
The following users thanked this post: FlyingDutch

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #3 on: February 22, 2022, 07:05:24 am »
Hello,

thanks all for answer (especially for "Moving Average" filter implementatio). I have one question more: I have such module with ADC converter:

https://pl.aliexpress.com/item/1005001593721645.html?gatewayAdapt=glo2pol&spm=a2g0o.9042311.0.0.6ab85c0fZdvO8a

This board is based on ADS1256 IC - here is datasheet for this IC:
https://www.ti.com/lit/ds/symlink/ads1256.pdf

This ADC has maximum sampling speed 30Ksps. I am wondering how many points I should use for FFT transform. This number is 2 to the power of m, where m is from 3 to 16. What number of points from ADC converter wpuld be optimal? I consider using m= 15 or 16 - is it a good assumption?

Thanks in advance and regards
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #4 on: February 22, 2022, 08:28:08 am »
Hello,

thanks all for answer (especially for "Moving Average" filter implementatio). I have one question more: I have such module with ADC converter:

https://pl.aliexpress.com/item/1005001593721645.html?gatewayAdapt=glo2pol&spm=a2g0o.9042311.0.0.6ab85c0fZdvO8a

This board is based on ADS1256 IC - here is datasheet for this IC:
https://www.ti.com/lit/ds/symlink/ads1256.pdf

This ADC has maximum sampling speed 30Ksps. I am wondering how many points I should use for FFT transform. This number is 2 to the power of m, where m is from 3 to 16. What number of points from ADC converter wpuld be optimal? I consider using m= 15 or 16 - is it a good assumption?

Thanks in advance and regards

It really depends on your output bin resolution, and how much latency you can get away with.

If you need to FFT 2^15 points, that's going to be over 1 second lag at 30kS/s, but you get bin resolution at about 1Hz.

And back to your windowing, you just want to multiply then values by your window as you send them into the FFT block.

In a non-existing HDL:

Code: [Select]
    if rising_edge(clk):
        if input_valid:
            to_fft = input_value*window_constant[sample_in_block]
            if sample_in_block == block_size-1:
                fft_trigger = 1;
                sample_in_block = 0;
            else:
                fft_trigger = 0;
                sample_in_block++;
        else
            fft_trigger = 0;

Depends on the way the data is expected to be presented to the FFT block though (e.g. if you need to buffer the samples first)
« Last Edit: February 22, 2022, 08:29:43 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: FlyingDutch

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #5 on: February 22, 2022, 01:18:27 pm »
Hello,

today I installed on my computer "Anaconda" with Python3.9 and packages "matplotlib", "numpy" and "Jupyter Notebook".

Using Python script I generated 2**14 sample of audio signal. This signal is consisted of sum of seven sinusoidal. These sinusoids has such amplitudes and frequencies:

 1) Amp=2 f=50Hz

2) Amp=3 f=200Hz

3) Amp=1,8 f=1 KHz

4) Amp=0,9 f=6,5 KHz

5) Amp=2,4 f=10 KHz

6) Amp=3,7 f=12,4 KHz

6) Amp=2 f=14,8 KHz

And sampling frequency is 30 KHz (such is max. sampling rate of ADC IC) - see code:
Code: [Select]
n = np.arange(16384) #2**14
sinus = 2*np.sin(2 * np.pi * n * 50 / fs)+3*np.sin(2 * np.pi * n * 200 / fs)+1.8*np.sin(2 * np.pi * n * 1000 / fs)+0.9*np.sin(2 * np.pi * n * 6500 / fs)+2.4*np.sin(2 * np.pi * n * 10000 / fs)+3.7*np.sin(2 * np.pi * n * 12400 / fs)+2*np.sin(2 * np.pi * n * 14800 / fs)
widmo = np.fft.fft(sinus)

I didn't added phase shifting to each sinusoid, but I will do it later.

Here is full python code generating samples and calculating FFT and drawing spectrum plot:
Code: [Select]
%matplotlib inline
import numpy as np
import scipy.signal as sig
from scipy.io import wavfile
import matplotlib.pyplot as plt

# częstotliwość próbkowania
fs = 30000

n = np.arange(16384) #2**14
sinus = 2*np.sin(2 * np.pi * n * 50 / fs)+3*np.sin(2 * np.pi * n * 200 / fs)+1.8*np.sin(2 * np.pi * n * 1000 / fs)+0.9*np.sin(2 * np.pi * n * 6500 / fs)+2.4*np.sin(2 * np.pi * n * 10000 / fs)+3.7*np.sin(2 * np.pi * n * 12400 / fs)+2*np.sin(2 * np.pi * n * 14800 / fs)
widmo = np.fft.fft(sinus)

widmo_amp = np.abs(np.fft.rfft(sinus)) / 1024
f = np.fft.rfftfreq(16384, 1/fs)
plt.plot(f, widmo_amp)
plt.xlabel('częstotliwość [Hz]')
plt.ylabel('amplituda widma')
plt.title('Widmo "rzeczywiste" sygnału sinusoidalnego')
plt.show()
Below is screenshot from "Jupyter Notebook" with these calculations - JupyterNot1.png

As is shown on spectrum plot - calculated Fast Fourier transform is correct. Now I have to alter Python script code in order to write generated samples in text file in format with I be able to read tis file content in IP Core ROM (xilinx). First I will try to write code for Xilinx FPGA (wich tools I am more familiar), and after that I will try to move this code (Verilog) to Gowin FPGA (Tang Nano 4K).

Update:

I managed to write generated samples (2**14) in text file - in every row of file is sample (amplitude) as Float64 (Double) number.
In order to do it I had to reshape numpy array with samples (sinus array) to one row 16384 columns - see code:
Code: [Select]
sinus2=sinus.reshape((1,16384)) #trzeba zmienic shape, aby zapis w pliku byl mozliwy
n2=n.reshape((1,16384))
print('Array:', sinus2.shape)
print('Datatype:', sinus2.dtype)

It gives as result:
Array: (1, 16384)

Datatype: float64


After reshaping with such code I was able to write generated samples to file:

Code: [Select]
a_file = open("test.txt", "w")
for row in sinus2:
    np.savetxt(a_file,row)
a_file.close()
I attached ziped file with generated samples: PrzebiegFunkcjaT.zip . Here is few first rows from generated file:

Quote
0.000000000000000000e+00
5.480056624774116258e+00
-4.133662592834458138e+00
4.712084740351380141e+00
-4.044070760789608698e-02
2.579540703684484626e+00
3.405856484408328111e+00
3.067913355752533100e+00
2.780774431272005742e+00
-1.718370032419028748e-01
7.782158983178106837e+00
1.245049810715485039e+00
3.412742857835758814e-01
7.493511121116378959e+00
-4.273783738137240995e+00
7.670704301834878613e+00
4.283694862937921233e-01
6.055390249418126647e-01
8.343503988437683816e-01
2.363535211179068618e+00
1.980815387486884571e+00
-1.049024421720973876e+00
2.694032242034156255e+00
4.651970443412252187e-01
-1.296731759010284479e+00


And with such code I made plot of signal (in time domain) - see screnshot:
Code: [Select]
plt.plot(n, sinus)
plt.xlabel('czas [s]')
plt.ylabel('amplituda')
plt.title('Przebieg sygnału sinusoidalnego')
plt.show()
Now I have to find a way hot to change these sample written as float64 in format from which I will ba able read values into "ROM" IPcore (Xilinx) - any clues how to do it is warmly weclome. Then I will be able to calculate FFT transform with IpCore "FFT"

BTW: I forgot about "Phase Spectrum" - here is code how to plot it (the first spectrum plot was with Amplitude):
Code: [Select]
widmo_faz = np.angle(np.fft.rfft(sinus))
plt.plot(f, widmo_faz)
plt.xlabel('częstotliwość [Hz]')
plt.ylabel('faza [rad]')
plt.title('Widmo fazowe sygnału sinusoidalnego')
plt.show()

See last screenshot.

Best Regards
« Last Edit: February 22, 2022, 02:36:47 pm by FlyingDutch »
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #6 on: February 22, 2022, 04:29:33 pm »
It really depends on your output bin resolution, and how much latency you can get away with.

If you need to FFT 2^15 points, that's going to be over 1 second lag at 30kS/s, but you get bin resolution at about 1Hz.

And back to your windowing, you just want to multiply then values by your window as you send them into the FFT block.

In a non-existing HDL:

Code: [Select]
    if rising_edge(clk):
        if input_valid:
            to_fft = input_value*window_constant[sample_in_block]
            if sample_in_block == block_size-1:
                fft_trigger = 1;
                sample_in_block = 0;
            else:
                fft_trigger = 0;
                sample_in_block++;
        else
            fft_trigger = 0;

Depends on the way the data is expected to be presented to the FFT block though (e.g. if you need to buffer the samples first)
Hello @hamster_nz,

thank you very much for answer. I have more questions related to the window width (filter window). Let assume that I want/need the FFT have to be calculated 30 times per second. Consider that I want use ADC which has max. sampling rate 30ksps the max. number of points for calculating FFT wil be 1024. Then taking 1024 samples take about 34.13 ms (it would be 29.3 times per second). So now let's assume that I take 1024 sample and calculate FFT with these points. The frequency resolution is OK for me (I need only 24 stripes of spectrum). Now my main question: How should be the window width (how many samples  and how many coefficients). The second question is" Is it possible to make a pipelined design for FFT sampling and calculating and how to do it? For what stages such pipelined design should be partitioned? What time every calculated FFT would be available with pipelined design?

Thanks in advance and Regards
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14471
  • Country: fr
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #7 on: February 22, 2022, 06:37:50 pm »
Windowing is a separate process performed before you do the FFT.

Not just that, but you don't even necessarily need to apply a window, depending on your particular use case.
The definition of the transform, which is just an efficient DFT, doesn't including windowing.

The FFT expects windowed data.

The FFT doesn't *expect* anything much. Use it according to your requirements, which could include having to apply a window, or not.

Why isn't it included with an FFT core? Because there are many window choices, and which you use depends on your needs.

Yes, and ultimately, as I said, those are two different things altogether.
Asking that question would be like asking why a multiplier IP doesn't include an adder and accumulator, since *my* own use of a multiplier is for implementing MAC operations. =)
 
The following users thanked this post: FlyingDutch

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #8 on: February 23, 2022, 05:05:42 am »
Hello @hamster_nz,

thank you very much for answer. I have more questions related to the window width (filter window). Let assume that I want/need the FFT have to be calculated 30 times per second. Consider that I want use ADC which has max. sampling rate 30ksps the max. number of points for calculating FFT wil be 1024. Then taking 1024 samples take about 34.13 ms (it would be 29.3 times per second). So now let's assume that I take 1024 sample and calculate FFT with these points. The frequency resolution is OK for me (I need only 24 stripes of spectrum). Now my main question: How should be the window width (how many samples  and how many coefficients). The second question is" Is it possible to make a pipelined design for FFT sampling and calculating and how to do it? For what stages such pipelined design should be partitioned? What time every calculated FFT would be available with pipelined design?

Thanks in advance and Regards

So designed like that the resolution will be 29.3Hz
output bin 0 will be DC.
output bin 1 will be 29.3 Hz
output bin 2 will be 58.6 Hz
output bin 3 will be 87.9 Hz
...
bin 511 will be 14.972 Hz.

At those data rates I think FFT is a waste of time. If I needed to implement this I would just use DFT.

Code: [Select]

    if data_enable
       windowed_sample = new_sample * window_constant[sample_no];

    trig_index = 0
    for(i = 0; i < 512; i++) {
       if sample_no == 0
          bin[i].r = windowed_sample * cosine_table[trig_index];
          bin[i].i = windowed_sample * sine_table[trig_index];
       else
          bin[i].r = bin[i].r + windowed_sample * cosine_table[trig_index];
          bin[i].i = bin[i].i + windowed_sample * sine_table[trig_index];
       trig_index = (trig_index + sample_no) % 1024;
    }
    if(sample_no == 1023) {
        dft_done = 1;  // Trigger the readout
        sample_no = 0;
     } else {
        sample_no++;
     }

Implemented in HDL, that loop will take around 514 cycles or so per sample. so could be done with a system clock of 30k * 514 = 15.5MHz, with two DSP blocks for the DFT and one for the window, or if you want to use less resources, interleave the sine and cosines, and use a two DSP block at 32MHz, or maybe with some scheduling to reuse the same DSP for windowing and DFT, a single DSP block.

Because the cosine and sine tables are the same, (just with a with a 256 phase offset), you only need two 1024-word memories - one for the combined cosine and cosine, and one for the window.

So a resource budget of two 18-bit x 1024 Block RAMs for constants, another two block RAMs to hold the DFT values (36-bits x 512 entries to keep precision), and a couple of DSP slices would about right.

Of course this doesn't scale to faster sample rates or larger DFT sizes, it's just that with 1024 samples at 30Hz isn't at the scale where FFT is needed.

Of note is how nicely this code below maps nicely onto the DSP slice's multiply-accumulate structure - like it's made for it  :D

Code: [Select]
bin[i].r = bin[i].r + windowed_sample * cos_table[trig_index];
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: FlyingDutch

Online gf

  • Super Contributor
  • ***
  • Posts: 1170
  • Country: de
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #9 on: February 23, 2022, 06:56:21 am »
The frequency resolution is OK for me (I need only 24 stripes of spectrum).

What are the "24 stripes of spectrum" that interest you?
Is it supposed to be a spectrogram, covering a contiguous frequency range with equal spacing?
With a spacing of 30000/1024 = 29.297 Hz, 24 adjacent bins cover a range of ~700 Hz.
Or do you rather want to detect 24 (unrelated) single frequencies (out of the 512 frequencies you get from a 1024 point DFT)?
 
The following users thanked this post: FlyingDutch

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #10 on: February 23, 2022, 07:47:13 am »
The frequency resolution is OK for me (I need only 24 stripes of spectrum).
What are the "24 stripes of spectrum" that interest you?
Is it supposed to be a spectrogram, covering a contiguous frequency range with equal spacing?
With a spacing of 30000/1024 = 29.297 Hz, 24 adjacent bins cover a range of ~700 Hz.
Or do you rather want to detect 24 (unrelated) single frequencies (out of the 512 frequencies you get from a 1024 point DFT)?
Or as an extension to the suggestion by hamster_nz
https://en.wikipedia.org/wiki/Goertzel_algorithm
Pick your frequencies, calculate just them.
 
The following users thanked this post: FlyingDutch

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #11 on: February 23, 2022, 07:57:12 am »
The frequency resolution is OK for me (I need only 24 stripes of spectrum).

What are the "24 stripes of spectrum" that interest you?
Is it supposed to be a spectrogram, covering a contiguous frequency range with equal spacing?
With a spacing of 30000/1024 = 29.297 Hz, 24 adjacent bins cover a range of ~700 Hz.
Or do you rather want to detect 24 (unrelated) single frequencies (out of the 512 frequencies you get from a 1024 point DFT)?

Hello @gf,

the outputs from FFT after additional processing would be used for simple spectrum analyzer (audio signal) - display on LEDs matrix. So this is the second scenario that you described.

Best Regards
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #12 on: February 23, 2022, 09:36:20 am »
To do the spectrum analyzer display, after the FFT/DFT you will need to sum up the power of each bin into the display bands of your choosing... octaves or half octaves or whatever.

But I was a little bit inspired by how to implement the DFT, and got this far...

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity dft_256 is
    Port ( clk        : in  STD_LOGIC;
           din        : in  STD_LOGIC_VECTOR (15 downto 0);
           din_valid  : in  STD_LOGIC;
           dout_r     : out STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
           dout_i     : out STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
           dout_valid : out STD_LOGIC                       := '0');
end dft_256;

architecture Behavioral of dft_256 is
    signal sequence_counter         : unsigned(7 downto 0) := (others => '1');
    signal sequence_counter_delay_1 : unsigned(7 downto 0) := (others => '1');
    signal sequence_counter_delay_2 : unsigned(7 downto 0) := (others => '1');
    signal sample_count             : unsigned(7 downto 0) := (others => '0');
   
    signal data_reg         : std_logic_vector(17 downto 0);
    type a_working is array(0 to 127) of std_logic_vector(35 downto 0);
    signal working_r        : a_working := (others => (others => '0'));
    signal working_i        : a_working := (others => (others => '0'));
    signal temp_r           : std_logic_vector(35 downto 0) := (others => '0');
    signal temp_i           : std_logic_vector(35 downto 0) := (others => '0');

    signal running_total_r  : std_logic_vector(35 downto 0) := (others => '0');
    signal running_total_i  : std_logic_vector(35 downto 0) := (others => '0');
    signal result_r         : std_logic_vector(35 downto 0) := (others => '0');
    signal result_i         : std_logic_vector(35 downto 0) := (others => '0');


    type a_sine_table is array(0 to 255) of std_logic_vector(15 downto 0);
    signal sine_table : a_sine_table := (
     x"0000", x"0192", x"0324", x"04B6", x"0647", x"07D9", x"096A", x"0AFB",
     x"0C8B", x"0E1B", x"0FAB", x"1139", x"12C7", x"1455", x"15E1", x"176D",
     x"18F8", x"1A82", x"1C0B", x"1D93", x"1F19", x"209F", x"2223", x"23A6",
     x"2527", x"26A7", x"2826", x"29A3", x"2B1E", x"2C98", x"2E10", x"2F86",
     x"30FB", x"326D", x"33DE", x"354D", x"36B9", x"3824", x"398C", x"3AF2",
     x"3C56", x"3DB7", x"3F16", x"4073", x"41CD", x"4325", x"447A", x"45CC",
     x"471C", x"4869", x"49B3", x"4AFA", x"4C3F", x"4D80", x"4EBF", x"4FFA",
     x"5133", x"5268", x"539A", x"54C9", x"55F4", x"571D", x"5842", x"5963",
     x"5A81", x"5B9C", x"5CB3", x"5DC6", x"5ED6", x"5FE2", x"60EB", x"61F0",
     x"62F1", x"63EE", x"64E7", x"65DD", x"66CE", x"67BC", x"68A5", x"698B",
     x"6A6C", x"6B4A", x"6C23", x"6CF8", x"6DC9", x"6E95", x"6F5E", x"7022",
     x"70E1", x"719D", x"7254", x"7306", x"73B5", x"745E", x"7503", x"75A4",
     x"7640", x"76D8", x"776B", x"77F9", x"7883", x"7908", x"7989", x"7A04",
     x"7A7C", x"7AEE", x"7B5C", x"7BC4", x"7C29", x"7C88", x"7CE2", x"7D38",
     x"7D89", x"7DD5", x"7E1C", x"7E5E", x"7E9C", x"7ED4", x"7F08", x"7F37",
     x"7F61", x"7F86", x"7FA6", x"7FC1", x"7FD7", x"7FE8", x"7FF5", x"7FFC",
     x"7FFF", x"7FFC", x"7FF5", x"7FE8", x"7FD7", x"7FC1", x"7FA6", x"7F86",
     x"7F61", x"7F37", x"7F08", x"7ED4", x"7E9C", x"7E5E", x"7E1C", x"7DD5",
     x"7D89", x"7D38", x"7CE2", x"7C88", x"7C29", x"7BC4", x"7B5C", x"7AEE",
     x"7A7C", x"7A04", x"7989", x"7908", x"7883", x"77F9", x"776B", x"76D8",
     x"7640", x"75A4", x"7503", x"745E", x"73B5", x"7306", x"7254", x"719D",
     x"70E1", x"7022", x"6F5E", x"6E95", x"6DC9", x"6CF8", x"6C23", x"6B4A",
     x"6A6C", x"698B", x"68A5", x"67BC", x"66CE", x"65DD", x"64E7", x"63EE",
     x"62F1", x"61F0", x"60EB", x"5FE2", x"5ED6", x"5DC6", x"5CB3", x"5B9C",
     x"5A81", x"5963", x"5842", x"571D", x"55F4", x"54C9", x"539A", x"5268",
     x"5133", x"4FFA", x"4EBF", x"4D80", x"4C3F", x"4AFA", x"49B3", x"4869",
     x"471C", x"45CC", x"447A", x"4325", x"41CD", x"4073", x"3F16", x"3DB7",
     x"3C56", x"3AF2", x"398C", x"3824", x"36B9", x"354D", x"33DE", x"326D",
     x"30FB", x"2F86", x"2E10", x"2C98", x"2B1E", x"29A3", x"2826", x"26A7",
     x"2527", x"23A6", x"2223", x"209F", x"1F19", x"1D93", x"1C0B", x"1A82",
     x"18F8", x"176D", x"15E1", x"1455", x"12C7", x"1139", x"0FAB", x"0E1B",
     x"0C8B", x"0AFB", x"096A", x"07D9", x"0647", x"04B6", x"0324", x"0192");

    signal trig_entry : unsigned(7 downto 0);
    signal sin_value  : std_logic_vector(15 downto 0);
    signal cos_value  : std_logic_vector(15 downto 0);
   
    component mac_block is
    port (
        clk : in std_logic;
        a : in std_logic_vector(17 downto 0);
        b : in std_logic_vector(15 downto 0);
        c : in std_logic_vector(35 downto 0);
        r : out std_logic_vector(35 downto 0)
    );
    end component;
begin

mac_block_r: mac_block port map (
    clk => clk,
    a   => data_reg,
    b   => cos_value,
    c   => running_total_r,
    r   => result_r);

mac_block_i: mac_block port map (
    clk => clk,
    a   => data_reg,
    b   => sin_value,
    c   => running_total_i,
    r   => result_i);
   
    -- For the first sample in a block set the running total to zero
    running_total_r <= (others => '0') when sample_count /= 0 else temp_r;
    running_total_i <= (others => '0') when sample_count /= 0 else temp_i;

process(clk)
    begin
        if rising_edge(clk) then
            -- Output the last completed DFT while inputting sample 0 of the next           
            if sample_count = 0 and sequence_counter >= 1 and sequence_counter < 128+1 then
                dout_r <= std_logic_vector(temp_r(35 downto 20));
                dout_i <= std_logic_vector(temp_i(35 downto 20));
                dout_valid <= '1';
            else
                dout_r <= (others => '0');
                dout_i <= (others => '0');
                dout_valid <= '0';
            end if;
           
            -- Write back any update value
            if sequence_counter >= 2 and sequence_counter < 128+2 then
                working_r(to_integer(sequence_counter_delay_2)) <= result_r;
                working_i(to_integer(sequence_counter_delay_2)) <= result_i;
            end if;
           
            -- Look up the working values and the sin/cos values
            if sequence_counter < working_r'high then
                temp_r    <= working_r(to_integer(sequence_counter));
                temp_i    <= working_i(to_integer(sequence_counter));
                sin_value <= sine_table(to_integer(trig_entry));
                cos_value <= sine_table(to_integer(trig_entry+64));
            end if;

            -- Restart the sequencer when a new sample arrives
            if din_valid = '1' then
                data_reg         <= din & "00";
                sequence_counter <= (others => '0');
                trig_entry       <= (others => '0');
                sample_count     <= sample_count + 1;
            elsif sequence_counter /= 255 then
                sequence_counter <= sequence_counter+1;
                trig_entry       <= trig_entry + sample_count;
            end if;
           
            -- Delayed sequence count for the write-back to working_r and working_i
            sequence_counter_delay_2 <= sequence_counter_delay_1;
            sequence_counter_delay_1 <= sequence_counter;
        end if;       
    end process;
end Behavioral;


It's most probably littered with 'off by one errors' at the moment but should give you an idea of what I'm talking about (it is most definitely wrong at the moment!!!), and I'm yet to adjust the scaling factor for all the fixed point math...

Sim is attached so you can see how it just bursts out the DFT results once it sees the first sample of a new block.
« Last Edit: February 23, 2022, 09:38:18 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: Someone, FlyingDutch

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4531
  • Country: au
    • send complaints here
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #13 on: February 23, 2022, 10:10:35 am »
The frequency resolution is OK for me (I need only 24 stripes of spectrum).

What are the "24 stripes of spectrum" that interest you?
Is it supposed to be a spectrogram, covering a contiguous frequency range with equal spacing?
With a spacing of 30000/1024 = 29.297 Hz, 24 adjacent bins cover a range of ~700 Hz.
Or do you rather want to detect 24 (unrelated) single frequencies (out of the 512 frequencies you get from a 1024 point DFT)?

Hello @gf,

the outputs from FFT after additional processing would be used for simple spectrum analyzer (audio signal) - display on LEDs matrix. So this is the second scenario that you described.

Best Regards
For that application you are probably better off with a (time multiplexed) "analog" style bandpass filter. You want some overlap so that a swept tone would maintain display across all frequencies, rather than disappearing the appearing as it hits the centre frequencies of the display. Still just a one DSP unit + one block/lut ram.

P.S. this is low throughput enough to be accomplished with a big microcontroller!
 
The following users thanked this post: FlyingDutch

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #14 on: February 23, 2022, 11:33:24 am »

P.S. this is low throughput enough to be accomplished with a big microcontroller!

Hello @Someone,

I was aware that this objectives can be accomplished with microcontroller, but I would like to try this subject on FPGA. The second reason is that maybe in final project I will use 200Ksps ADC (AD7606) and I would need bigger sampling and processing speed. These assumptions described by me  are my first try of this subject, but it is likely that there will be also second try.

Here is link to AD7606 board (max. 200Ksps sampling speed):

https://www.aliexpress.com/item/33007417676.html?gatewayAdapt=glo2pol&spm=a2g0o.detail.1000060.3.25e21dbcHG5qjT&gps-id=pcDetailBottomMoreThisSeller&scm=1007.13339.169870.0&scm_id=1007.13339.169870.0&scm-url=1007.13339.169870.0&pvid=befe63fe-bf4d-4abd-b142-794ab3616e6f&_t=gps-id:pcDetailBottomMoreThisSeller,scm-url:1007.13339.169870.0,pvid:befe63fe-bf4d-4abd-b142-794ab3616e6f,tpp_buckets:668%232846%238116%232002&pdp_ext_f=%257B%2522sku_id%2522%253A%252267081646518%2522%252C%2522sceneId%2522%253A%25223339%2522%257D&pdp_pi=-1%253B121.71%253B-1%253B-1%2540salePrice%253BPLN%253Brecommend-recommend

Best Regards
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #15 on: February 23, 2022, 11:36:02 am »
Hello @hamster_nz,

thank you very much for given code, today afternoon (after work) I will try code you introduce.

i have one question related to introduced code: I don't know what is component mac_block?
Code: [Select]
    component mac_block is
    port (
        clk : in std_logic;
        a : in std_logic_vector(17 downto 0);
        b : in std_logic_vector(15 downto 0);
        c : in std_logic_vector(35 downto 0);
        r : out std_logic_vector(35 downto 0)
    );
    end component;

What is it's implementation?

Best Regards
« Last Edit: February 23, 2022, 12:55:29 pm by FlyingDutch »
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #16 on: February 23, 2022, 04:12:14 pm »
Hello,

Today I altered the code of Python script and generated 2**13=8192 audio signal samples. I added phase shifts to two sinusoids. Here is audio signal definition:

Code: [Select]
n = np.arange(8192) #2**13
sinus = 2*np.sin(2 * np.pi * n * 50 / fs)+3*np.sin(2 * np.pi * n * 200 / fs+1.5707)+1.8*np.sin(2 * np.pi * n * 1000 / fs)+0.9*np.sin(2 * np.pi * n * 6500 / fs+0.7853)+2.4*np.sin(2 * np.pi * n * 10000 / fs)+3.7*np.sin(2 * np.pi * n * 12400 / fs)+2*np.sin(2 * np.pi * n * 14800 / fs)
widmo = np.fft.fft(sinus)

I also managed to generate .coe file (Xilinx ROM/Ram IP Core) from which I am able to initialize ROM memory.
In order to do it I installed package "fxpmath" and convert float64 numbers into fixed-point 16 bit numbers. These fixed-point numbers have 16-bit width and 10 positions fraction part (so there are 6-bits before decimal point and 10 bit fractional part). I made this with such simple code:
Code: [Select]
file = open("C:\\Rob\\AudioSignal.txt", "w")

for x in range(0, 8191):
    z = Fxp(sinus2[0][x], signed=True, n_word=16, n_frac=10)
    file.write(z.bin()+"\n")
    #print(z.hex())   # hex repr
    #print(z.val)     # raw val (decimal of binary stored)

file.close()

Here is begining of generated file:

[codememory_initialization_radix=2;
memory_initialization_vector=
0000111010001011
0010000011100111
1111011110101011
0001110010011011
0000110000100111
0001010100111101
0001010001111001
0001000111011110
0001001101001010
0000100011001100
0010010110010010
0000011111001011
0000010010101001
0010010000000101
1111010010001000
0010000001001000
0000000011110100][/code]

I am attaching to this post file with coe content (chnge txt extension to coe).

And here is full code of Python script (this altered version):
Code: [Select]
#-------------------------------------------------------------------------------
# Name:        module1
# Purpose:
#
# Author:      mgabr
#
# Created:     23.02.2022
# Copyright:   (c) mgabr 2022
# Licence:     <freer licence>
#-------------------------------------------------------------------------------

def main():
    pass

if __name__ == '__main__':
    main()

import numpy as np
import scipy.signal as sig
from scipy.io import wavfile
import matplotlib.pyplot as plt
from fxpmath import Fxp

# częstotliwość próbkowania
fs = 30000

n = np.arange(8192) #2**13
sinus = 2*np.sin(2 * np.pi * n * 50 / fs)+3*np.sin(2 * np.pi * n * 200 / fs+1.5707)+1.8*np.sin(2 * np.pi * n * 1000 / fs)+0.9*np.sin(2 * np.pi * n * 6500 / fs+0.7853)+2.4*np.sin(2 * np.pi * n * 10000 / fs)+3.7*np.sin(2 * np.pi * n * 12400 / fs)+2*np.sin(2 * np.pi * n * 14800 / fs)
widmo = np.fft.fft(sinus)

widmo_amp = np.abs(np.fft.rfft(sinus)) / 1024
f = np.fft.rfftfreq(8192, 1/fs)
plt.plot(f, widmo_amp)
plt.xlabel('częstotliwość [Hz]')
plt.ylabel('amplituda widma')
plt.title('Widmo "rzeczywiste" sygnału sinusoidalnego')
plt.show()

plt.plot(n, sinus)
plt.xlabel('czas [s]')
plt.ylabel('amplituda')
plt.title('Przebieg sygnału sinusoidalnego')
plt.show()

sinus2=sinus.reshape((1,8192)) #trzeba zmienic shape, aby zapis w pliku byl mozliwy
n2=n.reshape((1,8192))


# printing the array and checking datatype
print('Array:', sinus2.shape)
print('Datatype:', sinus2.dtype)

file = open("C:\\Rob\\AudioSignal.txt", "w")

for x in range(0, 8191):
    z = Fxp(sinus2[0][x], signed=True, n_word=16, n_frac=10)
    file.write(z.bin()+"\n")
    #print(z.hex())   # hex repr
    #print(z.val)     # raw val (decimal of binary stored)

file.close()


I was using free "PyScripter"  Python editor (and Python version 3.8.9). Now I am able to initialize "ROM" IP core (Xilinx) and try to calculate FFT (aslo IP Core) on FPGA with Spartan7 FPGA (XC7s15).

Best Regards
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #17 on: February 23, 2022, 07:12:04 pm »
Here's the mac_block -  it's just an inferred DSP slice:

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity mac_block is
    Port ( clk : in STD_LOGIC;
           a : in STD_LOGIC_VECTOR (17 downto 0);
           b : in STD_LOGIC_VECTOR (15 downto 0);
           c : in STD_LOGIC_VECTOR (35 downto 0);
           r : out STD_LOGIC_VECTOR (35 downto 0));
end mac_block;

architecture Behavioral of mac_block is
    signal a_s : signed (17 downto 0);
    signal b_s : signed (15 downto 0);
    signal c_s : signed (35 downto 0);
    signal r_s : signed (35 downto 0);

begin
    r   <= std_logic_vector(r_s);

process(clk)
    begin
        if rising_edge(clk) then
            -- Calculate the output
            r_s <= a_s*b_s+c_s;
            -- Register the inputs
            a_s <= signed(a);
            b_s <= signed(b);
            c_s <= signed(c);
        end if;
    end process;

end Behavioral;

I'll just do a little more work on actually making it work properly, then I'll post revised DFT code...
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: FlyingDutch

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #18 on: February 23, 2022, 07:16:48 pm »
Here's the mac_block -  it's just an inferred DSP slice:

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity mac_block is
    Port ( clk : in STD_LOGIC;
           a : in STD_LOGIC_VECTOR (17 downto 0);
           b : in STD_LOGIC_VECTOR (15 downto 0);
           c : in STD_LOGIC_VECTOR (35 downto 0);
           r : out STD_LOGIC_VECTOR (35 downto 0));
end mac_block;

architecture Behavioral of mac_block is
    signal a_s : signed (17 downto 0);
    signal b_s : signed (15 downto 0);
    signal c_s : signed (35 downto 0);
    signal r_s : signed (35 downto 0);

begin
    r   <= std_logic_vector(r_s);

process(clk)
    begin
        if rising_edge(clk) then
            -- Calculate the output
            r_s <= a_s*b_s+c_s;
            -- Register the inputs
            a_s <= signed(a);
            b_s <= signed(b);
            c_s <= signed(c);
        end if;
    end process;

end Behavioral;

I'll just do a little more work on actually making it work properly, then I'll post revised DFT code...

Why not make the ports signed instead of std_logic_vector?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #19 on: February 24, 2022, 01:35:47 am »
To answer Bassman59's question about why not "signed" on the module? Shrug, I don't have an opinion either way. My view is that outside the MAC block they are just bits, with no real meaning.

Here is the working DFT code.

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity dft_256 is
    Port ( clk        : in  STD_LOGIC;
           din        : in  STD_LOGIC_VECTOR (15 downto 0);
           din_valid  : in  STD_LOGIC;
           dout_r     : out STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
           dout_i     : out STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
           dout_valid : out STD_LOGIC                       := '0');
end dft_256;

architecture Behavioral of dft_256 is
    signal sequence_counter         : unsigned(7 downto 0) := (others => '1');
    signal sequence_counter_delay_1 : unsigned(7 downto 0) := (others => '1');
    signal sequence_counter_delay_2 : unsigned(7 downto 0) := (others => '1');
    signal sequence_counter_delay_3 : unsigned(7 downto 0) := (others => '1');
    signal sample_count             : unsigned(7 downto 0) := (others => '1');
   
    signal data_reg         : std_logic_vector(17 downto 0);
    type a_working is array(0 to 127) of std_logic_vector(40 downto 0);
    signal working_r        : a_working := (others => (others => '0'));
    signal working_i        : a_working := (others => (others => '0'));
    signal temp_r           : std_logic_vector(40 downto 0) := (others => '0');
    signal temp_i           : std_logic_vector(40 downto 0) := (others => '0');

    signal running_total_r  : std_logic_vector(40 downto 0) := (others => '0');
    signal running_total_i  : std_logic_vector(40 downto 0) := (others => '0');
    signal result_r         : std_logic_vector(40 downto 0) := (others => '0');
    signal result_i         : std_logic_vector(40 downto 0) := (others => '0');


    type a_sine_table is array(0 to 255) of std_logic_vector(15 downto 0);
    -- Table body generaeted with this C code
    --
    -- #include <math.h>
    -- #include <stdio.h>
    --
    -- int main(void) {
    --    for(int i = 0; i < 256; i++) {
    --      if(i%8 == 0) {
    --         printf("    ");
    --      }
    --      printf(" x\"%04X\",", (int)(sin(i*2*M_PI/256)*32767)&0xFFFF);
    --      if(i%8 == 7) {
    --        printf("\n");
    --      }
    --    }
    -- }
    signal sine_table : a_sine_table := (
     x"0000", x"0324", x"0647", x"096A", x"0C8B", x"0FAB", x"12C7", x"15E1",
     x"18F8", x"1C0B", x"1F19", x"2223", x"2527", x"2826", x"2B1E", x"2E10",
     x"30FB", x"33DE", x"36B9", x"398C", x"3C56", x"3F16", x"41CD", x"447A",
     x"471C", x"49B3", x"4C3F", x"4EBF", x"5133", x"539A", x"55F4", x"5842",
     x"5A81", x"5CB3", x"5ED6", x"60EB", x"62F1", x"64E7", x"66CE", x"68A5",
     x"6A6C", x"6C23", x"6DC9", x"6F5E", x"70E1", x"7254", x"73B5", x"7503",
     x"7640", x"776B", x"7883", x"7989", x"7A7C", x"7B5C", x"7C29", x"7CE2",
     x"7D89", x"7E1C", x"7E9C", x"7F08", x"7F61", x"7FA6", x"7FD7", x"7FF5",
     x"7FFF", x"7FF5", x"7FD7", x"7FA6", x"7F61", x"7F08", x"7E9C", x"7E1C",
     x"7D89", x"7CE2", x"7C29", x"7B5C", x"7A7C", x"7989", x"7883", x"776B",
     x"7640", x"7503", x"73B5", x"7254", x"70E1", x"6F5E", x"6DC9", x"6C23",
     x"6A6C", x"68A5", x"66CE", x"64E7", x"62F1", x"60EB", x"5ED6", x"5CB3",
     x"5A81", x"5842", x"55F4", x"539A", x"5133", x"4EBF", x"4C3F", x"49B3",
     x"471C", x"447A", x"41CD", x"3F16", x"3C56", x"398C", x"36B9", x"33DE",
     x"30FB", x"2E10", x"2B1E", x"2826", x"2527", x"2223", x"1F19", x"1C0B",
     x"18F8", x"15E1", x"12C7", x"0FAB", x"0C8B", x"096A", x"0647", x"0324",
     x"0000", x"FCDC", x"F9B9", x"F696", x"F375", x"F055", x"ED39", x"EA1F",
     x"E708", x"E3F5", x"E0E7", x"DDDD", x"DAD9", x"D7DA", x"D4E2", x"D1F0",
     x"CF05", x"CC22", x"C947", x"C674", x"C3AA", x"C0EA", x"BE33", x"BB86",
     x"B8E4", x"B64D", x"B3C1", x"B141", x"AECD", x"AC66", x"AA0C", x"A7BE",
     x"A57F", x"A34D", x"A12A", x"9F15", x"9D0F", x"9B19", x"9932", x"975B",
     x"9594", x"93DD", x"9237", x"90A2", x"8F1F", x"8DAC", x"8C4B", x"8AFD",
     x"89C0", x"8895", x"877D", x"8677", x"8584", x"84A4", x"83D7", x"831E",
     x"8277", x"81E4", x"8164", x"80F8", x"809F", x"805A", x"8029", x"800B",
     x"8001", x"800B", x"8029", x"805A", x"809F", x"80F8", x"8164", x"81E4",
     x"8277", x"831E", x"83D7", x"84A4", x"8584", x"8677", x"877D", x"8895",
     x"89C0", x"8AFD", x"8C4B", x"8DAC", x"8F1F", x"90A2", x"9237", x"93DD",
     x"9594", x"975B", x"9932", x"9B19", x"9D0F", x"9F15", x"A12A", x"A34D",
     x"A57F", x"A7BE", x"AA0C", x"AC66", x"AECD", x"B141", x"B3C1", x"B64D",
     x"B8E4", x"BB86", x"BE33", x"C0EA", x"C3AA", x"C674", x"C947", x"CC22",
     x"CF05", x"D1F0", x"D4E2", x"D7DA", x"DAD9", x"DDDD", x"E0E7", x"E3F5",
     x"E708", x"EA1F", x"ED39", x"F055", x"F375", x"F696", x"F9B9", x"FCDC");

    signal trig_entry   : unsigned(7 downto 0);
    signal trig_entry_c : unsigned(7 downto 0);
    signal sin_value    : std_logic_vector(15 downto 0);
    signal cos_value    : std_logic_vector(15 downto 0);
   
    component mac_block is
    port (
        clk : in std_logic;
        a : in std_logic_vector(17 downto 0);
        b : in std_logic_vector(15 downto 0);
        c : in std_logic_vector(40 downto 0);
        r : out std_logic_vector(40 downto 0)
    );
    end component;
begin

mac_block_r: mac_block port map (
    clk => clk,
    a   => data_reg,
    b   => cos_value,
    c   => running_total_r,
    r   => result_r);

mac_block_i: mac_block port map (
    clk => clk,
    a   => data_reg,
    b   => sin_value,
    c   => running_total_i,
    r   => result_i);
   
    -- For the first sample in a block set the running total to zero
    running_total_r <= (others => '0') when sample_count = 0 else temp_r;
    running_total_i <= (others => '0') when sample_count = 0 else temp_i;
    trig_entry_c    <= trig_entry + 64;


process(clk)
    begin
        if rising_edge(clk) then
            -- Output the last completed DFT while inputting sample 0 of the next           
            if sample_count = 0 and sequence_counter >= 1 and sequence_counter < 128+1 then
                dout_r <= std_logic_vector(temp_r(temp_r'high downto temp_r'high-15));
                dout_i <= std_logic_vector(temp_i(temp_i'high downto temp_i'high-15));
                dout_valid <= '1';
            else
                dout_r <= (others => '0');
                dout_i <= (others => '0');
                dout_valid <= '0';
            end if;
           
            -- Write back any update value
            if sequence_counter >= 3 and sequence_counter < 128+3 then
                working_r(to_integer(sequence_counter_delay_3)) <= result_r;
                working_i(to_integer(sequence_counter_delay_3)) <= result_i;
            end if;
           
            -- Look up the working values and the sin/cos values
            if sequence_counter <= working_r'high then
                temp_r    <= working_r(to_integer(sequence_counter));
                temp_i    <= working_i(to_integer(sequence_counter));
                sin_value <= sine_table(to_integer(trig_entry));
                cos_value <= sine_table(to_integer(trig_entry_c));
            end if;

            -- Restart the sequencer when a new sample arrives
            if din_valid = '1' then
                data_reg         <= din & "00";
                sequence_counter <= (others => '0');
                trig_entry       <= (others => '0');
                sample_count     <= sample_count + 1;
            elsif sequence_counter /= 255 then
                sequence_counter <= sequence_counter + 1;
                trig_entry       <= trig_entry + sample_count;
            end if;
           
            -- Delayed sequence count for the write-back to working_r and working_i
            sequence_counter_delay_3 <= sequence_counter_delay_2;
            sequence_counter_delay_2 <= sequence_counter_delay_1;
            sequence_counter_delay_1 <= sequence_counter;
        end if;       
    end process;

end Behavioral;


You could play around with the scaling factors of the 'sine' table, to tweak where the full scale output is on the DFT.

And here is the MAC block:

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity mac_block is
    Port ( clk : in STD_LOGIC;
           a : in STD_LOGIC_VECTOR (17 downto 0);
           b : in STD_LOGIC_VECTOR (15 downto 0);
           c : in STD_LOGIC_VECTOR (40 downto 0);
           r : out STD_LOGIC_VECTOR (40 downto 0));
end mac_block;

architecture Behavioral of mac_block is
    signal a_s : signed (17 downto 0);
    signal b_s : signed (15 downto 0);
    signal c_s : signed (40 downto 0);
    signal r_s : signed (40 downto 0);
   
begin
    r   <= std_logic_vector(r_s);

process(clk)
    begin
        if rising_edge(clk) then
            r_s <= a_s*b_s+c_s;
            -- Register the inputs
            a_s <= signed(a);
            b_s <= signed(b);
            c_s <= signed(c);
        end if;
    end process;

end Behavioral;


Here's a test sample source:

Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;


entity source is
    Port ( clk         : in STD_LOGIC;
           counter_max : in std_logic_vector(15 downto 0);
           data        : out STD_LOGIC_VECTOR (15 downto 0) := (others => '0');
           data_valid  : out std_logic                       := '0'
     );
end source;

architecture Behavioral of source is
    type a_data is array(0 to 31) of std_logic_vector(15 downto 0);
    signal values : a_data := (
       -- Sligtly offset source sine
--     x"30FB", x"471C", x"5A81", x"6A6C", x"7640", x"7D89", x"7FFF", x"7D89",
--     x"7640", x"6A6C", x"5A81", x"471C", x"30FB", x"18F8", x"0000", x"E708",
--     x"CF05", x"B8E4", x"A57F", x"9594", x"89C0", x"8277", x"8001", x"8277",
--     x"89C0", x"9594", x"A57F", x"B8E4", x"CF05", x"E708", x"0000", x"18F8"

       -- Perfectly aligned square
       x"0000", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF",
       x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF", x"7FFF",
       x"0000", x"8001", x"8001", x"8001", x"8001", x"8001", x"8001", x"8001",
       x"8001", x"8001", x"8001", x"8001", x"8001", x"8001", x"8001", x"8001"
    );
   
    signal counter : unsigned (15 downto 0) := (others => '0');
    signal sample : unsigned(9 downto 0) := (others => '0');
begin

process(clk)
    begin
        if rising_edge(clk) then
            if counter = 0 then
                data <= values(to_integer(sample));
                data_valid <= '1';
                counter <= unsigned(counter_max);
                if sample = values'high then
                    sample <= (others => '0');
                else
                    sample <= sample+1;
                end if;
            else
                counter <= counter-1;
                data_valid <= '0';
            end if;
        end if;
    end process;
end Behavioral;


Top level test bench:
Code: [Select]
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity tb_dft_top is
end tb_dft_top;

architecture Behavioral of tb_dft_top is
    component dft_top is
    Port ( clk : in STD_LOGIC;
           fft_r : out STD_LOGIC_VECTOR (15 downto 0);
           fft_i : out STD_LOGIC_VECTOR (15 downto 0);
           fft_valid : out STD_LOGIC);
    end component;

    signal clk       : STD_LOGIC;
    signal fft_r     : STD_LOGIC_VECTOR (15 downto 0);
    signal fft_i     : STD_LOGIC_VECTOR (15 downto 0);
    signal fft_valid : STD_LOGIC;

begin

clk_proc: process
    begin
        clk <= '0';
        wait for 10 ns;
        clk <= '1';
        wait for 10 ns;
    end process;

uut: dft_top port map (
    clk       => clk,
    fft_r     => fft_r,
    fft_i     => fft_i,
    fft_valid => fft_valid
);

end Behavioral;

Images attached are simulations of the sine and square test waves, showing the spectrum as it is streamed out.

Still doesn't address the issue of windowing the data on the way in, or calculating the "power = sqrt(r*r+i*i)" on the way out of the DFT..

For the former, this line in the DFT could be adapted:
Code: [Select]
                data_reg         <= din & "00";


For the later, a CORDIC magnitude would be sweet....
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: FlyingDutch

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14471
  • Country: fr
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #20 on: February 24, 2022, 02:18:01 am »
When dealing with internal registers/ports that are by design supposed to hold "numbers", I usually use numeric types in VHDL, and use std_logic_vector basically only for the top-level ports, which are indeed just connected to "pins" and thus are just signals. Makes it more readable IMHO and avoids a number of conversions. But that's just my 2 cents.
 
The following users thanked this post: nctnico, 2N3055

Offline dawnclaude

  • Contributor
  • Posts: 15
  • Country: tr
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #21 on: February 24, 2022, 06:02:08 am »
I prefer generating the filter coefficients in HDL over using .coe files if the mathematical derivation isn't too complex. Here is an incomplete snippet(because piece of a larger code):

Code: [Select]
architecture Behavioral of hann_window is

    constant WINDOW_ADDR_WIDTH : integer := clogb2(WINDOW_LENGTH);
   
    TYPE WINDOW_ROM_TYPE IS ARRAY(0 TO WINDOW_LENGTH-1) OF STD_LOGIC_VECTOR(WINDOW_WIDTH-1 downto 0);
    SIGNAL WINDOW_ROM: WINDOW_ROM_TYPE; -- The ROM block
   
    signal window_addr : unsigned(WINDOW_ADDR_WIDTH-1 downto 0);
    signal window_data : std_logic_vector(WINDOW_WIDTH-1 downto 0);
   
begin
   
    process(clk)
    begin
        if rising_edge(clk) then               
                window_data <= WINDOW_ROM(to_integer(window_addr)); -- Synchronous ROM
        end if;
    end process;


    WINDOW_LUT:
    FOR idx in 0 TO WINDOW_LENGTH-1 GENERATE 
        CONSTANT x: REAL := real(0.5) - real(0.5)*cos(real(2)*real(MATH_PI)*real(idx)/real(WINDOW_LENGTH-1));
        CONSTANT xn: UNSIGNED (WINDOW_WIDTH-1 DOWNTO 0) := to_unsigned(INTEGER(FLOOR(x*real(2**WINDOW_WIDTH))),WINDOW_WIDTH);
    BEGIN
        WINDOW_ROM(idx) <= STD_LOGIC_VECTOR(xn);
    END GENERATE;

end Behavioral;
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #22 on: February 24, 2022, 07:06:39 am »
To answer Bassman59's question about why not "signed" on the module? Shrug, I don't have an opinion either way. My view is that outside the MAC block they are just bits, with no real meaning.

Here is the working DFT code.

For the later, a CORDIC magnitude would be sweet....

Hello,

I just try this code in "Vivado" it synthesize properly. I run the test_bench.
Thank you very much.

BTW: Violent war is take place at my country borders. I am really afraid, and lose mood for electronics projects.

Best Regards
« Last Edit: February 24, 2022, 04:57:16 pm by FlyingDutch »
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #23 on: February 26, 2022, 02:04:32 pm »
Hello,

today I created "Vivado" (Xilinx) project for FPGA board with Spartan7 (xc7s15ftgb196-2) and made a "Block Design". Then I placed on "Block design" few needed IP Cores:

1) Clocking Wizard

2) Processor System Reset

3) Block Memory Generator (memory is initialized from coe file with 8192 samples of audio signal, generated by Python script)

4) AXI4-Stream Data Width Converter

5) Fast Fourier Transform

and combined them with suitable buses/wires and also lead out input/output ports (for block design). See the screenshot from Xilinx Vivado. I also attached a pdf document with "Block Design".

Then I generated "HDL Wrapper" for "Block Design" - see code below:

Code: [Select]
//Copyright 1986-2020 Xilinx, Inc. All Rights Reserved.
//--------------------------------------------------------------------------------
//Tool Version: Vivado v.2020.1 (win64) Build 2902540 Wed May 27 19:54:49 MDT 2020
//Date        : Sat Feb 26 12:29:52 2022
//Host        : DESKTOP-4Q2NB93 running 64-bit major release  (build 9200)
//Command     : generate_target design_fft_wrapper.bd
//Design      : design_fft_wrapper
//Purpose     : IP block netlist
//--------------------------------------------------------------------------------
`timescale 1 ps / 1 ps

module design_fft_wrapper
   (Address12b,
    clk_50MHz,
    event_data_in_channel_halt,
    event_data_out_channel_halt,
    event_fft_overflow,
    event_frame_started,
    event_status_channel_halt,
    event_tlast_missing,
    event_tlast_unexpected,
    m_axis_data_tdata,
    m_axis_data_tlast,
    m_axis_data_tready,
    m_axis_data_tuser,
    m_axis_data_tvalid,
    reset_rtl_0,
    s_axis_config_tdata,
    s_axis_config_tready,
    s_axis_config_tvalid,
    s_axis_data_tlast,
    s_axis_tready,
    s_axis_tvalid);
  input [12:0]Address12b;
  input clk_50MHz;
  output event_data_in_channel_halt;
  output event_data_out_channel_halt;
  output event_fft_overflow;
  output event_frame_started;
  output event_status_channel_halt;
  output event_tlast_missing;
  output event_tlast_unexpected;
  output [31:0]m_axis_data_tdata;
  output m_axis_data_tlast;
  input m_axis_data_tready;
  output [23:0]m_axis_data_tuser;
  output m_axis_data_tvalid;
  input reset_rtl_0;
  input [31:0]s_axis_config_tdata;
  output s_axis_config_tready;
  input s_axis_config_tvalid;
  input s_axis_data_tlast;
  output s_axis_tready;
  input s_axis_tvalid;

  wire [12:0]Address12b;
  wire clk_50MHz;
  wire event_data_in_channel_halt;
  wire event_data_out_channel_halt;
  wire event_fft_overflow;
  wire event_frame_started;
  wire event_status_channel_halt;
  wire event_tlast_missing;
  wire event_tlast_unexpected;
  wire [31:0]m_axis_data_tdata;
  wire m_axis_data_tlast;
  wire m_axis_data_tready;
  wire [23:0]m_axis_data_tuser;
  wire m_axis_data_tvalid;
  wire reset_rtl_0;
  wire [31:0]s_axis_config_tdata;
  wire s_axis_config_tready;
  wire s_axis_config_tvalid;
  wire s_axis_data_tlast;
  wire s_axis_tready;
  wire s_axis_tvalid;

  design_fft design_fft_i
       (.Address12b(Address12b),
        .clk_50MHz(clk_50MHz),
        .event_data_in_channel_halt(event_data_in_channel_halt),
        .event_data_out_channel_halt(event_data_out_channel_halt),
        .event_fft_overflow(event_fft_overflow),
        .event_frame_started(event_frame_started),
        .event_status_channel_halt(event_status_channel_halt),
        .event_tlast_missing(event_tlast_missing),
        .event_tlast_unexpected(event_tlast_unexpected),
        .m_axis_data_tdata(m_axis_data_tdata),
        .m_axis_data_tlast(m_axis_data_tlast),
        .m_axis_data_tready(m_axis_data_tready),
        .m_axis_data_tuser(m_axis_data_tuser),
        .m_axis_data_tvalid(m_axis_data_tvalid),
        .reset_rtl_0(reset_rtl_0),
        .s_axis_config_tdata(s_axis_config_tdata),
        .s_axis_config_tready(s_axis_config_tready),
        .s_axis_config_tvalid(s_axis_config_tvalid),
        .s_axis_data_tlast(s_axis_data_tlast),
        .s_axis_tready(s_axis_tready),
        .s_axis_tvalid(s_axis_tvalid));
endmodule


Now I will try to write test bench for this project and see on simulation how looks calculated Fourier Transform for these 8192 samples.

Best regards
« Last Edit: February 26, 2022, 02:12:10 pm by FlyingDutch »
 

Offline FlyingDutchTopic starter

  • Regular Contributor
  • *
  • Posts: 144
  • Country: pl
Re: Xilinx FFT IP Core - window filtering function is not implemented?
« Reply #24 on: March 20, 2022, 09:57:40 am »
Hello,

in the meantime, I made a "proof of concept" using external I2S (IC PCM1805) Delta-Sigma ADC for taking samples of audio signal. The "proof of concept" had been made with external board with I2S PCM1808 ADC, and "ESP32 Wrover" board. Here is link to I2S ADC board I used:

https://www.aliexpress.com/item/32830812025.html?gatewayAdapt=glo2pol&spm=a2g0o.order_list.0.0.21ef1c24fMPGJ3

Here is page on polish electronics forum, where I described in details this trials (English translation of WWW page made by "Google Translate"):

https://forbot-pl.translate.goog/forum/topic/21177-sipeed-tang-nano-4k-z-adc-gowin-fpga-designer/page/9/?_x_tr_sl=pl&_x_tr_tl=en&_x_tr_hl=pl&_x_tr_pto=wapp#comments

Because I2S protocol is very simple, it should be easy to write procedure getting audio samples  from I2S (Pcm1808) ADC in Verilog on FPGA board. I would like to use "Sipeed Tang Nano 4K" Gowin FPGA board for this purpose. The 512 point FFT I would like to calculate using IP Core from "Gowin EDA" environment. I write in this thread how the results are.

Best Regards
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf