Author Topic: Starting with STM32 (NUCLEO-L412KB)  (Read 11175 times)

0 Members and 1 Guest are viewing this topic.

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #100 on: May 20, 2024, 04:01:02 pm »
Impulse response:
Code: [Select]
44314902
148402228
221708163
222778963
186551583
137137416
89104182
49710214
21259325
3128889
-6704820
-10651461
-10921222
-9256912
-6873443
-4513878
-2556128
-1128066
-207962
299208
510446
534733
458937
344221
228442
131230
59608
12961
-13151
-24399
-26152
-22736
-17228
-11554
-6731
-3142
-779
563
1159
1273
1121
857
579
340
161
42
-27
-59
-66
-59
-46
-32
-20
-11
-5
-2
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #101 on: May 20, 2024, 04:04:27 pm »
Step (of value = 100000) response:

Code: [Select]
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4127
17947
38594
59341
76715
89487
97786
102416
104396
104687
104062
103070
102053
101191
100551
100131
99893
99788
99769
99797
99844
99893
99935
99967
99988
100000
100006
100008
100007
100005
100003
100001
99999
99998
99997
99996
99996
99996
99996
99996
95869
82048
61400
40652
23278
10506
2208
-2421
-4401
-4692
-4067
-3075
-2058
-1196
-556
-136
102
207
226
198
150
100
57
25
4
-8
-13
-14
-13
-11
-9
-7
-5
-4
-3
-2
-2
-2
-2
-2

Now the scale is OK.
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #102 on: May 20, 2024, 04:05:29 pm »
STM32 program:

Code: [Select]
        // Define variables
        q31_t pSrc[100];
        q31_t pDst[100];
        arm_biquad_casd_df1_inst_q31 S;
        q31_t pCoeffs[] = {44314902, 88629804, 44314902, 1448274717, -551792502};
               // { 0x02A43116, 0x0548622C, 0x02A43116, 0xA9AD14E3, 0x20E3AF76 };
        q31_t pState;

        uart_init();
        while (1) {
            // Initialize buffer pSrc
            for (int i = 0; i < 100; i++)
                pSrc[i] = 0;
            //pSrc[0]= 0x40000000;
            for (int i = 20; i < 60; i++)  pSrc[i] = 100000;

            // Process data
            arm_biquad_cascade_df1_init_q31(&S, 1, pCoeffs, &pState, 1);
            GPIOA->BSRR = (1 << 3); // Set PA3
            arm_biquad_cascade_df1_q31(&S, pSrc, pDst, 100);
            GPIOA->BRR = (1 << 3);  // Reset PA3

            // Output data
            for (int i = 0; i < 100; i++) {
                printf("%ld\r\n", pDst[i]);
                HAL_Delay(2);
            }
            HAL_Delay(2000);
        }

The process time is maintained at 42us for 100 samples.
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #103 on: May 20, 2024, 04:26:22 pm »
Gain is basically correct now, but there is still a problem with a full step (-2147483648 -> 2147483647). Then the overshoot will exceed the q31 range, and the response will go crazy :scared:. If you want the filter to withstand a full step, the gain would need to be reduced. This could be done by scaling down all three b parameters with the same factor (say 0.8 or whatsoever is necessary to bring the overshoot into a -1...1 range).

EDIT:

And keep in mind: The smaller the signal, the larger the (relative) numerical error.
Note that your step amplitude of 100000 in Q31 corresponds to only 4.6566e-05.
That's less than one digital ADCl count after scaling the ADC full-scale range to Q31.
« Last Edit: May 20, 2024, 04:41:13 pm by gf »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #104 on: May 20, 2024, 05:05:14 pm »
No problem.
I will multiply the value of two ADCs (12bits x 12bits = 24bits) and perhaps add several samples (16) before applying the filter. This produces 28-bit values (worst case), which will not saturate the filter and, I hope, will be sufficiently large values.

The problem now is to downsample from 1.333MHz or higher, to an output frequency of 10khz at most. In slower cases, I will need an output frequency lower than 1Hz and I don't know how the filter will behave in that case. Perhaps I may need to sum several blocks of input samples before applying the output filter.
« Last Edit: May 20, 2024, 05:07:48 pm by Picuino »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #105 on: May 20, 2024, 05:37:20 pm »
As the filter has a lower cutoff frequency, the coefficients become smaller and smaller and, therefore, resolution is lost.
There comes a time when the coefficients are very small and become almost zero.

Code: [Select]
   // Cutoff frequency = 10000
   coeff = { 0x0008CE29, 0x00119C53, 0x0008CE29, 0x7BBC3A97, 0xC4208CC3 };

   // Cutoff frequency = 1000
   coeff = { 0x00001738, 0x00002E70, 0x00001738, 0x7F92C8E9, 0xC06CDA36 };

   // Cutoff frequency = 100
   coeff = { 0x0000003C, 0x00000077, 0x0000003C, 0x7FF51415, 0xC00AEAFD };

   // Cutoff frequency = 10
   coeff = { 0x00000001, 0x00000001, 0x00000001, 0x7FFEE868, 0xC0011795 };

   // Cutoff frequency = 1
   coeff = { 0x00000000, 0x00000000, 0x00000000, 0x7FFFE40A, 0xC0001BF6 };


Script:
Code: [Select]
#
# Python script to calculate Butterworth filter coefficients
#
import scipy
import math

fc = 1000      # Cutoff frequency
fs = 1333000   # Sample rate

for fc in [10000, 1000, 100, 10, 1]:
    b, a = scipy.signal.butter(2, fc, btype='low', analog=False, output='ba', fs=fs)
    coeff = list(b) + list(-a)[1:]
    coeff = [round(c * (2 ** 30)) for c in coeff]
    coeff = [c if c >= 0 else c + 0x100000000 for c in coeff]
    print()
    print("   // Cutoff frequency = %d" % fc)
    print("   coeff = { 0x%08X, 0x%08X, 0x%08X, 0x%08X, 0x%08X };" % tuple(coeff))
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #106 on: May 20, 2024, 06:26:23 pm »
High Precision Q31 Biquad Cascade Filter
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1__32x64.html

Code: [Select]
arm_biquad_cas_df1_32x64_q31 (const arm_biquad_cas_df1_32x64_ins_q31 *S, const q31_t *pSrc, q31_t *pDst, uint32_t blockSize)
Quote
This function implements a high precision Biquad cascade filter which operates on Q31 data values. The filter coefficients are in 1.31 format and the state variables are in 1.63 format. The double precision state variables reduce quantization noise in the filter and provide a cleaner output. These filters are particularly useful when implementing filters in which the singularities are close to the unit circle. This is common for low pass or high pass filters with very low cutoff frequencies.

EDIT:

Python script with postShift manage:
Code: [Select]
#
# Python script to calculate Butterworth filter coefficients
#
import scipy

fs = 1333000   # Sample rate
fc = 100000      # Cutoff frequency
postShift = 0  # Post shift gain

b, a = scipy.signal.butter(2, fc, btype='low', analog=False, output='ba', fs=fs)
coeff = list(b) + list(-a)[1:]
while max(coeff) > 1.0 or min(coeff) < -1.0:
    postShift += 1
    coeff = [c/2 for c in coeff]
coeff = [round(c * (2 ** (31-postShift))) for c in coeff]
coeff = [c if c >= 0 else c + 0x100000000 for c in coeff]
print("   // Sample rate = %0.2f" % fs)
print("   // Cutoff frequency = %0.2f" % fc)
print("   postShift = %d;" % postShift)         
print("   q31_t pCoeffs = { 0x%08X, 0x%08X, 0x%08X, 0x%08X, 0x%08X };" % tuple(coeff))
« Last Edit: May 20, 2024, 06:42:11 pm by Picuino »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14692
  • Country: fr
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #107 on: May 20, 2024, 09:29:23 pm »
Sorry if you discussed it before - but the doesn't the STM32L4 have an FPU? Are you sure any of these fixed-point implementations will be faster than using floating point?
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #108 on: May 20, 2024, 10:15:13 pm »
Sorry if you discussed it before - but the doesn't the STM32L4 have an FPU? Are you sure any of these fixed-point implementations will be faster than using floating point?

I would not be sure either that fixed point is really faster.

Floating point normalization certainly solves scaling issues.
But I have doubts that (single precision) FP solves IIR precision issues, since mantissa precision is only 24 bits.
« Last Edit: May 20, 2024, 11:10:14 pm by gf »
 

Online dietert1

  • Super Contributor
  • ***
  • Posts: 2187
  • Country: br
    • CADT Homepage
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #109 on: May 21, 2024, 05:11:52 am »
For a lock-in one uses a direct receiver, that is signal detection by mixing the input signal with the carrier of known frequency and phase. This mixing with a synthetic carrier of adjustable phase happens at the ADC sampling rate. One multiplication per ADC sample.
The output of that mixer is bandwidth limited at the carrier frequency. Any meaningful sampling rate won't be higher than the carrier frequency, except nyquist factor 2. So there should be some down sampling, e.g. using a boxcar.
As far as i understand the filter design discussed here is to reduce detection bandwidth even further, e.g. to 1 Hz with a 1 KHz carrier. In this case i see the need for precision but i don't see the need for high speed. At 10 KHz carrier frequency the rate will be 20 KHz (50 usec).
For a IIR filter i would try to use the FPU. If the dynamic range is a problem, one can get a F7 or H7 with the double FPU. Others get the precision by using two or three filter stages running at ever lower rates, where each stage reduces bandwidth and rate by at most a factor 10 or so. As far as i remember Cortex M supports filters with double precision accumulator (32 * 32 to 64 multiply and add into 64 bit accu).

Regards, Dieter
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #110 on: May 21, 2024, 07:58:47 am »
Yes, what I am going to do is to multiply some 32 or 64 samples of the two ADCs and add the result in an accumulator. The output will be a buffer with the data from the accumulators, which should come out of the DAC at a rate of about 100kHz.
To that 100kHz buffer is where I should apply the filter or filters and, in the case of slower outputs, reduce the number of points.

What is not clear to me is how to reduce the number of points of the DAC after a filter. Do I add several points again to generate a single point?
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #111 on: May 21, 2024, 09:00:53 am »
Yes, what I am going to do is to multiply some 32 or 64 samples of the two ADCs and add the result in an accumulator. The output will be a buffer with the data from the accumulators, which should come out of the DAC at a rate of about 100kHz.

What you have in mind is basically boxcar filter (moving average filter) with 32...64 taps, applied to the stream of multiplied samples, followed by downsampling from 1333kSa/s to ~100kSa/s (factor 13x) by picking only every 13th sample and discarding the samples in between. And obviously you want a filter length which is larger than the decimation factor.

If the number of boxcar taps is an integer multiple of the decimation factor, then the cheapest way to do that is a CIC decimator (e.g. 13x5 -> 65 taps, or 16x4 -> 64 taps for fsout=fs/16=83kSa/s). See https://www.dsprelated.com/showarticle/1337.php.

EDIT:

Attached is the frequency response plot of a 64-tap boxcar filter @1333kSa/s. Check yourself what this means for the suppression of the carrier and carrier harmonics if the carrier frequency can be arbitrary. Also keep in mind that any frequencies which pass through the filter (with more or less attenuation) are folded down to freqencies < fs/R/2 by the downsampling (where fs is the original sample rate and R is the downsampling factor). If the carrier frequency happens to be "unfavorable", the folded frequencies can even fall into the 0...10kHz region of interest.

[ With your milliohm meter, the length of the boxcar was an integer mutiple of the carrier period. Then the carrier and carrier harmonics fall into zeros of the filter's frequency response and are rejected completely. But this is no longer granted if the carrier frequency can be arbitrary. ]

For comparison I also added a plot with the frequency response of a "proper" FIR downsampling filter which avoids aliasing (with a stopband attenuation of ~65dB; more that that is possible, too, with more taps). 9x13=117 taps means that 9 taps must be calculated for each source sample when the downsampling factor is 13. With CMSIS, you could use arm_fir_decimate_q31() to do the filtering and decimation.
« Last Edit: May 21, 2024, 10:05:24 am by gf »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #112 on: May 21, 2024, 10:56:14 am »
There is a very simple detail that I don't quite understand.
It seems that a Boxcar filter or a CIC filter is a filter that adds the previous N samples, so the algorithm would be something like this:
Code: [Select]
out[10] =                            in[10] + in[9] + in[8] + in[7]
out[11] =                   in[11] + in[10] + in[9] + in[8]
out[12] =          in[12] + in[11] + in[10] + in[9]
out[13] = in[13] + in[12] + in[11] + in[10]
and so on...

This means that the number of output samples is equal to the number of input samples, but filtered.

However what I need is to reduce the number of samples (for example from 1333kHz to 133kHz).

The only way I can think of to do this is to sum blocks:
Code: [Select]
out[1] =                                     in[10] + in[9] + in[8] + in[7]
out[2] = in[14] + in[13] + in[12] + in[11]

Is there any other way to make decimation?
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #113 on: May 21, 2024, 11:18:34 am »
There is a very simple detail that I don't quite understand.
It seems that a Boxcar filter or a CIC filter is a filter that adds the previous N samples, so the algorithm would be something like this:
Code: [Select]
out[10] =                            in[10] + in[9] + in[8] + in[7]
out[11] =                   in[11] + in[10] + in[9] + in[8]
out[12] =          in[12] + in[11] + in[10] + in[9]
out[13] = in[13] + in[12] + in[11] + in[10]
and so on...

This means that the number of output samples is equal to the number of input samples, but filtered.

And the next step after filtering is downsampling, i.e. you keep only every (say) 10th sample of out[] and discard the 9 samples in between. Of course you can optimize: You do not need to calculate those filtered samples, which are discarded in the next step.
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #114 on: May 21, 2024, 11:25:21 am »
https://arm-software.github.io/CMSIS_5/DSP/html/group__FIR__decimate.html#ga6a19d62083e85b3f5e34e8a8283c1ea0
https://arm-software.github.io/CMSIS_5/DSP/html/group__FIR__decimate.html#ga27c05d7892f8a327aab86fbfee9b0f29

Thank you very much for your help.
Do you know what would be the way to obtain the FIR filter coefficients?


Code: [Select]
arm_status arm_fir_decimate_init_q31 (
arm_fir_decimate_instance_q31 *  S,
uint16_t  numTaps,
uint8_t  M,
const q31_t *  pCoeffs,
q31_t *  pState,
uint32_t  blockSize
)

Parameters
    [in,out] S points to an instance of the Q31 FIR decimator structure
    [in] numTaps number of coefficients in the filter
    [in] M decimation factor
    [in] pCoeffs points to the filter coefficients
    [in] pState points to the state buffer
    [in] blockSize number of input samples to process

Returns
    execution status

        ARM_MATH_SUCCESS : Operation successful
        ARM_MATH_LENGTH_ERROR : blockSize is not a multiple of M

Details
    pCoeffs points to the array of filter coefficients stored in time reversed order:

        {b[numTaps-1], b[numTaps-2], b[N-2], ..., b[1], b[0]}

    pState points to the array of state variables. pState is of length numTaps+blockSize-1 words where blockSize is the number of input samples passed to arm_fir_decimate_q31(). M is the decimation factor.

« Last Edit: May 21, 2024, 11:27:11 am by Picuino »
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #115 on: May 21, 2024, 11:49:44 am »
https://arm-software.github.io/CMSIS_5/DSP/html/group__FIR__decimate.html#ga6a19d62083e85b3f5e34e8a8283c1ea0
https://arm-software.github.io/CMSIS_5/DSP/html/group__FIR__decimate.html#ga27c05d7892f8a327aab86fbfee9b0f29

Thank you very much for your help.
Do you know what would be the way to obtain the FIR filter coefficients?

For 10x decimation, try this one:

Code: [Select]
pkg load signal
R = 10      % decimation factor
BW = 10     % end of passband (start of transition band), khz
fs =  1333  % sample rate, kSa/s
ntaps = 80  % number of taps
h = remez(ntaps-1,[0 BW fs/2/R fs/2]/(fs/2),[1 1 0 0]);
% plot frequency response
[H,f] = freqz(h,1,10000,fs);
plot(f,20*log10(abs(H)))
grid on
ylim([-70 0])
% scale h to Q31
int32(round(h*2^32))

Code: [Select]
     1943208
      464978
      305641
      -49046
     -638438
    -1497324
    -2650941
    -4109235
    -5862456
    -7878098
   -10098245
   -12435856
   -14770057
   -16950103
   -18809941
   -20144286
   -20744894
   -20388173
   -18853952
   -15933574
   -11440900
    -5223349
     2827034
    12763159
    24575302
    38187347
    53451404
    70147061
    87988766
   106625888
   125658165
   144643853
   163116013
   180598380
   196623003
   210748070
   222574464
   231761399
   238041130
   241228017
   241228017
   238041130
   231761399
   222574464
   210748070
   196623003
   180598380
   163116013
   144643853
   125658165
   106625888
    87988766
    70147061
    53451404
    38187347
    24575302
    12763159
     2827034
    -5223349
   -11440900
   -15933574
   -18853952
   -20388173
   -20744894
   -20144286
   -18809941
   -16950103
   -14770057
   -12435856
   -10098245
    -7878098
    -5862456
    -4109235
    -2650941
    -1497324
     -638438
      -49046
      305641
      464978
     1943208

The question is, how fastst this function is.

[ If it is too slow, the decimation could be split into several stages, using a half-band filters for the first stages. Only for the last stage, the stopband must start below Nyquist. For the above case, the first decimation-by-2 stage would need only 6 taps or so. The function for the first stage could also be hand-optimized in order to avoid some of the overhead of the generic CMSIS function. ]
« Last Edit: May 21, 2024, 12:23:31 pm by gf »
 

Online dietert1

  • Super Contributor
  • ***
  • Posts: 2187
  • Country: br
    • CADT Homepage
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #116 on: May 21, 2024, 12:19:55 pm »
Let's say i want to use an exponential running average as low pass filter before down sampling. E.g.
Yn = 0.000 002 * Xn + 0.999 998 * Yn-1
If Xn is a 12 bit ADC value i will be adding zeros. But i can rewrite the formula as
Yn = Xn + Yn-1 - Yn-1 / 500 000
Is a 32 bit integer unit good enough to do this? I think it can work and the operation can be done at ADC rate. Maybe one should use a right shift instead of the division, using a power of 2 instead of an arbitrary number.

If one wants to input a 12 * 12 product into the filter, one can extend arithmetics to 64 bit using the same idea. Twice the number of operations per cycle but can still run at ADC rate.

Regards, Dieter
« Last Edit: May 21, 2024, 12:45:39 pm by dietert1 »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #117 on: May 21, 2024, 01:14:16 pm »
Within a week I will receive the board with the other micro (STM32G431KB), which is the one I am going to use in the end.
This other model has a maximum speed of 4Msps.

In practice I will set the main clock speed to 170MHz and the ADC clock speed to 1/4, which gives me a conversion speed of 170/4/15 = 2.833Msps.

I am not going to do hardware oversampling because that only serves to increase the number of bits of resolution and I already checked that when taking data, with all the noise produced by the instrumentation amplifier, the results do not improve by increasing the number of bits of the ADC.
I prefer to take many samples per second and filter after multiplying the two signals.
This sampling speed is too large to apply a filter with so many parameters (80), which is very slow.

To start I will try to multiply the two ADC signals and add 8 results to decimate the sampling rate at 354kHz.
I will try to apply the filter at this lower frequency.

I have no idea about the speed of the other processor (STM32G431KB). In principle it has more clock speed and also has instructions to accelerate the digital filters. Until it arrives to me (around the 29th) I can't test it.
« Last Edit: May 21, 2024, 01:16:02 pm by Picuino »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #118 on: May 21, 2024, 01:26:08 pm »
I'm going to try programming on my current board (STM32L412KB) just to get a rough idea of the timing.
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #119 on: May 21, 2024, 01:42:53 pm »
Input buffer = 1000 samples
Output buffer = 100 samples

process time = 936us @ 80MHz with STM32L412KB

Program:
Code: [Select]
        // Define variables
        q31_t pSrc[1000];
        q31_t pDst[1000];
        q31_t pCoeffs[] = { 1943208, 464978, 305641, -49046, -638438, -1497324,
                -2650941, -4109235, -5862456, -7878098, -10098245, -12435856,
                -14770057, -16950103, -18809941, -20144286, -20744894,
                -20388173, -18853952, -15933574, -11440900, -5223349, 2827034,
                12763159, 24575302, 38187347, 53451404, 70147061, 87988766,
                106625888, 125658165, 144643853, 163116013, 180598380,
                196623003, 210748070, 222574464, 231761399, 238041130,
                241228017, 241228017, 238041130, 231761399, 222574464,
                210748070, 196623003, 180598380, 163116013, 144643853,
                125658165, 106625888, 87988766, 70147061, 53451404, 38187347,
                24575302, 12763159, 2827034, -5223349, -11440900, -15933574,
                -18853952, -20388173, -20744894, -20144286, -18809941,
                -16950103, -14770057, -12435856, -10098245, -7878098, -5862456,
                -4109235, -2650941, -1497324, -638438, -49046, 305641, 464978,
                1943208, };
        q31_t pState[1000 + 80];
        arm_fir_decimate_instance_q31 S;

        uart_init();
        while (1) {
            // Initialize buffer pSrc
            for (int i = 0; i < 1000; i++)
                pSrc[i] = 0;
            for (int i = 0; i < 500; i++)
                pSrc[i] = 10000000;

            // Process data
            arm_fir_decimate_init_q31(&S, 80, 10, pCoeffs, pState, 1000);
            GPIOA->BSRR = (1 << 3); // Set PA3
            arm_fir_decimate_q31(&S, pSrc, pDst, 1000);
            GPIOA->BRR = (1 << 3);  // Reset PA3

            // Output data
            for (int i = 0; i < 100; i++) {
                printf("%ld\r\n", pDst[i]);
                while(uart_sending());
            }
            HAL_Delay(2000);
        }


Attached: output response to a step signal

Code: [Select]
9048
-140025
-933846
1473627
11115434
19592287
20790631
20040573
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19984258
19975209
20124282
20918103
18510631
8868823
391970
-806373
-56316
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
« Last Edit: May 21, 2024, 01:46:48 pm by Picuino »
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #120 on: May 21, 2024, 01:49:07 pm »
The other microcontroller may be able to reduce 2.833Msps in real time.
 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #121 on: May 21, 2024, 01:58:16 pm »
Yet another test:

Code: [Select]
pkg load signal
R = 10      % decimation factor
BW = 20     % end of passband (start of transition band), khz
fs =  2833  % sample rate, kSa/s
ntaps = 70 % number of taps
h = remez(ntaps-1,[0 BW fs/2/R fs/2]/(fs/2),[1 1 0 0]);
% plot frequency response
[H,f] = freqz(h,1,10000,fs);
plot(f,20*log10(abs(H)))
grid on
ylim([-70 0])
% scale h to Q31
int32(round(h*2^32))

Program:
Code: [Select]
        // Define variables
        q31_t pSrc[1000];
        q31_t pDst[1000];
        q31_t pCoeffs[] = { -3102388, -3113797, -4515965, -6180048, -8075456,
                -10141397, -12290802, -14416199, -16366203, -17980278,
                -19065273, -19421132, -18834989, -17096251, -14007364, -9390118,
                -3103346, 4955244, 14832503, 26518003, 39934516, 54936181,
                71309499, 88773820, 106991524, 125572769, 144091117, 162094330,
                179121976, 194721956, 208466551, 219971072, 228905848,
                235012496, 238111387, 238111387, 235012496, 228905848,
                219971072, 208466551, 194721956, 179121976, 162094330,
                144091117, 125572769, 106991524, 88773820, 71309499, 54936181,
                39934516, 26518003, 14832503, 4955244, -3103346, -9390118,
                -14007364, -17096251, -18834989, -19421132, -19065273,
                -17980278, -16366203, -14416199, -12290802, -10141397, -8075456,
                -6180048, -4515965, -3113797, -3102388, };
        q31_t pState[1000 + 70];
        arm_fir_decimate_instance_q31 S;

        uart_init();
        while (1) {
            // Initialize buffer pSrc
            for (int i = 0; i < 1000; i++)
                pSrc[i] = 0;
            for (int i = 0; i < 500; i++)
                pSrc[i] = 10000000;

            // Process data
            arm_fir_decimate_init_q31(&S, 70, 10, pCoeffs, pState, 1000);
            GPIOA->BSRR = (1 << 3); // Set PA3
            arm_fir_decimate_q31(&S, pSrc, pDst, 1000);
            GPIOA->BRR = (1 << 3);  // Reset PA3

            // Output data
            for (int i = 0; i < 100; i++) {
                printf("%ld\r\n", pDst[i]);
                while (uart_sending())
                    ;
            }
            HAL_Delay(2000);
        }

Process time: 836us (less)

 

Online PicuinoTopic starter

  • Frequent Contributor
  • **
  • Posts: 990
  • Country: 00
    • Picuino web
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #122 on: May 21, 2024, 02:45:48 pm »
Code: [Select]
#
# Python script to calculate
# FIR coefficients with remez algorithm
#

import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

fs = 2833000   # Sample rate, Hz
cutoff = 20000 # Desired cutoff frequency, Hz
R = 10         # Decimation factor
numtaps = 70   # Size of the FIR filter.

def plot_response(fs, w, h, title):
    plt.figure()
    plt.plot(0.5*fs*w/np.pi, 20*np.log10(np.abs(h)))
    plt.ylim(-100, 5)
    plt.xlim(0, 0.1*fs)
    plt.grid(True)
    plt.xlabel('Frequency (Hz)')
    plt.ylabel('Gain (dB)')
    plt.title(title)
    plt.show()


bands = [0, cutoff, 0.5*fs/R, 0.5*fs]
gains = [1, 0]

taps = signal.remez(numtaps, bands, gains, fs=fs)

q31_taps =[round(t*2**31) for t in taps]
q31_taps =[t if t>0 else t+0x100000000 for t in q31_taps]
q31_taps =[f"0x{t:08X}" for t in q31_taps]
print(", ".join(q31_taps))


w, h = signal.freqz(taps, [1], worN=2000)
plot_response(fs, w, h, "Low-pass Filter")

Python equivalent code for calculating FIR coefficients and frequency response.

EDIT: Q31 conversion corrected.
« Last Edit: May 21, 2024, 03:27:42 pm by Picuino »
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #123 on: May 21, 2024, 03:15:13 pm »
Code: [Select]
q31_taps =[round(t*2**32) for t in taps]

Sorry, my mistake. I got that wrong too. When converting to Q31, the scaling is 2**31, not 2**32.
Basically, the sum of the taps should be 1, in order that the DC gain of the filter becomes 1.
[ But I noticed that remez does not always produce a sum of exactly 1, it can be slightly off. ]
 

Online gf

  • Super Contributor
  • ***
  • Posts: 1297
  • Country: de
Re: Starting with STM32 (NUCLEO-L412KB)
« Reply #124 on: May 21, 2024, 03:41:36 pm »
Input buffer = 1000 samples
Output buffer = 100 samples
process time = 936us @ 80MHz with STM32L412KB

Yet another test:
Process time: 836us (less)

Both are about 9.5 cycles per output sample per tap.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf