For your purpose, look at the CMSIS library.
https://arm-software.github.io/CMSIS_6/latest/General/index.html
I can't find the advantage for what I need.
Now I need to implement IIR filter to output filtered results.
For your purpose, look at the CMSIS library.
https://arm-software.github.io/CMSIS_6/latest/General/index.html
I can't find the advantage for what I need.
Look at the CMSIS DSP libraries.
I want to implement a simple second order Butterworth filter. But I think I will have to implement it with fixed point operations (with integers) to gain speed.
I have about 30 CPU cycles per ADC sample (at 2666kHz ADC), so the operations have to be very fast.
Another thing I can do is to group data and apply the filter to aggregate data, not to all samples.
How are the CMSIS libraries downloaded and installed?
From the GitHub page is complicated. There is no download button anywhere.
Are they downloaded from the Eclipse environment?
QuoteI have about 30 CPU cycles per ADC sample (at 2666kHz ADC), so the operations have to be very fast.
I have doubts that 30 cycles per sample are enough for a biquad stage. But I may be wrong. You need to measure. I'd measure all three, q15, q31 and float. Don't forget optimization -O2. You can also try if -O3 is even faster.
Another thing I can do is to group data and apply the filter to aggregate data, not to all samples.
For the project I am thinking about now (a Lock-in amplifier) the output frequencies should be in the range from 10kHz (audio output) to 0.1Hz, selectable by software.
EDIT:
Perhaps this is a good election?
arm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2
Quotearm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2note that biquad is just second order IIR (2 taps).
- 1 tap iir can be implemented by zoroing 2nd order coeffs
- higher order IIR coefficients (as in the book examples) cannot be applied directly to biquad cascade.
Quotearm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2note that biquad is just second order IIR (2 taps).
- 1 tap iir can be implemented by zoroing 2nd order coeffs
- higher order IIR coefficients (as in the book examples) cannot be applied directly to biquad cascade.
The arm_biquad_cascade_xxx() functions implement N cascaded biquad stages, where the output of the first stage is connected to the input of the 2nd stage, and so on. N can be 1, for 2nd order.
How to generate the 5 filter coefficients (I am now using random ones).
pkg load signal
fc = 100000 % cutoff frequency
fs = 1333000 % sample rate
[b,a] = butter(2,fc/(fs/2))
coeff = floor([ b -a(2:3) ] / 2 * 2**31 + 0.5)
printf("%d, %d, %d, %d, %d\n", coeff)
% ==> 44314902, 88629804, 44314902, 1448274717, -551792502
If the ADC data need some kind of treatment to convert them to Q31 format or if they can be used as they are.
uint16_t sample; // 0...4095
q31_t qsample = (q31_t) sample - 2048 << 20;
uint16_t sample; // 0...16380
q31_t qsample = (q31_t) sample - 8192 << 18;
...
I'm a bit unsure regarding the scaling, but I think the scaling below in conjunction with postShift=1 should work.
arm_biquad_cascade_df1_q31():
With one second order filter, applied on a 500-sample buffer takes 404us @ 80MHz to run on the STM32L412KB.
// Define variables
q31_t pSrc[100];
q31_t pDst[100];
arm_biquad_casd_df1_inst_q31 S;
q31_t pCoeffs[] = { 0x02A43116, 0x0548622C, 0x02A43116, 0xA9AD14E3,
0x20E3AF76 };
q31_t pState;
uart_init();
while (1) {
// Initialize buffer pSrc
for (int i = 0; i < 100; i++)
pSrc[i] = 0;
pSrc[0] = 0x40000000;
// Process data
arm_biquad_cascade_df1_init_q31(&S, 1, pCoeffs, &pState, 0);
GPIOA->BSRR = (1 << 3); // Set PA3
arm_biquad_cascade_df1_q31(&S, pSrc, pDst, 100);
GPIOA->BRR = (1 << 3); // Reset PA3
// Output data
for (int i = 0; i < 100; i++) {
printf("0x%08lX\r\n", pDst[i]);
HAL_Delay(2);
}
HAL_Delay(2000);
}
0x0152188B
0x01C02D93
0x007AB721
0x00206613
0x0009AE7F
0x0001CBA3
0x000146DD
0xFFFF99AA
0x00009900
0xFFFF7E85
0x00007EA2
0xFFFF8954
0x00007092
0xFFFF9596
0x000064B0
0xFFFFA0C0
0x00005A1B
0xFFFFAAC2
0x000050A3
0xFFFFB3B7
0x0000482A
0xFFFFBBBB
0x00004095
0xFFFFC2E7
0x000039CC
0xFFFFC952
0x000033BA
0xFFFFCF10
0x00002E4B
0xFFFFD434
0x0000296E
0xFFFFD8CE
0x00002514
0xFFFFDCEC
0x0000212F
0xFFFFE09B
0x00001DB2
0xFFFFE3E8
0x00001A93
0xFFFFE6DC
0x000017C8
0xFFFFE980
0x00001548
0xFFFFEBDD
0x0000130C
0xFFFFEDFB
0x0000110B
0xFFFFEFE0
0x00000F41
0xFFFFF191
0x00000DA7
0xFFFFF315
0x00000C38
0xFFFFF470
0x00000AEF
0xFFFFF5A7
0x000009C9
0xFFFFF6BD
0x000008C2
0xFFFFF7B6
0x000007D7
0xFFFFF895
0x00000704
0xFFFFF95C
0x00000647
0xFFFFFA0F
0x0000059E
0xFFFFFAAF
0x00000507
0xFFFFFB3E
0x00000480
0xFFFFFBBE
0x00000407
0xFFFFFC30
0x0000039B
0xFFFFFC96
0x0000033A
0xFFFFFCF2
0x000002E3
0xFFFFFD44
0x00000295
0xFFFFFD8E
0x00000250
0xFFFFFDCF
0x00000212
0xFFFFFE0A
0x000001DA
0xFFFFFE3F
0x000001A8
0xFFFFFE6E
0x0000017C
0xFFFFFE98
0x00000154
0xFFFFFEBE
0x00000130
0xFFFFFEE0
0x00000110
0xFFFFFEFE
0x000000F3
0xFFFFFF19
22157451
29371795
8042273
2123283
634495
117667
83677
-26198
39168
-33147
32418
-30380
28818
-27242
25776
-24384
23067
-21822
20643
-19529
18474
-17477
16533
-15641
14796
-13998
13242
-12528
11851
-11212
10606
-10034
9492
-8980
8495
-8037
7602
-7192
6803
-6436
6088
-5760
5448
-5155
4876
-4613
4363
-4128
3905
-3695
3495
-3307
3128
-2960
2799
-2649
2505
-2371
2242
-2122
2007
-1899
1796
-1700
1607
-1521
1438
-1361
1287
-1218
1152
-1090
1031
-976
923
-874
826
-782
739
-700
661
-626
592
-561
530
-502
474
-449
424
-402
380
-360
340
-322
304
-288
272
-258
243
-231
// Initialize buffer pSrc
for (int i = 0; i < 100; i++)
pSrc[i] = 0;
for (int i = 20; i < 60; i++)
pSrc[i] = 0x10000;
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1352
3145
3635
3766
3803
3812
3815
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
3816
2464
671
180
51
11
5
-1
1
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
-1
0
pkg load signal
fc = 100000 % cutoff frequency
fs = 1333000 % sample rate
[b,a] = butter(2,fc/(fs/2))
coeff = floor([ b -a(2:3) ] / 2 * 2**31 + 0.5)
printf("%d, %d, %d, %d, %d\n", coeff)
% ==> 44314902, 88629804, 44314902, 1448274717, -551792502
Optimizing with -O3, filtering 100 samples takes 42us (34 clock cycles per sample @ 80MHz)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1352
4968
8412
9806
9861
9540
9309
9236
9246
9271
9286
9289
9288
9286
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
9285
7933
4316
872
-521
-576
-255
-24
49
39
13
-2
-5
-3
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
22157451
59258008
56428053
22829136
896995
-5260972
-3778510
-1196450
163989
418020
239778
54298
-24992
-30807
-14355
-1766
2497
2137
799
-11
-213
-141
-41
8
15
8
1
-2
-2
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
import scipy
import math
fc = 100000 # Cutoff frequency
fs = 1333000 # Sample rate
b, a = scipy.signal.butter(2, fc, btype='low', analog=False, output='ba', fs=fs)
coeff = list(b) + list(-a)[1:]
coeff = [round(c * (2 ** 30)) for c in coeff]
for c in coeff:
if c < 0 :
c += 0x100000000;
print(" 0x%08X," % c)
0x02A43116,
0x0548622C,
0x02A43116,
0x5652EB1D,
0xDF1C508A,
Impulse response: ...