Author Topic: Starting with STM32 (NUCLEO-L412KB) (Read 9727 times)

gf · « **Reply #75 on:** May 18, 2024, 06:33:07 pm »

Quote from: Picuino on May 18, 2024, 05:52:30 pm

Quote from: gf on May 18, 2024, 05:32:51 pm
For your purpose, look at the CMSIS library.

https://arm-software.github.io/CMSIS_6/latest/General/index.html
I can't find the advantage for what I need.

Why not? It can do FIR filtering, IIR filtering, and various other stuff. It does exactly what you need. And it seems that the code is not just written naively, but tries to benefit from SIMD and DSP instructions, etc. Are you sure that you can do it faster?

Quote from: Picuino on May 18, 2024, 05:57:56 pm

Now I need to implement IIR filter to output filtered results.

What kind of IIR filter do you want to design?
Try the functions butter, cheby1, cheby2 or ellip in Matlab or Octave.

gf · « **Reply #76 on:** May 18, 2024, 06:33:57 pm »

Quote from: dave j on May 18, 2024, 06:30:05 pm

Quote from: Picuino on May 18, 2024, 05:52:30 pm
Quote from: gf on May 18, 2024, 05:32:51 pm
For your purpose, look at the CMSIS library.

https://arm-software.github.io/CMSIS_6/latest/General/index.html
I can't find the advantage for what I need.

Look at the CMSIS DSP libraries.

Yes, that's what I meant. I should have been more specific.

Picuino · « **Reply #77 on:** May 18, 2024, 06:46:25 pm »

I want to implement a simple second order Butterworth filter. But I think I will have to implement it with fixed point operations (with integers) to gain speed.
Mixing integers and floats has not given me good results in the test I have done. The results are slower than I expected.

I have about 30 CPU cycles per ADC sample (at 2666kHz ADC), so the operations have to be very fast.

Another thing I can do is to group data and apply the filter to aggregate data, not to all samples. But I am not clear how to do it. In that case I could use the CMSIS libraries:

https://arm-software.github.io/CMSIS-DSP/latest/group__groupFilters.html

dietert1 · « **Reply #78 on:** May 18, 2024, 08:08:44 pm »

Block size should be small if the filter is inside a control loop. Otherwise one can decide depending on available memory.
And if you want more speed, you could use a higher clock frequency. E.g. STM32G4xx MCUs are very similar except they run up to 170 MHz and power consumption will be three times higher.
I happened to get STM32F103 bluepill modules with fake MCUs and replaced them with STM32L433. They are (almost) pin compatible.

Regards, Dieter

gf · « **Reply #79 on:** May 18, 2024, 08:28:28 pm »

Quote from: Picuino on May 18, 2024, 06:46:25 pm

I want to implement a simple second order Butterworth filter. But I think I will have to implement it with fixed point operations (with integers) to gain speed.

You want either arm_biquad_cascade_df1_q15() or biquad_cascade_df1_q31(), for 16-bit and 32-bit integer.
See https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html
A second order IIR needs just a single biquad stage.

Note that most integer CMSIS DSP functions work with Q numbers. At the end this is just a matter of scaling, but you need to take care.

Quote

I have about 30 CPU cycles per ADC sample (at 2666kHz ADC), so the operations have to be very fast.

I have doubts that 30 cycles per sample are enough for a biquad stage. But I may be wrong. You need to measure. I'd measure all three, q15, q31 and float. Don't forget optimization -O2. You can also try if -O3 is even faster.

Quote

Another thing I can do is to group data and apply the filter to aggregate data, not to all samples.

Decimation with a 1st order boxcar filter is likely the fastest you can do. Just add-up (say) 16 adjacent samples and replace the 16 samples with a single sample containing the sum, and so on. I guess this fits into 5 cycles/sample, so that you get (30-5)*16=400 cycles for filtering each decimated sample. However, a 1st order boxcar filter is not a good anti-aliasing filter at all. This may or may not matter, depending on the frequency of a potential undesired (picked-up) interfering signal. If the frequency happens to be folded by the downsampling into the frequency band of interest, then it matters.

What cutoff frequency do you have in mind for the lowpass?

EDIT: The larger the sample rate to cutoff frequency ratio, the higher precision is required for the coefficients and the calculation. Then it can happen that Q15 (16-bit) is not sufficient.

Picuino · « **Reply #80 on:** May 19, 2024, 10:28:31 am »

For the project I am thinking about now (a Lock-in amplifier) the output frequencies should be in the range from 10kHz (audio output) to 0.1Hz, selectable by software.

EDIT:
Perhaps this is a good election?
arm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2

I'm going to install CMSIS libraries to test it.

Picuino · « **Reply #81 on:** May 19, 2024, 10:49:03 am »

How are the CMSIS libraries downloaded and installed?
From the GitHub page is complicated. There is no download button anywhere.
Are they downloaded from the Eclipse environment?

Tation · « **Reply #82 on:** May 19, 2024, 11:42:27 am »

Quote from: Picuino on May 19, 2024, 10:49:03 am

How are the CMSIS libraries downloaded and installed?
From the GitHub page is complicated. There is no download button anywhere.
Are they downloaded from the Eclipse environment?

https://github.com/ARM-software/CMSIS-DSP/releases/tag/v1.15.0

jnk0le · « **Reply #83 on:** May 19, 2024, 12:51:13 pm »

Quote from: gf on May 18, 2024, 08:28:28 pm

Quote
I have about 30 CPU cycles per ADC sample (at 2666kHz ADC), so the operations have to be very fast.

I have doubts that 30 cycles per sample are enough for a biquad stage. But I may be wrong. You need to measure. I'd measure all three, q15, q31 and float. Don't forget optimization -O2. You can also try if -O3 is even faster.

If we are talking about latency then it is possible, but not with generic libraries. (TDF2 has O(1) latency)

For such sample rates stm32g4 would be a better choice. It also has a dedicated FIR/IIR hardware.

Quote from: Picuino on May 18, 2024, 06:46:25 pm

Another thing I can do is to group data and apply the filter to aggregate data, not to all samples.

In such scenario, it's possible to do low order IIR (2-5 taps). Might require optimized assembly again.

Quote from: Picuino on May 19, 2024, 10:28:31 am

For the project I am thinking about now (a Lock-in amplifier) the output frequencies should be in the range from 10kHz (audio output) to 0.1Hz, selectable by software.

That means you don't need to filter that out of 2,5MSPS signal, which makes things easier.

Quote from: Picuino on May 19, 2024, 10:28:31 am

EDIT:
Perhaps this is a good election?
arm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2

note that the biquad is just second order IIR (2 taps).

- 1 tap iir can be implemented by zoroing 2nd order coeffs
- higher order IIR coefficients (as in the book examples) cannot be applied directly to biquad cascade.

gf · « **Reply #84 on:** May 19, 2024, 01:00:02 pm »

Quote from: jnk0le on May 19, 2024, 12:51:13 pm

Quote
arm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2
note that biquad is just second order IIR (2 taps).

- 1 tap iir can be implemented by zoroing 2nd order coeffs
- higher order IIR coefficients (as in the book examples) cannot be applied directly to biquad cascade.

The arm_biquad_cascade_xxx() functions implement N cascaded biquad stages, where the output of the first stage is connected to the input of the 2nd stage, and so on. N can be 1, for 2nd order.

jnk0le · « **Reply #85 on:** May 19, 2024, 01:24:50 pm »

Quote from: gf on May 19, 2024, 01:00:02 pm

Quote from: jnk0le on May 19, 2024, 12:51:13 pm
Quote
arm_biquad_cascade_df1_fast_q31()
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#gaa09ea758c0b24eed9ef92b8d1e5c80c2
note that biquad is just second order IIR (2 taps).

- 1 tap iir can be implemented by zoroing 2nd order coeffs
- higher order IIR coefficients (as in the book examples) cannot be applied directly to biquad cascade.

The arm_biquad_cascade_xxx() functions implement N cascaded biquad stages, where the output of the first stage is connected to the input of the 2nd stage, and so on. N can be 1, for 2nd order.

which require conversion of coefficients from direct form to cascaded

Picuino · « **Reply #86 on:** May 19, 2024, 05:59:02 pm »

I have managed to install and use the CMSIS library.
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#ga5563b156af44d1be2a7548626988bf4e
https://arm-software.github.io/CMSIS-DSP/latest/group__BiquadCascadeDF1.html#ga4e7dad0ee6949005909fd4fcf1249b79

arm_biquad_cascade_df1_q31():
With one second order filter, applied on a 500-sample buffer takes 404us @ 80MHz to run on the STM32L412KB.

I'm going to wait until I receive the STM32G431KB this week to see how its speed increases thanks to a higher clock speed and IIR specific instructions.

EDIT:
The CMSIS library seems fast enough to use it without problems with samples decimated at 1.333MHz.

What I am not clear yet is:
1. How to generate the 5 filter coefficients (I am now using random ones).
2. If the ADC data need some kind of treatment to convert them to Q31 format or if they can be used as they are.

gf · « **Reply #87 on:** May 19, 2024, 09:48:27 pm »

Quote from: Picuino on May 19, 2024, 05:59:02 pm

How to generate the 5 filter coefficients (I am now using random ones).

You can calcuate the transfer function coefficients with "butter".
For 2nd order, the transfer function coefficients are the biquad coefficients.
[ For higher order you would need to decompose them into biquad coefficients for multiple stages. ]

In the coefficients for arm_biquad_cascade_df1_init_q31(), the first entry of a is omitted (always 1), that's why there are only 5 and not 6.

I'm a bit unsure regarding the scaling, but I think the scaling below in conjunction with postShift=1 should work.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Starting with STM32 (NUCLEO-L412KB) (Read 9727 times)

Share me