Author Topic: Starting with STM32 (NUCLEO-L412KB) (Read 15937 times)

gf · « **Reply #175 on:** May 29, 2024, 07:51:13 pm »

Quote from: dietert1 on May 29, 2024, 05:07:22 pm

You will get better results if you mix the ADC data at 2.5 MHz and then filter and downsample.
Above i posted the code "lockin.c". If you define BuffSize as 8 and ExpAverScale as 3, the filter state variables llOut1 and llOut can be int32_t, as 12 + 12 + 6 = 30 bit. The mixer+filter will need about 15 cycles per combined ADC1/ADC2 sample when compiled with optimizer.

With 32-bit accumulator it is indeed not very expensive.

Quote

The IIR low pass filter is superior to a boxcar filter.

Yes, your 2nd order IIR is better than a 1st order boxcar. However, a 2nd order boxcar has even more attenuation for the frequency ranges that matter for the first decimation stage (fs/8 +- 20kHz, fs/4 +- 20kHz, 3*fs/8 +- 20kHz, fs/2 - 20kHz ... fs/2), and implemented as a CIC decimator, I think it needs even a few cycles less than your 2nd order IIR.

EDIT: https://godbolt.org/z/3vs554f49

dietert1 · « **Reply #176 on:** May 30, 2024, 05:21:05 am »

When i think about those integrators in the CIC decimator: What is a good method to prevent them from accumulation of residual DC input and overflow? Some feedback scheme?

gf · « **Reply #177 on:** May 30, 2024, 07:03:23 am »

Quote from: dietert1 on May 30, 2024, 05:21:05 am

When i think about those integrators in the CIC decimator: What is a good method to prevent them from accumulation of residual DC input and overflow? Some feedback scheme?

There is no need to prevent the integrators from overflow but you deliberately let them wrap around. They only need to have enough bits.
See https://www.dsprelated.com/showarticle/1337.php for more details.

At the end, the CIC decimator in my example is exactly equivalent to a FIR filter with a 15-tap triangular kernel (which is again equivalent to two cascaded 8-tap boxcar filters), followed by 8x downsampling. If you feed DC in, you get DC out. It's a lowpass - there is no DC accumulation at the output. DC gain is also 64 [ like the gain of two cascaded IIR filters with transfer funtion 1/(1-0.875*z^-1) ].

It is rather an (undesired) property of fixed point IIR filters that they don't decay to zero when the input goes to zero, but retain a small residual DC value at the output.

Picuino · « **Reply #178 on:** May 30, 2024, 08:01:50 am »

I had not seen the program, thanks I will try it.

I am finding that the filters take up a large amount of time during which the microcontroller locks up. Later on I would like to attend commands by UART and send information. I think it would be desirable to split the filter into several stages and return control back to the micro every so often to attend to other tasks.
I would like to use a cooperative RTOS for this, with its yield() command for context switching.
Would FreeRTOS be ok or is there a reliable alternative?

Tation · « **Reply #179 on:** May 30, 2024, 08:47:18 am »

Wouldn't it be easier to use a preemptive RTOS to automatically switch out from the filter code? FreeRTOS is perfectly able to do it (or work in cooperative mode, if you want).

Picuino · « **Reply #180 on:** May 30, 2024, 10:57:59 am »

How is it easier to control the time of each routine? it seemed to me that with the cooperative mode it would be easier, but maybe I'm wrong.

One of the things I want to do in idle time is to calculate the sine function to generate the output reference signal, to change the output frequency. It is a time-consuming operation and can not block the microcontroller, because the input filtering is a priority.

gf · « **Reply #181 on:** May 30, 2024, 11:03:12 am »

Quote from: Picuino on May 30, 2024, 10:57:59 am

One of the things I want to do in idle time is to calculate the sine function to generate the output reference signal, to change the output frequency. It is a time-consuming operation and can not block the microcontroller, because the input filtering is a priority.

What exactly do you have in mind? Wouldn't you pre-calculate the table before you start measurements and run the whole ADC/DAC/filter stuff?

EDIT: Or do you want arbitrary frequency (which is not necessarily an integer fraction of the sample rate)? Then you need a DDS.

Picuino · « **Reply #182 on:** May 30, 2024, 11:05:50 am »

I am thinking of being able to generate several output frequencies and having them all pre-calculated would take up a lot of memory.

gf · « **Reply #183 on:** May 30, 2024, 11:11:44 am »

Quote from: Picuino on May 30, 2024, 11:05:50 am

I am thinking of being able to generate several output frequencies and having them all pre-calculated would take up a lot of memory.

But you do not need to measure while you change the frequency, do you?

Picuino · « **Reply #184 on:** May 30, 2024, 11:15:56 am »

Yes, I don't need to measure while changing the frequency.
I don't know what I was thinking.

I want to learn how to use RTOS anyway. It can come in handy and it's a utility I haven't used before on other smaller micros.

Tation · « **Reply #185 on:** May 30, 2024, 11:40:30 am »

Quote from: Picuino on May 30, 2024, 10:57:59 am

How is it easier to control the time of each routine? it seemed to me that with the cooperative mode it would be easier, but maybe I'm wrong.

You do not control that. Each task runs until: a) done (for now), then yields; b) runs until the RTOS decides to switch to another task, then it is interrupted and continued, at the same instruction, in the future.

If you need to synchronize tasks (e. g.: apply the filter only when some buffer is valid), the RTOS provides mechanisms for tasks to send signals to another tasks.

Quote from: Picuino on May 30, 2024, 10:57:59 am

One of the things I want to do in idle time is to calculate the sine function to generate the output reference signal, to change the output frequency. It is a time-consuming operation and can not block the microcontroller, because the input filtering is a priority.

Look for the state-variable approach to generate samples of a sine. No need for tables and each sample requires, if my memory serves me well, just 2 + and 2 *.

Tation · « **Reply #186 on:** May 30, 2024, 11:54:40 am »

For the "state variable" oscillator: https://www.njohnson.co.uk/pdf/drdes/Chap7.pdf

gf · « **Reply #187 on:** May 30, 2024, 12:51:05 pm »

Quote from: Tation on May 30, 2024, 11:54:40 am

For the "state variable" oscillator: https://www.njohnson.co.uk/pdf/drdes/Chap7.pdf

If I understand correctly, the STM32G4 has a CORDIC accelerator unit.

Picuino · « **Reply #188 on:** May 30, 2024, 01:52:10 pm »

Quote from: Tation on May 30, 2024, 11:54:40 am

For the "state variable" oscillator: https://www.njohnson.co.uk/pdf/drdes/Chap7.pdf

Great!!!
I had no idea you could make such an accurate digital oscillator with so few calculations. It can be implemented in real time for the lower frequencies.

Quote from: gf on May 30, 2024, 12:51:05 pm

If I understand correctly, the STM32G4 has a CORDIC accelerator unit.

Yes, but I don't know how to use it yet.

Picuino · « **Reply #189 on:** May 30, 2024, 04:33:21 pm »

Quote from: gf on May 29, 2024, 07:51:13 pm

Yes, your 2nd order IIR is better than a 1st order boxcar. However, a 2nd order boxcar has even more attenuation for the frequency ranges that matter for the first decimation stage (fs/8 +- 20kHz, fs/4 +- 20kHz, 3*fs/8 +- 20kHz, fs/2 - 20kHz ... fs/2), and implemented as a CIC decimator, I think it needs even a few cycles less than your 2nd order IIR.

EDIT: https://godbolt.org/z/3vs554f49

The problem with this code is that I don't actually know the DC voltage level of the signals. This level has to be calculated in real time with the average of the sampled values, applying a good low pass filter to each of the two signals (ADC1 and ADC2).

What I had started to program was a boxcar filter that multiplies the two values with DC voltage and then subtracts the DC voltage value.

Code: [Select]

for (i=0; i<N; i++) {
    acc += ADC1 * ADC2;
    mean_ADC1 += ADC1;
    mean_ADC2 += ADC2;
}

out = acc / N  - mean_ADC1 / N * mean_ADC2 / N;

Picuino · « **Reply #190 on:** May 30, 2024, 04:38:55 pm »

It is assumed to work for a large number of samples because the variable terms cancel:

Sum( [VAC1 + VDC1] * [VAC2 + VDC2] ) =
Sum( VAC1 * VAC2 + VAC2 * VDC1 + VAC1 * VDC2 + VDC1 * VDC2] ) =
Sum( VAC1 * VAC2 + VDC1 * VDC2] )

Because: VAC2 * VDC1 + VAC1 * VDC2 tends to cancel out and is worth zero.

EDIT:
It would be better to have a real-time estimate of the average of the two signals, but that means averaging with low-pass filtering the two signals separately in real time.

dietert1 · « **Reply #191 on:** May 30, 2024, 05:35:28 pm »

I thought that one could build the input circuitry with a high pass and assign the DC level about in the middle of the ADC input range. One could measure and then subtract the residual VDC1*VDC2 offset by turning off modulation for short periods, let's say once per second or once every ten seconds.

Regards, Dieter

gf · « **Reply #192 on:** May 31, 2024, 09:28:14 am »

Quote from: Picuino on May 30, 2024, 04:38:55 pm

It is assumed to work for a large number of samples because the variable terms cancel:

Sum( [VAC1 + VDC1] * [VAC2 + VDC2] ) =
Sum( VAC1 * VAC2 + VAC2 * VDC1 + VAC1 * VDC2 + VDC1 * VDC2] ) =
Sum( VAC1 * VAC2 + VDC1 * VDC2] )

Because: VAC2 * VDC1 + VAC1 * VDC2 tends to cancel out and is worth zero.

Let's look at the mixing product of two sine wave signals (same frequency and phase) with DC offsets:

(DC1+A1*sin(x)) * (DC2+A2*sin(x)) = 0.5*A1*A2 + DC1*DC2 + (A2*DC1+A1*DC2)*sin(x) - 0.5*A1*A2*cos(2*x)

We get a DC component (blue), a sine wave with carrier frequency and amplitude A2*DC1+A1*DC2, and a cosine wave with 2x carrier frequency and amplitude 0.5*A1*A2.

The DC value we want to measure is 0.5*A1*A2, but it is biased by DC1*DC2, i.e. we measure 0.5*A1*A2 + DC1*DC2 instead of the true value. The purple terms are AC components, and they must be filtered out by the lowpass filter after the mixer. This leads to the following objectives:

1) We want to get rid of the bias DC1*DC2

2) The lowpass filter must be able to remove the AC components (or at least attenuate them sufficiently).
Here I'd like to distinguish two cases:

2a) If the lowpass filter has zeros at the carrier frequency and 2x carrier frequncy, then it can eliminate them completely

2b) If 2a is not granted, then the removal of the AC components relies on the filter's stop band attenuation (which is limited)

Why do I distinguish 2a and 2b? Because it makes big difference whether you allow arbitrary carrier frequencies or not. Arbitrary carrier frequencies imply case 2b, which means that you require a computationally expensive filter with high selectivity and high stop band attenuation. OTOH, for case 2a, even a trivial 1st order boxcar filter can eliminate the carrier and all its harmonics completely if it is granted that the length of the boxcar is an exact integer multiple of the carrier period.

[ Let's reconsider your AVR milliohmmeter. Why did it work pretty well although it was much simpler? Because you had a frequency plan. The carrier frequency was exactly 1/16 of the sample rate and the integration period was an exact integer multiple of the carrier period. Furthermore, the reference signal was numerically generated and had a DC offset of exactly zero. Under these conditions, the bias DC1*DC2 becomes zero as well, and the boxcar is able to eliminate the carrier and its harmonics completely. So it did not matter if there was an uncompensated DC offset in the signal being measured. ]

Back to case 2b: Here we also must take care that the amplitude A2*DC1+A1*DC2 of the sin(x) term of the mixing product does not become too large, because the stopband attenuation of the lowpass filter after the mixer is limited. After attenuation by the filter, the residual amplitude of the sin(x) term should be well below the level of the signal we want to measure (which is 0.5*A1*A2). There are two ways to accomplish that: Either the residual DC offsets must be suffiently low, or the filter must be designed with an even higher stop band attenuation, making it even more computationally expensive.

Quote

It would be better to have a real-time estimate of the average of the two signals, but that means averaging with low-pass filtering the two signals separately in real time.

Yes, it means exactly that. You could do it in principle, but it is computationally expensive (possibly too expensive).

Picuino · « **Reply #193 on:** May 31, 2024, 01:56:09 pm »

Yes, having an external frequency and not being able to synchronize with it gives a lot of problems.

After several problems and difficulties, I have managed to get FreeRTOS up and running, so now I can devote time to making the filters and we will see what can be achieved with each technique (boxcar, IIR, etc).

EDIT:
I'll start by doing a simple BOXCAR to reduce the samples by a ratio of 1/8, followed by an IIR filter for each of the three signals (sum([ADC1-DC1]*[ADC2-DC2]), sum(ADC1) and sum(ADC2)).
DC1 = sum(ADC1) + IIR (previous)
DC2 = sum(ADC2) + IIR (previous)
There may be better solutions, but this one has the advantage of being simple for me and leaving enough processor time to perform the filtering at lower sample rates.

gf · « **Reply #194 on:** May 31, 2024, 03:45:56 pm »

What is the lowest carrier frequency you want to support?

Picuino · « **Reply #195 on:** May 31, 2024, 05:17:19 pm »

Good question. Because that's going to determine the low pass filter of the DC component detector.

Another solution I am thinking of is to set the DC component at the ADC input=2048 (actually the DC component is going to be very close to that number) and add a small correction limited to +-100 points with a low pass filter.

In any case, the DC detection low pass filter has to be stabilized very accurately within a few seconds. So I will try to give it an approximate response time tao=1 second.

Picuino · « **Reply #196 on:** May 31, 2024, 05:56:28 pm »

My first attempt:

Code: [Select]

#include "main.h"
#include "filter.h"

#define ADC_MEAN  (2048)
#define ADC_MEAN_MAX_DEVIATION (1024*8)

volatile uint32_t adc_acc[FILTER_BUFF_SIZE];
volatile int32_t adc1_mean;
volatile int32_t adc2_mean;

volatile uint32_t mul;
volatile uint32_t *p_adc_acc;
volatile uint16_t *p1;
volatile uint16_t *p2;

void filter_adc_decimate_init(void) {
    adc1_mean = 0;
    adc2_mean = 0;
}

void filter_adc_decimate(uint16_t init, uint16_t end, uint16_t dest) {
    HAL_GPIO_WritePin(GPIOB, GPIO_PIN_5, GPIO_PIN_SET);

    p1 = &adc1_buff[init];
    p2 = &adc2_buff[init];
    p_adc_acc = &adc_acc[dest];
    int16_t adc1_acc = 0;
    int16_t adc2_acc = 0;
    int16_t adc1_dc = ADC_MEAN + (int16_t) (adc1_mean >> 19);
    int16_t adc2_dc = ADC_MEAN + (int16_t) (adc2_mean >> 19);
    for (int i = init; i < end; i += 8) {
        mul = 0;
        
        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        mul += (*p1 - adc1_dc) * (*p2 - adc2_dc);
        adc1_acc += *p1++ - adc1_dc;
        adc2_acc += *p2++ - adc2_dc;

        *p_adc_acc++ = mul;
        
        if (adc1_acc > ADC_MEAN_MAX_DEVIATION) {
            adc1_acc = ADC_MEAN_MAX_DEVIATION;
        }
        if (adc1_acc < -ADC_MEAN_MAX_DEVIATION) {
            adc1_acc = -ADC_MEAN_MAX_DEVIATION;
        }
        if (adc2_acc > ADC_MEAN_MAX_DEVIATION) {
            adc2_acc = ADC_MEAN_MAX_DEVIATION;
        }
        if (adc2_acc < -ADC_MEAN_MAX_DEVIATION) {
            adc2_acc = -ADC_MEAN_MAX_DEVIATION;
        }
        adc1_mean += -(adc1_mean >> 16) + adc1_acc;
        adc2_mean += -(adc2_mean >> 16) + adc2_acc;
    }

    HAL_GPIO_WritePin(GPIOB, GPIO_PIN_5, GPIO_PIN_RESET);
}

First order IIR:

a1 = 1 - 1/65536

Impulse response after one second = a1^(2500000/8) = 0.00849

EDIT:
Process time = 54% of CPU time

gf · « **Reply #197 on:** May 31, 2024, 05:58:56 pm »

Quote from: Picuino on May 31, 2024, 05:17:19 pm

Another solution I am thinking of is to set the DC component at the ADC input=2048 (actually the DC component is going to be very close to that number) and add a small correction limited to +-100 points with a low pass filter.

I think the ADC can also be configured to return signed samples (-2048...2047), where 0 corresponds to Vref/2.

dietert1 · « **Reply #198 on:** May 31, 2024, 06:01:39 pm »

When i tested the signal chain, i was using that four step "triangle" wave from the ST example. The two neutral states came out to be near 2024 in the ADC capture and that value was pretty stable. The DAC introduced visible shifts due to a difference in rise and fall times. With a sine wave output signal of 100 KHz or less this should not happen though.
Meanwhile i started to test the CIC decimator with a decimation ratio of 64. So at 2.5 MHz sampling its output rate is about 40 KHz - enough for 10 KHz bandwidth. And it uses very few cycles, so one could think about implementing two mixers in order to upgrade the R meter to a RCL meter. Arm Cortex can certainly do this with everything in registers:
1 ADC read pointer
2 termination pointer
3 constant 2048 for DC subtraction
4 Q phase read pointer
5 I phase read pointer
6 Integrator 1 Q
7 Integrator 1 I
8,9 Integrator 2 Q
10,11 Integrator 2 I

So i would generate the two reference phases numerically and use the second ADC channel only for calibration/supervision.

Regards, Dieter

gf · « **Reply #199 on:** June 01, 2024, 04:08:43 pm »

Quote from: Picuino on May 31, 2024, 05:56:28 pm

My first attempt:

Code: [Select]
...
First order IIR: a1 = 1 - 1/65536
Impulse response after one second = a1^(2500000/8) = 0.00849

Process time = 54% of CPU time

Here's an optimized and corrected version: https://godbolt.org/z/G34oKdv4r
I guess it should run almost twice as fast. And if you manage to configure the ADC to return signed samples, you can set ADC_MEAN to 0, which saves additionaly 22 instructions in the loop. You should not declare variabels global if they are only used inside the function (except for large buffers, to save stack space). And avoid volatile whenever possible, it also prevents some compiler optimizations. One key for speed is to keep all (or as many as possible) variables (or better say values in the sense of SSA) inside the innermost loop in registers. Spilling to memory and reloading costs extra instructions and cycles.

With your IIR filter and 1kHz carrier of almost full-scale amplitude, the estimated DC offset has a residual ripple of 3 ADC counts. See the output of the included test in the bottom right Output window. And with lower carrier frequency, the estimation variability becomes larger, of course.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Starting with STM32 (NUCLEO-L412KB) (Read 15937 times)

Share me