Author Topic: STM32L432 numerical issues? (Read 2534 times)

cloidnerux · « **on:** August 10, 2020, 10:09:25 am »

Hello,

I want to measure current and voltage in a 50 Hz system and calculate the phase between them. For this I fit a sin-wave to the measured data and extract the phase information.
ADC sampling works correctly and I get 256 values for a 20ms period.
I tried the algorithm on my computer in python for all sorts of angles and noise amplitudes and it works. I also tried it on the raw values from the MCU and it works.
Trying the algorithm on the STM32L4 I see a correct value for the amplitude, however the phase jumps around periodically and is wrong, as seen in the second graph of the second row(phase of channel 1 and channel 2):

The algorithm calculates the least squares fit for the function f(x)=A*cos(x)+B*sin(x)+c. I use some precalculated values for the XiXi, XiYi and YiYi which I got from the working python script. The algorithm is the following:

Code: [Select]

void GetSineFit(float * data, float * amplitude, float * angle, float * _a, float * _b){
    float a = 0.0f, b = 0.0f;
    float p[3] = {0.0f};
    float XiZi = 0.0f, YiZi = 0.0f;
    float Zi = 0.0f;    
    for(int i = 0; i < BUF_SIZE; i++){
        XiZi += datax_lut[i] * data[i];
        YiZi += datay_lut[i] * data[i];
        Zi   += data[i];
    }
    float A[3][3]= {{XiXi, XiYi, Xi}, {XiYi, YiYi, Yi}, {Xi, Yi, (float)BUF_SIZE}};
    float C[3][3] = {{0.0f}};
    //Calculate the determinant of A
    float n2 = A[0][2] * A[1][0] * A[2][1];
    n2 += A[0][1] * A[1][2] * A[2][0];
    n2 += A[0][0] * A[1][1] * A[2][2];
    n2 -= A[2][2] * A[1][0] * A[0][1];
    n2 -= A[2][1] * A[1][2] * A[0][0];
    n2 -= A[2][0]*A[1][1]*A[0][2];
    if(n2 > 0.0f){
      float x = 1.0f/n2;
      //Calculate the adjunct matrix to A
      C[0][0] =  (A[1][1] * A[2][2]) - (A[2][1] * A[1][2]);
      C[0][1] =  (A[0][2] * A[2][1]) - (A[0][1] * A[2][2]);
      C[0][2] =  (A[0][1] * A[1][2]) - (A[0][2] * A[1][1]);

      C[1][0] =  (A[1][2] * A[2][0]) - (A[1][0] * A[2][2]);
      C[1][1] =  (A[0][0] * A[2][2]) - (A[0][2] * A[2][0]);
      C[1][2] =  (A[0][2] * A[1][0]) - (A[0][0] * A[1][2]);

      C[2][0] =  (A[1][0] * A[2][1]) - (A[1][1] * A[2][0]);
      C[2][1] =  (A[0][1] * A[2][0]) - (A[0][0] * A[2][1]);
      C[2][2] =  (A[0][0] * A[1][1]) - (A[0][1] * A[1][0]);
      for(int i = 0; i < 3; i++){
           for(int j = 0; j < 3; j++){
                C[i][j] *= x;
           }
      }
      p[0] = XiZi;
      p[1] = YiZi;
      p[2] = Zi;
      a = C[0][0] * p[0] + C[0][1] * p[1] + C[0][2] * p[2];
      b = C[1][0] * p[0] + C[1][1] * p[1] + C[1][2] * p[2];
      //c = C[2][0] * p[0] + C[2][1] * p[1] + C[2][2] * p[2];
    } else
    {
      printf("Determinant lower than zero, least squares fit not possible");
      a = 1.0f;
      b = 1.0f;
      //c = 0;
    }
    *amplitude = sqrtf(a*a + b*b);
    *angle = atan2f(a, b) * 180.0f / (float)M_PI;
    *_a = a;
    *_b = b;
}

The two graphs in the bottom row show the calculated A and B values for both channels, these also show the periodic behaviour, but somehow the amplitude calculation is correct.
Do I have a numerical issue with the floating point operations here? Is it some stupid mistake or some interrupt/register thing?

Best regards

Psi · « **Reply #1 on:** August 10, 2020, 10:45:57 am »

I don't know if this is your issue, but you have lines like

n2 -= A[2][0] * A[1][1] * A[0][2];

Where the order of execution is left up to the compiler.

And i'm not sure how the compiler will interpret k -= x * y * z

It could become

k = k - (x * y * z)
or
k = (k - x) * y * z

I'd like to think the compiler will follow the usual rules, but whenever i do anything where order of execution matter i always use brackets.

My only other suggestion is to confirm that float is the same size in python as it is in your STM compiler.

newbrain · « **Reply #2 on:** August 10, 2020, 10:52:39 am »

Quote from: Psi on August 10, 2020, 10:45:57 am

[...]
k = k - (x * y * z)

No. There is no doubt at all that the one I left above is the only correct interpretation.

If you want the standard chapter and verse, it's (C11) 6.5.16.2 Compound assignment, paragraph 3:

Quote

A compound assignment of the form E1 op= E2 is equivalent to the simple assignment
expression E1 = E1 op (E2), except that the lvalue E1 is evaluated only once

Psi · « **Reply #3 on:** August 10, 2020, 11:03:33 am »

I didnt say both were correct,

Point i was making was, trust the compiler at your peril.

janoc · « **Reply #4 on:** August 10, 2020, 11:17:50 am »

Quote from: Psi on August 10, 2020, 11:03:33 am

I didnt say both were correct,

Point i was making was, trust the compiler at your peril.

Or rather - don't use shitty compiler that doesn't implement the actual language standard. I have never heard about a C compiler that did addition/subtraction first and multiplication after. There are ambiguities in the standard and things left to the implementations but this is not one of them.

The only possible problem with that statement is the order of the multiplications there - in some cases it could matter if it causes a temporary over/underflow or casting and thus loss of precision. However, that is impossible to know without knowing what sort of values are being multiplied. That's where brackets could be justified in some situations.

newbrain · « **Reply #5 on:** August 10, 2020, 11:33:41 am »

Quote from: Psi on August 10, 2020, 11:03:33 am

trust the compiler at your peril.

Always.

But here we are not talking about some obscure optimization bug or corner case.
This is basic C syntax and semantic.

newbrain · « **Reply #6 on:** August 10, 2020, 11:45:54 am »

Apart improbable compiler's vagaries:

Could you share also the python script?
As janoc also says some very small value (or large) could affect the calculation, floats only have about 6 significant digits.
Especially if the matrix involved is not well-behaved.
This, IIUC, runs every 20ms: are you sure the HW FPU is being used? If compiled with soft-fp it might take longer than expected (but I did not try to make a precise estimation of the time it would take).

cloidnerux · « **Reply #7 on:** August 10, 2020, 12:08:45 pm »

The python script:

Code: [Select]

import numpy as np
import matplotlib.pyplot as plt



phase = 150
offset = 2047
amplitude = 1000
samples = 512
noise_amplitude = 100
scale = 1


x_val = np.linspace(0, 2*np.pi, samples)
y1 = amplitude*np.sin(x_val) + (np.random.rand(samples)*2*noise_amplitude- noise_amplitude)
y2 = amplitude*np.sin((x_val + phase*np.pi/180)*scale) + (np.random.rand(samples)*2*noise_amplitude- noise_amplitude)
y_ref = np.sin(x_val)
y_ref2 = np.cos(x_val)

def SineFit(dataz):
    #http://mariotapilouw.blogspot.com/2012/03/sine-fitting.html
    global samples, x_val
    datax = np.cos(x_val, dtype=float)
    datay = np.sin(x_val, dtype=float)
    dataz -= np.mean(dataz, dtype=float)
    a = 0
    b = 0
    c = 0
    p = np.zeros((3, 1), dtype=float)
    XiYi = 0
    XiZi = 0
    YiZi = 0
    XiXi = 0
    YiYi = 0
    Xi = 0
    Yi = 0
    Zi = 0
    x = 0
    
    for i in range(samples):
        XiYi += float(datax[i] * datay[i])
        XiZi += float(datax[i] * dataz[i])
        YiZi += float(datay[i] * dataz[i])
        XiXi += float(datax[i] * datax[i])
        YiYi += float(datay[i] * datay[i])
        Xi   += float(datax[i])
        Yi   += float(datay[i])
        Zi   += float(dataz[i])
    A = np.zeros((3,3), dtype=float)
    B = A
    C = A
    X = A
    A = np.array(((XiXi, XiYi, Xi), (XiYi, YiYi, Yi), (Xi, Yi, samples)))
    n = np.linalg.det(A)
    n2 = A[0][2] * A[1][0] * A[2][1]
    n2 += A[0][1] * A[1][2] * A[2][0]
    n2 += A[0][0] * A[1][1] * A[2][2];
    n2 -= A[2][2] * A[1][0] * A[0][1];
    n2 -= A[2][1] * A[1][2] * A[0][0];
    n2 -= A[2][0]*A[1][1]*A[0][2];
    n2 = float(n2)
    if(n > 0):

        C[0][0] =  (A[1][1] * A[2][2]) - (A[2][1] * A[1][2])
        C[0][1] =  (A[0][2] * A[2][1]) - (A[0][1] * A[2][2])
        C[0][2] =  (A[0][1] * A[1][2]) - (A[0][2] * A[1][1])
        
        C[1][0] =  (A[1][2] * A[2][0]) - (A[1][0] * A[2][2])
        C[1][1] =  (A[0][0] * A[2][2]) - (A[0][2] * A[2][0])
        C[1][2] =  (A[0][2] * A[1][0]) - (A[0][0] * A[1][2])
        
        C[2][0] =  (A[1][0] * A[2][1]) - (A[1][1] * A[2][0])
        C[2][1] =  (A[0][1] * A[2][0]) - (A[0][0] * A[2][1])
        C[2][2] =  (A[0][0] * A[1][1]) - (A[0][1] * A[1][0])
         
        for i in range(3):
           for j in range(3):
                X[i][j] = C[i][j] / n2
        p[0] = XiZi
        p[1] = YiZi
        p[2] = Zi
        a = X[0][0] * p[0] + X[0][1] * p[1] + X[0][2] * p[2]
        b = X[1][0] * p[0] + X[1][1] * p[1] + X[1][2] * p[2]
        c = X[2][0] * p[0] + X[2][1] * p[1] + X[2][2] * p[2]
    else:
        a = 1
        b = 1
        c = 0
    mag = np.sqrt(a*a + b*b, dtype=float)
    phi = 0
    phi = np.arctan2(a, b, dtype=float)
    phi = np.unwrap(phi)
    
    rn = dataz - a*datax - b*datay-c
    error = np.sqrt(np.sum(rn*rn)/samples)
    return mag, phi*180/np.pi, error

phase = np.linspace(0, 360, 20)

phi = np.zeros(phase.shape)

ind = 0
for p in phase:
    y2 = amplitude*np.sin((x_val + p*np.pi/180)*scale) + (np.random.rand(samples)*2*noise_amplitude- noise_amplitude)
    mag, phi[ind],_ = SineFit(y2)
    ind += 1

fig, ax = plt.subplots(2,1)
ax[0].plot(phase)
ax[0].plot(phi)
ax[1].plot(np.rad2deg(np.unwrap(np.deg2rad(phase-phi))))
plt.show()

Quote

As janoc also says some very small value (or large) could affect the calculation, floats only have about 6 significant digits.

Do you know if the floats in a STM32 are 16 or 32-bit?

Code: [Select]

This, IIUC, runs every 20ms: are you sure the HW FPU is being used? If compiled with soft-fp it might take longer than expected (but I did not try to make a precise estimation of the time it would take).

I have some GPIOs that toggle on start and completion. With complete serial output it takes 5ms, just the math takes 500us. I activated the hard-fp options in platform.io, so it should use the FPU.

newbrain · « **Reply #8 on:** August 10, 2020, 12:53:47 pm »

Quote from: cloidnerux on August 10, 2020, 12:08:45 pm

The python script:
[...]
Do you know if the floats in a STM32 are 16 or 32-bit?
Code: [Select]
This, IIUC, runs every 20ms: are you sure the HW FPU is being used? If compiled with soft-fp it might take longer than expected (but I did not try to make a precise estimation of the time it would take).I have some GPIOs that toggle on start and completion. With complete serial output it takes 5ms, just the math takes 500us. I activated the hard-fp options in platform.io, so it should use the FPU.

Thanks for the script - I might have a look later, no guarantee though...

If you are using a reasonable compiler on a common architecture floats are 32 bits and double 64 bits, the sizes are more or less forced by the minimum standard limits (IEEE754 is not mandatory for C, but very common).
Specifically on STM32 this holds true.
The FPU on STM32L432 is only single precision (float, 32 bits), while the default for numpy is 64 bits (IIRC).

If lack of precision is a problem or not depends on the actual data and the algorithm, you might try to see if the result changes using doubles in C (or using float32 in numpy!).
Doubles would significantly slow down the MCU, as they'd be SW emulated.

Good job checking the times - so that is not an issue.

voltsandjolts · « **Reply #9 on:** August 10, 2020, 01:04:33 pm »

Curve fitting for phase difference measurement seems a bit odd to me - is it normal?

What are the alternative ways of measuring phase difference, that are perhaps more computationally efficient?

Maybe deriving ADC sample clock from one input signal and use a Goertzel filter on the other input signal to measure phase?

Or just do two Goertzel filters, one on each ADC input channel, to calculate phase of each and then get the delta.

hans · « **Reply #10 on:** August 10, 2020, 01:13:18 pm »

Are you sure that those float matrices/arrays are initialized properly?

Code: [Select]

#include <stdio.h>

int main(int a, char** b) {
    float A[3] = {1.0f};
    float C[3][3] = {{1.0f}};
    for (auto i = 0; i < 3; i++) {
        printf("A[%d]=%f\r\n", i, A[i]);
    }

    for (auto i = 0; i < 3; i++) {
        for (auto j = 0; j < 3; j++) {
            printf("C[%d][%d]=%f\r\n", i, j, C[i][j]);
        }
    }
    return 0;
}

Output:

Quote

A[0]=1.000000
A[1]=0.000000
A[2]=0.000000
C[0][0]=1.000000
C[0][1]=0.000000
C[0][2]=0.000000
C[1][0]=0.000000
C[1][1]=0.000000
C[1][2]=0.000000
C[2][0]=0.000000
C[2][1]=0.000000
C[2][2]=0.000000

I changed the float value to 1.0f, because on my PC the small stack is preinitialized to zeros, so I wanted to some all floats change at the time (they don't). On embedded you can't be 100% sure that the stack is indeed properly initialized.

Another thing you could try is copy the C code, add some stimulus ADC data captured from the hardware, and let the algorithm run on the computer (not the python script). This way you for sure know that the C code is correctly translated from MATLAB/Python/etc. (assuming you use some standard x86 compiler like GCC or similar).

edit: hmm, I see values are also written before read, so that shouldn't be that big of an issue then..

cloidnerux · « **Reply #11 on:** August 10, 2020, 03:29:18 pm »

I checked the floats on the STM and windows, they are both 4 byte/32-Bit.
I copy/pasted the code into a c-file and get correct results, independent of phase shift or noise, which is the same as with the python script.
I changed the function and all variables to static, so there is no heap or stack issue going on.
And still it is not working.
I also tried to switch to -mfloat-abi=softfp with no luck.

I will try to update the arm-gcc of PIO and see what happens.

Edit: Changing to some Version 8 of arm-gcc did not help either
I also tried to send the data I give the algorithm over the serial port and calculate the sine-fit on my computer, which gives the desired results, even for the crappiest of waveforms. The phase is slowly changing, as the ADC sampling is slightly offset each time. But still, the on-board calculations go haywire.

iMo · « **Reply #12 on:** August 10, 2020, 04:28:04 pm »

Doublecheck sqrtf and atanf within a simple routine - whether they work as expected..

cloidnerux · « **Reply #13 on:** August 10, 2020, 04:32:27 pm »

So, somehow I chased a ghost, so to say.
I processed 3 channel. I changed everything to one channel and suddenly I got the correct results.
Then I tried sending all three channels, but only processing one and got the same weird graphs..
So it seems there is an issue with the serial port sending the data rather than an issue with the algorithm.

Thanks for all the help, I got a step further.

SiliconWizard · « **Reply #14 on:** August 10, 2020, 04:55:31 pm »

Quote from: voltsandjolts on August 10, 2020, 01:04:33 pm

Curve fitting for phase difference measurement seems a bit odd to me - is it normal?

I also found this a weird approach.
Curve fitting in such an example (with the number of points used) is notoriously fiddly. A small loss in precision can get you completely wrong values.

As said above - apart from using another approach - one possible reason for the discrepancy is: precision indeed. STM32L4 FPUs are single-precision, whereas Python's "float" uses the default precision which is probably at least double-precision FP (at least AFAIK, I don't know Python much.)

You may want to test your C code on a computer with a C compiler instead, and see what you get.

SiliconWizard · « **Reply #15 on:** August 10, 2020, 05:18:43 pm »

Quote from: cloidnerux on August 10, 2020, 04:32:27 pm

So, somehow I chased a ghost, so to say.

Alright.

But I would still suggest considering a few of the points we made above.

In particular, I would still suggest evaluating the robustness of your "algorithm" in presence of noise or loss of precision. I would somehow simulate that and see how well your curve fitting fares.

cloidnerux · « **Reply #16 on:** August 10, 2020, 05:29:41 pm »

Quote

In particular, I would still suggest evaluating the robustness of your "algorithm" in presence of noise or loss of precision. I would somehow simulate that and see how well your curve fitting fares.

I simulated with nois up to 20% of the signal amplitude. I simulated frequency deviations and all phase angles. I also have a screenshot of some real-world 50Hz hum the circuit picked up:

I would say the fit is rather robust and will give me the information I need.
I looked into different options, but somehow this is one of the computational more efficient ones. It looks like a lot of code, but it has fairly few computations for what it does, especially fewer than a FFT.

voltsandjolts · « **Reply #17 on:** August 10, 2020, 06:44:02 pm »

You don't need to do a full FFT (DFT) because you are only interested in one frequency bin, either 50Hz or 60Hz.
So you can use Goertzel method which is like one multiply and two additions per ADC sample.

cloidnerux · « **Reply #18 on:** August 10, 2020, 11:37:50 pm »

Quote

So you can use Goertzel method which is like one multiply and two additions per ADC sample.

I will give it a look. I didn't read about it before, but it sound interesting.

cloidnerux · « **Reply #19 on:** August 11, 2020, 07:16:05 am »

So, I "implemented" the goertzel algorithm(From https://stackoverflow.com/questions/11579367/implementation-of-goertzel-algorithm-in-c)

Code: [Select]

void goertzel(float * data, int numSamples,int TARGET_FREQUENCY,int SAMPLING_RATE, float * amplitude, float * angle)
{
    int     k,i;
    float   floatnumSamples;
    float   omega,sine,cosine,coeff,q0,q1,q2,magnitude,real,imag;

    float   scalingFactor = numSamples / 2.0f;

    floatnumSamples = (float) numSamples;
    k = (int) (0.5f + ((floatnumSamples * TARGET_FREQUENCY) / SAMPLING_RATE));
    omega = (2.0f * M_PI * k) / floatnumSamples;
    sine = sin(omega);
    cosine = cos(omega);
    coeff = 2.0f * cosine;
    q0=0.0f;
    q1=0.0f;
    q2=0.0f;

    for(i=0; i<numSamples; i++)
    {
        q0 = coeff * q1 - q2 + data[i];
        q2 = q1;
        q1 = q0;
    }

    // calculate the real and imaginary results
    // scaling appropriately
    real = (q1 - q2 * cosine) / scalingFactor;
    imag = (q2 * sine) / scalingFactor;

    *amplitude = sqrtf(real*real + imag*imag);
	*angle = atan2(real, -imag) * 180.0f / (float)M_PI;
}

Looking at the results, the goertzel algorithm seems to have a slight offset, but is not that much influenced by noise(0.5% noise, orange - goertzel, blue - fitting, in the top graph blue is the reference):

And with 20% noise:

I then tested it against the curve-fitting approach by processing 1000000 times the same data on my computer with gcc 8.2.0-5:

Code: [Select]

clock_t begin = clock();
	for(int i = 0; i < iters; i++){
		GetSineFit(data, &mag, &phi, &a, &b);
	}
	clock_t end = clock();
	double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
	printf("Data fit: %f\n", time_spent);
	begin = clock();
	for(int i = 0; i < iters; i++){
		goertzel(data, BUF_SIZE, 50, BUF_SIZE*50, &mag2, &phi2);
	}
	end = clock();
	time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
	printf("Goertzel: %f\n", time_spent);

The results are quite interesting:

Code: [Select]

Data fit: 0.982000
Goertzel: 1.968000

The goertzel algorithm takes almost twice the time. I tried with option -Os:

Code: [Select]

Goertzel: 0.993000
Data fit: 0.296000

And still goertzel is slower, which is not intuitive. I would suspect that the compiler and the x86 processor can handle the matrix stuff quite well and can simplify the first loop to some multiply-add instructions, or something like that. I assume however, that if I don't precalculate some of the results the curve-fitting approach will not be that much faster.

KE5FX · « **Reply #20 on:** August 11, 2020, 07:28:49 am »

Quote from: hans on August 10, 2020, 01:13:18 pm

Are you sure that those float matrices/arrays are initialized properly?
float A[3] = {1.0f};
float C[3][3] = {{1.0f}};

Note that this only works for zero initialization. It should have worked the way you assumed, but it doesn't.

Basically, if you initialize any array elements at all, the whole array will be zero-initialized. Assigning { 0 } happens to have the effect of zeroing the array, but it doesn't actually specify the initialized value for any elements beyond the first.

voltsandjolts · « **Reply #21 on:** August 11, 2020, 08:57:28 am »

Something is not right with that comparison between curve fitting and Goertzel.

With the pre-computed coefficients (as you would do in 'the real world') the Goertzel calculations done on each block are:

Code: [Select]

    q0=0.0f;    q1=0.0f;    q2=0.0f;
    for(i=0; i<numSamples; i++)    {
        q0 = coeff * q1 - q2 + data[i];
        q2 = q1;
        q1 = q0;
    }
    real = (q1 - q2 * cosine) / scalingFactor;
    imag = (q2 * sine) / scalingFactor;
    *amplitude = sqrtf(real*real + imag*imag);
    *angle = atan2(real, -imag) * 180.0f / (float)M_PI;

...and the curve fit code is this which has many more multiplications:

Code: [Select]

for(int i = 0; i < BUF_SIZE; i++){
        XiZi += datax_lut[i] * data[i];
        YiZi += datay_lut[i] * data[i];
        Zi   += data[i];
    }
    float A[3][3]= {{XiXi, XiYi, Xi}, {XiYi, YiYi, Yi}, {Xi, Yi, (float)BUF_SIZE}};
    float C[3][3] = {{0.0f}};
    //Calculate the determinant of A
    float n2 = A[0][2] * A[1][0] * A[2][1];
    n2 += A[0][1] * A[1][2] * A[2][0];
    n2 += A[0][0] * A[1][1] * A[2][2];
    n2 -= A[2][2] * A[1][0] * A[0][1];
    n2 -= A[2][1] * A[1][2] * A[0][0];
    n2 -= A[2][0]*A[1][1]*A[0][2];
    if(n2 > 0.0f){
      float x = 1.0f/n2;
      //Calculate the adjunct matrix to A
      C[0][0] =  (A[1][1] * A[2][2]) - (A[2][1] * A[1][2]);
      C[0][1] =  (A[0][2] * A[2][1]) - (A[0][1] * A[2][2]);
      C[0][2] =  (A[0][1] * A[1][2]) - (A[0][2] * A[1][1]);

      C[1][0] =  (A[1][2] * A[2][0]) - (A[1][0] * A[2][2]);
      C[1][1] =  (A[0][0] * A[2][2]) - (A[0][2] * A[2][0]);
      C[1][2] =  (A[0][2] * A[1][0]) - (A[0][0] * A[1][2]);

      C[2][0] =  (A[1][0] * A[2][1]) - (A[1][1] * A[2][0]);
      C[2][1] =  (A[0][1] * A[2][0]) - (A[0][0] * A[2][1]);
      C[2][2] =  (A[0][0] * A[1][1]) - (A[0][1] * A[1][0]);
      for(int i = 0; i < 3; i++){
           for(int j = 0; j < 3; j++){
                C[i][j] *= x;
           }
      }
      p[0] = XiZi;
      p[1] = YiZi;
      p[2] = Zi;
      a = C[0][0] * p[0] + C[0][1] * p[1] + C[0][2] * p[2];
      b = C[1][0] * p[0] + C[1][1] * p[1] + C[1][2] * p[2];
      //c = C[2][0] * p[0] + C[2][1] * p[1] + C[2][2] * p[2];
    } else
    {
      printf("Determinant lower than zero, least squares fit not possible");
      a = 1.0f;
      b = 1.0f;
      //c = 0;
    }
    *amplitude = sqrtf(a*a + b*b);
    *angle = atan2f(a, b) * 180.0f / (float)M_PI;
    *_a = a;
    *_b = b;

So, something going on with gcc or PC processing.
I think these comparisons need to be done on the target microcontroller to be meaningful, and then the Goertzel will be faster.

Harvs · « **Reply #22 on:** August 11, 2020, 09:58:52 am »

I haven't looked at the details of the specific curve fitting algorithm used by the OP, however an LES fit for determining phase is one of the common approaches for Phasor Measurement Units used in power systems. It is computationally very quick with precomputed pseudo-inverse matrix. The bit where there's more variation in algorithms is determining frequency on a cycle by cycle basis.

Not to derail this thread, but I have a open source project in work (well, the project's essentially finished but the documentation has just started) of a phasor measurement unit for use in research.

In this project I'm downsampling from 12.8ksps to 1.6ksps with a cascade of two biquad filters, then running the LES algorithm on each new sample (which helps with frequency tracking). I'm doing it all in fixed point math (apart from the ATAN and SQRT), and for a 3 phase system it used around 20% processing time on a STM32F407 @168MHz. So not that difficult at all. I've splurged on a stm32H7 for the new version just to try new things.

cloidnerux · « **Reply #23 on:** August 11, 2020, 11:23:40 am »

Quote

Not to derail this thread, but I have a open source project in work (well, the project's essentially finished but the documentation has just started) of a phasor measurement unit for use in research.

In this project I'm downsampling from 12.8ksps to 1.6ksps with a cascade of two biquad filters, then running the LES algorithm on each new sample (which helps with frequency tracking). I'm doing it all in fixed point math (apart from the ATAN and SQRT), and for a 3 phase system it used around 20% processing time on a STM32F407 @168MHz. So not that difficult at all. I've splurged on a stm32H7 for the new version just to try new things.

Do you have some public github for this project?

Harvs · « **Reply #24 on:** August 12, 2020, 01:53:43 am »

Yes, but it's all still a work in progress, needs a lot of code clean up and I need to finish the timestamping component before I can do real-world testing against another PMU that I've lined up. I've also started a blog where I'm documenting parts of the system (again just in its infancy.)
https://github.com/harvie256/xmu
https://xmuopen.wordpress.com/

What's your application?

BTW this is the paper I've implemented
https://ieeexplore.ieee.org/document/6547769


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32L432 numerical issues? (Read 2534 times)

Share me