Author Topic: This code is so broken it some how fixes itself.  (Read 4096 times)

0 Members and 1 Guest are viewing this topic.

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
This code is so broken it some how fixes itself.
« on: January 27, 2023, 08:21:59 pm »
So this code works.  It shouldn't, but it does.  (Friend wrote it, acchmm).  I just find it amusing how broken it is that it seems to fix itself.

The SAI's are in 16bit extended.

The for loop is nuts.  There are not 8 16bit ints there.  There are 8 8bit ints.  Yet it works.  In fact a whole bunch of different viarities work also.

      for (int i = 0; i < 8; i+=4)

Is my favourite.

      for (int i = 0; i < 2; i++)  // Nope , not even with a int32_t buffer.
      for (int i = 0; i < 4; i++)  // Nope
      for (int i = 0; i < 8; i++)  // YES
      for (int i = 0; i < 8; i+=2)  // YES
      for (int i = 0; i < 8; i+=4)  // YES

I think it's just lucky word alignment.

Code: [Select]
int16_t buffer[8];

float gain = 0.75f;
while (1) {
HAL_SAI_Receive(&hsai_BlockA1, (uint8_t*) buffer, 8, 10);
for (int i = 0; i < 8; i++) {

double sample = buffer[i];
sample = sample * gain; // pre-gain
sample = PeakFilter_Update(&bassBoost, sample);
sample = PeakFilter_Update(&midCut, sample);
sample = PeakFilter_Update(&highBoost, sample);

buffer[i] = (int16_t) sample;
}
HAL_SAI_Transmit(&hsai_BlockA2, (uint8_t*) buffer, 8, 10);
}

EDIT: I'm laughing at myself but this is a classic example.  Classic.  Your code works first time.  "That was easy".  Never pay any attention.  "Some time later...", "That doesn't look right... I'll take a look...."

"Wait.  How does this work?  How did this ever work?  What the hell?"

No... I'm not looking for help to fix it.  I can see a symetry of problems that if fixes will make it make sense, but I just found it funny.  Symmetrical insanity just happens to work.

EDIT2: Oh and it has many other issues to boot.
« Last Edit: January 27, 2023, 08:29:35 pm by paulca »
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline Sherlock Holmes

  • Frequent Contributor
  • **
  • !
  • Posts: 570
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #1 on: January 27, 2023, 09:33:35 pm »
So this code works.  It shouldn't, but it does.  (Friend wrote it, acchmm).  I just find it amusing how broken it is that it seems to fix itself.

The SAI's are in 16bit extended.

The for loop is nuts.  There are not 8 16bit ints there.  There are 8 8bit ints.  Yet it works.  In fact a whole bunch of different viarities work also.

      for (int i = 0; i < 8; i+=4)

Is my favourite.

      for (int i = 0; i < 2; i++)  // Nope , not even with a int32_t buffer.
      for (int i = 0; i < 4; i++)  // Nope
      for (int i = 0; i < 8; i++)  // YES
      for (int i = 0; i < 8; i+=2)  // YES
      for (int i = 0; i < 8; i+=4)  // YES

I think it's just lucky word alignment.

Code: [Select]
int16_t buffer[8];

float gain = 0.75f;
while (1) {
HAL_SAI_Receive(&hsai_BlockA1, (uint8_t*) buffer, 8, 10);
for (int i = 0; i < 8; i++) {

double sample = buffer[i];
sample = sample * gain; // pre-gain
sample = PeakFilter_Update(&bassBoost, sample);
sample = PeakFilter_Update(&midCut, sample);
sample = PeakFilter_Update(&highBoost, sample);

buffer[i] = (int16_t) sample;
}
HAL_SAI_Transmit(&hsai_BlockA2, (uint8_t*) buffer, 8, 10);
}

EDIT: I'm laughing at myself but this is a classic example.  Classic.  Your code works first time.  "That was easy".  Never pay any attention.  "Some time later...", "That doesn't look right... I'll take a look...."

"Wait.  How does this work?  How did this ever work?  What the hell?"

No... I'm not looking for help to fix it.  I can see a symetry of problems that if fixes will make it make sense, but I just found it funny.  Symmetrical insanity just happens to work.

EDIT2: Oh and it has many other issues to boot.

How do you distinguish between "works" and "broken"?
“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.” ~ Arthur Conan Doyle, The Case-Book of Sherlock Holmes
 

Offline Sherlock Holmes

  • Frequent Contributor
  • **
  • !
  • Posts: 570
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #2 on: January 27, 2023, 09:37:58 pm »
Note:

Code: [Select]
HAL_SAI_Receive(&hsai_BlockA1, (uint8_t*) buffer, 8, 10);

If that 8 is the size of the buffer to write to, then only 8 bytes are being used despite the fact the declared buffer is 16 bytes long...

The code declares a 16 byte buffer, but it only ever needs and uses an 8 byte buffer, so long as buffer is at least 8 bytes long, everything is fine - is that what you're talking about though?


« Last Edit: January 27, 2023, 09:40:48 pm by Sherlock Holmes »
“When you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.” ~ Arthur Conan Doyle, The Case-Book of Sherlock Holmes
 
The following users thanked this post: thm_w

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6349
  • Country: ca
  • Non-expert
Re: This code is so broken it some how fixes itself.
« Reply #3 on: January 27, 2023, 10:27:14 pm »
Note:

Code: [Select]
HAL_SAI_Receive(&hsai_BlockA1, (uint8_t*) buffer, 8, 10);

If that 8 is the size of the buffer to write to, then only 8 bytes are being used despite the fact the declared buffer is 16 bytes long...

The code declares a 16 byte buffer, but it only ever needs and uses an 8 byte buffer, so long as buffer is at least 8 bytes long, everything is fine - is that what you're talking about though?

16 bytes are used, assuming HAL_SAI_Recieve is set to 16-bit (instead of 8-bit) in the initialization. Which OP is saying it is.

I think the point is that HAL_SAI_Receive is writing to an address using 16-bit each time, but then OP thought the loop was pulling out data 8-bit at a time?
Which I don't think it is, because buffer is a 16-bit array, right?

int16_t buffer = {1, 2, 3, 4};
buffer[0] = 1
buffer[1] = 2
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Online hans

  • Super Contributor
  • ***
  • Posts: 1637
  • Country: nl
Re: This code is so broken it some how fixes itself.
« Reply #4 on: January 27, 2023, 11:34:35 pm »
That function looks like typical C HAL hackery mess - where 8 defines the number of items and not the number of bytes, and the item size is configured elsewhere. It increments the dataPtr by 2 bytes at a time, and only after byte increments casts it to 16-bits for the write operation. That is functionally correct, but IMO very odd way of dealing with sizes, pointers and casts. But I rest my case, C does not really allow for a type generic implementation without either macros (= bigger mess) or C++ templates..

Anyhow, if we assume that OP tested a SAI passthrough which works and does not discard any samples, then it kind of is a mystery what behaviour is observed. As Sherlock says, what does broken and what does working look like? Are there really zero differences to spot for the 3 different "YES" situations listed?

Considering that the filter behaviour can be tested independently.. one may be able to generate a sweep of input frequencies into a buffer and pull it through the suspecting code. The output samples can then be plotted in e.g. a bode plot. It's a bit tedious work to set up the testbench, and perhaps there are frameworks or tools that don't require you to write everything yourself (e.g. thinking of MATLAB/Octave), but it could give some insight what the numerical operations are affecting to your raw ADC/DAC samples.

However, I think some sort of unit testing is always beneficial for DSP or business logic code like this. The HAL functions serve their purposes of transferring in ADC/DAC data with some real-time element attached to it (not to cause time-related glitches etc.), but other than that the calculations can be modelled very well on a PC with unit tests instead. If that catches 95% of the mistakes before uploading to hardware, it can reduce the head scratching alot between distrusting the HAL configuration/usage and/or your own tested application code.
« Last Edit: January 27, 2023, 11:37:56 pm by hans »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14445
  • Country: fr
Re: This code is so broken it some how fixes itself.
« Reply #5 on: January 27, 2023, 11:39:48 pm »
That function looks like typical C HAL hackery mess - where 8 defines the number of items and not the number of bytes, and the item size is configured elsewhere.

So? How is a function that takes a number of items as parameter instead of bytes hackery?
The HAL is poorly documented, and that's the real issue (but we all know documentation is hard shit to get right.) When using the HAL, I've always looked at its source code when using HAL functions. Rather than trying to guess. May sound annoying, but that's as good as you'll get.

 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3639
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #6 on: January 28, 2023, 12:02:02 am »
Without refering to the code for HAL_SAI_Receive(), it's not possible to say whether it is correct, or not. I looked at the github page linked above by thm_w, but couldn't understand the workings of the function enough to say. I will note, in passing, that the function is written in an almost unbelievably stupid way, effectively testing whether it is in 8 or 16 bit mode for every single word transferred.

Simply because the code calls that function using a pointer cast to (uint8_t *) doesn't really mean anything. It definitely does not mean that "there are 8 8bit ints". It doesn't even mean that the ints are unsigned! All it does is mollify the type checker that the function is being called with arguments compatible with the function's prototype. The data representation is unchanged, and this cast has no effect on further uses of buffer later in the caller.

I've observed that many C programmers imbue the cast with magical, ineffable powers. For example, the code inside the loop which does
Code: [Select]
buffer[i] = (int16_t) sample;
uses a totally pointless cast expression. When the lvalue and the rvalue in an assignment are both arithmetic types, conversion is automatic.
 
The following users thanked this post: karpouzi9

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6349
  • Country: ca
  • Non-expert
Re: This code is so broken it some how fixes itself.
« Reply #7 on: January 28, 2023, 12:29:59 am »
I don't disagree that code is not optimized, I wouldn't call it unbelievably stupid though. I doubt the performance difference is significant.
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3639
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #8 on: January 28, 2023, 12:59:37 am »
I guess it's an opportunity for another gripe which is that hardware companies seldom understand software enough to use it properly. If the prototype for HAL_SAI_Receive() declared that the type of the Data argument was void *, then callers could pass in their 8-bit or 16-bit or 32-bit arrays without any cast at all. Declaring the type as uint8_t* is very misleading since audio samples are always signed!
 
The following users thanked this post: thm_w, paulca

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14445
  • Country: fr
Re: This code is so broken it some how fixes itself.
« Reply #9 on: January 28, 2023, 01:52:36 am »
I don't disagree that code is not optimized, I wouldn't call it unbelievably stupid though. I doubt the performance difference is significant.

Optimizers almost certainly optimize this by factoring the tests. So while not pretty to read, it probably doesn't make a difference at all, unless maybe you compile with zero optimization.

As to 'uint8_t *' for parameters taking buffers, except if said buffers actually make sense as a succession of 8-bit words, I'm fully against that style. It requires casting if the pointer passed is not actually a 'uint8_t *', and could be misleading indeed regarding any alignment requirement.

Use 'void *', and properly document if there is any alignment requirement or not. No cast needed, no possible confusion.

Note: also, it's pretty straightforward to check alignment at run time from a pointer, and I don't think they do it in the HAL (I would have to check that back.)

For instance, 16-bit aligned:
Code: [Select]
if (((uintptr_t) pointer) % 2 == 0) {... }
Not rocket science.
« Last Edit: January 28, 2023, 02:05:54 am by SiliconWizard »
 
The following users thanked this post: thm_w

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #10 on: January 28, 2023, 09:30:48 am »
Quote
There are not 8 16bit ints there.  There are 8 8bit ints.
Why do you say that?  The buffer is clearly 8 16bit ints.  So it's all dependent on how HAL_SAI_RECEIVE works.


If HAI_SAI_RECEIVE() "Size" parameter is actually a count of data whose size was specified sometime back in initialization and is stored in the hsai handle data structure.
That appears to be the case.  So it's not your co-workers code that sucks, it's the HAL library (yet again.)

Here's an edited copy of the function (from some STM32F4xx HAL version), with some annotations added.

Code: [Select]
HAL_StatusTypeDef HAL_SAI_Receive(SAI_HandleTypeDef *hsai, uint8_t *pData, uint16_t Size, uint32_t Timeout)
{
   // BLAH BLAH
    hsai->pBuffPtr = pData;
    hsai->XferCount = Size;
   //   BLAH
    /* Receive data */
    while (hsai->XferCount > 0U) {
      if ((hsai->Instance->SR & SAI_xSR_FLVL) != SAI_FIFOSTATUS_EMPTY) {
//***** Pick up "item" size from handle, copy that many bytes
        if ((hsai->Init.DataSize == SAI_DATASIZE_8) && (hsai->Init.CompandingMode == SAI_NOCOMPANDING)) {
          (*hsai->pBuffPtr++) = hsai->Instance->DR;
        } else if (hsai->Init.DataSize <= SAI_DATASIZE_16) {
          *((uint16_t *)hsai->pBuffPtr) = hsai->Instance->DR;
//***** adjust the pointer by the size from the handle.
          hsai->pBuffPtr += 2U;
        } else {
          *((uint32_t *)hsai->pBuffPtr) = hsai->Instance->DR;
          hsai->pBuffPtr += 4U;
        }
//****** Decrement "Size" parameter value by 1
        hsai->XferCount--;
      }
    // More BLAH
    }

 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #11 on: January 28, 2023, 10:01:55 am »
It has many issues, not one.  The size of what SAI reads and how the loop goes through it appear to only be part of the problem. 

Anyway.  It shall be rewritten.  First off, the TX should be interrupt, the handling code should be in the ISR.  I shoudn't be calling the SAI HAL function, because I explicitly want to copy the contents of the SAI FIFO only.  I also don't want the FIFOs to block or be consumed if that makes sense, I just want them to free run.  If I read an input SAI at twice the sample frequency I want to receive duplicate FIFO contents.  If HAL decides to go around it's 1ms Tick delay loop it will get ugly.  I also really don't need it to check anything, at this stage all I want it to do really is give me the contents of the Rx FIFO, verbatim as 2x32bit ints.  I'll do the rest.  That should be a simple read and shift the 8 bytes out of the hardware array.  No need for HAL abstractions for that part.  I have a limited cycle budget obviously.

I just found it funny that it worked, like I wrote it out in more or less one go (i've played a lot with the eq params), ran it and it played music with EQ without distortion.  I've been using it on trial for a week now.  It was only when I read what I'd written I went, "Wait... hang on."
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #12 on: January 28, 2023, 10:03:20 am »
I've observed that many C programmers imbue the cast with magical, ineffable powers. For example, the code inside the loop which does
Code: [Select]
buffer[i] = (int16_t) sample;
uses a totally pointless cast expression. When the lvalue and the rvalue in an assignment are both arithmetic types, conversion is automatic.

I believe the term used is "explicit cast" where an "implicit cast" would have occurred anyway.  There is a reason why it exists and even has a name for the technique.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 
The following users thanked this post: hans

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
Re: This code is so broken it some how fixes itself.
« Reply #13 on: January 28, 2023, 04:06:12 pm »
Being explicit over implicit is not "totally pointless"; code is written to communicate the intent to both human readers, and compilers.

I really struggle to see why most libraries are so poorly documented. I mean, given typical 3-4 arguments for a function, it takes maybe 5-10 minutes to write a simple comment which lists all the arguments and clearly explains what they are. Heck, I consider myself a lazy documentation writer yet my own internal functions are usually better documented than ones in widely used libraries.

There should never be confusion if a parameter is a count or size. Count should be named count, and size named size, and then the comment should read:
// number of items
or
// size in bytes

It's sad such a thing even needs to be discussed, but result of absolute laziness during code writing wastes a lot of time afterwards. Why don't standards like MISRA C mandate commenting the purpose and usage of each argument and return value of non-static functions?
« Last Edit: January 28, 2023, 04:08:36 pm by Siwastaja »
 
The following users thanked this post: Ed.Kloonk, hans, thm_w, SiliconWizard

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #14 on: January 28, 2023, 04:27:15 pm »
Being explicit over implicit is not "totally pointless"; code is written to communicate the intent to both human readers, and compilers.

Indeed.  It can also catch some fairly nasty surprise bugs should someone want to attempt changing the type of the receiver variable to see what happens.  Probably not a big concern in a single C function, but good habits are habits.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #15 on: January 28, 2023, 08:48:06 pm »
Quote
First off, the TX should be interrupt, the handling code should be in the ISR.  I shoudn't be calling the SAI HAL function
Ah, but those are "failures of design" rather than "failures of coding."
 

Offline Leeima

  • Contributor
  • Posts: 31
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #16 on: January 28, 2023, 09:11:39 pm »
I guess it's an opportunity for another gripe which is that hardware companies seldom understand software enough to use it properly.

In my experience I've noticed the inverse is often true! Software companies seldom understand hardware enough to use it properly!  ;)
 

Offline helius

  • Super Contributor
  • ***
  • Posts: 3639
  • Country: us
Re: This code is so broken it some how fixes itself.
« Reply #17 on: January 28, 2023, 09:21:31 pm »
From The Hacker's Dictionary:
Quote
finger-pointing syndrome: n.

All-too-frequent result of bugs, esp. in new or experimental configurations. The hardware vendor points a finger at the software. The software vendor points a finger at the hardware. All the poor users get is the finger.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #18 on: January 29, 2023, 02:59:10 pm »
Well, this is my second attempt at the same:

Caveats:  It's hardcoded for 1 Rx 1 Tx for now and fresh out of the fingers.

Code: [Select]
uint32_t rx1Buffer[2];
uint32_t tx1Buffer[2];

void SAI_Tx_Interrupt(SAI_HandleTypeDef *hsai) {
// Get rx1 it status
uint32_t ite = NVIC_GetEnableIRQ(SAI1_IRQn);
// Shhh it
NVIC_DisableIRQ(SAI1_IRQn);
// Move its FIFO to Tx buffer
tx1Buffer[0] = rx1Buffer[0];
tx1Buffer[1] = rx1Buffer[1];
        // Release it
if (ite) {
NVIC_EnableIRQ(SAI1_IRQn);
}
        EQ_Run(tx1Buffer);
////////////////////////////
/* Transmit 2x32 bits of data */
hsai->Instance->DR = tx1Buffer[0];
hsai->Instance->DR = tx1Buffer[1];
}

void SAI_Rx_Interrupt(SAI_HandleTypeDef *hsai) {

// Dont want the Tx reading as we write
        uint32_t ite = NVIC_GetEnableIRQ(SAI2_IRQn);
NVIC_DisableIRQ(SAI2_IRQn);
////////////////////////////
/* Receive 2x32 bits of data */
rx1Buffer[0] = hsai->Instance->DR;
rx1Buffer[1] = hsai->Instance->DR;
if (ite) {
NVIC_EnableIRQ(SAI2_IRQn);
}
}

Much less code.  A basic "borrow and strip" approach from HAL in most cases.  "Works" in that it through-puts music with EQ.  I have noticed one or two clicks, so it might need a bit of tweaking.

The handling of the actual 16bit (in this case) audio, now becomes much cleared.

The raw 32bit words contain right hand justified signed 16bit words.

Code: [Select]
void EQ_Run(uint32_t*buf) {
int16_t *buffer = (int16_t*)buf;
for (int i = 0; i < 4; i+=2) {
double sample = buffer[i];

Now that's a bit clearer, I can try 24bit.

EDIT:  I'm still a little concerned about:
   for (int i = 0; i < 4; i+=2) {

Which could be "left" justified surely?
« Last Edit: January 29, 2023, 03:04:02 pm by paulca »
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8168
  • Country: fi
Re: This code is so broken it some how fixes itself.
« Reply #19 on: January 29, 2023, 03:53:54 pm »
Geez, if you want to control how interrupts pre-empt each other, just set the priorities accordingly (NVIC_SetPriority(IRQn, priority)) once. Enabling and disabling other interrupt sources from other ISRs is an ugly hack, there is absolutely no need for this kind of crap, it very likely has some nasty corner cases.
« Last Edit: January 29, 2023, 04:11:09 pm by Siwastaja »
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #20 on: January 29, 2023, 04:55:45 pm »
Geez, if you want to control how interrupts pre-empt each other, just set the priorities accordingly (NVIC_SetPriority(IRQn, priority)) once. Enabling and disabling other interrupt sources from other ISRs is an ugly hack, there is absolutely no need for this kind of crap, it very likely has some nasty corner cases.

I know.  Giving them equal prio and equal sub-prio and they shouldn't interrupt each other.

But....  the thought I had, was, when there are more than one Rx SAI, I wanted to limit the granularity of the interrupts, not disable them entirely.  EG:

Silence Rx1
   Copy Rx1 buffer
Release Rx1
Silence Rx2
  Copy Rx2 buffer
Release Rx2
Silence Rx3
  Copy Rx3 buffer
Release Rx3

Might be over kill / unnecessary... or might make things wose!
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #21 on: January 30, 2023, 05:18:36 pm »
Good news, bad news.

Good news, it works and after some tweaking and testing, it's verified as working.

Bad news.  1 stereo input + 6 peak filters = 69% duty cycle.  Actually that makes it sound better than it is, as many things upset it skipping into half sample rating the output to 24k.  That's 48k 16bit too.  Dreams of 96k and 24bit are dashed entirely.

I can optimize, but I'm not sure I'm going to make the following filter code at least twice as fast.??

Code: [Select]
float PeakFilter_Update(PeakFilter_t *filt, float in) {
filt->x[2] = filt->x[1];
filt->x[1] = filt->x[0];
filt->x[0] = in;

filt->y[2] = filt->y[1];
filt->y[1] = filt->y[0];

filt->y[0] = (filt->a[0] * filt->x[0] + filt->a[1] * filt->x[1]
+ filt->a[2] * filt->x[2]
+ (filt->b[1] * filt->y[1] + filt->b[2] * filt->y[2])) * filt->b[0];
return (filt->y[0]);
}
« Last Edit: January 30, 2023, 05:33:11 pm by paulca »
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #22 on: January 30, 2023, 05:25:51 pm »
Caps showing the Tx and Rx interrupt duty cycles. 

First with 2 samples.
Second with 8 samples.

As an aside.  I had a rather unpleasant practical in Nyquist aliasing and artefacts when I tripple checked the 24kHz output sample rate, but.... playing a 20kHz sine wave through it.  My recommendation is... Do not try this at home.  It's unpleasant. I don't think the PCM5102's filterless output filter liked it.  As best I could determine on an FFT it was warbing between 3.8 and 4.2 kHz!
« Last Edit: January 30, 2023, 05:30:15 pm by paulca »
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #23 on: February 01, 2023, 09:45:25 am »
It would seem I under-estimated just how well the code optimizes.  -Ofast and -O3 both take it from a 67% ish DC to about 21%.  Impressive.

That changes the outlook towards the positive.  I've already spoken too soon on this bit of code about 3 times now, but ... a basic test, non conclusive, suggests I might squeeze a 5 Band EQ + a 3 Band EQ and dual outputs and at 96kHz.

Testing should reveal.  I expect it might be tight, but in theory it can be tight and still work if interrupts are under control.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 825
Re: This code is so broken it some how fixes itself.
« Reply #24 on: February 01, 2023, 02:43:45 pm »
The use of unions/structs can also be easier to use than casting, especially when done on a single type in various places. The casting gets eliminated and each use will be easier to understand (your 'casting' equivalent takes place only once when you create the union/struct). There are many ways one could go about this, but the following is one example-

Code: [Select]
static const uint32_t SAI_BUF_SIZE = 2;
typedef union {
    volatile uint32_t dr_u32;
    volatile int16_t eq_i16;
} sai_buffer_t;

sai_buffer_t rx1Buffer[SAI_BUF_SIZE];
sai_buffer_t tx1Buffer[SAI_BUF_SIZE];

//usage

//maybe always 2, don't know, but can use loop in any case as optimizer will figure it out and produce optimal code
for(int i = 0; i < SAI_BUF_SIZE; i++ ){
        tx1Buffer[i].dr_u32 = rx1Buffer[i].dr_u32;
}


EQ_Run( tx1Buffer );

void EQ_Run(sai_buffer_t* buf) {
for (int i = 0; i < SAI_BUF_SIZE; i++) {
double sample = buf[i].eq_i16;
                ...



Obviously lots of options and whether this one is better or not, I like to eliminate casting if there is a better option. In this case I would rather deal with the union than to have to (re)think about how the underlying data fits together when casting.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #25 on: February 01, 2023, 03:23:33 pm »
Obviously lots of options and whether this one is better or not, I like to eliminate casting if there is a better option. In this case I would rather deal with the union than to have to (re)think about how the underlying data fits together when casting.

True.  I was looking into unions for something else yesterday I can see how they can help with the constant duplicity of their types between signed 16 and unsigned 32.

It also depends on how the compiler optimises things.

That part of the code will change, as I have to try 24bit and you can't just "cast" right justified signed 24bit as easily as 16.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline cv007

  • Frequent Contributor
  • **
  • Posts: 825
Re: This code is so broken it some how fixes itself.
« Reply #26 on: February 01, 2023, 04:04:10 pm »
Quote
I have to try 24bit and you can't just "cast" right justified signed 24bit as easily as 16.

Code: [Select]
typedef union {
    volatile uint32_t dr_u32;
    volatile int16_t eq_i16;
    volatile int32_t eq_i24 : 24;
} sai_buffer_t;

When used, the compiler will sign extend the eq_i24 to an int32_t.

edit-
could also have a single member which would work the same for i16/i24, each sign extending to i32 when used-

Code: [Select]
static const unsigned SAI_DATA_BITS = 16;

union sai_buffer_t {
    volatile uint32_t dr_u32;
    volatile int32_t data : SAI_DATA_BITS;
};
« Last Edit: February 01, 2023, 05:55:42 pm by cv007 »
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6239
  • Country: fi
    • My home page and email address
Re: This code is so broken it some how fixes itself.
« Reply #27 on: February 01, 2023, 05:32:57 pm »
I habitually use
Code: [Select]
typedef struct {
    union {
        double          d;
        float           f;
        uint64_t        u64;
        int64_t         i64;
        uint32_t        u32[2];
        int32_t         i32[2];
        uint16_t        u16[4];
        int16_t         i16[4];
        uint8_t         u8[8];
        int8_t          i8[8];
        unsigned char   uc[8];
        signed char     sc[8];
        char            c[8];
    };
} word64;

typedef struct {
    union {
        float           f;
        uint32_t        u32;
        int32_t         i32;
        uint16_t        u16[2];
        int16_t         i16[2];
        uint8_t         u8[4];
        int8_t          i8[4];
        unsigned char   uc[4];
        signed char     sc[4];
        char            c[4];
    };
} word32;
to examine the various parts of data elements, and reinterpret the storage as another type.  For example, ((word32){ .f = value }).u32 returns the 32-bit storage representation of float value (and the code kinda-sorta assumes IEEE 754 Binary64 and Binary32 floating point formats).
When optimized, the machine code simplifies either to nothing (ARMv7-a with hardware floating point support) or a single register move (x86-64), too.

If you have a buffer with 24-bit signed integer or fixed-point samples, you can easily expand it to 32-bit signed integer or fixed point using
Code: [Select]
static inline void i24_to_i32_quad(int32_t *dst, const uint32_t src0, const uint32_t src1, const uint32_t src2)
{
    *(dst++) = ((int32_t)( src0 << 8 )) >> 8;
    *(dst++) = ((int32_t)( (src1 << 16) | ((src0 >> 24) << 8) )) >> 8;
    *(dst++) = ((int32_t)( ((src1 >> 16) << 8) | (src2 << 24) )) >> 8;
    *(dst++) = ((int32_t)( src2 )) >> 8;
}

void i24_to_i32(int32_t *buf, const size_t count)
{
    const size_t  quads = (count / 4) + !!(count & 3);

    const uint32_t *src = (uint32_t *)buf + 3*quads;
    int32_t *dst = buf + 4*quads;

    while (dst > buf) {
        dst -= 4;
        src -= 3;
        i24_to_i32_quad(dst, src[0], src[1], src[2]);
    }
}
but the buffer size must be padded with zeroes to a multiple of 16 bytes (or count a multiple of 4), and have room for 4 bytes per 24-bit value.  As it progresses backwards from end of array to beginning of array, the conversion is done in-place, in units of 12 bytes in, 16 bytes out.  This does do sign extension correctly, too.

Note that it does expect (negative integer) >> N to be arithmetic shift right; it is on GCC and Clang.

With GCC and Clang, one can also use
Code: [Select]
typedef struct {
    union {
       unsigned int  u24:24;
       int           i24:24;
       uint8_t       u8[3];
       int8_t        i8[3];
       char          c[3];
    } __attribute__((packed));
} __attribute__ ((packed)) word24;

typedef struct {
    union {
        uint32_t    u32[3];
        int32_t     i32[3];
        word24      w24[4];
        uint16_t    u16[6];
        int16_t     i16[6];
        uint8_t     u8[12];
        int8_t      i8[12];
        char        c[12];
    };
} word24x4;

void unpack_word24x4(int32_t out[4], const uint32_t in[3])
{
    const word24x4  temp = { .u32 = { in[0], in[1], in[2] } };
    out[0] = temp.w24[0].i24;
    out[1] = temp.w24[1].i24;
    out[2] = temp.w24[2].i24;
    out[3] = temp.w24[3].i24;
}
but this relies on the packed type attribute limiting the size of the word24 type to exactly three bytes, with byte alignment; this is true for GCC and Clang, but is not "standard" C.  (Plus right shift with signed integer types being arithmetic shift right.)

As shown, unpack_word24x4() takes 12 bytes (in three 32-bit unsigned integers), and expands them to four signed 32-bit integers (with sign extension as expected).

Note that i24_to_i32_quad() is faster than unpack_word24x4() on 32-bit ARMv7 at least, if one examines the code at Compiler Explorer.
That is,
Code: [Select]
void unpack_word24x4(int32_t out[4], const uint32_t in[3])
{
    out[0] = ((int32_t)( in[0] << 8 )) >> 8;
    out[1] = ((int32_t)( (in[1] << 16) | ((in[0] >> 24) << 8) )) >> 8;
    out[2] = ((int32_t)( ((in[1] >> 16) << 8) | (in[2] << 24) )) >> 8;
    out[3] = ((int32_t)( in[2] )) >> 8;
}
is faster but harder to maintain than the other version above, on ARMv7 at least; both do yield the exact same results.
« Last Edit: February 01, 2023, 05:37:07 pm by Nominal Animal »
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #28 on: February 01, 2023, 05:44:40 pm »
Most of that went over my head.  However, the basic filters / processing I am using are only there because they were the first ones I found in a palatable (theivable) form.

I really should try and get biquad equivalents and use the arm math library with DSP optimisations.  I might for example get a wider range of high order filters.

So I will return and review the data packing/unpacking at the same time.

For now I think I'm going to do a blunt ignorant shift and or like the HAL code does :)

On the optimisation...  my assumption was that the C compiler would include things like hardware DP FPU and DSP extensions if they existed with no optimisations.

However now I'm wondering if it actually disables all of that so you can step over some floating point calcs in the debugger, where as if they are offloaded to teh FPU that becomes difficult?

I mean how much "extra" stuff is turned on for -O3, -Ofast
« Last Edit: February 01, 2023, 05:46:51 pm by paulca »
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6239
  • Country: fi
    • My home page and email address
Re: This code is so broken it some how fixes itself.
« Reply #29 on: February 01, 2023, 08:45:52 pm »
Most of that went over my head.
Sorry!  :-[

To summarise, if you have arrays of 24-bit samples, three bytes per sample, I showed that it is best to handle them in groups of four samples (12 bytes), and very efficiently sign-expand to 32-bit.  The rest was details, and two different approaches you can choose from, although one has better performance.

On the optimisation...  my assumption was that the C compiler would include things like hardware DP FPU and DSP extensions if they existed with no optimisations.
STM32F4 series has an ARM Cortex-M4 core, and thus ARMv7E-M architecture.
ST's AN4841: Digital signal processing for STM32 microcontrollers using CMSIS describes its DSP features (but also includes some of the STM32F7 series).
Since GCC and Clang try to implement ARM C Language Extensions, ACLE (ihi0053) is also useful.

In general, C is not an easy language to automatically SIMD-vectorize.  For STM32F4, there are instructions that add, subtract, and/or multiply using pairs of signed or unsigned 16-bit values in each register, or quartets of 8-bit signed or unsigned values in each registers.

As of right now, you do need to use the ACLE macros to SIMD-vectorize your C code.

For example, let's say you have an array of 16-bit signed integer samples data, another array of 16-bit signed integer coefficients coefficient, both arrays having count elements (even; multiple of two count), and both arrays being 32-bit aligned (__attribute__((align (4)))), you can obtain the 64-bit sum (which cannot overflow since each product is between -0x3FFF8000 and +0x40000000, i.e. 31 bits):
Code: [Select]
#include <arm_acle.h>

typedef struct {
    union {
        uint32_t   u32;
        int32_t    i32;
        uint16_t   u16[2];
        int16_t    i16[2];
        uint16x2_t u16x2;
        int16x2_t  i16x2;
        uint8_t     u8[4];
        int8_t      i8[4];
        uint8x4_t  u8x4;
        int8x4_t   i8x4;
        char       c[4];
    };
} word32;

int64_t sum_i16xi16(const int16_t *data, const int16_t *coeff, const uint32_t count)
{
    const word32 *dz = (const word32 *)data + (count / 2);
    const word32 *d = (const word32 *)data;
    const word32 *c = (const word32 *)coeff;
    int64_t  result = 0;

    while (d < dz)
        result = __smlald((*(d++)).i16x2, (*(c++)).i16x2, result);

    return result;
}
using -O2 -march=armv7e-m+fp -mtune=cortex-m4 -mthumb (with gcc; use -Os instead of -O2 with clang).

The inner loop will compile to
    .L3:
        ldr     r0, [r3], #4        ; 2 cycles
        ldr     r2, [r1], #4        ; 2 cycles
        cmp     ip, r3, #1          ; 1 cycle
        smlald  r4, r5, r2, r0      ; 1 cycle
        bhi     .L3                 ; 2-5 cycles when taken
which by my count should yield 8 cycles, or 4 cycles per sample, or equivalently 0.25 samples per cycle.
Furthermore, the result is trivial to shift and then clamp to the proper range, so it is just a few additional cycles per buffer to adjust for the fractional bits in samples and/or coefficients.  Me Like Dis.

If count is odd, the last sample and coefficient will be ignored.  I suggest you ensure your array sizes are even, and set the final element of both arrays to zero, so you can round count up to the next even number without affecting the result.

The inner loop is equivalent to
Code: [Select]
    for (uint32_t i = 0; i < count/2; i++)
        result = __smlald(d[i].i16x2, c[i].i16x2, result);
but is written in a form that both gcc and clang can optimize to the above code or equivalent.
Each loop iteration computes two 16-bit products, summing them to result.
« Last Edit: February 01, 2023, 08:47:42 pm by Nominal Animal »
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #30 on: February 01, 2023, 09:24:12 pm »
It's an H7, but I get the idea :)
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #31 on: February 01, 2023, 09:26:22 pm »
While I have you :)  (sorry).

Is there a quick win to gain access to the DTCM SRAM?  Googling finds me long linker script modifications.

Can't I just ignorantly use hard coded offsets by casting an integer like 0x200000 to a pointer?  I'm only declaring 4 tiny arrays.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline thm_w

  • Super Contributor
  • ***
  • Posts: 6349
  • Country: ca
  • Non-expert
Re: This code is so broken it some how fixes itself.
« Reply #32 on: February 01, 2023, 10:21:59 pm »
Guess its harder than ITCM for some reason?

Quote
@note Some code parts can be executed in the ITCM-RAM (64 KB) which decrease critical task execution time, compared to code execution from Flash memory. This feature can be activated using '#pragma location = ".itcmram"' to be placed above function declaration, or using the toolchain GUI (file options) to execute a whole source file in the ITCM-RAM.

@note The application needs to ensure that the SysTick time base is always set to
      1 millisecond to have correct operation.

@Note For the Cortex-M7, if the application is using the DTCM/ITCM memories (@0x20000000/ 0x0000000: not cacheable and only accessible
      by the Cortex-M7 and the  MDMA), no need for cache maintenance when the Cortex M7 and the MDMA access these RAMs.
      If the application needs to use DMA(or other masters) based access or requires more RAM, then  the user has to:
              - Use a non TCM SRAM. (example : D1 AXI-SRAM @ 0x24000000)
              - Add a cache maintenance mechanism to ensure the cache coherence between CPU and other masters(DMAs,DMA2D,LTDC,MDMA).
              - The addresses and the size of cacheable buffers (shared between CPU and other masters)
                must be properly defined to be aligned to L1-CACHE line size (32 bytes).

https://github.com/STMicroelectronics/STM32CubeH7/blob/master/Projects/STM32H747I-EVAL/Templates_LL/readme.txt
https://community.st.com/s/question/0D53W00001sKgU0SAK/stm32cubeide-doesn-not-support-declaring-variable-in-dtcm-on-stm32h7-directly
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6239
  • Country: fi
    • My home page and email address
Re: This code is so broken it some how fixes itself.
« Reply #33 on: February 01, 2023, 11:18:13 pm »
It's an H7, but I get the idea :)
H7 is Cortex-M7, so also ARMv7E-M, but with newer FP with double precision support (+fp.dp), so you'll want to use
    gcc -O2 -march=armv7e-m+fp.dp -mtune=cortex-m7 -mthumb ...
or
    clang -Os -target=arm-arm-none-eabi -mcpu=cortex-m7+fp -mthumb ...
to compile the code.  I use these for i.MX RT1062 (Teensy 4.x), too: it has a very similar ARM Cortex-M7 core (but NXP, not ST).

The same documentation applies.
The cycle counts are likely completely off, though, since I can find them only for Cortex-M0 and Cortex-M4, not Cortex-M7.

Is there a quick win to gain access to the DTCM SRAM?
You can use a fixed pointer to access DTCM SRAM.  Thing is, how will you ensure the addresses you use are not used by the compiler and/or linker already?

That is why modifying your linker script, typically by declaring a section that will be placed in the correct addresses, is used.  If you ensure section "dtcm" is placed in the correct region, then
    static type arrayname[count] __attribute__((section "dtcm"));
puts the array at the correct section, so it will end up in the correct memory addresses, and the linker will decide its exact address.  Without modifications or support from the startup (startup.s, I believe), it will be uninitialized (have random contents) after bootup.

If you can show the linker script you already use (look for "Linker script for STM32H7 series", file name probably ends with .ld), and it does contain the "This software component is licensed by ST under BSD 3-Clause license", then you are allowed to publish it here and I can show how to modify it to add such a section.  Most likely, it is just one added line.
 
The following users thanked this post: paulca

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #34 on: February 14, 2023, 06:25:46 pm »
Looked back into this today.

It would seem the linker script is already setup to use the DTC RAM.  However, most things get allocated in RAM_D1 'above' it.

Linker script attached.  I believe it's BSD license.

Any quick wins?
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 386
  • Country: be
Re: This code is so broken it some how fixes itself.
« Reply #35 on: February 14, 2023, 06:58:05 pm »
However, most things get allocated in RAM_D1 'above' it.

What are these things? What the map file shows? Any insights from arm-none-eabi-objdump -h yourfile.elf ? Are some symbols matching .data.* pattern located in RAM_D1 and some in DTCMRAM?

BTW, your linker script does not have section RAM_D1. Are you sure about it?
 

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #36 on: February 14, 2023, 07:37:34 pm »
However, most things get allocated in RAM_D1 'above' it.

What are these things? What the map file shows? Any insights from arm-none-eabi-objdump -h yourfile.elf ? Are some symbols matching .data.* pattern located in RAM_D1 and some in DTCMRAM?

BTW, your linker script does not have section RAM_D1. Are you sure about it?

Sorry, there are two linkers.  The other one is attached,.

The "other" things are general variables, buffers, (.bss and .data I will assume)

The idea is to create an labelled section like .dtcm and allocate some of the buffers in there.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 386
  • Country: be
Re: This code is so broken it some how fixes itself.
« Reply #37 on: February 14, 2023, 08:47:38 pm »
The idea is to create an labelled section like .dtcm and allocate some of the buffers in there.

Ok, then create it!

Add the following to the linker script:

Code: [Select]
.dtcm_section {
    KEEP(*dtcm*)
} > DTCMRAM

and the following, as suggested by Nominal Animal, to your source code:

Code: [Select]
static type arrayname[count] __attribute__((section "dtcm"));
 
The following users thanked this post: paulca

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4032
  • Country: gb
Re: This code is so broken it some how fixes itself.
« Reply #38 on: February 21, 2023, 04:50:30 pm »
Thanks!  Got it working.

Adding /modified a few bits though.

Code: [Select]
__attribute__((section(".dtcm_section"))) static uint32_t rx1Buffer[8];
 __attribute__((section(".dtcm_section"))) static uint32_t rx2Buffer[8];
 __attribute__((section(".dtcm_section"))) static uint32_t tx1Buffer[8] ;
 __attribute__((section(".dtcm_section"))) static uint32_t tx2Buffer[8];


Code: [Select]
  .dtcm_section :
  {
      . = ALIGN(4);
      KEEP(*dtcm*)
      . = ALIGN(4);
  } > DTCMRAM

I might test how much impact it has with the scope next time I'm checking the performance/timing.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 
The following users thanked this post: eutectique


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf