Author Topic: Your experience in embedded software engineering hell  (Read 2470 times)

0 Members and 1 Guest are viewing this topic.

Offline crayonTopic starter

  • Newbie
  • Posts: 7
  • Country: mx
Your experience in embedded software engineering hell
« on: March 27, 2020, 07:47:50 pm »
Hi!

I'm interested in knowing some of your worst experiences working in embedded software engineering and what lessons you learned personally that made you better in sorting those kind of problems.


Regards.
 
The following users thanked this post: boB

Offline dmills

  • Super Contributor
  • ***
  • Posts: 2093
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #1 on: April 01, 2020, 08:58:55 am »
I am currently chasing the 'fun' of a random seg fault that occurs about once a day.....
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23021
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #2 on: April 01, 2020, 09:17:43 am »
Tow pet hates at the moment:

1. STM32CubeMX keeps deleting bits of code it says it won't when I update pin assignments >:(
2. Deciphering timer functionality on any MCU.
 

Offline NivagSwerdna

  • Super Contributor
  • ***
  • Posts: 2495
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #3 on: April 01, 2020, 09:22:48 am »
Spent two weeks once looking for a corruption in the alternate carry flag (Z80A) whilst writing some of MSX-DOS... turns out I had a typo a "0" which should have been an "O"

Joy  :)
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4228
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Re: Your experience in embedded software engineering hell
« Reply #4 on: April 01, 2020, 09:42:00 am »
https://www.eevblog.com/forum/microcontrollers/most-annoying-bug/

I'd also like to nominate the hardware bug in the STM32F7 which means speculative QSPI memory fetches can occur on exit from interrupts, even when QSPI is configured in a mode which means transfers should only be initiated manually.

Offline cdwijs

  • Regular Contributor
  • *
  • Posts: 57
Re: Your experience in embedded software engineering hell
« Reply #5 on: May 20, 2020, 07:45:22 pm »
I'm now writing medical software. The fun part is to program the prototype code that does *almost* what i need it to do.
The not-so-fun part is to write all the state diagrams, the unit tests, and make all the code pass all the tests. This is all hand written, and therefore prone to human error.

See this question for more detail:
https://www.eevblog.com/forum/programming/syncing-c-state-machine-with-doxygen-and-graphwiz-diagrams/

Cheers,
Cedric
 

Offline Chris42

  • Contributor
  • Posts: 32
  • Country: pl
Re: Your experience in embedded software engineering hell
« Reply #6 on: July 08, 2020, 10:27:06 pm »
The worst hell that I experienced was when I was building portable weather monitoring station. It was a little hobby project and the program was written using arduino framework.

The hell started when I noticed that MCU was randomly hanging up. That was happening once every 1-2 days. After many days of debugging I finally traced the bug back to one of the libraries that I used. I don't remember the exact details but for some reason it was overriding one of my interrupt settings which was causing interrupt routine for the wind sensor to keep executing even when it was not supposed to. The interrupt would just keep getting triggered so the whole thing would  get stuck servicing that event and not doing anything else.

That was an absolute nightmare to debug and needless to say that was the last time I used arduino framework for any project.
 

Offline Whales

  • Super Contributor
  • ***
  • Posts: 1899
  • Country: au
    • Halestrom
Re: Your experience in embedded software engineering hell
« Reply #7 on: July 08, 2020, 11:36:11 pm »
My story is relatively minor, not hell but still frustrating.  On an 8051 STC (8-bit) micro.

Why does running this code, outside of the interrupt, make the interrupt stop firing for X time? Stages I went through:

  • Interrupt disable: none being used in the software.
  • Some weird contention over volatile vars shared between the interrupt and normal code?  Shouldn't be, there are no locking instructions in this ISA and nothing complicated is being done.
  • Fetching from data/flash memory leading to stalls: my most likely culprit for a while, as replacing the fetches with dummy data seemed to fix the problem.  Nothing in the datasheet suggested that arbitrary data/flash reads would do something like this.

It took me a week but I eventually discovered what the problem actually was.  A solder bridge behind two pins on my SMD DAC.  Whenever certain data patterns were read and outputted one of the data lines would get stuck/shorted to another line (making it look like my interrupt had stopped firing).  I was lucky that it eventually got into a state where the output on a certain pin became 3-level (ie two resistively summed digital output pins), this made it very obvious that I needed to look for a hardware bug, but before that I had no clues. 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5319
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #8 on: July 09, 2020, 12:40:58 am »
Microchip's PIC32MZ EC series, that originally promised simultaneous sampling circa 28Msa/s ADCs... it took them two years to accept the ADC was barely capable of 100ksa/s. It was an expensive mistake for a project I'd invested a lot of time on. The delays meant I shelved the project semi permanently, but the lack of acceptance from Microchip that it was crap kept me hanging on, gullible fool that I was.

On the flip side it did mean I became reasonably adept on a number of vendors' Cortex M offerings though.
 

Offline NivagSwerdna

  • Super Contributor
  • ***
  • Posts: 2495
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #9 on: July 09, 2020, 09:43:03 am »
Actually generally dodging Microchip Silicon Errata and also Compiler Errata should probably be added to the list.  :)
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: Your experience in embedded software engineering hell
« Reply #10 on: July 09, 2020, 10:34:47 am »
What hell?

Dealing quite a lot with various uC, professionally (@work) and hobby (@home), there's one common thing: They all have bugs, quirks and errata, the more complex the more issues to deal with, sometimes it's just a bug in a compiler you've discovered, sometimes it's an interesting HW behaviour and the manufacturer publishes the errata one year after you discovered and fixed that for your project, ...

Some of my experience and habits I adopted (quite outdated now, and quite a non-representative random choice taken): https://wunderkis.de/thestm32files.html
Safety devices hinder evolution
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4953
  • Country: si
Re: Your experience in embedded software engineering hell
« Reply #11 on: July 09, 2020, 12:30:28 pm »
A simple one is "Don't assume stack allocated structures start off as zeroes"

Ran into this on a STM32 too due to the HAL library init functions constantly wanting big structures of parameters passed into them. This often makes you temporarily allocate a local struct to populate and feed into the function. But on a big struct you might not set all of the fields. It works most of the time, but sooner or later the stack will contain some non zero character in one of the fields you didn't set, causing the HAL driver to do something stupid for apparently no reason.

This also leads into not trusting STs HAL drivers, they do very stupid stuff sometimes. Use it to get off the ground and then write your own optimized driver.
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: Your experience in embedded software engineering hell
« Reply #12 on: July 09, 2020, 12:38:42 pm »
A simple one is "Don't assume stack allocated structures start off as zeroes"

What's your background? In embedded one never can assume anything is zeroed before usage unless you've done that yourself. Okay, most boiler plate startup code linked in by default does initialize your data and bss, but one can never be shure without having checked that.
Safety devices hinder evolution
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23021
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #13 on: July 09, 2020, 12:52:07 pm »
Yeah my go to is still AVR because of that.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4953
  • Country: si
Re: Your experience in embedded software engineering hell
« Reply #14 on: July 09, 2020, 12:54:23 pm »
A simple one is "Don't assume stack allocated structures start off as zeroes"

What's your background? In embedded one never can assume anything is zeroed before usage unless you've done that yourself. Okay, most boiler plate startup code linked in by default does initialize your data and bss, but one can never be shure without having checked that.

The initialization code that STM CubeMX generates defines a struct at the start, fills the fields one by one and then passes it to the init function. It never clears the struct before use and sometimes leaves an occasional field untouched before passing it.

At one point i copy pasted such code to do a quick thing, it worked for what i needed, then later on mysteriously broke. And of course this was the last place i would think to look so it was a wild goose chase before it was fixed for good.
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: Your experience in embedded software engineering hell
« Reply #15 on: July 09, 2020, 01:10:41 pm »
A simple one is "Don't assume stack allocated structures start off as zeroes"

What's your background? In embedded one never can assume anything is zeroed before usage unless you've done that yourself. Okay, most boiler plate startup code linked in by default does initialize your data and bss, but one can never be shure without having checked that.

The initialization code that STM CubeMX generates defines a struct at the start, fills the fields one by one and then passes it to the init function. It never clears the struct before use and sometimes leaves an occasional field untouched before passing it.

At one point i copy pasted such code to do a quick thing, it worked for what i needed, then later on mysteriously broke. And of course this was the last place i would think to look so it was a wild goose chase before it was fixed for good.

Yes, noticed that behaviour too. Luckily didn't get trapped yet or just didn't notice - don't know. Anyway, thanks, one more thing to consider when chasing mystery bugs.

Another anecdotal one:
Replaced (by soldering) the STM32F767 on a nucleo board by the newer silicon revision due to that ethernet MAC related bug with no workaround. Guess what, after replacing the chip, ethernet didn't work at all. After one and a half day of bug hunting I noticed I accidentally removed one of the jumpers from the board while doing the soldering job. Ethernet works fine with the jumper plugged into place again. And best: Did this (unnoticed removal of that jumper) to a second board before I found the root cause ;)
Safety devices hinder evolution
 

Offline Howardlong

  • Super Contributor
  • ***
  • Posts: 5319
  • Country: gb
Re: Your experience in embedded software engineering hell
« Reply #16 on: July 09, 2020, 02:18:28 pm »
I've found that despite being like a dog with a bone, one of the best ways to fix a problem is to sleep on it.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4953
  • Country: si
Re: Your experience in embedded software engineering hell
« Reply #17 on: July 09, 2020, 02:58:59 pm »
Another anecdotal one:
Replaced (by soldering) the STM32F767 on a nucleo board by the newer silicon revision due to that ethernet MAC related bug with no workaround. Guess what, after replacing the chip, ethernet didn't work at all. After one and a half day of bug hunting I noticed I accidentally removed one of the jumpers from the board while doing the soldering job. Ethernet works fine with the jumper plugged into place again. And best: Did this (unnoticed removal of that jumper) to a second board before I found the root cause ;)

I was using Ethernet on a STM32H7 and after a lot of hair pulling discovered that the supplied HAL drivers has the transmitting functionality completely broken. It would receive Ethernet frames perfectly fine while transmitting would return success while nothing comes out on the bus. Eventually found out that a carefully placed delay inside one of the HAL Ethernet functions fixes it. No idea why, it happens with all caching disabled, it happens at other clock speeds, just simply seams like you need to wait a tiny bit before telling the DMA to start shoveling data into it.

Perhaps this is now in the Errata, have not looked, but not that it matters because the new revision of that STM32H7 chip includes changes like changing ADC clock dividers around for no reason and juggling USB peripheral registers around. Yet still being sold under the same part number even tho any old firmware using the ADC or USB will not run correctly on the new revision. Oh and it also changed the CPU clock speed from 400 to 480MHz in that revision.
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: Your experience in embedded software engineering hell
« Reply #18 on: July 09, 2020, 08:04:55 pm »
I was using Ethernet on a STM32H7 and after a lot of hair pulling discovered that the supplied HAL drivers has the transmitting functionality completely broken.

Yes, I'm pretty annoyed by the STM32H7 ethernet MAC. Just because it's a completely new and other one than the rest of the STM32 family. I'd need a new driver for my framework based on ChibiOS and LWIP, none available yet but can't be bothered to roll my own or port the HAL supplied driver to ChibiOS. Somewhere else I read about the supplied HAL is in a state in between totally broken and having subtle bug showing up under certain circumstances only (of which not fully using the available performance is of lest concern to me as long as it works at all). Scary things like dependencies from the used memory configuration were meantioned. Reading info like that didn't motivate me either, so the project is put aside for now.
« Last Edit: July 09, 2020, 08:07:46 pm by capt bullshot »
Safety devices hinder evolution
 

Offline RJSV

  • Super Contributor
  • ***
  • Posts: 2121
  • Country: us
Re: Your experience in embedded software engineering hell
« Reply #19 on: August 11, 2020, 05:52:49 am »
My story is much more simple, to recall and tell... However, the solution ? Couldn't even say if there likely, is a solution.

   'Farnsworth' owned, like 40 % of the little embedded products unit. ... Mr. Jay 'Farnsworth' and he dabbled in coding the 8 /16 bit. In fact, Jay headed that team.
   Everytime somebody asked: ' Why is it (the embedded code) doing it THAT WAY... ?'

  Answer was always, deadpan delivered:
"Jay" did it that way...".  End of inquirey.
  Got old, I had to listen to that one liner, about 55 times. OR, as JAY would write; '0047h times'
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf