Electronics > Microcontrollers

Parsing UART packets / state machines / packed structs

<< < (10/13) > >>

Nominal Animal:

--- Quote from: Siwastaja on December 07, 2022, 08:32:20 am ---
--- Quote from: Nominal Animal on December 07, 2022, 04:38:23 am ---In other words, the one wrong use case of packed structs is to avoid using accessor functions.
--- End quote ---
One could also claim that having one accessor function per member for validation is not always the best strategy. Sometimes validity is in combination of variables, for example: assert(a<10 && b<10 && (a+b)<15).
--- End quote ---
I agree.  It is equally valid to just verify all the fields at some point.

In my own code, when the structures are accessed randomly –– so, a tree or other structured data thingy –– I sometimes unpack and validate the structures at reception/reading, because that tends to have the best cache footprint and I/O thoroughput.  Idea being, since we touch the data anyway, we could make it cheaper to access.

Similarly, when receiving data via an UART, I prefer methods that update checksums and calculate command hashes on the fly, online.

--- Quote from: Siwastaja on December 07, 2022, 08:32:20 am ---In fact, I don't think there is any fundamental difference in verifying the input values in a "normal" function in a program, or input values coming in a struct that was received through UART.
--- End quote ---

So much of the code I write is centered around working with structured data, that I find the idea that verifying input could be considered somehow special or different, completely bonkers.

I like to think of all data suspicious, because that way any oddity or issue in processing that cannot be compensated becomes an issue for the user to deal with.  I absolutely hate programs that silently corrupt data, because "those errors occur so rarely it is not worth it to check for them", because with my luck, it is my most precious data that gets silently corrupted.  No, I want programs to do their best, but also tell me if something odd or unexpected happened.

In an embedded appliance, especially microcontrollers, it can be a difficult user interface question to decide how to handle communications errors.  But, if you start with "I'll worry about that later", then you basically decide to handle them randomly, because that kind of functionality is a design issue, and not something that can be bolted on top later on.  It has never been succesfully fixed later on, only semi-patched with ugly kludges.
Security is like that: it is either designed in, or it does not exist.

Key advice.
Validate your inputs.

A gigantic number of bugs, crashes (that can lead to death) and security exploits have come, come and will come from not validating inputs.

Even a very large number of memory-related bugs are linked to not properly validated inputs. That's the case for - I'd dare say - most buffer overflows.

Yet it looks like most quality-oriented discussions and programming language features these days obsess over memory access and management, and almost nothing about input validation (and more generally invariant checks, and "contracts".) That's probably because we tend to focus on adding safety nets rather than on prevention, and that doesn't just apply to software.


--- Quote ---Validate your inputs.
--- End quote ---

Yeah.  Once upon a time, I wrote one of the first implementations of IP option support.  It was pretty neat; you could put "record route" options in ping packets and get interesting data, and etc.

It was "done" just before an "InterOp" trade show.  So we took it to the show and loosed it upon the "show network" (you could get away with things like that in those days.)  Broadcast pings with options.  AFAIK, everything we sent out was correct.  But other systems all over crashed and burned.   I hadn't done much to validate things coming in, so some of the other system tried to respond, and sent out ... really horribly formatted attempts.  We crashed too!

It was a very educational experience for a lot of people/companies!  Me included, I think.

Nominal Animal:
All that I describe below has quite naturally lead me to where I am now, and to say:

Validate your inputs.
Make sure you understand the limitations of the processing or calculations.
Present the results in an effective, useful form.
Let the user know if anything unexpected happens.
Document your intent and purpose for your code in detail.

I started doing web backend stuff very early in the era, before the turn of the century.
I had web clients that did not care about accept-charset, and simply sent data using whatever encoding they liked.
I had to use hidden input fields to detect the character set (codepage 437, codepage 1252, ISO 8859-1, ISO 8859-15, MacRoman, early UTF-8).

Since the first Microsoft web servers were vulnerable to path exploits –– ../../../../../../system32/cmd.exe "escaping" the web root, and referring to the command-line interpreter ––, I saw lots of these in my server logs.  I also found out that many users had difficulties remembering the difference between uppercase and lowercase, and complaining about case sensitivity in URL path components.
So, early on, I did my own URL path and query string verification too: removed any illegal sequences, replaced common non-ASCII characters with their closest ASCII equivalents, lowercased it all, and removed all but alphanumerics (including dashes not at the beginning or end) between slashes.
It, too, had a twofold reason: one was security, the other helped users if they accidentally used say /pähkähullu/ instead of /pahkahullu/.

Not verifying the data simply would have never worked at all.

Later on, when I got back to Uni and started doing computational physics (ie. computer simulations, mostly molecular dynamics simulations), I soon learned the most important rule in simulation at the visceral level: you first check if it makes any sense.  If the total energy in an isolated system changes, your simulation is faulty, and so on.

My favourite early failure in simulation was with Morse copper (Morse potential fitted roughly to copper), where I investigated the limit at which a system composed of perfect cubic lattice, one half of which was at absolute zero, and the other half had very high random (Boltzmann-distributed) kinetic vectors, would end up melted; with the system being periodic and infinite.  My error was to join the two parts so that atoms at the interface were much closer than normal in the lattice.  The effects I observed from this specific type of "superheating" were really odd.  I saw everything from acoustic waves with wavelength much longer than my system, to a molten droplet encased in solid Morse copper, apparently shielded by the Leidenfrost effect.

That taught me that even tiny errors in the inputs could make the outputs completely unphysical and irrelevant (albeit interesting).

Around the same time, I also started working on sonification and visualization, the former helping me focus on what is important and useful (in that case, when monitoring a running simulation), and the latter on how to control the information conveyed, to convey the most important data to the viewer in an intuitive manner.  Showing atoms as gleaming marbles is pretty, but the intuitive associations and extrapolations such images generate are completely wrong.
For example, "atoms colliding" sounds like a trivial, well defined phenomenon.  Except it isn't, when you actually investigate say ion implantation in solid matter, and try to find when "atoms collide".  You can say that the ion collides, yes, but as to the solid matter, the atoms there are in continuous interaction and movement (due to finite temperature; even at room temperature the movement is significant), so the macro-scale concept of "collision" has no useful meaning or relevance: they all "collide" with each other, all the time, exchanging energy.

That taught me intuition is useful, but easily mistaken; and most importantly, aimable/directable.  So, I learned to be suspicious of my own assumptions and intuition, and shifted my learning more towards understanding (in an intuitive manner) the underlying concepts correctly, rather than just applying existing knowledge to solve a specific problem.

(I've left out my endeavours related to the demoscene, art, multimedia, et cetera, so the above is a simplified storyline, picking only the major plot-relevant points here.  I do consider myself lucky for having such a winding path, because every step in a different domain has taught me something useful and applicable here.)

All this has taught me to consider the inputs with suspicion; consider the limits and errors inherent in the processing; and how to present the results in an effective and useful manner.  And when privileged information is at stake, the privileges needed at each step, and how to stop information leaking (including backwards in the above chain).  It all starts at the conceptual level, at the core design level.

Of course, I'm no "ideas man, leaving the details for the lesser mortals to implement".  At the core design level, I constantly bump into details and problems I haven't worked on before, so I create practical test cases –– unit tests, in the form of complete executable programs or scripts; sometimes brute-force searching for the 'optimal' solution in the solution phase space –– to find out.  I nowadays write these for even ideas I have, just to work out if that idea would work in practice.  I quickly learned to file them away well: I use a separate directory for each test case, with a README, sources, and an executable, that when run without any command-line parameters tells exactly what it does.  This way I can do a quick find or grep to find an old test case based on keywords I may recall, or at least limit the subset to check (by running the test cases) to a dozen or less, and a quick review of the README followed by the source code tells me all pertinent information.

And that is also the source of why I always remind new programmers to write comments that describe their intent, and not what the code does.  We/I can always read the code to see what it does, but lacking crosstime telepathy, I cannot divine whether that was the intent of the programmer or not.
I myself had to learn to do this later on, after already being adept (paid to write code) in several programming languages, so it is very hard for me: even now, I still slip into the useless type of comments, unless I'm careful.


--- Quote from: Nominal Animal on December 08, 2022, 06:49:04 am ---And that is also the source of why I always remind new programmers to write comments that describe their intent, and not what the code does.  We/I can always read the code to see what it does, but lacking crosstime telepathy, I cannot divine whether that was the intent of the programmer or not.

--- End quote ---
I agree here. I deeply loath 'documentation' made using Doxygen because the output just lacks the one important thing: why is the code the way it is and what are the relations of functions? On top of that, any decent IDE can produce an overview of functions realtime; I don't need Doxycrap for that.


[0] Message Index

[#] Next page

[*] Previous page

There was an error while thanking
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod