Electronics > Microcontrollers
Parsing UART packets / state machines / packed structs
Siwastaja:
--- Quote from: NorthGuy on December 06, 2022, 07:25:01 pm ---You usually don't need to use packed structs. Most of the protocols will use struct fields which are naturally aligned. There may be padding added to ensure this, which you explicitly add to your struct where needed. Thus, once you ensure that the beginning of the struct is aligned, the rest of the fields get aligned automatically.
--- End quote ---
Well designed protocols do this for performance. You need to remember the ages old good trick, which is still useful: when the order of fields does not matter for you (and usually it does not), just order the fields from biggest to smallest (e.g. floats go first, chars last), for least amount of padding.
Then, for protocols, you can add padding yourself. The advantage over compiler-generated padding is also that these padding fields become natural "reserved" fields which allows you to extend the protocol later.
But I'm sure this is somehow wrong, too, definitely not professional and robust :-//
Sherlock Holmes:
--- Quote from: Siwastaja on December 06, 2022, 07:35:57 pm ---
--- Quote from: NorthGuy on December 06, 2022, 07:25:01 pm ---You usually don't need to use packed structs. Most of the protocols will use struct fields which are naturally aligned. There may be padding added to ensure this, which you explicitly add to your struct where needed. Thus, once you ensure that the beginning of the struct is aligned, the rest of the fields get aligned automatically.
--- End quote ---
Well designed protocols do this for performance. You need to remember the ages old good trick, which is still useful: when the order of fields does not matter for you (and usually it does not), just order the fields from biggest to smallest (e.g. floats go first, chars last), for least amount of padding.
Then, for protocols, you can add padding yourself. The advantage over compiler-generated padding is also that these padding fields become natural "reserved" fields which allows you to extend the protocol later.
But I'm sure this is somehow wrong, too, definitely not professional and robust :-//
--- End quote ---
Perhaps such field ordering could be done with greater flexibility, I agree that the "largest" first is a sensible strategy sometimes but .Net does that automatically, one has to take special steps to order them oneself.
SiliconWizard:
--- Quote from: Siwastaja on December 06, 2022, 08:48:14 am ---
--- Quote from: SiliconWizard on December 06, 2022, 05:26:32 am ---Any platform I've ever used in C from 8-bit to 64-bit had compilers allowing struct packing. The only problem to be aware of is again alignment, which may be a problem on some platforms for accessing certain ill-aligned struct members directly. That's the case in particular for the 16-bit PIC MCUs for accessing 16-bit or 32-bit struct members. They need to be aligned.
--- End quote ---
But, once again, with the packed attribute, compiler uses correct instructions to access unaligned 16- or 32-bit member; e.g. 8-bit loads and shifting + orring to combine the result into a register. Just like what you do manually in C by writing val = (uint32_t)in[1]<<24 | ... and so on. This is obviously slower than aligned loads, and the reason not to pack structs "just because".
--- End quote ---
I have certainly run into compilers (xc16 here, don't remember the version, but it's based on GCC although the port is handled by Microchip and never made it to mainline GCC, so it's possibly very not on par with what GCC is supposed to be doing) that wouldn't automatically handle unaligned access in the way you describe, and accessing unaligned struct members would crash the CPU.
Given that the "packed" attribute AFAIK is compiler-specific and not standard C, you're of course not guaranteed any of this anyway. Let me know if this has made it to the standard though. Maybe C23, as I think they finally introduced attributes?
That said, I checked on godbolt using Clang and GCC and various targets and your statement seems to hold, so I'm gonna assume that it should be safe if you use a reasonably recent version with a reasonably well supported target. I'm again not too sure about 16-bit PICs here, but xc16 is kind of a nasty beast.
Nominal Animal:
I use both (packed) structures and (character) buffer accessors.
The structures work really well with fixed structures, like say PNG files. When the actual layout depends on the content, the buffer accessors work much better than trying to use structures. IPv4 datagrams are on the borderline, because the first 20 bytes of the header are fixed, and only options may vary; so I might use a packed structure (as all C compilers I use do support those) for the first 20 bytes, but accessors for the options.
I do not understand the argument here, because in my opinion, the two do not overlap that much.
I use packed structures with fixed data structures when I have ensured they work and documented the need for them to work, and buffer accessors for varying data structures and when packed structures may not be available.
Even when using structures with associated data, I tend to use the struct fields through accessors when the field deals with things like data length to ensure memory safety: that through normal use of the interface, one will not accidentally access past the known/received data, because the content happens to be buggy or malicious.
This checking is important. The recent CVE-2022-41674 is a good example. Code did not check that the MaxBSSID field in a WiFi packet had a sensible value (between 1 and 8, inclusive), so a buffer overrun ensued. Oopsie.
Thus, even when using a packed struct, you should never trust its contents to be safe; and to make sure, we use helper accessors to do the checking for us. (Well, not for pure data fields, those that just state some fact, but do not affect the layout or access to other data.)
In other words, the one wrong use case of packed structs is to avoid using accessor functions. The two are not mutually exclusive.
For worry-free binary data stream protocols, I use prototype binary representations of known values as both the identifier as well as byte order indicator, so that the data generator does not need to care about the byte order at all. On the host computer, the datastream can be trivially 'normalized' to local-endian, or kept as-is. To detect the byte order, one simply tests the byte order permutations (including 'none'/'as-is') that yield a recognized value. For example, for IEEE 754 Binary32 'float' type, one can use 0x3F2C1880 = 0.67224884f.
For structures, on a computer, the "packed" datastream can easily be expanded into a stream/array of native structures. It is an extra copy, yes, but it turns out that due to the cache behaviour of current machines, doing it as soon as the data is read or received, is actually very, very cheap in terms of CPU or wall time used, because on typical computers we tend to be memory bandwidth limited, and not at all CPU limited. Even in Python (when interfacing to microcontrollers) this makes sense –– for example, one should definitely convert an incoming bytes stream into NumPy.arrays via struct.unpack() for manipulation. (On the prototype part, just try the different byte order prefixes on the struct formats, until the values match the expected, and then instantiate that using struct.Struct for mass unpacking.)
I really like the prototype approach, because not only does a single value (of each type used) ensure byte order, but they also act like protocol and format identifiers, 'magic pattern', in the Unix file command magic sense.
Siwastaja:
--- Quote from: Nominal Animal on December 07, 2022, 04:38:23 am ---In other words, the one wrong use case of packed structs is to avoid using accessor functions.
--- End quote ---
One could also claim that having one accessor function per member for validation is not always the best strategy. Sometimes validity is in combination of variables, for example: assert(a<10 && b<10 && (a+b)<15).
I like to do the sanity check of the len field in the protocol level handler (which is agnostic to the payloads), but when len and CRC check out and thus non-corrupted data is received (leaving logical mistakes / bugs at the sender), I use a handler function which gets the whole payload type struct, and verifies the sanity of data fields "as it goes", in the same way one usually checks function input arguments.
In fact, I don't think there is any fundamental difference in verifying the input values in a "normal" function in a program, or input values coming in a struct that was received through UART.
But as is obvious to most of sensible programmers, different paradigms work for different problems.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version