Author Topic: Best practice for messaging protocol parsing implementation  (Read 2754 times)

0 Members and 1 Guest are viewing this topic.

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Best practice for messaging protocol parsing implementation
« on: April 08, 2021, 05:12:16 pm »
Hi,
regardless of the hardware protocol (SPI, UART, I2C etc) what are some general best practices for implementing a messaging protocol parsing?

I am not looking for a "universal" type of solution that tries to address all possible scenarios, but a simple one where for you have two micros connected by lets say SPI and want to exchange messages:
1) with payload that varies depending on the header
2) ACK/NACK based on checksum and timeout

I always implemented them using Switch Case statements but that might sometime up in long convoluted code.

Are there some general guidelines, "elegant" sample code or perhaps tutorials?

Thank you :)
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2603
  • Country: us
Re: Best practice for messaging protocol parsing implementation
« Reply #1 on: April 08, 2021, 06:39:43 pm »
You said you're not looking for a "universal" solution, but there's not really enough information here to give more specific advice.

If you have a checksum, then the first step should probably be to validate that and NACK if necessary.  After that it will depend on the complexity of the messages. 

If the number of message types is fixed and relatively small then a switch statement that dispatches the message to different handlers may be just fine.  If you have a lot of message types then a list of message type descriptors that have message type identifiers with associated handlers may be better.  If you have variable message types or some types need to be handled by different modules within your application, you may want to use a linked list, either of individual message type descriptors or of lists of descriptors.  That would allow an individual code module to maintain a list of message types it needs to handle, which it can register with the protocol handler.  When a message comes in, the protocol handler can scan the list(s) of message type descriptors to find the right one and dispatch the message to the given handler.  This is slightly more complex initially, but can have substantial benefits in keeping big/complicated applications well modularized which ultimately makes them easier to develop and maintain. 

You can use a similar approach if you have a protocol where you address different parameters by ID.  A switch statement may be fine if you have a small number of IDs, or if the IDs are numerically contiguous maybe you have an array of handlers/variables where the ID is the index into the array.  But more complex situations, or where the IDs are sparse/noncontiguous you may want to have a list of parameter descriptors where each one contains the ID and then whatever information the protocol needs to handle that parameter.  The descriptor may just have a pointer to a variable, which can be a tagged union if you have different data types, and the protocol can use the same handler for all parameters, or the descriptor can have a reference to a handler specific to that parameter. Or a hybrid approach where the descriptor has an optional function pointer for a parameter-specific handler, and if that pointer is void the protocol code just uses a default handler.

Whatever you do, if you end up with a large/complicated switch statement or deeply nested conditionals, that's a code smell.  It's not necessarily bad--sometimes it's the best/only way to get a hairy bit of logic to work--but when it happens it's well worth taking a step back and looking for a better way, which there usually is.  You may need to break parsing down into stages or steps, so for example if your top-level message handler identifies that a particular message is part of a particular class of messages, it might make sense to write a separate handler function for that class of messages that the top level handler can invoke rather than doing further parsing in the top level handler. 

If you end up with one of these more complex situations, it's a good idea to create some simple macros or functions to get/set fields within a message to make it easier to create and maintain the various handlers you will eventually need to write.  This is some extra work up front, but can save a lot of headaches versus manually accessing byte indexes into the message buffer all over the place.  Even just functions that insert/extract multi-byte values into/from a message buffer can be useful so you don't have to worry about endianness if you eventually need to communicate across implementations where that's different--if you need to change it, that's all in one place instead of sprinkled all over your code.  You can also create more sophisticated functions that construct message headers for you (maybe based on a few arguments like message type and payload length).  Similarly basic ACK/NACK responses should probably be handled by one or two functions that take whatever arguments you need to construct those messages so you don't have to manually construct those responses from scratch everywhere they're needed.  This is all about one of the key rules of pragmatic software development: Don't Repeat Yourself.  If a whole lot of spots in your code need to do a particular thing, then write one function that does that thing and call it where needed instead of doing that thing over and over again.

Whatever approach you end up with, two more tips:
- ALWAYS check buffer lengths! When dispatching a message from a higher level handler to a lower level one, you probably want to include the message length as an argument.  That allows the receiving handler to validate the length and avoid reading junk data if the received length isn't what's expected
- Catch all errors and especially the impossible ones as you write the code.  This doesn't have to be fancy, just throw ASSERT()s everywhere so that when something goes wrong you find out immediately.  You can remove these asserts (or use a macro-based ASSERT and #define it out) or replace them with something more suitable later.
 

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #2 on: April 08, 2021, 07:59:22 pm »
Thank you very much Ajb for the detailed explanation! :)

What do you mean by "higher level" and "lower level" handlers when you say "When dispatching a message from a higher level handler to a lower level one"?

Thank you
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2603
  • Country: us
Re: Best practice for messaging protocol parsing implementation
« Reply #3 on: April 08, 2021, 08:50:30 pm »
I mean that if the protocol is sufficiently complex (not necessarily VERY complex, just above a certain level) you'll probably end up with more than one layer of message handling functions.  So at the highest level, you have a function that handles the raw message from the wire, and does some initial validation and maybe sends a NACK if it finds a problem with the message (like a checksum failure or an invalid message type); this would be the highest level message handler.  If the message is valid, it may then invoke a handler for the specific message type received, which would be a lower level handler.  You may have additional levels, where the second layer handles a class of message and dispatches different types within that class to more specific handlers, or where you have a nested protocol where the payload of a message contains one or more submessages.  Each time you make a handoff like that you need to make sure that the function you're invoking has the context it needs to interpret the message properly, which probably includes a length argument, but the details beyond that depend on the protocol.

(Higher/lower is relative, in this case I mean in terms of application organization/specificity--in a protocol sense you might say that the first handler is at a lower level because it cares about the details of the message on the wire, while the second handler may only care about the message payload and thus is at a higher level.  TCP/IP/Ethernet in the OSI model would be an example of this, where the payload of an incoming message from the physical/link layer (Ethernet) is passed "up" to the network layer (IP), the payload from that is passed "up" to the transport layer (TCP), and then finally that payload is passed "up" to the application layer.)
 

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #4 on: April 08, 2021, 09:14:54 pm »
Thank you Ajb! :)
 

Offline jc101

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #5 on: April 08, 2021, 09:27:23 pm »
I quite like the U-Blox UBX protocol structure. 

Page 168 onwards in this https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_UBX-13003221.pdf

The actual packet can be read byte by byte generically for any kind of payload, only if the checksum matches do you then need to parse the payload.  The payload for each Class and ID, which identifies what the packet holds/does, has fields aligned to 4-byte boundaries, so the payload can easily be mapped to a union of structures to pick out the data.

So one small state machine to receive the packet, and if the checksums match, then it's a single switch statement for each Class and ID to extract the data directly from the structures.

I'll admit much of this is fixed-length messages, but there are some variable length ones in there too.  So the union would end up being sized to hold the largest possible length of any variable item of data.

Having spent some time this week talking to their GNSS modules via their UBX protocol, it has been quite quick and simple to impliment.
 

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #6 on: April 09, 2021, 12:37:31 am »
Thank you jc101 :)
 

Offline JustMeHere

  • Frequent Contributor
  • **
  • Posts: 744
  • Country: us
Re: Best practice for messaging protocol parsing implementation
« Reply #7 on: April 09, 2021, 02:54:40 am »
One way to get rid of the switch statement may be dynamic class loading.  Don't know how well it works in C, but I know it would work in Java. 

In Java I would look for classes then load them up into a HashMap.  The switch statement would be replaced by looking for "commands" in the HashMap.

Here is some info on C
https://pubs.opengroup.org/onlinepubs/009695399/functions/dlsym.html
 

Offline TomS_

  • Frequent Contributor
  • **
  • Posts: 834
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #8 on: April 09, 2021, 06:35:05 am »
So at the highest level, you have a function that handles the raw message from the wire, and does some initial validation and maybe sends a NACK if it finds a problem with the message (like a checksum failure or an invalid message type); this would be the highest level message handler.

Personally I would probably refer to this as the lowest level handler - if you were to follow a similar kind of layered approach as the OSI model for example.

The physical aspects of a network are considered to be the lowest level, while things like protocols and applications are increasingly higher levels respectively.
 

Offline voltsandjolts

  • Supporter
  • ****
  • Posts: 2300
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #9 on: April 09, 2021, 09:45:17 am »
Personally I would probably refer to this as the lowest level handler - if you were to follow a similar kind of layered approach as the OSI model for example.
The physical aspects of a network are considered to be the lowest level, while things like protocols and applications are increasingly higher levels respectively.

^^ Yup.

At the lower level for packetisation you might consider using COBS which is quite simple to implement but gives great flexability, particularly in a layered approach.
It can packetise any length of random data, just add a layer above that with a CRC or checksum and you have a robust transport layer.
Then use whatever packet data format you want above that, binary, ascii,...whatever.

https://www.eevblog.com/forum/microcontrollers/implementing-uart-data-packets-with-consistent-overhead-byte-stuffing-(cobs)/
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5907
  • Country: es
Re: Best practice for messaging protocol parsing implementation
« Reply #10 on: April 09, 2021, 10:37:19 am »
I think the best way to keep the code readable and ordered is the switch method.

I played a while ago with the sony unilink (Old protocol for car stereos and cd changers).
It's simple enough yet pretty solid.Based on that, here's my idea:

SRC ADDR = The device address this packet come from
DST ADDR = The device address this packet goes to
CHKSUM1 = Start + src +  dst + cmd + opts (use any algorithm you like, CRC, SHA1, or classic checksum (XOR, AND, sum) )
CHKSUM2 = Start + src +  dst + cmd + opts + data

[Start = 0xAA] [SRC ADDR] [DST ADDR] [CMD] [CMD OPTS] [CHKSUM1] [DATA(size expected in CMD OPTS)] [CHKSUM2]

So from the start you will discard any data not starting with 0xAA.
Once you received the 0xAA you start saving and parsing the received data, determine if it's for you, and if the sender is valid.

Before doing anything else, you check the stored data matches checksum1.
If not, drop everything and wait for the next packet.
If ok, based on CMD and CMD opts, determine if you are going to receive more data.
Keep saving the data, and finally compare with checksum2.

You can just send simple commands (turn on, sleep, turn led on) without needing any data, but also send almost unlimited sizes.
Ex. CMD=write to sector. OPTS=sector address. Data size= known sector size, or defined in CMD or opts.

At the end, send an ACK with your computed checksum, the master compares it with the sent checksum, and repeats or sends next frame.

CHKSUM = CHECKSUM from last packet (CHKSUM1 or 2, depending).
CHKSUM2 = CHECKSUM for this packet

[Start = 0xAA] [SRC ADDR] [DST ADDR] [ACK] [CHKSUM] [CHKSUM2]

To detect packets, the easiest way is to have a timeout between frames, ex you send bytes every 2uS, but wait 1mS between packets.
So you set a 500uS timer. At every byte received, you clear the timer.
If at some point you were receiving a valid packet, but the timer overflows, reset the the protocol.

Some examples:
[0xAA] [0x80] [0x02] [LED1] [ON/OFF/BLINK] [CHKSUM1]
[0xAA] [0x80] [0x02] [WRITE SECTOR] [0x0] [CHKSUM1] [256Bytes] [CHKSUM2]
« Last Edit: April 09, 2021, 11:01:55 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline MIS42N

  • Frequent Contributor
  • **
  • Posts: 511
  • Country: au
Re: Best practice for messaging protocol parsing implementation
« Reply #11 on: April 09, 2021, 11:07:19 am »
I think you are dealing with two different areas: (1) message integrity (2) message payload.
(1) is making sure a payload is valid (2) deals with the payload content.

The NMEA messages from a GPS module are a good example: Each line of data starts with a $, finishes with a *XX where XX is the checksum, there is a limited number of characters between $ and * (I think 78), and there are no $ or * in the payload. There is no ACK/NAK system because a missed message is not a catastrophe. For guaranteed delivery, some form of acknowledgement is needed. If you read about some of the old modem protocols such as https://en.wikipedia.org/wiki/ZMODEM it may give you some ideas.

There's no point in looking at message payload until message integrity is verified. How you deal with the payload depends on if you have control of it. You can design your payload descriptors (headers) to fit the data. Sequential numbers or letters allow a table lookup process, it can be quite efficient. It is a long time since I wrote C but IIRC you could call a subroutine with ** variable, look up a table of subroutine addresses and call the appropriate subroutine. I think C++ (I never did get that far) does it more neatly. Other languages have constructs like GO TO A B C D DEPENDING ON X (is that BASIC?). Most processors implement this with modifying the program counter, makes for very quick evaluation. If you use a sparse set of headers (e.g A=add, I=insert, etc. then you could use a case statement, or you can set up a character lookup array so value['A'] = 1. value['I']=2 etc which then becomes a sequential lookup.

You can hide the underlying sequential system by having defines - #define add 1, #define insert 2 allowing statements like messgtype = add, messgtype = insert etc.
I am a fan of the table lookup method, because adding or removing functions is changing a couple of tables and the code that uses the tables doesn't change.
My 2¢ worth.
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Best practice for messaging protocol parsing implementation
« Reply #12 on: April 09, 2021, 12:14:05 pm »
I quite like the U-Blox UBX protocol structure. 

Page 168 onwards in this https://www.u-blox.com/sites/default/files/products/documents/u-blox8-M8_ReceiverDescrProtSpec_UBX-13003221.pdf

The actual packet can be read byte by byte generically for any kind of payload, only if the checksum matches do you then need to parse the payload.  The payload for each Class and ID, which identifies what the packet holds/does, has fields aligned to 4-byte boundaries, so the payload can easily be mapped to a union of structures to pick out the data.

Having spent some time this week talking to their GNSS modules via their UBX protocol, it has been quite quick and simple to impliment.
I completely disagree. Such protocols are horrible! Text based protocols are so much easier to work with and extend. I have implemented the Ublox UBX protocol as well (which is the first problem; you have to implement yet another protocol) and with information being in specific bytes you need to count in which field you are. No way to monitor the communication using a terminal program. A text based protocol OTOH is much easier to work with and -surprisingly- doesn't result in longer packets because none of the excess (zeros) and padding data needs to be send. For example: the UBX protocol's header is already 6 bytes long. That can easely be a command and a space with room to spare.
« Last Edit: April 09, 2021, 12:27:58 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #13 on: April 09, 2021, 04:34:54 pm »
Thank you all for the detailed infos!!! :)

As I will start implementing it I will probably have more questions but in the meantime thank you! Much appreciated as usual!!:)
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14472
  • Country: fr
Re: Best practice for messaging protocol parsing implementation
« Reply #14 on: April 09, 2021, 05:11:06 pm »
Regarding the OP's real question - as I see it - being not really about the protocol itself but how to handle "messages", I'd say it depends on the language, your personal coding style, the number of possible different messages, and so on.

Assuming you're using C, 'switch' constructs can obviously only be used for message "IDs" that are integers. If they are not, you can alternatively transform them into integers first.

A couple points:
* If the number of different IDs is large, that means a long switch statement. The longer a given statement or function is, and the harder it is to maintain, generally speaking.
* To keep things tidy using 'switch', you best use a function for each kind of ID there is, and call it from each case, instead of stuffing all handling code in each case part, which will yield a messy, and again unwieldly long piece of code, hard to read and maintain.
* With that said, you may think of implementing some kind of handler table instead of using a switch; so basically an array of function pointers, indexed by the message ID, each function pointer pointing to the corresponding handler function. For this approach,  all the different message IDs must be reasonably contiguous, else you may need a large array with a lot of space wasted. To mitigate this, you may use some kind of hash table instead of a raw array. Either way, handling a new message ID is just a matter of adding a new entry in the table and writing the corresponding handler function. Neat.
* Note that regarding performance, any reasonable optimizing compiler will usually implement a 'switch' statement having at least a few cases as a jump table, so performance should be pretty similar to the handler table above. But the handler table will be easier to maintain, and the message "parsing" code, very compact.
* As a side note, some could argue that the handler table suggested above may be a security hazard, so to be used with care.
« Last Edit: April 09, 2021, 05:13:46 pm by SiliconWizard »
 

Offline ricko_ukTopic starter

  • Super Contributor
  • ***
  • Posts: 1015
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #15 on: April 09, 2021, 06:25:00 pm »
Thank you Silicon Wizzard,
yes that's what my OP was about. And I was thinking of functions for each case to keep it clean as you mentioned.
But your suggestion of array of function pointers indexed by message ID  is even better!

My other question is each handler function. I guess there are two ways to implement it. One where you parse every byte coming in and one where you parse it only after the entire message has been received (obviously assuming you know the length from the first (or first few) bytes.

Any suggestions, pros, cons for each solution?

Thank you
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2603
  • Country: us
Re: Best practice for messaging protocol parsing implementation
« Reply #16 on: April 09, 2021, 07:02:11 pm »
So at the highest level, you have a function that handles the raw message from the wire, and does some initial validation and maybe sends a NACK if it finds a problem with the message (like a checksum failure or an invalid message type); this would be the highest level message handler.

Personally I would probably refer to this as the lowest level handler - if you were to follow a similar kind of layered approach as the OSI model for example.

The physical aspects of a network are considered to be the lowest level, while things like protocols and applications are increasingly higher levels respectively.

Yes, I mentioned this in the paragraph after the one you quoted  :)

Anyway it's tangential to the point I was making since I was talking about code organization, not protocol structure (which has not been given).  Even within one layer of the OSI model, given any complexity you likely end up having some sort of `receiveMessage()` function at the top level of a module (the part that's exposed as the API) which dispatches messages to more specific functions like `receiveFooMessage()` and `receiveBarMessage()`, and the latter may further delegate processing to `receiveBarMessageSubtypeBaz()`. 

One way to get rid of the switch statement may be dynamic class loading.  Don't know how well it works in C, but I know it would work in Java. 

In Java I would look for classes then load them up into a HashMap.  The switch statement would be replaced by looking for "commands" in the HashMap.

Here is some info on C
https://pubs.opengroup.org/onlinepubs/009695399/functions/dlsym.html

C doesn't have dynamic class loading because it doesn't have classes  :P  What you'd typically do instead is have each code module provide some sort of description (usually a struct) of the functionality it offers and then register that description with some sort of dispatcher at init.  That dispatcher then needs to have some way of figuring out how to resolve events or invocations to functionalities from the correct module.  This does require coordination between the module being loaded and the dispatcher--the description of functionality has to be an object type that the dispatcher understands, so probably a struct type defined by the dispatcher module--so it's generally specific to a type of functionality.  A sort of related example would be the way that LWIP handles network interfaces, where an implementation provides an instance of the `netif` struct (defined by LWIP) that holds state information as well as driver functions for a given network interface.  That struct gets handed off to LWIP which then handles the interface's state and invokes its drivers as necessary (usually indirectly via calls into the APIs for the other modules in the stack).

This obviously isn't as flexible or as expressive as what you can do in object oriented languages, but generally works well in embedded applications.

The payload for each Class and ID, which identifies what the packet holds/does, has fields aligned to 4-byte boundaries, so the payload can easily be mapped to a union of structures to pick out the data.

So one small state machine to receive the packet, and if the checksums match, then it's a single switch statement for each Class and ID to extract the data directly from the structures.

This is worth highlighting, because the layout of structures in memory is not guaranteed to be contiguous.  So in structures that have mixed types with different storage sizes you can end up with padding that will make the struct layout differ from the packet layout.  Depending on the platform/compiler/language version there are some options to control this, but it can trip you up if you don't account for it.  If you have the same platform at both ends you can sometimes ignore this and just copy the struct from memory straight into the message byte-for-byte, but if you need to communicate between different platforms (or even software versions) it's probably better to be a bit more deliberate, which may involve manually marshalling fields from whatever data object you have in memory into and out of the message buffer.

My other question is each handler function. I guess there are two ways to implement it. One where you parse every byte coming in and one where you parse it only after the entire message has been received (obviously assuming you know the length from the first (or first few) bytes.

Any suggestions, pros, cons for each solution?

Thank you

There are a couple of cons to doing the byte-for-byte thing:
- if you are handing off each byte to a specific handler as it comes in, then you need to first identify which handler to invoke, which probably means you need a generic handler to parse the header and make that handoff
- the handoff between the generic handler and the specific one requires relaying any necessary state, which may be more or less of a problem depending on the protocol complexity
- the specific handler shouldn't do anything with the message contents until it knows the message is valid, which requires the whole message to be received so the checksum can be validated
- each specific handler has to be able to validate the checksum
- if the specific handler DOES do anything with the partial message, you may need to be able to undo those things if the message turns out to be invalid

So given those there's not much point to handing each byte off to a specific handler.  It's going to generally be a lot simpler to have one function that gets and validates the complete message and then hands the whole thing off to the appropriate specific handler. 

As to whether that initial handler should get a byte at a time, that depends on the protocol.  If you have variable message lengths, or addressing that needs to be checked, you probably need to hand it a byte at a time at least for the header so it can figure out how many more bytes to wait for, and maybe it also checks the destination address and just ignores the rest of the message if it doesn't match.  Once it knows the destination is correct and how many bytes to wait for you could then switch to DMA for the rest of the message, or continue to process each byte as it comes in.  The latter may make sense if you need to enforce an inter-byte timeout or something, or a per-byte ACK.
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Best practice for messaging protocol parsing implementation
« Reply #17 on: April 09, 2021, 09:18:26 pm »
I missed this one piece of really bad advise:

The payload for each Class and ID, which identifies what the packet holds/does, has fields aligned to 4-byte boundaries, so the payload can easily be mapped to a union of structures to pick out the data.

Never ever map structures on buffers of incoming data! It will break at some point and requires contorting the C compiler to pack a struct (or not) in very specific ways. Some CPUs / C ABIs require very specific alignment of variables an may resort to odd (and hard to detect) behaviour if the alignment is bad. And then there is the difference between big & little endian which breaks multi-byte numbers anyway and / or makes the code non-portable. Always process a buffer byte-by-byte and combine multi-byte values using shifts.
« Last Edit: April 09, 2021, 09:23:03 pm by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: artag

Offline artag

  • Super Contributor
  • ***
  • Posts: 1070
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #18 on: April 09, 2021, 10:26:41 pm »
...  what are some general best practices for implementing a messaging protocol parsing?

Respect Postel's Law : https://en.wikipedia.org/wiki/Robustness_principle : "be conservative in what you do, be liberal in what you accept from others"
 

Offline artag

  • Super Contributor
  • ***
  • Posts: 1070
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #19 on: April 09, 2021, 10:32:32 pm »
Always process a buffer byte-by-byte and combine multi-byte values using shifts.

This.  ^^^^^^

It may seem painful but when you run your code on another sort of machine, it will still work.
If you want to avoid too much fiddly code, consider the library functions htobe16() and its siblings, which you can be sure will be optimised for each machine they appear on (they're the functions used to implement correctly-ordered bitfields in TCP/IP)
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8172
  • Country: fi
Re: Best practice for messaging protocol parsing implementation
« Reply #20 on: April 10, 2021, 08:19:11 am »
Just ignore nctnico's "truths" such as the bliss of ASCII protocols in machine-to-machine communication, or the massive danger of communication structs and the bliss of bitshifting.

Handling buffers as structs is very widely used, widely tested solution. The whole world's network infrastructure is based on this design pattern and works pretty well.

All compilers implement packing and alignment attributes. C99 officially introduced fixed length types such as int16_t. Use macros like hton to convert endianness. It's also perfectly acceptable to document your system only applies to little-endian, especially in embedded where the whole system is developed in sync. Big endian systems are exceedingly rare.

The benefit is utterly obvious: code quality. A system that is based on structs is automatically in sync - you can simply include the same header on both sides -, requires no update of "parser" when the contents are modified - in fact it requires no "parser"  at all. Everything just automatically works. The only thing that can go wrong are related to alignment and accessing through pointers. You need to be aware of this.

I have been through byte shifting orgy as a beginner like everybody else, and hunted down hard to find and rarely occurring bugs caused by typos such as using < instead of << somewhere, for days and days. Realistically speaking, a parser module doing significant amount of bitshifting really requires:
* A unit test bench,
* A locked-down specification that never changes (or a change triggers a tedious manual rewrite of the parser module PLUS the unit test bench).

This is all syntactic boiler-plate for no reason whatsoever.

You can grow up from that beginner phase, or you can rationalize your coding style until cows come home.

Yes, there are always risks when doing anything. Yes, certain things require skill. Using structs to access communication buffers requires much less skill than flying an airplane or operating as a surgeon. Yet, the effort is non-zero, and you can always fail.

I have worked with this pattern for years and years and never seen any issue. The opposite is true; I have hunted down several bugs caused by the bitshifting code which is very error prone manual work - unless you automate it through code generation.
« Last Edit: April 10, 2021, 08:22:02 am by Siwastaja »
 

Offline artag

  • Super Contributor
  • ***
  • Posts: 1070
  • Country: gb
Re: Best practice for messaging protocol parsing implementation
« Reply #21 on: April 10, 2021, 09:32:51 am »
Just ignore nctnico's "truths" such as the bliss of ASCII protocols in machine-to-machine communication, or the massive danger of communication structs and the bliss of bitshifting.

Handling buffers as structs is very widely used, widely tested solution. The whole world's network infrastructure is based on this design pattern and works pretty well.

nctnico is correct. Some protocols use binary for efficiency but higher levels like SMTP, HTTP and FTP use ASCII protocols for control for the reason he states. Using structs without the host-to-network-order calls is non-portable and results in the broken proprietary network world that we've left behind. If your network world is TCP/IP and the protocols that run above it, then it doesn't make that mistake.
 
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Best practice for messaging protocol parsing implementation
« Reply #22 on: April 10, 2021, 10:10:30 am »
Just ignore nctnico's "truths" such as the bliss of ASCII protocols in machine-to-machine communication, or the massive danger of communication structs and the bliss of bitshifting.

Handling buffers as structs is very widely used, widely tested solution. The whole world's network infrastructure is based on this design pattern and works pretty well.
Until you spend days hunting a bug in 'inherited code' which was tested on a PC but doesn't run on -for example- ARM because the ARM doesn't support unaligned accesses. So there goes your unit test out of the window because it didn't catch it due to differences between platforms. What is a unit test worth if you need to run it for every specific platform? Mapping buffers onto structs also requires specific compiler settings to pack the structs in a certain way. This can clash with other compiler options or may be different between compilers from different vendors. All in all mapping structs onto buffers brings you a whole heap of potential platform and code portability problems.
« Last Edit: April 10, 2021, 10:14:09 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8172
  • Country: fi
Re: Best practice for messaging protocol parsing implementation
« Reply #23 on: April 10, 2021, 10:17:32 am »
Unaligned access results in immediate exception called - guess what - "bus error" or "unaligned access"! But I'm not too surprised you have to hunt it down for days.

But, the struggle to find a typoed "<" when the programmer meant "<<" is very real as it compiles, runs, and even works with certain lucky values and only produces the wrong result randomly.

But all in all, I'm not saying you are completely wrong. All your points are valid and I totally agree with them. It's just that there is much more to it than your simplistic good/bad right/wrong viewpoint.
« Last Edit: April 10, 2021, 10:19:58 am by Siwastaja »
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: Best practice for messaging protocol parsing implementation
« Reply #24 on: April 10, 2021, 10:23:20 am »
Unaligned access results in immediate exception called - guess what - "bus error" or "unaligned access"!
Well, you'd expect that but -surprisingly- not all ARM cores throw that exception. Some will just silently mess up the data and keep on going. Do you realy think I'd need to hunt a bug for days if it throws an exception?  :palm: With an exception I'd get a nice stack trace which pinpoints the problem.

Experiences like that taught me it is wise to write code in a way so it can't fail due to platform/compiler dependancies. In the end that is way more efficient use of my time.
« Last Edit: April 10, 2021, 10:30:58 am by nctnico »
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf