Thoughts on Message Protocols on Multidrop UART

Thoughts on Message Protocols on Multidrop UART
Posted by DaAwesomeP on 20 Jul, 2021 01:40
Hello!

I'm working on a project with many devices on a multi-drop UART or USRT line talking to a central controller.

Many times I have made simple byte message protocols consisting of something like STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE. I can do this again, but it can be tedious to write/make changes to as the project gets bigger and I add more and more commands with differing data types and combinations. I'm using Zephyr and I noticed it has support for protocol buffers (nanopb) and CBOR (tinycbor) libraries in their samples, so here I am: Is it worth exploring the other data formats? Will I save any time with the packet implementation or extending the protocol, or will I spend all my time getting a new protocol type to work? Should I stick with my simple messages?

The first thing I realized is that it will make it much harder to troubleshoot/debug on a scope/analyzer. But at the same time it would make handling the different types of data that need to be transmitted in one command much easier to deal with and more flexible.

I appreciate any thoughts on this,
Perry

#1 Reply
Posted by errorprone on 20 Jul, 2021 02:04
Another option is Modbus which is similar in principle. It is variable length with different commands and a CRC. Instead of a start and stop it has a timeout. With a USB to uart converter, it’s possible to use a PC and python to decode/debug the messages.

#2 Reply
Posted by AaronLee on 20 Jul, 2021 02:30
Quote from: DaAwesomeP on 20 Jul, 2021 01:40
Hello!

I'm working on a project with many devices on a multi-drop UART or USRT line talking to a central controller.

Many times I have made simple byte message protocols consisting of something like STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE. I can do this again, but it can be tedious to write/make changes to as the project gets bigger and I add more and more commands with differing data types and combinations. I'm using Zephyr and I noticed it has support for protocol buffers (nanopb) and CBOR (tinycbor) libraries in their samples, so here I am: Is it worth exploring the other data formats? Will I save any time with the packet implementation or extending the protocol, or will I spend all my time getting a new protocol type to work? Should I stick with my simple messages?

The first thing I realized is that it will make it much harder to troubleshoot/debug on a scope/analyzer. But at the same time it would make handling the different types of data that need to be transmitted in one command much easier to deal with and more flexible.

I appreciate any thoughts on this,
Perry

I've done exactly what you've been doing (simple protocol with STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE) all my career. Often times it's just a one-to-one situation, but other times it's been multidrop, ie. one master and multiple slaves. I've never had the need for multi-master networks. I've used UART, RS-232, RS-485, etc. And I've used other protocols (Modbus, HART, etc.) where the network requirements mandated it. If there's no mandate for a particular protocol, in my experience just doing what you're doing is the simplest. On all the devices on a multi-drop network, I add a debug UART port, and just print out the commands as they're received/sent and send them to a PC terminal via a USB to UART adapter. If the debugging gets complicated, I can synchronize all the clocks and printout a timestamp to the debugging port to go along with the commands received/sent.

While it's no problem for me to use Modbus or various other protocols, I find it's a bit more work to implement and never as simple as a basic protocol such as you're using.

Of course, if you have errors in communication, then you need some error detection/correction method, which always adds a lot to the complexity of things. So if you have a communications library that already has those functions built-in, you might find that easy to use. For me, it's just a headache though in dealing with someone else's libraries, and especially if I need my system optimized for memory, timing, or various other parameters, I can easily do it if it's all my own code, but not so easily done if I'm using someone else's library. It's really frustrating to me to use someone else's library and waste days or even weeks trying to get it to do exactly what you want it to do, when you could have just written it yourself from scratch and saved a lot of time. But that's a personal decision for each engineer to make.

#3 Reply
Posted by DaAwesomeP on 20 Jul, 2021 03:57
Quote from: errorprone on 20 Jul, 2021 02:04
Another option is Modbus which is similar in principle. It is variable length with different commands and a CRC. Instead of a start and stop it has a timeout. With a USB to uart converter, it’s possible to use a PC and python to decode/debug the messages.

Ah I forgot about Modbus. I guess that would more be the way to go if I go with a standard protocol.

Quote from: AaronLee on 20 Jul, 2021 02:30
I've done exactly what you've been doing (simple protocol with STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE) all my career. Often times it's just a one-to-one situation, but other times it's been multidrop, ie. one master and multiple slaves. I've never had the need for multi-master networks. I've used UART, RS-232, RS-485, etc. And I've used other protocols (Modbus, HART, etc.) where the network requirements mandated it. If there's no mandate for a particular protocol, in my experience just doing what you're doing is the simplest. On all the devices on a multi-drop network, I add a debug UART port, and just print out the commands as they're received/sent and send them to a PC terminal via a USB to UART adapter. If the debugging gets complicated, I can synchronize all the clocks and printout a timestamp to the debugging port to go along with the commands received/sent.

While it's no problem for me to use Modbus or various other protocols, I find it's a bit more work to implement and never as simple as a basic protocol such as you're using.

Of course, if you have errors in communication, then you need some error detection/correction method, which always adds a lot to the complexity of things. So if you have a communications library that already has those functions built-in, you might find that easy to use. For me, it's just a headache though in dealing with someone else's libraries, and especially if I need my system optimized for memory, timing, or various other parameters, I can easily do it if it's all my own code, but not so easily done if I'm using someone else's library. It's really frustrating to me to use someone else's library and waste days or even weeks trying to get it to do exactly what you want it to do, when you could have just written it yourself from scratch and saved a lot of time. But that's a personal decision for each engineer to make.

Thank you very much for your advice. I will stick with my simple solution for now. I think at the very least I should get everything working the simple way I know how and then re-evaluate if it makes sense (of course I will have other priorities then).

Yeah I was also thinking about how ACK/checksum would work. I guess in a simple system I would add a CRC and then if it fails throw it out and ask for the message again. Or even just going into an error state at that point would be fine for this application. But Zephyr should make dealing with CRCs pretty simple for just pass/failing messages.

#4 Reply
Posted by AaronLee on 20 Jul, 2021 04:38
Quote from: DaAwesomeP on 20 Jul, 2021 03:57

Thank you very much for your advice. I will stick with my simple solution for now. I think at the very least I should get everything working the simple way I know how and then re-evaluate if it makes sense (of course I will have other priorities then).

Yeah I was also thinking about how ACK/checksum would work. I guess in a simple system I would add a CRC and then if it fails throw it out and ask for the message again. Or even just going into an error state at that point would be fine for this application. But Zephyr should make dealing with CRCs pretty simple for just pass/failing messages.

What I do, if the system has enough memory to allow for it, is to add a sequential transaction number to the protocol, whereby each time a packet is sent, it includes the transaction number and then increments the count. Then store the packets sent in a buffer, and if there's no acknowledge for a particular packet, it should be stored in the buffer and can easily just be resent. The logic to implement that isn't too overwhelming. But if memory is at a premium, other strategies might be needed. Of course, in some cases the sequence of commands is critical, so in that case the system needs to hold up sending new commands before getting an acknowledge, where resending a command out of sequence can cause issues. Sometimes a system with plenty of bandwidth can simply resend/re-request all the data that needs to be communicated in a round-robin fashion, so that if some packet is corrupted/missing, it'll end up getting sent in the next cycle. There's definitely many systems that this approach wouldn't work in, but if it does, it really doesn't require any error checking other than verifying your CRC/checksum so that you don't process garbage packets.

As for CRC/checksum calculations, it's all super trivial in my book. Once you have a function to calculate the CRC/checksum you want, it's super easy to make a function that calls that on all the data you're about to send/received.

If you decide to use Modbus, while it has a defined CRC for each packet, and an error code for invalid packets, there's no defined mechanism for resending packets that were corrupted or not received. So you still have the same issue with how to handle those errors once they're detected.

#5 Reply
Posted by Berni on 20 Jul, 2021 05:52
If you are after reliability then use CAN.

Lots of MCUs have built in CAN controllers these days. Those controllers will handle all the tricky bits of multi master communication by them selves, it checksums the data to make sure it is intact and automatically resends it if it does get garbled. It works over differential pairs so its more resilient to noise. If you have CAN FD support then you can even get a decent amount of speed out of it too. All the software has to do is throw some data into a buffer and the CAN controller takes care of getting it out.

That being said the classical UART packet format of START ADDR LEN DATA..... CRC STOP works really well and is pretty flexible. You can designate the first data byte as a command and simply have devices ignore unknown commands. That way you can add any new packets of any new length and not break anything. If you need more than 255 length you can designate the 255 as meaning "This message is 255 long and continues in the next message" so you can string together multiple messages into one giant 100KB message if you so required. If you need more than 256 commands then you can reserve command 255 as meaning "Extended Command, see next byte for command". That way the old devices will still simply ignore the Extended Command but new devices that know it can read it. So if you start off your protocol in a smart way you can just keep extending it indefinitely.

The easy way to ensure a reliable delivery of the message is to make ACKs and NACks mandatory. That way after sending a packet the node expects a ACK or NACK from the recipient. So unless you get a ACK in a timely manner it just resend the same buffer onto the bus again. No extra memory needed and no keeping track of packet indexes. But it does slow down the bus waiting for the responses (since nobody else is allowed to talk in that time). Then again if you are after high speed you should probably be looking at something other than UART.

#6 Reply
Posted by AaronLee on 20 Jul, 2021 06:04
Quote from: Berni on 20 Jul, 2021 05:52
If you are after reliability then use CAN.

Or Ethernet and use TCP/IP which has built-in error correcting. Of course then you need to add on a TCP/IP stack to your software, and have a MCU that has an ethernet port. One possible advantage of Ethernet is if you use POE and power all the slave devices via the power in the master. Of course that's yet even more hardware.

#7 Reply
Posted by nctnico on 20 Jul, 2021 07:50
Quote from: AaronLee on 20 Jul, 2021 06:04
Quote from: Berni on 20 Jul, 2021 05:52
If you are after reliability then use CAN.

Or Ethernet and use TCP/IP which has built-in error correcting. Of course then you need to add on a TCP/IP stack to your software, and have a MCU that has an ethernet port.
You can run ethernet frames over any medium (as long as it support variable length messages), including a multi-drop UART interface. Been there, done that.

#8 Reply
Posted by ajb on 20 Jul, 2021 15:35
Quote from: DaAwesomeP on 20 Jul, 2021 03:57
Yeah I was also thinking about how ACK/checksum would work. I guess in a simple system I would add a CRC and then if it fails throw it out and ask for the message again.

Be careful with this in a multidrop situation. An integrity check failure means that none of the contents of the message can be trusted, and that includes the header containing the destination address and/or source address (unless the header has its own integrity check, but that's uncommon, esp for UART protocols). So a device that receives a message that fails integrity check and sends an error response could be responding to a message that was never meant for it, or could be sending that response to a device that didn't actually send it. Then you have a mess on your hands, especially if the error response collides with any other response on the bus, causing more CRC failures...

If you need reliability in a multidrop system, it's generally better to send a positive ACK on valid messages, a negative ACK on messages with good checksums but otherwise bad contents (invalid commands or whatever), and no response on messages with bad checksums so the sender can retransmit if no response is received.

#9 Reply
Posted by Siwastaja on 20 Jul, 2021 15:47
If you are having a scalability/maintainability problem with your start | addr | cmd | len | payload | checksum scheme, that doesn't mean the concept itself is bad, you are not just abstracting it properly and are doing unnecessary manual work constructing and parsing the messages.

In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.

This is the most writable, most readable, most maintainable pattern by far, but the downside is, this is non-portable* and possibly requires a compiler flag like -fno-strict-aliasing and use of alignment attributes. The biggest practical risk, though, is getting hit by a hyperventilating language lawyer, so I can't recommend doing this, just describing what I do with zero problems (other than said language lawyers) during years of doing that. I'm also not discussing this further this time, this is just something OP might want to look into and make their own decisions so please allow them to do that.

*) non-portable meaning, doesn't run automatically everywhere as-is, not that it couldn't be ported with trivial amount of work per platform combination.

#10 Reply
Posted by SiliconWizard on 20 Jul, 2021 16:30
Quote from: Siwastaja on 20 Jul, 2021 15:47
If you are having a scalability/maintainability problem with your start | addr | cmd | len | payload | checksum scheme, that doesn't mean the concept itself is bad, you are not just abstracting it properly and are doing unnecessary manual work constructing and parsing the messages.

I agree. And using protocol buffers for that is often pretty overkill.

Using any kind of packet format doesn't in itself prevent you from properly abstracting/factoring code to make maintenance easy.
As we said in another thread - the key is to avoid duplicating code as much as you can. There are many ways of doing this. Pick one you're comfortable with. And factor, factor, factor.

#11 Reply
Posted by DaAwesomeP on 20 Jul, 2021 18:00
Wow! So many insights! Thank you!

Quote from: Berni on 20 Jul, 2021 05:52
If you are after reliability then use CAN.

Quote from: AaronLee on 20 Jul, 2021 06:04
Or Ethernet and use TCP/IP which has built-in error correcting. Of course then you need to add on a TCP/IP stack to your software, and have a MCU that has an ethernet port. One possible advantage of Ethernet is if you use POE and power all the slave devices via the power in the master. Of course that's yet even more hardware.

I thought about RS485, CAN, Ethernet (PoE would be really nice here), but a goal of this project is to keep it is cheap as possible. I'm not going too long distance between each board to be totally concerned. Boards will essentially consist of an MCU, regulator, buffers, caps, and connectors.

Quote from: Berni on 20 Jul, 2021 05:52
That being said the classical UART packet format of START ADDR LEN DATA..... CRC STOP works really well and is pretty flexible. You can designate the first data byte as a command and simply have devices ignore unknown commands. That way you can add any new packets of any new length and not break anything. If you need more than 255 length you can designate the 255 as meaning "This message is 255 long and continues in the next message" so you can string together multiple messages into one giant 100KB message if you so required. If you need more than 256 commands then you can reserve command 255 as meaning "Extended Command, see next byte for command". That way the old devices will still simply ignore the Extended Command but new devices that know it can read it. So if you start off your protocol in a smart way you can just keep extending it indefinitely.

Fantastic strategy, thank you!

Quote from: Berni on 20 Jul, 2021 05:52
The easy way to ensure a reliable delivery of the message is to make ACKs and NACks mandatory. That way after sending a packet the node expects a ACK or NACK from the recipient. So unless you get a ACK in a timely manner it just resend the same buffer onto the bus again. No extra memory needed and no keeping track of packet indexes. But it does slow down the bus waiting for the responses (since nobody else is allowed to talk in that time). Then again if you are after high speed you should probably be looking at something other than UART.
That's a good enough strategy for my application. Ensure delivery for each message, and then resend a few times before going into an error state.

Quote from: ajb on 20 Jul, 2021 15:35
If you need reliability in a multidrop system, it's generally better to send a positive ACK on valid messages, a negative ACK on messages with good checksums but otherwise bad contents (invalid commands or whatever), and no response on messages with bad checksums so the sender can retransmit if no response is received.

I did not think about NACKs. I'll add that to my list.

Quote from: Siwastaja on 20 Jul, 2021 15:47
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.

That's very simple but definitely makes perfect sense. I can't believe I didn't think of that! I'm a team of one (me) with nobody peering over my shoulder, and I am comfortable with that implementation, so I will probably go with that.

Thank you everyone so much! This is amazingly helpful!

#12 Reply
Posted by Berni on 21 Jul, 2021 05:31
Quote from: Siwastaja on 20 Jul, 2021 15:47
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.

This is the most writable, most readable, most maintainable pattern by far, but the downside is, this is non-portable* and possibly requires a compiler flag like -fno-strict-aliasing and use of alignment attributes. The biggest practical risk, though, is getting hit by a hyperventilating language lawyer, so I can't recommend doing this, just describing what I do with zero problems (other than said language lawyers) during years of doing that. I'm also not discussing this further this time, this is just something OP might want to look into and make their own decisions so please allow them to do that.

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.

Its pretty popular in its use over here, the packing is mostly done by compiler directives right next to the struct to make sure it does not get lost, and everything we use tends to be ARM with GCC, so the code runs fine across different projects. But yeah moving to a different compiler would need some adjustments to the code, while moving to something with the opposite endianess would really break things. But id say very few large piles of code would not break if suddenly run on a opposite endianess CPU.

Sure it has some downside, but they are minor compared to the huge upside of literally taking 0 CPU cycles to do as it doesn't actually have to move any data around, and no extra memory is needed. Besides there are plenty of other worse ways to shoot yourself in the foot with a C pointer.

#13 Reply
Posted by JOEBOBSICLE on 21 Jul, 2021 09:13
I'd use protobuf for encoding messages. I think CAN is just as cheap as UART and a lot more reliable.

Most microcontrollers will have CAN and it's easy to use plus robust.

#14 Reply
Posted by Siwastaja on 21 Jul, 2021 10:14
Quote from: DaAwesomeP on 20 Jul, 2021 18:00
Quote from: Siwastaja on 20 Jul, 2021 15:47
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.

That's very simple but definitely makes perfect sense. I can't believe I didn't think of that! I'm a team of one (me) with nobody peering over my shoulder, and I am comfortable with that implementation, so I will probably go with that.

Just remember, assuming GCC

* If you do it by pointer casting and not unions, use -fno-strict-aliasing to prevent compiler doing stupid assumptions
* Use stdint.h types like uint32_t, not "int".
* __attribute__((packed)), so that the structs are indeed the same on both sides
* __attribute__((aligned(8))), make sure each of the structs are aligned in memory, for example if you have a buffer containing the data.
* If alignment is impossible to define beforehand, for example you receive arbitrary number of bytes in a buffer and need to parse something at an arbitrary position, memcpy first. While there is a performance penalty, is still at least as efficient, or more efficient, than parsing it together with arithmetic, and it's still just one line of code. But best to avoid such situation. (I'm assuming ARM here. A pure x86 solution would work fine with unaligned access, just slower.)

Most of the networking code out there uses this pattern, and for example the TCP header fields have been designed from scratch alignment in mind! Do the same, for example
Code: [Select]
struct PACK { uint8_t status; uint8_t reserved_for_future_use; int16_t temperature; uint32_t counter; }instead of
Code: [Select]
struct PACK { uint8_t status; uint32_t counter; uint8_t reserved_for_future_use; int16_t temperature; }
In the former, everything is aligned as long as the struct itself is aligned per the requirements of the longest type (8 pretty much always works, but 4 would work here). In the latter, compiler would want to add padding, which is prevented, and you have unaligned access for counter and temperature.

Yeah, a few things to remember and take care of.

#15 Reply
Posted by fchk on 22 Jul, 2021 06:53
Automotive has created LIN as a robust form of multidrop UART. Maybe this is for you. Cheap, easy to use, proven.

#16 Reply
Posted by peter-h on 22 Jul, 2021 13:53
"Another option is Modbus which is similar in principle. It is variable length with different commands and a CRC. Instead of a start and stop it has a timeout. "

I am informed that many products don't implement that timeout feature (to detect end of message, etc). So they receive data until the CRC works out right

Most protocols are way too complex for most tasks - because a whole committee got stuck into it. Just use a unique start byte, etc, just like at present. Keep it simple...

CAN and LIN are complex.

If you want error detection/correction then it gets much more complex because (for example) you can't be sure if the recipient failed to get the message. It could simply be that his ACK got lost on the way to you. This leads to a need for an incrementing serial number in each packet, OR a design where multiple identical packets don't cause a problem.

#17 Reply
Posted by ajb on 22 Jul, 2021 15:45
Quote from: peter-h on 22 Jul, 2021 13:53
Most protocols are way too complex for most tasks - because a whole committee got stuck into it. Just use a unique start byte, etc, just like at present. Keep it simple...

CAN and LIN are complex.

If you want error detection/correction then it gets much more complex because (for example) you can't be sure if the recipient failed to get the message. It could simply be that his ACK got lost on the way to you. This leads to a need for an incrementing serial number in each packet, OR a design where multiple identical packets don't cause a problem.

This last paragraph is part of why a lot of protocols are complex. Even if the device/application is simple, the real world around it is not, and you need to be prepared for when the real world gets involved in your protocol. It's also why a lot of protocols rely on committees, because bringing in more people means more experience and expertise to help inform how the protocol should account for various use cases and potential problems. Of course it's possible to account for too many use cases and end up with something monstrous, but that's a failure of scope setting more than protocol development. Finally, there's often a tradeoff in complexity between levels of a system. Simplifying the protocol may just end up shifting complexity to other areas of the applications that use it, whether because you end up shoehorning functionality into an inadequate communication channel, or because it didn't account for a particular error condition, or because the protocol specification ends up leaving enough room for interpretation that different devices will have differences in implementations.

#18 Reply
Posted by nctnico on 22 Jul, 2021 20:24
Quote from: Berni on 21 Jul, 2021 05:31
Quote from: Siwastaja on 20 Jul, 2021 15:47
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.

This is the most writable, most readable, most maintainable pattern by far, but the downside is, this is non-portable* and possibly requires a compiler flag like -fno-strict-aliasing and use of alignment attributes. The biggest practical risk, though, is getting hit by a hyperventilating language lawyer, so I can't recommend doing this, just describing what I do with zero problems (other than said language lawyers) during years of doing that. I'm also not discussing this further this time, this is just something OP might want to look into and make their own decisions so please allow them to do that.

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.

#19 Reply
Posted by Berni on 23 Jul, 2021 05:45
Quote from: nctnico on 22 Jul, 2021 20:24
Quote from: Berni on 21 Jul, 2021 05:31

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.

Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()

Human readable clear text protocols definitely are nice for debug, but they are not always a good option when it comes to moving large amounts of data or squeezing bandwidth out of a slow communication link (Like radio communication). Something that just turns the various lights in a room on/off over UART, great use for ascii commands. Sending data to some big LED art installation that updates 1000 RGB LEDs at 100 fps not so much, that case would waste so much precious link bandwidth and cause a ton of extra computational load for all the parsing.

For example the standard NMEA protocol for GPS receivers that pretty much every GPS module on the planet supports is human readable text. But then most modules also implement some vendor proprietary protocol that is almost always a binary one, only supporting the more advanced feature on that protocol. For a good reason too since if you start sending out live raw satellite data at high update rates (for example for RTK differential GPS) you tend to need something along the lines of 115200 baud UART to get that much data out even in the really compact binary format.

#20 Reply
Posted by MadScientist on 23 Jul, 2021 08:46
Err, LIN bus

#21 Reply
Posted by MadScientist on 23 Jul, 2021 08:48
Quote from: Berni on 23 Jul, 2021 05:45
Quote from: nctnico on 22 Jul, 2021 20:24
Quote from: Berni on 21 Jul, 2021 05:31

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.

Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()

Human readable clear text protocols definitely are nice for debug, but they are not always a good option when it comes to moving large amounts of data or squeezing bandwidth out of a slow communication link (Like radio communication). Something that just turns the various lights in a room on/off over UART, great use for ascii commands. Sending data to some big LED art installation that updates 1000 RGB LEDs at 100 fps not so much, that case would waste so much precious link bandwidth and cause a ton of extra computational load for all the parsing.

For example the standard NMEA protocol for GPS receivers that pretty much every GPS module on the planet supports is human readable text. But then most modules also implement some vendor proprietary protocol that is almost always a binary one, only supporting the more advanced feature on that protocol. For a good reason too since if you start sending out live raw satellite data at high update rates (for example for RTK differential GPS) you tend to need something along the lines of 115200 baud UART to get that much data out even in the really compact binary format.

Of course the marine industry has largely moved to nmea 2000 which is CAN based and not human readable.

Getting harder to get older NMEA 0183 Units.

#22 Reply
Posted by Tagli on 23 Jul, 2021 09:28
Quote from: peter-h on 22 Jul, 2021 13:53
I am informed that many products don't implement that timeout feature (to detect end of message, etc). So they receive data until the CRC works out right
If you can spare a timer in your microcontroller, timeouts can be implemented easily. I've implemented a simple Modbus RTU slave on PIC16F & dsPIC30F devices which lack this kind of timeout support and DMA.

Some STM32 devices support adjustable timeout (called Receiver Timeout), while others have only fixed 1-character timeout support (called Idle Detection). Modbus spec needs 3.5-character timeout. But in practice, 1-character timeout works okay. I don't think that any modern USART transmitter places gaps into their output stream.

The only think I don't like about Modbus is its 16-bit registers. It's possible to implement workarounds in the slave firmware. Also, it's hard to find free bus testing GUI tools which support different data types, like int32_t or float.

#23 Reply
Posted by nctnico on 23 Jul, 2021 10:05
Quote from: Berni on 23 Jul, 2021 05:45
Quote from: nctnico on 22 Jul, 2021 20:24
Quote from: Berni on 21 Jul, 2021 05:31

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.

Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()
No, it can break silently causing subtle errors you won't spot until you've got a huge load of units in the field. Not every CPU throws an exception for an unaligned access (much to my own surprise). Endianess is another pitfall which requires to know in what order the protocol is defined and then check what the CPU supports. If you shift byte offsets into a multi-byte element then the whole struct alignment and CPU order doesn't matter. The code will always work. Also keep in mind that when the compiler can't tell whether an element is aligned it may produce code which does byte-by-byte shifting so any speed improvement by mapping the struct is not quaranteed. Things may get iffy with variables that like to be on 64 bit boundaries (like doubles on ARM).

#24 Reply
Posted by SiliconWizard on 23 Jul, 2021 16:29
Quote from: nctnico on 23 Jul, 2021 10:05
Quote from: Berni on 23 Jul, 2021 05:45
Quote from: nctnico on 22 Jul, 2021 20:24
Quote from: Berni on 21 Jul, 2021 05:31

Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.

Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()
No, it can break silently causing subtle errors you won't spot until you've got a huge load of units in the field.

Uh. In theory, you're right. In practice, for this to happen, meaning having never spotted any issue with enddianness or alignment for the whole development cycle of the product, including testing, it would take a clueless team with near to zero testing. Seriously.

But anyway, it's obvious, or at least should be, that if you're using structs for directly mapping data blocks, you'll need to know your tools well and know what you're doing. If you're declaring your structs "packed" with whatever your compiler has for ensuring that, and you're carefully crafting the members so that proper alignment is guaranteed on a given target, then nothing subtle will happen. You just need to know how to implement this properly. And, of course, the thing is that it requires anyone working on it to know how to do this properly. That's where it could go wrong if code maintenance goes in the wrong hands. But even so...

Oh, and checking structs can be done statically, for instance using asserts and offsetof(), if you really want to check that the compiler does what you wanted it to do.

Also, you probably need to do more testing. Frankly, if you ever implement data structures with a mismatched enddianness or mismatched alignement and your tests can't spot that, then you (or your test team) should probably change jobs. Yes yes, sure, tests are not everything, but again if your tests haven't spotted a problem with this, then test coverage is clearly poor. When testing a communication protocol, it looks like a good idea to at least tests all parameters in a given protocol block, and a reasonable range of values for each. Just saying.