Hello!
I'm working on a project with many devices on a multi-drop UART or USRT line talking to a central controller.
Many times I have made simple byte message protocols consisting of something like STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE. I can do this again, but it can be tedious to write/make changes to as the project gets bigger and I add more and more commands with differing data types and combinations. I'm using Zephyr and I noticed it has support for protocol buffers (nanopb) and CBOR (tinycbor) libraries in their samples, so here I am: Is it worth exploring the other data formats? Will I save any time with the packet implementation or extending the protocol, or will I spend all my time getting a new protocol type to work? Should I stick with my simple messages?
The first thing I realized is that it will make it much harder to troubleshoot/debug on a scope/analyzer. But at the same time it would make handling the different types of data that need to be transmitted in one command much easier to deal with and more flexible.
I appreciate any thoughts on this,
Perry
Another option is Modbus which is similar in principle. It is variable length with different commands and a CRC. Instead of a start and stop it has a timeout. With a USB to uart converter, it’s possible to use a PC and python to decode/debug the messages.
I've done exactly what you've been doing (simple protocol with STARTBYTE | ADDR | CMD | DATALEN | DATA... | SUM | STOPBYTE) all my career. Often times it's just a one-to-one situation, but other times it's been multidrop, ie. one master and multiple slaves. I've never had the need for multi-master networks. I've used UART, RS-232, RS-485, etc. And I've used other protocols (Modbus, HART, etc.) where the network requirements mandated it. If there's no mandate for a particular protocol, in my experience just doing what you're doing is the simplest. On all the devices on a multi-drop network, I add a debug UART port, and just print out the commands as they're received/sent and send them to a PC terminal via a USB to UART adapter. If the debugging gets complicated, I can synchronize all the clocks and printout a timestamp to the debugging port to go along with the commands received/sent.
While it's no problem for me to use Modbus or various other protocols, I find it's a bit more work to implement and never as simple as a basic protocol such as you're using.
Of course, if you have errors in communication, then you need some error detection/correction method, which always adds a lot to the complexity of things. So if you have a communications library that already has those functions built-in, you might find that easy to use. For me, it's just a headache though in dealing with someone else's libraries, and especially if I need my system optimized for memory, timing, or various other parameters, I can easily do it if it's all my own code, but not so easily done if I'm using someone else's library. It's really frustrating to me to use someone else's library and waste days or even weeks trying to get it to do exactly what you want it to do, when you could have just written it yourself from scratch and saved a lot of time. But that's a personal decision for each engineer to make.
Thank you very much for your advice. I will stick with my simple solution for now. I think at the very least I should get everything working the simple way I know how and then re-evaluate if it makes sense (of course I will have other priorities then).
Yeah I was also thinking about how ACK/checksum would work. I guess in a simple system I would add a CRC and then if it fails throw it out and ask for the message again. Or even just going into an error state at that point would be fine for this application. But Zephyr should make dealing with CRCs pretty simple for just pass/failing messages.
If you are after reliability then use CAN.
If you are after reliability then use CAN.
Or Ethernet and use TCP/IP which has built-in error correcting. Of course then you need to add on a TCP/IP stack to your software, and have a MCU that has an ethernet port.
Yeah I was also thinking about how ACK/checksum would work. I guess in a simple system I would add a CRC and then if it fails throw it out and ask for the message again.
If you are having a scalability/maintainability problem with your start | addr | cmd | len | payload | checksum scheme, that doesn't mean the concept itself is bad, you are not just abstracting it properly and are doing unnecessary manual work constructing and parsing the messages.
If you are after reliability then use CAN.
Or Ethernet and use TCP/IP which has built-in error correcting. Of course then you need to add on a TCP/IP stack to your software, and have a MCU that has an ethernet port. One possible advantage of Ethernet is if you use POE and power all the slave devices via the power in the master. Of course that's yet even more hardware.
That being said the classical UART packet format of START ADDR LEN DATA..... CRC STOP works really well and is pretty flexible. You can designate the first data byte as a command and simply have devices ignore unknown commands. That way you can add any new packets of any new length and not break anything. If you need more than 255 length you can designate the 255 as meaning "This message is 255 long and continues in the next message" so you can string together multiple messages into one giant 100KB message if you so required. If you need more than 256 commands then you can reserve command 255 as meaning "Extended Command, see next byte for command". That way the old devices will still simply ignore the Extended Command but new devices that know it can read it. So if you start off your protocol in a smart way you can just keep extending it indefinitely.
The easy way to ensure a reliable delivery of the message is to make ACKs and NACks mandatory. That way after sending a packet the node expects a ACK or NACK from the recipient. So unless you get a ACK in a timely manner it just resend the same buffer onto the bus again. No extra memory needed and no keeping track of packet indexes. But it does slow down the bus waiting for the responses (since nobody else is allowed to talk in that time). Then again if you are after high speed you should probably be looking at something other than UART.
If you need reliability in a multidrop system, it's generally better to send a positive ACK on valid messages, a negative ACK on messages with good checksums but otherwise bad contents (invalid commands or whatever), and no response on messages with bad checksums so the sender can retransmit if no response is received.
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.
This is the most writable, most readable, most maintainable pattern by far, but the downside is, this is non-portable* and possibly requires a compiler flag like -fno-strict-aliasing and use of alignment attributes. The biggest practical risk, though, is getting hit by a hyperventilating language lawyer, so I can't recommend doing this, just describing what I do with zero problems (other than said language lawyers) during years of doing that. I'm also not discussing this further this time, this is just something OP might want to look into and make their own decisions so please allow them to do that.
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.
That's very simple but definitely makes perfect sense. I can't believe I didn't think of that! I'm a team of one (me) with nobody peering over my shoulder, and I am comfortable with that implementation, so I will probably go with that.
struct PACK
{
uint8_t status;
uint8_t reserved_for_future_use;
int16_t temperature;
uint32_t counter;
}
instead ofstruct PACK
{
uint8_t status;
uint32_t counter;
uint8_t reserved_for_future_use;
int16_t temperature;
}
Most protocols are way too complex for most tasks - because a whole committee got stuck into it. Just use a unique start byte, etc, just like at present. Keep it simple...
CAN and LIN are complex.
If you want error detection/correction then it gets much more complex because (for example) you can't be sure if the recipient failed to get the message. It could simply be that his ACK got lost on the way to you. This leads to a need for an incrementing serial number in each packet, OR a design where multiple identical packets don't cause a problem.
In constrained embedded environments, I prefer to overlay packed structs directly over the data (either by pointer cast or union type punning), allowing one single header file, which can be included from both sides, define the whole protocol, so that there is no need to write any data generation or parsing code anywhere; just set the variables, send the struct, receive struct, access variables.
This is the most writable, most readable, most maintainable pattern by far, but the downside is, this is non-portable* and possibly requires a compiler flag like -fno-strict-aliasing and use of alignment attributes. The biggest practical risk, though, is getting hit by a hyperventilating language lawyer, so I can't recommend doing this, just describing what I do with zero problems (other than said language lawyers) during years of doing that. I'm also not discussing this further this time, this is just something OP might want to look into and make their own decisions so please allow them to do that.
Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.
Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.
Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.
Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()
Human readable clear text protocols definitely are nice for debug, but they are not always a good option when it comes to moving large amounts of data or squeezing bandwidth out of a slow communication link (Like radio communication). Something that just turns the various lights in a room on/off over UART, great use for ascii commands. Sending data to some big LED art installation that updates 1000 RGB LEDs at 100 fps not so much, that case would waste so much precious link bandwidth and cause a ton of extra computational load for all the parsing.
For example the standard NMEA protocol for GPS receivers that pretty much every GPS module on the planet supports is human readable text. But then most modules also implement some vendor proprietary protocol that is almost always a binary one, only supporting the more advanced feature on that protocol. For a good reason too since if you start sending out live raw satellite data at high update rates (for example for RTK differential GPS) you tend to need something along the lines of 115200 baud UART to get that much data out even in the really compact binary format.
I am informed that many products don't implement that timeout feature (to detect end of message, etc). So they receive data until the CRC works out right
Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.
Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()
Yeah casting the whole byte array as a struct is a really good way of "parsing" the data.Until you get burned a couple of times. I have stopped casting structs onto received data a long time ago. You will run into endianness problems and unaligned accesses which may silently fail causing corruption which is hard to spot. The portability between systems & compilers just sucks. I'm using readable text based protocols almost exclusively because they are easy to debug, easy to extend and easy to port.
Well when things break due to endianess or unaligned access the whole thing pretty clearly blows up to let you know. As for padding issues making it overrun the memory, that can be caught as an assertion by running the struct trough sizeof()No, it can break silently causing subtle errors you won't spot until you've got a huge load of units in the field.