Author Topic: custom bytecode protocol, length first? Command first? Guidance? (Read 3847 times)

jnz · « **on:** April 30, 2018, 05:23:05 pm »

I have to send some data up to a PC from micro over a very sow radio phy. Multiple packets of 18 bytes or so each that need to be combined on the receiving side.

The content is an op code format of undefined needs. For the most part there is a are OPCODE, SUBFUNCTION, LENGTH, ARGS, and possibly a CRC

I know opcodes (op) and subfunctions (sf) can be one byte. Length (ln) needs to be two. And args could be 0-(65535-overhead)).

I need guidance on which makes more sense...

ln ln op sf args
op sf ln ln args

The critical info required to build a complete message can be received in the first radio packet.

Either way both the sender and receiver need to know that an op with be proceeded by a sf, and the next bytes with be length or args.

I’ve seen both styles in practice. Any pointers to established guides or rules would be helpful.

ogden · « **Reply #1 on:** April 30, 2018, 05:27:49 pm »

Quote from: jnz on April 30, 2018, 05:23:05 pm

I need guidance on which makes more sense...

Exactly. Let's start with: you provide information that makes sense, first

Quote

ln ln op sf args
op sf ln ln args

C · « **Reply #2 on:** April 30, 2018, 06:08:22 pm »

The important thing is having a standard.

Think of your radio, That is one transmit with many receive possible.

If the receiver knows the length, then it can skip over what it does not understand with out problems.
This allows you to add new message types that some receivers will not understand.

The format for packet type should be totally defined.
There should be no undefined bits.
But you can have send this bit as 0 and will be defined in future expansion. Here you have a defined future expansion is it is defined.

The receiver should have a simple decode for each packet.
Check these bits first then based on the bits check these bits.
You end up with a tree structure of decisions.
with action being what to do or future use.

Structure = speed and less errors.
An IF or bit always has two choices True & False.
Two bits = four states.
3 bits = 8 states.
8 bits = 256 states.

From your list
Length & CRC are the foundation.
Foundation should define what to do when a CRC error happens.

With CRC being based on content, it is often at the end so that compute while sending is possible.
With CRC at end, And length at start, length should give you all you need to know where CRC is.

Message type first would let you have a bit in message type that changes the max length.
A bit could select if 8-bits are used for length or 16-bits are used for length.

Simple is better.

C

jnz · « **Reply #3 on:** April 30, 2018, 06:10:47 pm »

Quote

Exactly. Let's start with: you provide information that makes sense, first

Yea, read the line above it where I show fairly clearly what op sf ln mean.

As a packet...

LengthB0 LengthB1 OPByte SFByte ARGB0, ARGB1, ARGB2, ARGB3 - Ex: 00 06 A0 22 01 02 03 04 where 0005 is len, A021 is command and subfunction, 01020304 are args

Is storing the length up front advantageous to presenting know bytes like command/opcode and subfunction bytes then the length for the unknown bytes that may be coming? Either way you need to know what the spec is on both tx and rx ends. So does it even matter?

rstofer · « **Reply #4 on:** April 30, 2018, 06:24:55 pm »

I would start with a variant of the Intel Hex Protocol:
https://en.wikipedia.org/wiki/Intel_HEX

This protocol is pretty busy so it may not be workable or it would need to be stripped down:
http://www.bdmicro.com/code/robin/

But, yes, there should be a record length at the beginning. It is another question if there should be a SOF character ahead of that - Start Of Frame - or just the ':' in the Intel protocol.

ETA: You might do some research on protocols with error recovery. This may be important on a limited bandwidth system.

C · « **Reply #5 on:** April 30, 2018, 06:48:08 pm »

Saying what rstofer stated a different way

Intel Hex Protocol limits what bytes are used for data.

So a byte not used for sending data can be used for framing.
If framing byte can appear in data then you have problem is it a framing byte or data byte when you lose sync.

If you have good packet framing then you can have many sub packets in the bigger packet.

Need to think that what you receive is BAD and then be able to prove that it is GOOD.
If you can not prove it is good it remains BAD.

XML uses < & > for framing and replaces these characters in data.

A study of
protocols with error recovery
is very important.
C

rstofer · « **Reply #6 on:** April 30, 2018, 06:54:54 pm »

Quote from: C on April 30, 2018, 06:48:08 pm

So a byte not used for sending data can be used for framing.
If framing byte can appear in data then you have problem is it a framing byte or data byte when you lose sync.

Which kind of pushes the protocol to an ASCII representation. If the data can be binary, how can a framing byte be represented?

This is the topic of books and a simple answer just doesn't work out well.

Scrts · « **Reply #7 on:** April 30, 2018, 07:03:16 pm »

I think the clear idea is that the length has to be prior to the data sent. I'd vote for opcode, then length, then data. It was so much easier to work with such format when used with FPGAs. You run the state machine to trigger then opcode and then you can allocate the counter length for the data based on the message length and then pump the data to the memory.

jnz · « **Reply #8 on:** April 30, 2018, 07:09:17 pm »

Quote from: Scrts on April 30, 2018, 07:03:16 pm

I'd vote for opcode, then length, then data. It was so much easier to work with such format when used with FPGAs. You run the state machine to trigger then opcode and then you can allocate the counter length for the data based on the message length and then pump the data to the memory.

Right, that's the argument for opcode-length-data, that you can switch off of it right away without having all the of the data processed, very obvious for an FPGA! The counter to that is that I need to assemble multiple packets before acting on them anyhow. So in that case, I need my command bytes encapsulated anyhow and maybe both of those should be in the same format, one inside the other, in which case, I think length first makes sense.

Good points here, I'll keep reading before replying to others.

Karel · « **Reply #9 on:** April 30, 2018, 07:19:32 pm »

HLDC as used in AX25 (amateur packetradio) is a very robust protocol.

https://en.wikipedia.org/wiki/High-Level_Data_Link_Control

C · « **Reply #10 on:** April 30, 2018, 08:03:46 pm »

HDLC, SDLC & CAN and others use bit stuffing to create unique bit stream markers.

Current high speed things like HDMI, DVI, SATA, SAS use the 8b/10b method to do DC balance, framing and some error checking.

These are all serial streams.
In case you do not know HDLC & SDLC were world spanning networks long before Ethernet which also has unique framing marks.

With out hardware based framing in place you get into a mess where a low level has to create two or more streams of data to get a higher level with data transparency.

Intel hex is just a way of converting 4 binary bits to a byte at the cost of 1/2 the possible transmission speed.
Each ASCII character in the data becomes two bytes.

So for a normal serial interface you have is it normal or is it intel hex mode and it often takes a human to decide.

You get similar problem with terminal control, color and formating. For a long time each terminal had their way of doing things. This lead to UNIX having TERMCAP so as to have a standard way for a Unix program to work with many different terminals.

You also had problems with printers

All because you could not say simple things

You will also find base64 where two bytes become three to send/receive.

You have Unicode that uses the 8-bit to code normal ascii or Unicode.

The base problem is that you have a USART with out a way of sending control/status in stream separate from the data and maintain transparent byte data.

The 9-bit mode some micro controllers can use is a work around.
Where the parity bit is not used as parity.
Have not found this to work with PC's

Cure can be as simple as not using a USB to serial converter.
A USB device can send/receive packets. You just need to extend actual packet interface to user of data.
The start and stop of a radio transmission can be used to mark packet start/stop.

To handle binary transfer between computer and terminal
ZMODEM, YMODEM, XMODEM and others were often used.

This allowed a computer acting like a terminal send/receive files.

For computer to computer using USART and wanting IP
you have PPP & Slip as layer 2 protocols.

And you have all these due to trying to using a UART for other then sending simple set of bits that has no idea of larger structure.

C

rstofer · « **Reply #11 on:** April 30, 2018, 08:55:01 pm »

Quote from: jnz on April 30, 2018, 07:09:17 pm

Quote from: Scrts on April 30, 2018, 07:03:16 pm
I'd vote for opcode, then length, then data. It was so much easier to work with such format when used with FPGAs. You run the state machine to trigger then opcode and then you can allocate the counter length for the data based on the message length and then pump the data to the memory.

Right, that's the argument for opcode-length-data, that you can switch off of it right away without having all the of the data processed, very obvious for an FPGA!

I don't know how you can transition on an op code when you haven't even validated the message and you can't do that until you have seen the checksum (CRC?).

Inside the FPGA, I would expect to see a FIFO and something assembling a command string from the contents. But I suspect the entire message has to be present and validated before anything can be done. As a result, the sequence of fields is relatively unimportant but if the length field is somehow involved with validation (and it probably will be), one of the things it does is point to where the checksum should be located. How far into the message should a pointer to the end be located?

jnz · « **Reply #12 on:** April 30, 2018, 09:17:34 pm »

Quote from: C on April 30, 2018, 08:03:46 pm

[whoa dude]

C

I tried to read your last post a few times. Seems all over the place.

C · « **Reply #13 on:** April 30, 2018, 09:53:33 pm »

Sorry jnz

look at simple facts.

A USART is built to send a group of bits with option to have time space between groups of bits.

You have option for number of bits.
You have option For parity bit
You have option for stop bit length.

That is it for basic on wire.
Then you have
Clock for bit rate

Option for extended break ( line goes active for many bit times )

Todays UARTs hardware often add
Line control signals that can do flow control.

What is not there is concept of groups of bits.

HDLC SDLC (USRT) which are synchronous add a few things
A sync bit stream to sync the clock of receiver.
Packet framing
Bit stuffing to keep packet framing unique while allowing transparent byte transfer.

With out this foundation, you have to use a level 2 protocol to build it if you need or want it.

All the different ways in my last is just the many ways to work around a foundation problem of no hardware framing.
UART framing is 5 to 8 bits of data.

C

ogden · « **Reply #14 on:** April 30, 2018, 10:31:19 pm »

Opcode shall come first ONLY when it defines length of the packed. Otherwise packet shall start with length field.

jnz · « **Reply #15 on:** April 30, 2018, 11:21:03 pm »

Quote from: ogden on April 30, 2018, 10:31:19 pm

Opcode shall come first ONLY when it defines length of the packed. Otherwise packet shall start with length field.

Unhelpful and probably incorrect.

If I have a length size of one byte, and know that I'll have 20 bytes of packet meta data / overhead, if I put the length first I will have a max data size of 235 bytes, if I put the overhead up front knowing it'll always be 20 bytes and well defined, then I can store that and an actual 255 bytes of real-data.

rstofer · « **Reply #16 on:** April 30, 2018, 11:33:30 pm »

Depends on where you start counting. Having the length come first and then skipping over some metadata before counting seems legitimate. Of course, the checksum has to include all bytes and should probably be a CRC.

sokoloff · « **Reply #17 on:** May 01, 2018, 12:16:51 am »

It may help to frame the problem this way (pun intended).

Imagine the receiver is turned on and begins listening after 3 packets of a 5 packet message have been sent. What allows the receiver to ignore packets 4 and 5 (being an incomplete transmission at the link layer) and then start on packet 6 as a new packet?

Imagine the receiver fails to receive packet 8 of a 10 packet transmission. How does the receiver know to discard packets 1-7, 9 and 10, and start listening again anew for a packet after that?

rstofer · « **Reply #18 on:** May 01, 2018, 12:26:26 am »

That goes to the entire TCP protocol. Guaranteed delivery.

TCP/IP has been implemented on serial lines and packet radio.

https://en.wikipedia.org/wiki/AMPRNet

jeremy · « **Reply #19 on:** May 01, 2018, 12:42:48 am »

Over slow unreliable links I usually use a simplified version of HDLC (without bit stuffing):

Start of frame/end of frame: 0x7E
Escape character: 0x7D

Frame looks like: 0x7E payload payload payload ... crc crc crc crc 0x7E

if you see a 0x7E or hit a maximum packet size or timeout, you attempt to process whatever is in your buffer by checking the crc, then clear the buffer. You can send 0x7E as often as you like between packets as a keepalive as it should just reset the state machine.
if you need to send a 0x7D or 0x7E as part of the actual packet data, you first send 0x7D and then send (data ^ (1<<5)). On the RX end if you see a 0x7D, then you just drop it and flip bit 5 on the next byte.

Works great so far, and can be done with no issues using static allocation on a tiny micro.

So just do this, but with a bit of a packet structure. Something like 0x7E nonce packet_type payload ... crc 0x7E. Each group of packets has the same nonce, and if it increments then you drop the previous packets from your buffer with the same nonce. Just need to be careful of nonce overflow.

andyturk · « **Reply #20 on:** May 01, 2018, 02:25:45 am »

A nice way to frame packets of variable length is by using Consistent Overhead Byte Stuffing

hamster_nz · « **Reply #21 on:** May 01, 2018, 02:47:28 am »

So this is radio? Will it have an AGC? Don't you want some sort of preamble to allow time for that to settle? Maybe a couple of sync bytes at the start to help you get framing?

SYNC SYNC SYNC LASTSYNC PACK-TYPE LEN[ DATA DATA.... CRC CRC

(SYNC could be 11001100, LASTSYNC 11001111)

Hey! Almost an Ethernet frame :-)

(And you want the length to determine where the CRC will be. If signal is lost sooner you know the length was invalid, and you will need to ignore the trailing data after where you expected the CRC to be.

gmb42 · « **Reply #22 on:** May 01, 2018, 11:30:28 am »

FWIW, a couple of industrial telemetry protocols I work with (DNP3 and IEC60870-5-101) use a data link layer comprising a header with a checksum and then a data body with further checksums. These protocols were originally used for radio and serial comms and both monitoring and control functionality, and now run over IP networks.

They both use a header of the form:

start byte1 : 0x05
start byte2 : 0x64
length byte
control byte, aka function code
destination address, 2 bytes
source address, 2 bytes
crc, 2 bytes

These have a destination address as multiple devices may be accessed over the same radio link. The length is the length of all remain bytes in the message, disregarding CRC bytes.

The body consists of blocks each a maximum of 16 bytes, with a 16 byte CRC. A data link layer message can thus have at most 250 bytes of data (255 - control and dest and src addresses)

Over the data link layer they superimpose a transport layer that allows reassembly of multiple data link layer messages into a larger message.

Scrts · « **Reply #23 on:** May 01, 2018, 01:15:06 pm »

Quote from: rstofer on April 30, 2018, 08:55:01 pm

Quote from: jnz on April 30, 2018, 07:09:17 pm
Quote from: Scrts on April 30, 2018, 07:03:16 pm
I'd vote for opcode, then length, then data. It was so much easier to work with such format when used with FPGAs. You run the state machine to trigger then opcode and then you can allocate the counter length for the data based on the message length and then pump the data to the memory.

Right, that's the argument for opcode-length-data, that you can switch off of it right away without having all the of the data processed, very obvious for an FPGA!

I don't know how you can transition on an op code when you haven't even validated the message and you can't do that until you have seen the checksum (CRC?).

Inside the FPGA, I would expect to see a FIFO and something assembling a command string from the contents. But I suspect the entire message has to be present and validated before anything can be done. As a result, the sequence of fields is relatively unimportant but if the length field is somehow involved with validation (and it probably will be), one of the things it does is point to where the checksum should be located. How far into the message should a pointer to the end be located?

You have a FIFO inside FPGA for data reception most of the time anyway, because there is a clock-crossing domain. You run data receiver at one speed and the rest of FPGA logic at another (e.g. controlled by the PLL of the memory controller). So you push the data into the FIFO and if CRC does not match in the end - you just clear the FIFO. If the data length is long and you're actually doing much of data push into external memory, then if the CRC does not match - you just do the address pointer jump to ignore the whole packet.

westfw · « **Reply #24 on:** May 02, 2018, 06:16:33 am »

I highly recommend START and END characters to help frame your data, and to help when pushing it through other links.

Hdlc has an asynchronous byte-based form, used by both PPP and iirc the Ax.25 protocol that someone has already mentioned. Worth looking at!

(Ah. The UART-oriented part of ax.25 is apparently [size=78%]https://en.m.wikipedia.org/wiki/KISS_(TNC)[/size] )

jnz · « **Reply #25 on:** May 03, 2018, 09:26:11 pm »

Mostly got side tracked here because I mentioned RADIO, but that's already encapsulated. What isn't is that I'll get multiple radio packets, so while each is confirmed itself, I didn't have a way of putting them all together.

Quote from: rstofer on April 30, 2018, 06:24:55 pm

I would start with a variant of the Intel Hex Protocol:
This protocol is pretty busy so it may not be workable or it would need to be stripped down:
But, yes, there should be a record length at the beginning. It is another question if there should be a SOF character ahead of that - Start Of Frame - or just the ':' in the Intel protocol.

rstoufer won with a link above. I modified that a serial packet protocol to be included in first radio packet, so now I have a CRC for multiple individually-confirmed packets, total length, source and dest which could be useful later.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: custom bytecode protocol, length first? Command first? Guidance? (Read 3847 times)

Share me