This is not "popular", but a description of what worked well for me –– but I use Linux, not Windows.
I have a very small Odroid HC1 single-board computer, with an Samsung Exynos 5422 octa-core big.LITTLE ARM processor (four Cortex-A15 and four Cortex-A7 cores), with a built-in SATA connector for storage. It is a variant of Odroid XU4. Aside from USB, it has a single UART with 1.8V logic levels, and I wanted to use the UART only for my own display, voltage and current monitoring, and some UI buttons.
This meant that I needed to split the single hardware serial port into multiple virtual serial ports, sharing the bandwidth. (I also used to do a lot of web development, so I wanted to reuse the same multiplex-demultiplex on TLS-over-TCP/IP, since it is the TLS handshake that is the costly operation, and multiplexing multiple logical streams on top of one physical stream is the common problem here. It is slightly more complex, because it allows an arbitrary number of substreams with almost arbitrary stream identifiers, so I'll just describe the simpler one with a limited number of substreams.)
I've also been bitten by bufferbloat, so anything like
netstrings or any of the existing multiplexing protocols like MIME
multipart/mixed has too much latency and buffering requirements, especially since the UART is limited to 2Mbaud or so.
What I ended up with was a simple multiplexing stream protocol, and a stateless robust escaping mechanism.
Each substream has their own unique reserved start byte. There is one additional reserved byte, that acts like an escape marker.
As an example, I'll use three substreams (thus 4 reserved bytes) below. One of these substreams is the initial one when connection is made.
(For a number of reasons, I explicitly chose that the escape marker followed by any substream start byte is ignored, and can thus be used as padding. If the escape marker is followed by another escape marker, the preceding one is ignored. Thus, a sequence of escape markers followed by a substream start byte encodes nothing; and a sequence of escape markers followed by a valid escape value encodes the same thing as one escape marker and that same escape value.)
There are three cases, in increasing order of complexity, to consider in the unescaped data stream. Let D denote any non-reserved byte value, and R one of the four reserved byte values:
- R: A single reserved byte value. 4 escape values needed. 50% overhead.
- R R: Two consecutive reserved byte values. 16 escape values needed. 0% overhead.
- R D R: Two reserved byte values, with a non-reserved byte value in between. 16 escape values needed, but the following byte is part of the escape sequence. 0% overhead.
(You can add a fourth case, R R R, to encode three reserved values in two bytes, by using 4×4×4=64 escape values, achieving compression. Because R R is encoded without overhead, D R R and R R D are encoded with that, with no overhead.)
A trivial greedy encoder/multiplexer that sees only three bytes forward, the maximum overhead is 33% for any 3-byte sequence (for D D R, D R D, and R D D, which all encode to four bytes). For two- and one-byte sequences, the maximum overhead is 50%. Those do not include the start byte.
Given uniform random data and the three cases above, 88% of 8-byte source sequences are encoded without overhead, with typical maximum overhead being 37.5%; and 37% of 64-byte source sequences are encoded without overhead, with typical maximum overhead being 12.5%.
As the average overhead is much lower than the maximum, I never bothered to try and reduce the absolute worst-case maximum.
The decoding and demultiplexing is very straightforward, as long as there is room for at least three output bytes, and the decoder/demultiplexer can see two future incoming bytes at the same time. You'll want the four values to be relatively rare, but trivially detected: say, 220-223,
(byte&0xFC)==0xDC, which happen to be suitably rare in both binary data and Unicode text; or for pure text, say 16-19,
(byte&0xFC)==0x10. Also note that it is then preferable that the escape values are also easily detected (contiguous in each of the three cases), but do not overlap with the four reserved values.