Warning: a lot of conditionals ahead.I’m not entirely sure, if I understand your idea correctly. I can now interpret it in two ways, each requiring a very different response. Either as an universal packet decoder, which reads octets from a packets and writes the corresponding data into objects, or as a corners-cutting packet decoder, trying to avoid writing decoding routines by mapping objects onto arrays of octets.
Case 1It is a good and sane idea. But you are approaching it from a direction and with tool, which is going to make it hard. The first, minor issue: the size of the problem makes it not suitable to write as an ad-hoc add-on to another project. Perfectionism hurts.

The bigger issue relates to the means of implementation.
There are two approaches: static and dynamical. The dynamical approach is building a description structures in memory and then reading them to process packets. You just have a list of field descriptors and each of them is applied in turn to the packet. The downside is describing behavior as data, which leads to limited bandwidth and latency, and increased memory demand.
The other option comes with no such limitations on average. The method is: code generation. While this is possible in C using preprocessor metaprogramming, it’s PITA in all aspects. Instead, use a second language to generate your C code. Scripts in that language are invoked during build to produce actual sources, which are then compiled the normal way. You may write this yourself, if you must deal with a particular protocol. Otherwise others already wrote that:
ProtoBuf,
FlatBuffers,
Thrift.
For the first option, building it dynamically, you may consider
BSON.
Case 2If you don’t care about reliability: skip the rest of this post.
If you care about your sanity and want to sleep well: skip the rest of this post.
If you are uncertain about your answers to either of the above: skip the rest of this post.
You have been warned.I may warn you too, that it’s likely that this is going to trigger two kinds of responses. First: “I use this for 20 years and it worked for me”. Which is invalid as a response to what is written below. Second: “this is compilers’ fault and compiler vendors are stupid”. No, this is how C is defined: many people are simply substituting knowledge with guesses. Which is, again, something to blame C environment for.
Case 2, part ASo, the news is: this is not going to
reliably work. Even if it may appear for some time it does. If you are lucky and a specific set of conditions coincide, actually a product incorporating this kind of code may never experience a failure. But it is doomed to fail one day in a more general case. The appearance of everything working misled even skilled programmers: Linus Torvalds got blindsided by it and — despite his tremendous experience — made Linux have serious security vulnerabilities and horribly hard to track down bugs. Led to a major production of his famous rants and until now he blames compiler vendors and ISO, which is a common reaction in people upon realizing that kind of a mistake.
The problem is that C uses an unexpected computation model. In particular its memory model is not what most programmers expect. Thre is zero correspondence between any notion of memory used in practice — be it physical RAM, hierarchical caches, NUMA, or virtual address space — and how C understands memory. The way compilers usually(!) translate C code to machine code makes it appear otherwise. Which is exactly what misleads many programmers. But no, the model used in C does not account for actual memory at all and any effects on actual memory are literally a coincidence. This is amended to some degree by compiler-specific and platform-specific extensions, but that’s not a part of the language itself.
With exceptions, each object type lives in its own memory domain. Borders of these domains can’t be crossed. So if you create an object of type
struct Foo and have a pointer to it (
struct Foo*), it lives in its own “memory of Foos”. If you then have
struct Bar*, it identifies an object of type
struct Bar living in “memory of Bars”. These two memories never mix together. If you are familiar with microcontroller programming, you may think of it as SRAM variables and EEPROM variables. Except that it applies to every single type (with exceptions). The exception is accessing objects through a
char pointer or “compatible types”.
The consequences are severe. You can’t get an arbitrary buffer, cast the pointer to another type, and simply use as if it was that type. Of course you can syntactically — it’s a valid syntax — but the semantics are different from what one expects. The code is interpreted as if these were two separate memories.
Consider, what it means for this code:
uint8_t* buf = …;
read_into_buffer(buf);
T* typed = (T*)buf;
use(typed->something);
The code doesn’t impose any relationship between writing to
*buf and accessing
typed->something. Neither is
*buf ever read. The two snippets below are perfectly fine equivalent interpretations:
use(typed->something);
read_into_buffer(buf);
or
use(0xDEADBEEF);
// note missing read (assuming it has no side effects)
And the code is going to be translated into machine code according to that interpretation, not what you expected it to be.
Why does it seem to work so often? Because compilers often take the easiest route. And the easiest route is providing translation statement-by-statement, not in larger chunks. Even if range of statements is processed as a whole, often only simple transforms are applied. But once in a while the generated code does not match what programmer imagines the program means. And that mismatch is also hard to detect, as the observed behavior closely follows expectations: except for some very specific case.
The keyword you want to use to read by yourself is: “strict aliasing”. I did not provide it in the beginning for a reason. People, facing this problem, usually fail to admit the fault is in them not knowing the language. Instead, they blame everything around. Starting with compiler vendors supposedly intentionally “abusing undefined behavior to make code faster”. While the thing is already well described by Loyd Blankenship in his famous “conscience of a hacker” letter:
“I found a computer. Wait a second, this is cool. It does what I want it to. If it makes a mistake, it's because I screwed it up. Not because it doesn't like me... Or feels threatened by me.. Or thinks I'm a smart ass.. Or doesn't like teaching and shouldn't be here.” My goal was to, instead, show you that from program's perspective. That it is literally, what the C code says to do. The compiler just translates, what the code author wrote.
Now to the practical part. This thing is abused so often by programmers, that actually there are ways to make it reliable. There are two.
First is using the “compatible types” exception. Specifically, among one of the possibilities are types grouped in an
union type:
(1)union Foo {
unsigned char buffer[128];
struct Packet interpreted;
};
You then read into
buffer and it is guaranteed, that
interpreted is going to contain what it expect it to contain (but see “part B” below). Unfortunately this is a bit clumsy and produces a lot of boilerplate. You also can’t map this to an arbitrary buffer (the write must be directly through that union), in particular one with many packets. You may read one and exactly one packet this way. It can’t handle variable-length packets easily. To have multiple possible types readable from such a buffer, you must group all of them in a single
union.
The other option boils down to using external guarantees from the compiler or platform, precisely controlling the build environment (compiler, version, flags), sometimes checking the machine code output. But this is not within the language itself and one has to be particularly cautious about matching all the factors in just the right way.
Case 2, part BC does not provide any means of controlling, how a structure is exactly laid out. The order of elements must be preserved, but that’s it. Sizes of elements are platform-dependent and so is padding between them. So, even ignoring the strict aliasing issue, you can’t tell if values in a received packet map as expected to the
structure fields.
Again, this can be countered using platform-specific solutions. Most notably some compilers offer the “packed” mode in which case no padding is used. Combined with always relying on
uintN_t types,
(2) and always using endianess-conversion function, this gives a somewhat robust method of doing such a mapping. But, again, this is outside of the language itself. And the details of “packed” interpretation may vary (assuming it is available). While everybody expects that no octets are inserted between fields, the behavior with bit-fields — both regarding padding and grouping — is not as certain.
The final problem is type alignment. On many platforms values of a given size must be aligned to particular addresses. This is a hardware limitation. If they are not, the result may range from impossibility to generate corresponding machine code, the processor crashing, acting in an unexpected ways, ending with decreased performance. The relevance of this here is that the field may have wrong alignment in a received packet.
This may be either corrected by the compiler by generating a more expensive load instruction (which is expected for any compiler which supports “packed” structures), or corrected by the processor on some platforms at the cost of performance (most notably x86 and x86_64 do this, though nowadays the slowdown is almost unmeasurable).
(1) Some other examples of compatible type exceptions include: accessing structures with matching fields in the begining, up to the first non-matching field, and casting a structure pointer to the type of its first field. Not useful in this case.
(2) intN_t/
uintN_t requirement comes from these types being the only ones having well-defined storage format. They are always 2’s complement, always exactly
N bits, they have no trap representations, and no padding bits. Only endianess remains unspecified.