I never really thought about this until today, but when using bitfields there must be some kind of trade-off between memory size and execution speed, right?
Say you have an struct like the following:
typedef struct {
uint8_t foo;
uint16_t bar;
uint8_t baz : 3;
bool flag_a : 1;
bool flag_b : 1;
bool flag_c : 1;
bool flag_d : 1;
bool flag_e : 1;
} my_struct_t;
Now, by setting baz to 3 bits (assuming it only needs to represent values 0-7) and using single-bit fields for our 5 boolean flags, we've gone from a total size of 9 bytes to only 4 bytes. Nice, some memory saved. However, when you're writing your code to access these bitfields, what kind of machine code is the compiler actually generating?
I assume there must be some penalty to reads and writes to these fields, in the form of extra instructions necessary to 'resize' the data. Without knowing what actually goes on, I would assume the compiler has to generate masking and shifting instructions. For instance, in the case of assigning some other uint8_t variable to baz, I would assume you would need to mask off the relevant three bits, and depending on the alignment within the struct, shift those 3 bits into the correct position.
Am I right? Or, because the C standards don't actually define how struct bitfields are physically packed, is this all platform- and/or compiler-dependant?
Exactly what instructions the compiler has to generate depend on what instructions the CPU type has available :-)
At one extreme, things such as the 68020 could do that in a single instruction: BFINS Dn, <ea>, [offset:width]. The VAX might well have been able to as well, though I can't recall. It's not necessarily FAST though .. in the case of the 68020 BINFS from one register to another takes 10 clock cycles if the instruction is in the cache and it can't be pipelined with other instructions (the most common case).
Simple instructions such as add, sub, and, or, xor take 2 clock cycles. Shifts by static amounts take 4 clock cycles. So you can do several simple operations in the time taken by one BFINS.
At the other extreme, you might not have any support for bit fields at all, in which case you have to do everything with shifts or rotates and masking. If you at least have rotate (the RISC-V base instruction set doesn't) then you can do this:
ROR t1, rs1, #off
ASL t2, rs3, #32-size
LSR t1, t1, #size
OR rd, t1, t2
ROL rd, rd, #size+off
That will take five clock cycles on most RISC CPUs, or four if the first two instruction can be executed together.
In between, you can have instructions that don't actually do the bitfield insert directly, but just make it easier. For example on the Motorola M88000:
CLR t1, rs1, size<off> // clears the destination bitfield
MAK t2, rs2, size<off> // shifts the inserted value to the right place and clears all other bits
OR rd, t1, t2
That can be done in two clock cycles because the first two instructions can be executed at the same time.
If you need to read the stuff from RAM and write it back afterwards then you'd probably never notice the speed difference between any of these!