Author Topic: How to prevent the compiler messing with a structures allignment (Read 11379 times)

nctnico · « **Reply #100 on:** September 01, 2023, 02:04:10 pm »

Quote from: tridac on August 31, 2023, 02:29:18 pm

Hi,

typedef a union containing the structure and a U8 byte array, sizeof (struct). Then, you can send the structure by accessing that array, in several parts as need be. Compilers may pad out smaller than word size structure elements and that will depend on the compiler. You can force alignment to word size by including dummy elements, useful if you are mapping a machine peripheral register set onto a structure..

Another way might be to just copy each structure element into an array, adding the size of element to the index for each, then send as required. The receiver gets the actual structure contents in size, with no compiler padding...

This still doesn't take care of alignment issues which is one of the potential problems when mapping structs onto byte buffers. The root of the problem is that the data types of C where never designed/defined to be portable across platforms in their binary/platform native form.

NorthGuy · « **Reply #101 on:** September 01, 2023, 04:13:07 pm »

Quote from: nctnico on September 01, 2023, 02:04:10 pm

This still doesn't take care of alignment issues which is one of the potential problems when mapping structs onto byte buffers. The root of the problem is that the data types of C where never designed to be portable across platforms.

The keyword here is "potential". You need a little bit of basic understanding to avoid the problems.

People lacking such understanding, when they encounter problems, see such problems as inexplicable and consequently are makng an elephant of a fly.

nctnico · « **Reply #102 on:** September 01, 2023, 04:17:16 pm »

Quote from: NorthGuy on September 01, 2023, 04:13:07 pm

Quote from: nctnico on September 01, 2023, 02:04:10 pm
This still doesn't take care of alignment issues which is one of the potential problems when mapping structs onto byte buffers. The root of the problem is that the data types of C where never designed to be portable across platforms.

The keyword here is "potential". You need a little bit of basic understanding to avoid the problems.

In theory you are right, but the real world is different. The latter is why all modern data exchange formats and protocols are text based. There is no ambiguity about how to interpret c.q. use the data. But this has been explained at length several times already.

NorthGuy · « **Reply #103 on:** September 01, 2023, 04:44:45 pm »

Quote from: nctnico on September 01, 2023, 04:17:16 pm

The latter is why all modern data exchange formats and protocols are text based.

How come? HTTP/1 was text based, but HTTP/2 is binary.

I haven't seen any text based CAN protocols though.

nctnico · « **Reply #104 on:** September 01, 2023, 05:09:20 pm »

Quote from: NorthGuy on September 01, 2023, 04:44:45 pm

Quote from: nctnico on September 01, 2023, 04:17:16 pm
The latter is why all modern data exchange formats and protocols are text based.

How come? HTTP/1 was text based, but HTTP/2 is binary.

I haven't seen any text based CAN protocols though.

I have

Siwastaja · « **Reply #105 on:** September 01, 2023, 05:27:55 pm »

Quote from: nctnico on September 01, 2023, 04:17:16 pm

In theory you are right, but the real world is different. The latter is why all modern data exchange formats and protocols are text based. There is no ambiguity about how to interpret c.q. use the data. But this has been explained at length several times already.

Your trolling is getting too obvious. This "clicked" to me maybe a year or so ago, that you are some kind of artistic installment / interesting troll, but many have not realized it yet, but if you go this far, I think too many will get your true intentions and hence lose interest.

The "packed struct causes unaligned access" lie is excellent, 90-95% just believe it, do not confirm it and you can do real harm with such lies, and since you repeat it ad nauseam, people like me lose interest in correcting this misinformation in every thread, multiple times per thread. But ALL modern data exchange based on text? Give me a break, now that's not going to fly. No one will take this seriously.

nctnico · « **Reply #106 on:** September 01, 2023, 05:39:51 pm »

Quote from: Siwastaja on September 01, 2023, 05:27:55 pm

Quote from: nctnico on September 01, 2023, 04:17:16 pm
In theory you are right, but the real world is different. The latter is why all modern data exchange formats and protocols are text based. There is no ambiguity about how to interpret c.q. use the data. But this has been explained at length several times already.

Your trolling is getting too obvious.

No, you are just getting tired of being proven wrong all the time and now need to resort to some kind of petty vendetta. Just play the ball. Many of your contributions are usefull but don't degrade yourself with your vendetta. This is not the place.

Based on what you wrote so far I probably have been doing engineering for a living for 1 or 2 decades longer than you have. Worked on a wider variety of systems with a wider variety of people. A lot of engineering problems do not stem from technical problems but human error. And then mix in being on a budget time & money wise. So part of the solutions I provide also attempt to mitigate human error and use solutions that have very little chance of leading to wasting time. You are simply not seeing that (yet). Likely because you rarely work within a team of people who have multiple disciplines and skill levels or are just stuck in a mode where all your co-workers should have a god-level skillset or can bugger off otherwise.

Siwastaja · « **Reply #107 on:** September 01, 2023, 06:06:20 pm »

Yeah, I admit, I'm wrong. Really all modern data exchange formats are text-based. Good examples are The text-based h.264 video codec, The text-based JPEG2000 which replaced the obsolete binary JPEG format, the text-based Ogg Vorbis audio storage format which is superior to stupid legacy binary MPEG layer 3 format, not to talk about ZIP, the text-based data compression format. As we all know, IPv6 headers also transitioned into text format from the old legacy binary IPv4, and ext4 filesystem uses XML and stores the file contents encoded in base64 within that XML. What modernism!

Sorry 'bout that. My shorter work experience prevented me from seeing how all modern data exchange formats and protocols are text based. They clearly are, because nctnico can't be wrong with such high work status.

SiliconWizard · « **Reply #108 on:** September 01, 2023, 08:53:45 pm »

Quote from: nctnico on September 01, 2023, 05:39:51 pm

Quote from: Siwastaja on September 01, 2023, 05:27:55 pm
Quote from: nctnico on September 01, 2023, 04:17:16 pm
In theory you are right, but the real world is different. The latter is why all modern data exchange formats and protocols are text based. There is no ambiguity about how to interpret c.q. use the data. But this has been explained at length several times already.

Your trolling is getting too obvious.
No, you are just getting tired of being proven wrong all the time (...). Many of your contributions are usefull (...)

So is he wrong all the time or are many of his contributions useful? Surely that wouldn't really add up. Unless being wrong is actually useful. Or something.
Maybe one can design a text-based protocol out of this.

tridac · « **Reply #109 on:** September 02, 2023, 12:34:30 am »

It only works in a portable way if all the structure elements are U8, but if the union is defined the same at both ends of the comms, then it should work, assuming the same compiler is used. Copying structure elements, breaking down U16 / U32 into bytes while copying to the array is also useful. Have used both methods here in the past, but tends to be application specfic as to which is the best way. Perhaps best to organise the data to avoid having to send a structure in the first place. Comms works in byte streams, not structures...

Siwastaja · « **Reply #110 on:** September 02, 2023, 06:23:46 am »

Quote from: SiliconWizard on September 01, 2023, 08:53:45 pm

Unless being wrong is actually useful.

Bingo! This is the key. Remember, wrong information, complex/difficult solutions, and especially adding confusion to things instead of trying to clarify them is an opportunity to some. When some lose, others win. Many make full careers out of this principle, and I totally loathe it.

PlainName · « **Reply #111 on:** September 02, 2023, 08:57:56 am »

Quote from: SiliconWizard on September 01, 2023, 08:53:45 pm

Quote from: nctnico on September 01, 2023, 05:39:51 pm
Quote from: Siwastaja on September 01, 2023, 05:27:55 pm
Quote from: nctnico on September 01, 2023, 04:17:16 pm
In theory you are right, but the real world is different. The latter is why all modern data exchange formats and protocols are text based. There is no ambiguity about how to interpret c.q. use the data. But this has been explained at length several times already.

Your trolling is getting too obvious.
No, you are just getting tired of being proven wrong all the time (...). Many of your contributions are usefull (...)

So is he wrong all the time or are many of his contributions useful? Surely that wouldn't really add up.

Perhaps we could calm down a little and revert to talking instead of being nitpickers. When we talk, as opposed to writing detailed specs, we often exaggerate or use hyperbole (or just talk 'in general') so there will be exceptions, different percentages and all that stuff. Easy to pick on if you want, but it's pointless and just stirs the flames (to mix metaphors). When I read "wrong all the time.... many useful" I don't read that literally but understand what he is getting at - that the chap is wrong a lot, but that he also often comes out with good stuff. And please note I am writing what I think nctino is saying, not giving my opinion of Siwastaja.

As for the protocol thing, treat it literally and you can easily prove it wrong, as you could pretty much any statement (and which members here do quite often!). The photo example is a good one. And yet... we do send photos as text in emails so that kind of thing isn't completely silly. And taken in the manner I said above, where we may use a bit of hyperbole, I think the perception is that we would use text-based protocols when possible. That doesn't mean it has to be every time, but if performance or whatever isn't an overriding factor then surely we would revert to something like JSON or XML or whatever the current flavour is.

Simon · « **Reply #112 on:** September 02, 2023, 01:39:17 pm »

We seem to be getting confused between communication methods and data storage. Nothing is stored as text or transmitted as text. If I want the number 256 in my program it is stored in binary as 2 bytes, when it is transmitted, it will be transmitted as such.

Nominal Animal · « **Reply #113 on:** September 02, 2023, 02:55:52 pm »

Quote from: Simon on September 02, 2023, 01:39:17 pm

If I want the number 256 in my program it is stored in binary as 2 bytes, when it is transmitted, it will be transmitted as such.

It can be useful to decouple the transmission format from the storage format, though.

Some say packed structs are the way, some use accessor functions (or pack-unpack functions); both work, and have their upsides and downsides.

Regardless of which approach you use, using a temporary structure to manage the multi-message transfer can be very useful. As an example, consider a case where you receive a chunk of information split in multiple messages whose order (or even source) may vary. You could have a bit mask in the temporary structure, one bit per message part, which is cleared when the temporary structure is copied to the variables or structure used by the main code. Packing this temporary structure is not a good idea, because you typically have only one, and having access to it as lightweight and "atomic" as possible will lead to more robust code.

If there is a key or index in each message indicating which update each message belongs to, you'll also want to reset the bit mask when the key or index is newer than the one being constructed in the temporary structure. If the key or index order is unpredictable, you will need at least two temporary structures, plus optionally a ring buffer of the few latest completed and/or discarded keys or indexes; with each structure having an extra (monotonic) counter. When a message with a key that does not match any of those in the temporary structures, you check the ring buffer if it is an old message. If it is not an old message, it is a new message, so you clear the older temporary structure, replacing the oldest key or index in the ring buffer with its key or index, and update the cleared temporary structure with the current message contents. If you have messages coming in for the same structure using different buses, this may be necessary to properly receive all updates correctly.

(It can be extremely informative to keep counters of the messages completely received, messages with keys/indices that had to be discarded, and duplicated messages ignored. Perhaps not during ordinary runtime, but definitely so when debugging.)

When all parts of an update have been received, and the main code uses a structure, you can reduce latencies (assuming you use the "disable interrupts, update values, re-enable interrupts" scheme) by composing a temporary (and properly aligned) copy of that main-code structure locally, on stack. That way, the update simplifies to disable interrupt, fast copy of a fixed number of native words, re-enable interrupts, yielding the minimum time interrupts are disabled.
Of course, when your main code uses fields of that structure, it too should disable interrupts, copy the entire structure to a local one (on stack), and re-enable interrupts, to ensure atomicity, and that the values do not unexpectedly change during the main code (as things like average/mean voltage/current can drift from each other if that happens; they will only stay in sync as long as all statistically collected data involves the exact same data points).

Of course, if you use an RTOS and not bare hardware, then you'd use the synchronization methods provided by that RTOS; but the principles still apply: temporary structures that can hold the necessary information, but not having the same format/alignment/packing that the transferred data or the structure used by the main code, can actually solve quite a few otherwise annoying and sticky problems. And minimizing exclusive lock duration is always a good thing, usually making a real difference with throughput and/or latencies.

Hopefully, you can see how the posts by myself and others (before this text protocol nonsense) sketch out related issues and solutions to the binary-structured-data-via-serial-protocols general problem, those more usual in parallel and distributed processing under fully-featured OSes, but also applicable to microcontrollers (both under RTOSes and on bare metal) in this kind of situation.

slugrustle · « **Reply #114 on:** September 02, 2023, 03:06:19 pm »

Quote from: Simon on September 02, 2023, 01:39:17 pm

We seem to be getting confused between communication methods and data storage. Nothing is stored as text or transmitted as text.

Emphasis mine.

Possible exceptions are MICR codes on checks (stored on the check as text, "transmitted" as text), fax ("stored" as text on either end), and any storage or transmission method using OCR. Then there are letters, books, and postcards, but I'm assuming the phrase communication methods here implies communication between computers.

I am only writing this to amuse myself and realize it is off topic.

NorthGuy · « **Reply #115 on:** September 02, 2023, 03:29:30 pm »

Quote from: PlainName on September 02, 2023, 08:57:56 am

And yet... we do send photos as text in emails so that kind of thing isn't completely silly.

The email message format in the form which is used today was generally finalized in 1982 - you can read RFC 822. At that time, the majority of communication lines were 7-bit (ASCII is also 7-bit), therefore to transmit data over these lines, everything had to be converted to the text. That's why. Further, many of the mail programs of these days couldn't accept lines longer than 72-characters, so the text had to be broken down to short lines. This all was done out of necessity. This made email format somewhat convoluted, but there was no other choice.

Later, in 1996, where official HTTP appeared, this was no longer the case, so they kept textual headers, but binary data (e.g. pictures) was transmitted as is. This was a leap forward.

This was about 30 years ago. Now we have much better rockets than Coyote.

DiTBho · « **Reply #116 on:** September 04, 2023, 04:06:40 am »

Quote from: Siwastaja on September 02, 2023, 06:23:46 am

Quote from: SiliconWizard on September 01, 2023, 08:53:45 pm
Unless being wrong is actually useful.

Bingo! This is the key. Remember, wrong information, complex/difficult solutions, and especially adding confusion to things instead of trying to clarify them is an opportunity to some. When some lose, others win. Many make full careers out of this principle, and I totally loathe it.

so as logical choice to avoid being misunderstood and wasting your energy and time one'd better avoid to expose too much of personal professional knowledge and opinions based on personal working experience, instead of wasting them on imbeciles who then could potentially show no gratitude at best or accuse you of trolling at worst.

DiTBho · « **Reply #117 on:** September 04, 2023, 04:34:08 am »

Quote from: SiliconWizard on September 01, 2023, 08:53:45 pm

Maybe one can design a text-based protocol out of this.

home automation sensors: text-based
tilt sensors: text-based
debugger interfaces (gdb-like, trace32-like, ... ): text-based

actually it doesn't surprise me at all to read a similar answer, although I would NOT use it with canbus, but simply because the bandwidth is limited to 1Mbps and text-form halves the payload, plus, canbus should be used for fast event-processing, and transmitting text, while reducing ambiguity, costs more CPU cycles to decode.

DiTBho · « **Reply #118 on:** September 04, 2023, 04:37:18 am »

personally, I would investing time in the design and implementation of what Nominal Animal suggested in his last post.

Siwastaja · « **Reply #119 on:** September 04, 2023, 05:26:41 am »

Quote from: DiTBho on September 04, 2023, 04:34:08 am

Quote from: SiliconWizard on September 01, 2023, 08:53:45 pm
Maybe one can design a text-based protocol out of this.

home automation sensors: text-based
tilt sensors: text-based

I actually happen to professionally work with both home automation sensors and inertial measurement units (basically tilt sensors yes) and most I have seen have been binary protocols. Reasons are simple; home automation is often wireless* (mesh even) and packets have to be very small, so they can't afford text. Inertial measurement units and tilt sensors on the other hand tend to produce data at quite high data rates, and still be used in small embedded systems. With CAN, they easily hog the whole 1Mbaud/s bandwidth, or with RS485/422, they hog a 1Mbit/s UART. No one wants to really parse text in microcontrollers (except nctnico) when that can be avoided.

*) and wired uses something like modbus rtu, still binary

With home automation though, there often is another software layer (server, "data hub", "bridge") which might do conversion from binary to text format, so then you have both in one system. This is handy for example if you want to serialize the data for a web page to be accessed in JavaScript, in which case JSON makes a lot of sense.

As usual, generalizations tend to go seriously wrong if they are made on feelings / cargo cult basis, instead of actual knowledge and experience.

JPortici · « **Reply #120 on:** September 04, 2023, 06:08:55 am »

Quote from: Amelia Smith on September 04, 2023, 02:45:03 am

I see, CAN bus cuases such types of issues. You have to use it perfectly. Have you tried fregmentation? You message has the size more than 8 bytes and the middle part is recieved by the wrong node. The messages is corrupted and I think fragmentation of message is the right solution. Tell me if I am right or not.

not.
You usually employ two methods to broadcast data that has size larger than the data field
1) you send multiple nodes, for example 0x4F0 0x4F1 0x4F2, at the same time, you do this to broadcast data with minimum latency
2) You use transport protocols, such as CAN-TP described in ISO 15765-2, you do this do send a variable length amount of data from one producer to one consumer (the consumer has to acknowledge and control the transfer progress)

i have also seen using a single node with one data byte representing the part of the message (simillar to CAN-TP, but not really), don't do this. It confuses the hell out of protocol analyzers unless you write your own that interprets the data.

DiTBho · « **Reply #121 on:** September 04, 2023, 06:16:46 am »

Quote from: Siwastaja on September 04, 2023, 05:26:41 am

As usual, generalizations tend to go seriously wrong if they are made on feelings / cargo cult basis, instead of actual knowledge and experience.

I think I said both bandwidth/rate and wired/wireless both matter.

edit:
instead of getting pissed, I tell a little anecdote:

- - -

Ironically, I'm working on converting some 3rd party tilt sensors sold as "text based" to "binary protocol" (canbus) sensors.

It's 20hz 3 axes tilt, and it's sold as-is. RS232, text-based, in a plastic shield with a 4pin cable.

With my colleagues, I'm developing a control motor for a cargo-bicycle based on a BOSH motor + SHIMANO 8SP internal gearbox coupled with a custom digital linear actuator that controls the gear.

Nothing special, but when you stop at the traffic light going uphill with full load (200Kg!!!) the engine should know that you are going uphill to correctly select the gear and the crawl start procedure, as well as going downhill, the engine should know how to use the brake motor correctly in order to save energy and charge the supercapacitor and therefore the battery.

This is because we would like to outperform the competition by offering something slightly better, especially in the mountains, where having a greater electric range means not having to use a lot of muscle traction or, worse, staying on foot because there is no way to recharge/replace the battery, and the load is too heavy for simple muscle traction.

Anyway, cargobike are not a "new things", we are hacking some existing products, as the primary focus was the algorithm, which is "the" added value; the sensors, the protocols, the costs didn't matter for the first proof of concept.

We worked with a prototype with "off the shelf" sensors, once the algorithm was designed and tested in a lab field. Once completed, analyzed, and verified, we can think of implementing our hardware with our own canbus inclination sensor, as required by our customers.

September-October. It's time to think about how to design a little MPU board with a CanBus adapter and how it uses accelerometers, gyroscopes and sensor fusion math to bring out the artificial horizon. Design, implement it, debug and test it. A sub-project into the main project.

Why binary, this time? Because it is the sensor that we produce for our own hardware, and therefore we can do it as we like. Still 20Hz, still very slow, but made binary it is better suited to CanBus because it reduces the payload and all messages + some meta can be packed and sent in one shot.

Nominal Animal · « **Reply #122 on:** September 04, 2023, 08:06:38 am »

Do we even agree to the distinction between "text" and "binary" formats? In my use, they correspond roughly to "human-readable/writable" and "non-human-readable", which is pretty vague and unhelpful. Here are the four format types that do matter to myself:

Fixed formats, where the relevance of each byte is dictated by its position
Chunked formats, where each chunk includes the exact length of said chunk
Stream formats, where control and formatting begins with a reserved value that is escaped or otherwise masked in the data stream
Hybrid formats

It does not matter to me much if values are restricted in range or in a specific base: parsing big-endian multi-byte binary integers is not that different to parsing big-endian decimal numbers.

For example, HTTP/1 is a hybrid format. Most of it is a stream format with CR LF (0x0D 0x0A) being a delimiter, but with chunked transfer encoding, it provides a length (decimal encoded, followed by a CR LF), then exactly that many raw bytes of payload (followed by an "extra" CR LF delimiter). It even supports gzip compression for the data payload, on top of chunked transfer encoding, in which case the length refers to the encoded length, not the decoded length.

GIF images are an example of a fixed format. PNG and JFIF JPEG images are an example of a chunked format. HTML and XML are stream formats.
Standard encoding of protobufs is a stream format, but netstrings are a chunked format.

Many people suggest chunked formats, but they tend to require significant buffering (consider e.g. netstrings: to provide a value, you need to know its exact length in bytes before you can start emitting it, thus necessitating an output buffer of the maximum value length), making them annoying/problematic on small microcontrollers.

I do have my own solution, a stream-based format that can be generated and parsed with minimal memory and processing, using a relatively simple state machine, supports lossless conversion from any XML-derived format with a lot less overhead than XML, can be used to replace protocols like HTTP, and can easily be extended to support multiple independent data streams on top of the same transport stream with minimal buffering (4-8 bytes). It can be expressed and defined in human-readable terms, but a binary representation (reserved byte values) tends to be more effective. With the multi-stream extension, the worst case overhead can be as high as 33%, but is neglible for typical data given a sensible choice of reserved byte values.
I only bring it up because it fits both "binary" and "text" format categories, depending on the reserved byte value choices.

(I really should try and publish it, because I think it would be useful to many if they just knew of the technique. Alas, I don't have the social werewithal to become a vocal proponent for it and push it to the relevant working groups and projects. Mentioning it here, like DiTBho does for their my-c, is near my limit.)

It is not suitable for CAN bus, however, which has its own standard frame types with base and extended frames having 0 to 8 data payload bytes (and the number of payload bytes already specified in the data frame). Each payload segment is also so short that only a fixed format makes sense here. For up to a reasonable number of different logical payloads, different CAN bus identifiers can be used for each part/sub-message (each bus supporting 2048 unique identifiers), so multi-part messages do not need to use the payload bytes to identify the payload itself.

Note that if the CAN bus message order cannot be absolutely controlled, you can use N times the number of CAN bus identifiers, in a cyclic round-robin scheme, for N different logical messages in time. (You'd also want to have the additional bit mask I described, to determine when all parts of a logical message have been received.) This is also useful when using an RTOS and a mailbox scheme, because then you have (theoretically) N-1 logical message intervals to process the message in the main loop, before it is overwritten by a receive interrupt. (Obviously you can also use a queue for the logical messages to avoid losing any messages, but the queue primitives needed may not be available in an interrupt context – see my post about the problem of using a mutex in an interrupt context above.)

PlainName · « **Reply #123 on:** September 04, 2023, 01:01:40 pm »

Did you allow for chunks not in a fixed order? I think that's one of the things that text lends itself to (apart from being readable by us): the data can be shuffled around and transferred in arbitrary order, which can make things both simpler to generate and trickier to parse. And, of course, arbitrary chunk lengths: text is defined by terminators and I think that's probably more robust than having the expected length noted before the actual data.

DiTBho · « **Reply #124 on:** September 04, 2023, 01:57:25 pm »

Like gps nmea


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: How to prevent the compiler messing with a structures allignment (Read 11379 times)

Share me