Author Topic: array of pointers to different types C (Read 2072 times)

Simon · « **on:** December 05, 2023, 10:17:21 am »

So in my efforts to receive a CAN bus message in a way that does not have me using "if" statements to check the message ID or filter element number in order to decide what to do with it I am trying to use an array of pointers. I find pointers a little confusing but only by using them do I get more versed in their use.

So far it is a right mess of errors, and being pointers even successful compilation is no guarantee.

The CAN message filter elements are all stored in consecutive order in RAM (Bosch MCAN controller), the buffer that the message goes into will reference the filter element number. This is very useful as it means that I can represent each ID with numbers from 0-127 rather than having to use the 11 or 29 bit ID. So this makes it easier to have an array of the data that is in a small array of up to 128 rather than 2048-536 million.

Now the fun bit of course is that the data in each array element will be different. So I thought it made sense to set up an array of pointers to each variable (structures) and use memcpy from the buffer of received data to the element of the array that is indexed by the filter element number (0-127).

Naturally not so easy.

So how do I do this? or should I have a union between an array of a generic structure type and the specific structure types that are used in each message ID also known by it's filter element number.

onsokumaru · « **Reply #1 on:** December 05, 2023, 01:09:56 pm »

I think using a union is probably a good idea. Feels like you're overcomplicating yourself, but I kinda get your point on this.
If you have defined structures for each type of message, then just declare an union type with all of them:
union AllMessageData {
struct MessageDataA msgA;
struct MessageDataB msgB;
// Add other message types here
};
Then you need a mechanism to do the adequate typecast when you want to access those structs in the array. Again, to avoid the switch case or if chain, I can't see how. Looks like you are looking for object oriented programming features in this case, like having a method with the same name in different classes, or even using c++ templates (if you though pointers were hard enough)

magic · « **Reply #2 on:** December 05, 2023, 01:27:11 pm »

Not entirely sure what's going on here.

Could you provide concrete examples of the types, variables or arrays that already exist or need to exist?
What specific types or variables would you like to add and for what exact purpose?

Storing multiple different "things" in some sort of universal "container" is possible and not too hard, what's hard is taking a random item out of there and knowing what it is and what to do with it.

C++ templates, as the name suggests, are a great way to produce a thousand copies of the same code specialized for different types, but they don't help one code to deal with a thousand types at the same time.

Simon · « **Reply #3 on:** December 05, 2023, 01:57:53 pm »

No I am not trying to put different data into the same container.

The data arrives in a RAM buffer, so this RAM buffer is of a fixed setup in terms of each elements contents. Each different message will contain unique variables, these can be represented as a structure for each, the structures may be different, they may be the same. The RAM buffer contains the index number of the message ID filter element, so from one small number that will at most be 127 I can identify each message.

What I am trying to do is create an array of pointers to each variable:

void *can_0_rx_variable_index[number of ID's] ;

So I need to put the address of the variables into each element, this works so far:

Code: [Select]

void can_0_std_filter_setup(uint16_t id , uint8_t filter_element_n , void * variable )
{
	can_0_rx_variable_index[filter_element_n] = variable ;
	
	// see MTTCAN 2.4.5
	can_0_rx_std_filter[filter_element_n].reg = 0x1 << 30 // SFT Standard filter type: 0 = range, 1 = dual filter, 2 = filter + mask, 3 = disabled
	| 0x1 << 27 // SFEC Standard Filter Element Configuration: Store in Rx FIFO 0 if filter matches
	| id << 16  // Primary filter ID
	| 0x0 << 15 // SSYNC: Standard Sync Message: generates the timestamp
	| id << 0 ; // Secondary Filter ID: for a single ID use the same as for ID 1 with SFT set to 1
}

This will be given the address of the variable: &variable when called to set up a filter and allocate the index that points to the variables that the messages in that slot correspond to. The "variable" will be the structures.

What I am struggling with is how to use memcpy() to take the data from the buffer and direct in into each address indexed structure.

screwbreaker · « **Reply #4 on:** December 05, 2023, 04:03:36 pm »

Maybe I'm trying to do something very similar.
Tell me if I understood correctly:

You want an array of structures.
The structure is passed to your function as a void* pointer.
Each structure is different and you want to know with wich type if structure you are dealing with.
To know thet you want to check a specific variable inside the structure itself: "can_0_rx_std_filter[filter_element_n].reg" this one fo be clear.

But you have problems with the access to the variable because you are trying to access it from a generic pointer. Not the correct structure type.

Is this is right. You are in my same situation.
What I'm trying to do at the moment is to use a generic structure type. Which have only one element, the one I use as ID.
And then based on this ID use this structure as the correct type.
something like:

generic_struct *temp = (*generic_struct)variable; //variable is a void*

If (temp->reg == 1) { (struct1*)variable->var = float }
If (temp->reg == 2) { (struct2*)variable->string = "string" }

My problem is. I don't know if the element of an array are consecutive in memory. So, I don't know if I can thrust the result of an access to the element ID.

Simon · « **Reply #5 on:** December 05, 2023, 04:13:09 pm »

I think slightly different but very similar. Basically I want the same code to be able to put data into any of the variables types that are all of the same size. Because they are potentially different types rather than put them in an array, the only way to automate the access is to use the array of pointers to these structures.

When the message is filtered and stored, the index number of the filter element is stored with the data. These numbers are very convenient as array indexes because rather than use the CAN ID where I may have 4 message ID out of a possible 2048 I instead will have the numbers 0-3, these will basically take the place of the CAN message ID in identifying the received data. This filter index can be used as the index in the array of pointers that will point to each variable structure for each unique message.

Basically ignore the message ID stuff and only worry about the fact that each message comes with a unique number to the data it carries that is part of a consecutive set of numbers that can be used as an array index.

To get around wanting to store the data in different variable types instead of putting the data into the array, the array simply points to the address of the correct variable for that message.

magic · « **Reply #6 on:** December 05, 2023, 04:42:52 pm »

Is this what you want?

Code: [Select]

void rx_message(int type, char *data, size_t length) {
    // check if type is valid
    // check if length is valid for given type; a second array of size_t or an array of {void*, size_t} pairs may be useful
    // then simply:
    memcpy(can_0_rx_variable_index[type], data, length);
}

This should work but the usual warning applies: memory layout of your structs must match perfectly with the format of those messages - same fields, same size, same order, same signedness, endianness and so on. It's usually achievable in practice for any given combination of compiler and target CPU, though pedants may want you to write serialization/deserialization routines by hand or use some library.

More complex processing than simply memcpy is doable too: have an array of pointers to functions, one function per type of message, specifically written to deal with given type. The functions may take arguments like destination variable address (as void*) and rx_message has means of providing this information when calling them.

IanB · « **Reply #7 on:** December 05, 2023, 04:55:04 pm »

A typical design pattern for this would be a dispatch table to call an appropriate handler function for each type of message.

You would have an array of function pointers indexed by message type, and you would call each function through the function pointer, and pass it the address of the incoming message buffer as a void pointer. In the function itself, you would cast the void pointer to a structure pointer corresponding to the data layout for the given message type. In C, it is common to use void pointers in this way.

I don't have time to give you a code example right now, but maybe someone else can help with that.

Simon · « **Reply #8 on:** December 05, 2023, 05:49:42 pm »

I would assume that the whole point of having the index number of the message filter in the receive buffer is to do the sort of thing I am trying to do. If I did use a high number of ID's I would be writing a serious amount of code if I used function pointers. My point is that whatever the structure type I want to point to, every one will be the same layout and size. All that will change will be what I call the data in 8 bytes of it and what type that data is. it's really not that difficult. I am just struggling with the pointer notation.

IanB · « **Reply #9 on:** December 05, 2023, 05:55:35 pm »

Quote from: Simon on December 05, 2023, 05:49:42 pm

I would assume that the whole point of having the index number of the message filter in the receive buffer is to do the sort of thing I am trying to do. If I did use a high number of ID's I would be writing a serious amount of code if I used function pointers. My point is that whatever the structure type I want to point to, every one will be the same layout and size. All that will change will be what I call the data in 8 bytes of it and what type that data is. it's really not that difficult. I am just struggling with the pointer notation.

Yes, but the idea of the dispatch table is you store the function pointers in an array, and you index the array by the index number in the message. Then you just call the function directly from the entry in the array. This avoids having a long switch statement (which is not necessarily wrong, but can look ugly if you have 100 or so choices to make). But a switch statement with 10 cases is not so bad.

I understand the challenge with using pointer notation, but I'm afraid I am not in a position to write a code example to illustrate it.

IanB · « **Reply #10 on:** December 05, 2023, 06:01:13 pm »

A short answer to your question is that, no, you cannot have an array containing a mixture of different types in C. Every element of an array has to be the same data type.

The advantage of using function pointers is that all the function pointers can be of the same pointer type, but each pointer can point to a different message handling function according to the contents of the message.

Simon · « **Reply #11 on:** December 05, 2023, 06:20:24 pm »

Quote from: IanB on December 05, 2023, 06:01:13 pm

The advantage of using function pointers is that all the function pointers can be of the same pointer type, but each pointer can point to a different message handling function according to the contents of the message.

And also of an array of pointers to memory locations, they are all the same type regardless of what resides there. I think I am probably overcomplicating some aspects of what I am trying to do and will revisit it. I am probably making an actual error somewhere in trying to do something that is not possible.

IanB · « **Reply #12 on:** December 05, 2023, 06:31:27 pm »

Quote from: Simon on December 05, 2023, 06:20:24 pm

And also of an array of pointers to memory locations, they are all the same type regardless of what resides there. I think I am probably overcomplicating some aspects of what I am trying to do and will revisit it. I am probably making an actual error somewhere in trying to do something that is not possible.

I'm not sure why an array of pointers to memory locations? Presumably you have an incoming message in a message buffer, that has, for example, a message identifier, a control field, and a data field. When dispatching the message to a message handler, you would pass a pointer to the buffer and the handler function would deal with the contents. It might get more complicated if you have to make a copy of the message before processing it, as then you would have to keep track of the copies and delete them when they have been processed.

Simon · « **Reply #13 on:** December 05, 2023, 08:59:00 pm »

Quote from: IanB on December 05, 2023, 06:31:27 pm

Quote from: Simon on December 05, 2023, 06:20:24 pm
And also of an array of pointers to memory locations, they are all the same type regardless of what resides there. I think I am probably overcomplicating some aspects of what I am trying to do and will revisit it. I am probably making an actual error somewhere in trying to do something that is not possible.

I'm not sure why an array of pointers to memory locations? Presumably you have an incoming message in a message buffer, that has, for example, a message identifier, a control field, and a data field. When dispatching the message to a message handler, you would pass a pointer to the buffer and the handler function would deal with the contents. It might get more complicated if you have to make a copy of the message before processing it, as then you would have to keep track of the copies and delete them when they have been processed.

The messages go into a buffer, the buffer is set up in RAM and I can choose how many buffer slots to have up to a number. So this means that I can handle the data from one slot while others are being filled. Separately there are message ID filter elements, there can be up to 128 filters, When the message is stored in the buffer along with the sender ID is the filter element number (0-127). So if I use each filter element to filter one message ID only I can now "replace" the message ID's that may be scattered throughout a range of 2048 to over half a million with numbers that will start at 0 and count up. This is ideal to use as the index number of an array. Except I would have to treat all of the data in an array of variables containing the message data as the same types, so 8x 8bit, or 2x 32 bit or some other combination. This leaves me limited to how I deal with the data and what I can do. But if instead of putting the data into an array of all the same type I use a structure to manage the data of each ID that works much better as I can have whatever variable mapping I need within the data. But now I have a problem, instead of referencing my data with an index number I have variable names, and lots of different types of data.

So instead of putting the data into the array I put the address of the individual variables into the array, so now I can call the data whatever I like and organize the 8 bytes in any way I like it doesn't matter, the interrupt handler simply dumps the data into what ever location it is told to and the main program can access that data by whatever name and type it likes.

magic · « **Reply #14 on:** December 05, 2023, 09:11:46 pm »

Quote from: Simon on December 05, 2023, 08:59:00 pm

interrupt handler simply dumps the data into what ever location it is told to and the main program can access that data by whatever name and type it likes.

Then you simply need the single memcpy line I posted, plus taking care to ensure that your data types are really compatible with the bits coming from the cable.

IanB · « **Reply #15 on:** December 05, 2023, 09:16:53 pm »

I'm not sure I am going to explain this without showing an example.

You have message buffer slots, but each message buffer is the same, fixed, length, I believe? And the length is known.

So you can have a void pointer pointing to the start of the buffer, and now you know where the buffer is in memory.

Secondly, the messages can be of different types, and the data in the buffer is arranged in different ways according to the message type? So the ideal way to handle this will be to map the buffer contents onto a struct that is unique for each message type.

What you now do is look at the message type and call a function to process that message, a different function for each message type. You pass into that function the void pointer to the message buffer containing the data to be processed.

Inside the function, the void pointer is converted to a struct pointer, where the structure has the right layout for that kind of message. Now the function can address the data fields appropriately. In C, you do this with the arrow syntax: p->field1, p->field2, etc.

Lastly, the way to call the functions. You could do this with a giant switch statement on the message type, but another way is to have an array of function pointers, where function[0] is the function for the first message type, function[1] is the function for the second message type, and so on.

Now you can just call the functions directly, by writing something like "result = function[index](buffer_ptr);"

Simon · « **Reply #16 on:** December 05, 2023, 09:28:35 pm »

Quote from: IanB on December 05, 2023, 09:16:53 pm

Secondly, the messages can be of different types, and the data in the buffer is arranged in different ways according to the message type? So the ideal way to handle this will be to map the buffer contents onto a struct that is unique for each message type.

No it's not. The data is in the buffer, it's 8 bytes of data, it has no types associated with any of the data. I need to take the data out of the buffer and use it to update various structure variables that the data is. I can identify what the data is either by the CAN ID which is silly or the filter element number that captured the message, this indirectly identifies the identifier if I set the filter elements up properly. The data will for simplicity always be 8 bytes. If I have pointers to functions this is a roundabout way of doing it. It is more direct to simply point to the structure variable that the data belongs to. The data is mem copied in, no one cares what type of data it is, just where it needs to go. The program then accesses the structures as it wishes and will find the latest data in the variable as it is received.

What got complicated and now I realize I was doing it wrong was that i was trying to copy bitfields from the message buffer into other bit fields of the data structures. I now realize that this will not work as I have to address the physical memory with memcopy, I can't use it to access bit 29 to 0 , it has to be the whole 32 bit chunk and I need to process it myself.

golden_labels · « **Reply #17 on:** December 06, 2023, 05:08:23 am »

Warning: a lot of conditionals ahead.

I’m not entirely sure, if I understand your idea correctly. I can now interpret it in two ways, each requiring a very different response. Either as an universal packet decoder, which reads octets from a packets and writes the corresponding data into objects, or as a corners-cutting packet decoder, trying to avoid writing decoding routines by mapping objects onto arrays of octets.

Case 1
It is a good and sane idea. But you are approaching it from a direction and with tool, which is going to make it hard. The first, minor issue: the size of the problem makes it not suitable to write as an ad-hoc add-on to another project. Perfectionism hurts.

The bigger issue relates to the means of implementation.

There are two approaches: static and dynamical. The dynamical approach is building a description structures in memory and then reading them to process packets. You just have a list of field descriptors and each of them is applied in turn to the packet. The downside is describing behavior as data, which leads to limited bandwidth and latency, and increased memory demand.

The other option comes with no such limitations on average. The method is: code generation. While this is possible in C using preprocessor metaprogramming, it’s PITA in all aspects. Instead, use a second language to generate your C code. Scripts in that language are invoked during build to produce actual sources, which are then compiled the normal way. You may write this yourself, if you must deal with a particular protocol. Otherwise others already wrote that: ProtoBuf, FlatBuffers, Thrift.

For the first option, building it dynamically, you may consider BSON.

Case 2
If you don’t care about reliability: skip the rest of this post.
If you care about your sanity and want to sleep well: skip the rest of this post.
If you are uncertain about your answers to either of the above: skip the rest of this post.
You have been warned.

I may warn you too, that it’s likely that this is going to trigger two kinds of responses. First: “I use this for 20 years and it worked for me”. Which is invalid as a response to what is written below. Second: “this is compilers’ fault and compiler vendors are stupid”. No, this is how C is defined: many people are simply substituting knowledge with guesses. Which is, again, something to blame C environment for.

Case 2, part A
So, the news is: this is not going to reliably work. Even if it may appear for some time it does. If you are lucky and a specific set of conditions coincide, actually a product incorporating this kind of code may never experience a failure. But it is doomed to fail one day in a more general case. The appearance of everything working misled even skilled programmers: Linus Torvalds got blindsided by it and — despite his tremendous experience — made Linux have serious security vulnerabilities and horribly hard to track down bugs. Led to a major production of his famous rants and until now he blames compiler vendors and ISO, which is a common reaction in people upon realizing that kind of a mistake.

The problem is that C uses an unexpected computation model. In particular its memory model is not what most programmers expect. Thre is zero correspondence between any notion of memory used in practice — be it physical RAM, hierarchical caches, NUMA, or virtual address space — and how C understands memory. The way compilers usually(!) translate C code to machine code makes it appear otherwise. Which is exactly what misleads many programmers. But no, the model used in C does not account for actual memory at all and any effects on actual memory are literally a coincidence. This is amended to some degree by compiler-specific and platform-specific extensions, but that’s not a part of the language itself.

With exceptions, each object type lives in its own memory domain. Borders of these domains can’t be crossed. So if you create an object of type struct Foo and have a pointer to it (struct Foo*), it lives in its own “memory of Foos”. If you then have struct Bar*, it identifies an object of type struct Bar living in “memory of Bars”. These two memories never mix together. If you are familiar with microcontroller programming, you may think of it as SRAM variables and EEPROM variables. Except that it applies to every single type (with exceptions). The exception is accessing objects through a char pointer or “compatible types”.

The consequences are severe. You can’t get an arbitrary buffer, cast the pointer to another type, and simply use as if it was that type. Of course you can syntactically — it’s a valid syntax — but the semantics are different from what one expects. The code is interpreted as if these were two separate memories.
Consider, what it means for this code:

Code: [Select]

uint8_t* buf = …;
read_into_buffer(buf);

T* typed = (T*)buf;
use(typed->something);

The code doesn’t impose any relationship between writing to *buf and accessing typed->something. Neither is *buf ever read. The two snippets below are perfectly fine equivalent interpretations:

Code: [Select]

use(typed->something);
read_into_buffer(buf);

or

Code: [Select]

use(0xDEADBEEF);
// note missing read (assuming it has no side effects)

And the code is going to be translated into machine code according to that interpretation, not what you expected it to be.

Why does it seem to work so often? Because compilers often take the easiest route. And the easiest route is providing translation statement-by-statement, not in larger chunks. Even if range of statements is processed as a whole, often only simple transforms are applied. But once in a while the generated code does not match what programmer imagines the program means. And that mismatch is also hard to detect, as the observed behavior closely follows expectations: except for some very specific case.

The keyword you want to use to read by yourself is: “strict aliasing”. I did not provide it in the beginning for a reason. People, facing this problem, usually fail to admit the fault is in them not knowing the language. Instead, they blame everything around. Starting with compiler vendors supposedly intentionally “abusing undefined behavior to make code faster”. While the thing is already well described by Loyd Blankenship in his famous “conscience of a hacker” letter: “I found a computer. Wait a second, this is cool. It does what I want it to. If it makes a mistake, it's because I screwed it up. Not because it doesn't like me... Or feels threatened by me.. Or thinks I'm a smart ass.. Or doesn't like teaching and shouldn't be here.” My goal was to, instead, show you that from program's perspective. That it is literally, what the C code says to do. The compiler just translates, what the code author wrote.

Now to the practical part. This thing is abused so often by programmers, that actually there are ways to make it reliable. There are two.

First is using the “compatible types” exception. Specifically, among one of the possibilities are types grouped in an union type:⁽¹⁾

Code: [Select]

union Foo {
    unsigned char buffer[128];
    struct Packet interpreted;
};

You then read into buffer and it is guaranteed, that interpreted is going to contain what it expect it to contain (but see “part B” below). Unfortunately this is a bit clumsy and produces a lot of boilerplate. You also can’t map this to an arbitrary buffer (the write must be directly through that union), in particular one with many packets. You may read one and exactly one packet this way. It can’t handle variable-length packets easily. To have multiple possible types readable from such a buffer, you must group all of them in a single union.

The other option boils down to using external guarantees from the compiler or platform, precisely controlling the build environment (compiler, version, flags), sometimes checking the machine code output. But this is not within the language itself and one has to be particularly cautious about matching all the factors in just the right way.

Case 2, part B
C does not provide any means of controlling, how a structure is exactly laid out. The order of elements must be preserved, but that’s it. Sizes of elements are platform-dependent and so is padding between them. So, even ignoring the strict aliasing issue, you can’t tell if values in a received packet map as expected to the
structure fields.

Again, this can be countered using platform-specific solutions. Most notably some compilers offer the “packed” mode in which case no padding is used. Combined with always relying on uintN_t types,⁽²⁾ and always using endianess-conversion function, this gives a somewhat robust method of doing such a mapping. But, again, this is outside of the language itself. And the details of “packed” interpretation may vary (assuming it is available). While everybody expects that no octets are inserted between fields, the behavior with bit-fields — both regarding padding and grouping — is not as certain.

The final problem is type alignment. On many platforms values of a given size must be aligned to particular addresses. This is a hardware limitation. If they are not, the result may range from impossibility to generate corresponding machine code, the processor crashing, acting in an unexpected ways, ending with decreased performance. The relevance of this here is that the field may have wrong alignment in a received packet.

This may be either corrected by the compiler by generating a more expensive load instruction (which is expected for any compiler which supports “packed” structures), or corrected by the processor on some platforms at the cost of performance (most notably x86 and x86_64 do this, though nowadays the slowdown is almost unmeasurable).

⁽¹⁾ Some other examples of compatible type exceptions include: accessing structures with matching fields in the begining, up to the first non-matching field, and casting a structure pointer to the type of its first field. Not useful in this case.
⁽²⁾ intN_t/uintN_t requirement comes from these types being the only ones having well-defined storage format. They are always 2’s complement, always exactly N bits, they have no trap representations, and no padding bits. Only endianess remains unspecified.

IanB · « **Reply #18 on:** December 06, 2023, 07:01:38 am »

If I declare:

Code: [Select]

struct Buffer {
    uint8_t data[64];
}

And then I declare:

Code: [Select]

struct Message1 {
    uint16_t ivar[32];
}

Then as long as I take care to consider padding and alignment issues, I should be free to cast a pointer to one into a pointer to the other, and access the same data in memory. This is the kind of thing the C language was designed to support as a systems programming language.

The second consideration is that the buffer contains data that was transmitted from a remote system, then I need to take care that the remote system and the local system have the same interpretation of the data (e.g. endianness).

But even so, if you have full control over what's going on, it should not be that hard.

IanB · « **Reply #19 on:** December 06, 2023, 07:06:24 am »

Quote from: Simon on December 05, 2023, 09:28:35 pm

No it's not. The data is in the buffer, it's 8 bytes of data, it has no types associated with any of the data. I need to take the data out of the buffer and use it to update various structure variables that the data is.

Here is the idea I was trying to convey by the use of dispatch table and the use of processing functions. The buffer itself contains raw data, but each handler can interpret the data differently according to the message type (of filter type):

Code: [Select]

#include <stdio.h>
#include <stdint.h>

/* Create and initialize a message buffer for testing */
struct MessageBuffer {
    uint16_t message_type;
    uint8_t message_data[16];
};

static struct MessageBuffer buffer = { 0, {11, 14, 23, 68, 21, 19, 88, 41}};

/* Create a dispatch table for message processing by type */

void ProcessMessage1(void *p);
void ProcessMessage2(void *p);
/* ProcessMessage3, 4, 5, etc... */

typedef void (*MessageProcessor)(void *);

static MessageProcessor ftable[] = {ProcessMessage1, ProcessMessage2}; /* etc. */

/* Main function */
int main(void)
{
    /* invoke appropriate message handling function by type */
    ftable[buffer.message_type](&buffer.message_data);
    return 0;
}

/* Message processors */
void ProcessMessage1(void *p)
{
    struct Message1
    {
        uint16_t ivar[4];
    };
    
    struct Message1 *message = p;

    printf("The first short integer is %d", message->ivar[0]);
}

void ProcessMessage2(void *p)
{
    struct Message2
    {
        uint32_t ivar[2];
    };
    
    struct Message2 *message = p;

    printf("The first long integer is %d", message->ivar[0]);
}

Jeroen3 · « **Reply #20 on:** December 06, 2023, 07:36:48 am »

You should make software CAN messageboxes. I've used this approach many times.

A large lookup table contains messageboxes with an allocation of the data.
A function callback is used to read/write (flags) the data from/to the messagebox.
This is very convenient, because a timer can mark a box ready for transmit, and the can thread will run the callback populating the data and sending it.
Also any received message can be processed in the ISR immediately, or later in the can thread.
The can thread could also track errors or timeout of receiving, trigging the callback with different flags.

The important bit is to take the data away from the can controller, because it usually has limited memory and you have more can messages.
Another benefit is that you decouple any HW can controller from how your software deals with it.

Adaptive and scalable.

Reduced example:

Code: (c) [Select]

typedef int (*CAN_messagebox_handler_fn)(uint32_t flags, uint8_t dlc, uint8_t *data);

typedef struct {
	/* Interval in systicks for cyclic transmit */
	uint32_t time;
        /* CAN Message */
	uint32_t id;
	uint32_t dlc;
	uint8_t data[8];
	CAN_messagebox_handler_fn callback;
	// and other stuff, eg keeping track of receive timeouts, errors etc.
} CAN_messagebox_t;

You then have a can application.c file somewhere with:

Code: [Select]

volatile CAN_messagebox_t can_msgbox[32] = {
		{ /* rx */
				.flags = CAN_MSGBOX_FLAG_RX,
				.time = 0,
				.id = 0x101,
				.dlc = 8,
				.callback = can_msg_example_rx,
		},
		{ /* tx */
				.flags = CAN_MSGBOX_FLAG_TX,
				.time = 100,
				.id = 0x202,
				.dlc = 8,
				.callback = can_msg_example_tx,
		},
};

int can_msg_example_rx(uint32_t flags, uint8_t dlc, uint8_t *data) {
        extern uint8_t someModulesData;
	if (flags & CAN_MSGBOX_FLAG_RX_TIMEDOUT) {
		// Receive timeout
		return 0;
	} else {
		if (dlc == 8) {
			someModulesData = data[0];
		}
	}
	return 0;
}

int can_msg_example_tx(uint32_t flags, uint8_t dlc, uint8_t *data) {
        extern uint8_t someModulesData;
	data[0] = someModulesData;
	data[1] = 0xFF;
	data[2] = 0xFF;
	data[3] = 0xFF;
	data[4] = 0xFF;
	data[5] = 0xFF;
	data[6] = 0xFF;
	data[7] = 0xFF;
	return 8;
}

You may want to "simplify" things with castings to structs or arrays... Don't. The endians will bite you, and you will merge CAN code with your normal code. Such as the "special values" for out of range or invalid in j1939 and similar. Those a can things, and the can code should know those.
The interface between CAN and your code with above example is the callback function. Easily unit tested or simulated as well!

I have borrowed this approach from a proprietary CAN protocol driver you can buy for too much money.

m k · « **Reply #21 on:** December 06, 2023, 07:54:57 am »

Quote from: Simon on December 05, 2023, 09:28:35 pm

What got complicated and now I realize I was doing it wrong was that i was trying to copy bitfields from the message buffer into other bit fields of the data structures. I now realize that this will not work as I have to address the physical memory with memcopy, I can't use it to access bit 29 to 0 , it has to be the whole 32 bit chunk and I need to process it myself.

How typical from me, but byte oriented mind seems to be a popular disability.
First time I also missed some dots, like CAN and 29.

Yes, if message has no filler bits you must do bit banging manually.
Sometimes you must even start from the end.
There have been 9 bit machines but that's a nother thing.

magic · « **Reply #22 on:** December 06, 2023, 08:47:20 am »

Quote from: golden_labels on December 06, 2023, 05:08:23 am

C uses an unexpected computation model. In particular its memory model is not what most programmers expect. Thre is zero correspondence between any notion of memory used in practice — be it physical RAM, hierarchical caches, NUMA, or virtual address space — and how C understands memory. The way compilers usually(!) translate C code to machine code makes it appear otherwise. Which is exactly what misleads many programmers. But no, the model used in C does not account for actual memory at all and any effects on actual memory are literally a coincidence. This is amended to some degree by compiler-specific and platform-specific extensions, but that’s not a part of the language itself.

With exceptions, each object type lives in its own memory domain. Borders of these domains can’t be crossed. So if you create an object of type struct Foo and have a pointer to it (struct Foo*), it lives in its own “memory of Foos”. If you then have struct Bar*, it identifies an object of type struct Bar living in “memory of Bars”. These two memories never mix together. If you are familiar with microcontroller programming, you may think of it as SRAM variables and EEPROM variables. Except that it applies to every single type (with exceptions). The exception is accessing objects through a char pointer or “compatible types”.

I extracted the relevant part and highlighted the sentence which really matters

The rest is not entirely true. All memory is a "memory of chars" and every other object is stored as a sequence of one or more chars. You are permitted to copy objects by casting them to char* and copying byte by byte into another char* derived from a valid object of the same type.

One thing the standard doesn't say is that this will work across network and different CPU architectures, but packed structs and endian conversion solve this problem reliably in sane real world cases (it also helps that everything you are likely to encounter uses 8/16/32/64b integers in two's complement). Alignment isn't an issue either, since the compiler can synthesize code for accessing unaligned data members once it knows that they are unaligned due to packing, which it does.

One thing you can't do is take any random and randomly aligned char pointer and cast it to a pointer of another type, the pointer may end up misaligned for the target type and dereferencing it is "undefined behavior". Writing your chars to an existing, properly allocated variable of correct type avoids this pitfall.

Nominal Animal · « **Reply #23 on:** December 06, 2023, 10:18:54 am »

There are exactly two ways of reinterpreting data as another type in C:

Type punning via an union:
uint32_t float_to_u32(float f)
{
const union {
float flt;
uint32_t u32;
} temp = { .flt = f };
return temp.u32;
}
Copying the data via a char array (or equivalent memcpy()/memmove() call):
uint32_t float_to_u32(float f)
{
uint32_t result;
memcpy(&result, &f, sizeof result);
return result;
}

Of these, type punning via an union is the more efficient one, and will always do the right thing.

Many people do type punning via a pointer –– constructing a pointer of new type to the existing storage –– but that has not been a valid method since C99, and it is problematic for compilers to generate correct code for. So, I suggest you avoid relying on it.

Quote from: IanB on December 06, 2023, 07:01:38 am

If I declare:

Code: [Select]
struct Buffer { uint8_t data[64]; }
And then I declare:

Code: [Select]
struct Message1 { uint16_t ivar[32]; }
Then as long as I take care to consider padding and alignment issues, I should be free to cast a pointer to one into a pointer to the other, and access the same data in memory.

That works due to wrong reasons: because a structure may not contain padding before the first member, and because the storage representations of fixed-size uintN_t types are fixed by the standard unlike most other types. You cannot expect that to work in a general case. There is also a contentious language standard point about this, stemming from historical definitions of BSD and POSIX struct sockaddr, but that does not matter when we are talking about writing new code.

In the general case, casting the pointer to another type is only valid if you access common initial members, or if the original or another type is a char array or a pointer to char. Yes, there is a lot of code that expects such pointer casts to work, but that code is wrought with compiler bugs and issues and arguments between C users and compiler developers. I'd avoid going there.

The correct pattern is to define the structures inside an union, and access each structure through the union. This way, there is no overhead, and the compiler understands the pattern and generates correct code for it. Even the ISO C standard contains footnotes about exactly this pattern. The most common of such types I use are

Code: [Select]

typedef struct {
    union {
        double          f64;
        uint64_t        u64;
        int64_t         i64;
        float           f32[2];
        uint32_t        u32[2];
        int32_t         i32[2];
        uint16_t        u16[4];
        int16_t         i16[4];
        uint8_t         u8[8];
        int8_t          i8[8];
        unsigned char   uc[8];
        signed char     sc[8];
        char            c[8];
    };
} word64;

typedef struct {
    union {
        float           f32;
        uint32_t        u32;
        int32_t         i32;
        uint16_t        u16[2];
        int16_t         i16[2];
        uint8_t         u8[4];
        int8_t          i8[4];
        unsigned char   uc[4];
        signed char     sc[4];
        char            c[4];
    };
} word32;

(To be precise, the above word32 and word64 are structure types, containing an anonymous union.)
For example, to convert a 32-bit unsigned integer u into the storage representation of a float, use (((word32){ .u32 = u }).f32).

There are a few tricks, involving the common initial members. For example, if you have

Code: [Select]

struct msgtype1 { uint16_t type; uint16_t slot; /* ... */ };
struct msgtype2 { uint16_t type; uint16_t slot; /* ... */ };
struct msgtype3 { uint16_t type; uint16_t slot; /* ... */ };
typedef struct {
    union {
        struct msgtype1 msg1;
        struct msgtype2 msg2;
        struct msgtype3 msg3;
    };
} message;

it is valid to access type and slot via any union member (msg1.type, msg3.slot, etc.).

It is absolutely acceptable to create an array of unions. For example,

Code: [Select]

struct message {
    union {
        struct msgtype1  msg1;
        struct msgtype2  msg2;
        struct msgtype3  msg3;
        struct rxmsg  rx;
        char  data[16];
    };
};

struct message  msg[64];

At runtime, you will want to verify that the anonymous union member sizes are exactly the same: sizeof msg[0] == sizeof msg[0].msg1 && sizeof msg[0] == sizeof msg[0].msg2 && sizeof msg[0] == sizeof msg[0].msg3 && sizeof msg[0] == sizeof msg[0].rx && sizeof msg[0] == sizeof msg0.data.
Note that you will want to examine the size of an actual variable and its members, and not just the types.

This will work correctly even if the message types have bit fields.

If your CAN message queue provides a new message struct rxmsg newmsg, you can copy it to slot 5 using simply msg[5].rx = newmsg;. No memcpy() needed; the compiler will handle it itself. (And if the structures are large enough or non-aligned, GCC for example will do the copy via a memcpy() call on many architectures. You can assume this assignment to be the most efficient copy possible, because it typically is. On x86-64, for example, XMM/YMM vector registers are used for the copying, whenever the alignment is sufficient, and the vector copy known to be fastest on said architecture and optimization settings, in both GCC and LLVM/Clang.)

There is absolutely no way to determine which of the union members is the "correct" one. If you need to know that, you need to store it as a common initial member.

The reason for the pattern of anonymous union within structure instead of plain union, is historical-practical: it is absolutely ubiquitous in POSIX and BSD library code, whereas plain union types are rarer. Since it costs nothing (there is no runtime overhead), using the anonymous union within a structure makes sure we rely on a common code generation case, instead of a rare one, and hopefully avoids any compiler bugs related to aliasing and undefined behaviour. Essentially, this way we know we can rely on C99 and later structure type rules, and not be bitten by an obscure bug related to union types in some compiler version.

magic · « **Reply #24 on:** December 06, 2023, 11:17:25 am »

Quote from: Nominal Animal on December 06, 2023, 10:18:54 am

Quote from: IanB on December 06, 2023, 07:01:38 am
If I declare:

Code: [Select]
struct Buffer { uint8_t data[64]; }
And then I declare:

Code: [Select]
struct Message1 { uint16_t ivar[32]; }
Then as long as I take care to consider padding and alignment issues, I should be free to cast a pointer to one into a pointer to the other, and access the same data in memory.
That works due to wrong reasons: because a structure may not contain padding before the first member, and because the storage representations of fixed-size uintN_t types are fixed by the standard unlike most other types.

These are good reasons, not wrong at all

But it almost certainly won't work anyway due to aliasing rules. You can't simultaneously use pointers of different types pointing at the same storage location, unless one of the pointers is char*. The compiler is allowed to assume (and sooner or later will assume) that pointers to different types point at different things, and generate invalid code:

Code: [Select]

struct A *a = whatever;
struct B *b = (struct B*) a;
int x = b->x;
a->x++;
x = b->x;   // x is not reloaded here because the compiler "knows" that b hasn't been modified

It's an infamous source of subtle and difficult bugs. You may get what you want for a while if the compiler doesn't perform such optimizations or doesn't notice the opportunity for optimization (for example, due to the code being spread across multiple functions or files) but it's a ticking time bomb waiting for a sufficiently advanced complier to set it off. And compilers optimizations are only getting more complex and aggressive with time, not less.

GCC has the -fno-strict-aliasing mode which I believe is supposed to make such code work correctly (i.e. as expected by a sane person, not a language lawyer). I have never tried it, though, preferring to rewrite unsafe aliasing when I see it.

IanB · « **Reply #25 on:** December 06, 2023, 04:14:37 pm »

Quote from: magic on December 06, 2023, 11:17:25 am

It's an infamous source of subtle and difficult bugs. You may get what you want for a while if the compiler doesn't perform such optimizations or doesn't notice the opportunity for optimization (for example, due to the code being spread across multiple functions or files) but it's a ticking time bomb waiting for a sufficiently advanced complier to set it off. And compilers optimizations are only getting more complex and aggressive with time, not less.

GCC has the -fno-strict-aliasing mode which I believe is supposed to make such code work correctly (i.e. as expected by a sane person, not a language lawyer). I have never tried it, though, preferring to rewrite unsafe aliasing when I see it.

This is a really unfortunate state of affairs.

I have been out of touch with C for over 20 years, and it appears things have been changing over the decades.

K&R conceived of C as a low-level programming language, to be used as an alternative to assembly language. As such, it should/would be hardware-oriented and close to the metal, as you would need when working with microcontrollers. If a high-level general purpose programming language is needed, then C++ is available.

If, as I understand, the standards committee has been abstracting C in to a general purpose programming language that is hardware agnostic, then that leaves an unfortunate gap. What language is one supposed to use to write operating systems, device drivers and microcontroller code?

From what I read, it seems the Linux kernel has been affected by this too.

magic · « **Reply #26 on:** December 06, 2023, 05:59:53 pm »

C++ is a similar mess.

Pointers make C an unpleasant language to compile. Pointer variables can alias, i.e. point to the same object. Pointer arithmetic means that the target of a pointer variable can change in ways difficult to predict. The heavy use of pointers and pointer arithmetic in typical C code means that everything the compiler "knows" about the value behind a pointer can change at any time when other pointers are written through. It's hard to prove with certainty that two pointers will never alias at run time, particularly if they come "from outside" - as function parameters or global variables that could have been initialized by anyone to any value.

C is not only a systems language, it's also a speed freak language. People want C code to run fast. People want operating systems to be fast too. So compiler vendors convinced the standards committee to a compromise: pointers to the same type can alias any time they want and the compiler must prove that they don't before assuming so, pointers to different types must not. This relieves the compiler from aliasing worries which in 99.9% of cases would be completely unfounded, at the cost of breaking the remaining 0.1% of code.

You can still type pun by unions or by copying individual chars between variables of different type. I believe it's illegal to cast a char array to any other type and access it as such, although I have surely done it many times before I knew better and it worked - typically problems only occur when concurrent accesses are made through the two incompatible pointers.

IanB · « **Reply #27 on:** December 06, 2023, 07:15:43 pm »

Quote from: magic on December 06, 2023, 05:59:53 pm

Pointers make C an unpleasant language to compile.

Unpleasant to compile, or unpleasant to optimize? Maybe if predictable behavior is needed we should turn off optimizations?

Quote

C is not only a systems language, it's also a speed freak language.

I'm certainly familiar with that perspective. But might it also be the case that people rely too much on the optimizer instead of their own coding skills?

Imagine if you were writing in assembly and the assembler decided to re-write your code for you?

Nominal Animal · « **Reply #28 on:** December 07, 2023, 07:36:06 am »

I don't like quoting the C standards in general, but here, consider these said while nodding yes to the above posts; I'll explain why, further below, after the horizontal line.

Quote from: magic on December 06, 2023, 05:59:53 pm

I believe it's illegal to cast a char array to any other type and access it as such, although I have surely done it many times before I knew better and it worked - typically problems only occur when concurrent accesses are made through the two incompatible pointers.

Well, unsigned char is the special type: it allows access to the storage representation of other types:

Quote from: ISO C99 6.2.6.1p4

Values stored in non-bit-field objects of any other object type consist of n×CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.

C99 and later has three pointer qualifiers: const, volatile, and restrict.
const is a promise that the code itself will not try to modify the value. volatile tells the compiler that the value may be changed by external code or causes at any point during execution. restrict is an aliasing-related promise: that the pointed to object will only be referenced directly or indirectly via this particular pointer only; that any access to the pointed to object will depend on the value of this pointer. (An entire chapter, 6.7.3.1 in C99, is dedicated for the formal definition of restrict, though.)

Type punning via an union was described in ISO C99 as a footnote (6.5.2.3 Structure and union members, footnote 82):

Quote from: ISO C 6.5.2.3, 82.

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The common initial sequence was described in ISO C99 6.5.2.3p5:

Quote from: ISO C 6.5.2.3p5

if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

In a very real sense, the "default C" language has become more abstracted and "further away from the hardware" in a sense, in succeeding ISO C standards. However, in my opinion, ISO C99 also added the tools to drill straight through those abstractions: type punning, exact-width two's complement standard types intN_t and uintN_t, minimum/fast two's complement types int_fastN_t and uint_fastN_t, size_t, intmax_t and uintmax_t, intptr_t and uintptr_t, and so on.

The main point about ISO C99 was that it did not state anything new, only documented existing agreements and behaviour of C compilers that their users had found useful/necessary. You could say that the increased abstraction was necessary to allow better optimization schemes to evolve, while the added features were necessary for the low-level programmers (mostly kernel and library programmers) to keep performance and portability across a large diverse set of architectures. (At this point, computer architectures were even more diverse than now.)

Then came the odd misstep that is ISO C11. It was mostly a push by Microsoft to allow their C++ compiler to compile ISO C also (they still refuse to support ISO C99, though); and the infamous Annex K that is likely to be removed from the next ISO C standard, defining their "safe I/O functions". It's main impact was aligning the atomic memory model semantics with C++, plus the _Generic macro facility allowing type-dependent polymorphic functions via a preprocessor macro –– that e.g. func(X) resolves to say func_int(X) if X is of type int, func_d(X) if X is of type double, and so on.
(Some disagree vehemently with this characterization, but I say the existence of Annex K is proof enough. There is also the entire OOXML debacle in the same timeframe (first decade of this century), which in my opinion illustrates the approach MS then had with "standardization": weapon, rather than collaboration.)

ISO C17 was basically a stationary point. Not only was this around the time Microsoft changed its approach to open source and to standardization in a lesser degree, but C17 added very little anything new.

If we look at what is to become ISO/IEC 9899:2024 (Wikipedia), it looks like the standard development is switching back to the practice-driven way C99 was developed, by incorporating features and facilities already provided by various C compilers that have been found useful (and sometimes necessary). Sure, the new bit operations in <stdbit.h> have new names, like stdc_count_ones() instead of popcount(), but we can deal with those as things settle. (I also haven't checked how the new things stand with respect to freestanding vs. hosted implementations –– i.e., their impact on embedded development ––, but I'm expecting it is sane/positive.)

In my opinion, all of the above means that those who want to write efficient low-level code in C, need to keep track with the ISO C standards, but even moreso with the features and facilities their toolchains provide; especially with binutils, gcc, and clang. The original language as described by K&R has drifted a lot since then, away from its low-level simple origins; but the same tasks and performance (but with even better portability!) can still be achieved by using the new language features.

In particular, in embedded development, I very much rely on ELF object file format features exposed by the compiler and linker: the __attribute__((section (foo)) shenanigans. Even in systems programming, <dlfcn.h> is indispensable for me for run-time extensions (plug-ins and such).

Is it worth it, chasing a moving target like this, instead of just staying with good ol' K&R C?

Well, I remember the time in the nineties when it was easy to exceed the performance of C compiler-generated assembly (by gcc, icc, pathscale, portland group) by rewriting it by hand. Nowadays, SIMD vectorization and possibly avoiding one or two unnecessary register moves at the beginning of a function is about it: the optimization has progressed by leaps and bounds. To me, the changes are worth it: I do eagerly expect switching to C23/C24 as soon as it becomes practical. And I do write a lot of C, both freestanding (microcontroller/embedded) and hosted (especially combining with POSIX C) systems stuff.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: array of pointers to different types C (Read 2072 times)

Share me