My plan is to set everything up in the program at the time of writing, no run time edits. I am afraid that you have rather lost me.
I explained it poorly, because I was still thinking it up at the time.
I'm wondering if what you are talking about is specific structures of various arrays that describe some range of indices and their sub indices that cover specific ranges, perhaps with each pointing to the next mini dictionary.
Yes, except that the structures are in RAM and form an array, not a linked list.
Instead of a single dictionary, you can use a sequence of dictionaries, looking up the index in each, one at a time, until found. (This kind of construction is often called
overlay, but I chose to call it
fragment, because of the inherent structure of the indexes, as I describe below.)
Each
fragment can be used in more than one dictionary, and the change of value in one will be reflected in all dictionaries using the same fragment.
At the face of it, this sounds like a lot of wasted computation, but it is not: it turns out that for typical implementations, each fragment will basically define a range of indexes, that rarely overlaps with other fragments' indexes.
If one uses a different dictionary each for DEFTYPE, DEFSTRUCT, VAR, ARRAY, and RECORD objects, they will overlap, but implementation becomes much more efficient. In fact, it turns out that it makes sense to implement ARRAY objects as RECORD objects that just all happen to have all fields the same type. In practice, this means that at to look up any index, one will check at most two dictionary fragments. With binary search, it should not be too costly at all.
What I was thinking was that in order to find an object at sub index level I have to first locate it by index, then I can go by sub index. Sub indices seems to be consecutive, if I am using one of the sub index objects I may as well set up the entire index as I am likely using more than just the one. This makes it possible to pin point a sub index object within the index if I were to have an array of sub index descriptions for each index, this allows me to vary the array length for each index and not waste space.
Yes. Similarly, indexes are rather tightly clustered and not at all random.
For example, all indexes below 0x1000 are definition objects (and there are none in the range 0x0260..0x0FFF. 0x0060..0x025F and 0x6000..0x9FFF are used by the eight logical device profiles. As a "manufacturer", you'll put your profile stuff in 0x2000..0x5FFF. Network variables are in 0xA000..0xAFFF, and system variables in 0xB000..0xBFFF. Mostly, the accesses will be to communication profile area, in 0x1000..0x1FFF. These are not randomly filled, either, and tend to be densely filled in the lower indexes, with an occasional hole here and there.
DEFTYPE objects, when read, simply provide the length of the type defined, in bits, as an UNSIGNED32, or 0 if variable. These are all also read-only or const, so all you need is 8 bits per index. DEFSTRUCT objects are slightly more complicated, as their sub-index 0 indicates the number of fields in the struct as an UNSIGNED8, and the following sub-indexes the UNSIGNED16 field type. In practice, each index can just refer to an array of UNSIGNED16 values. It turns out that these can easily be packed into a single array of UNSIGNED16, and each object index just refer to the offset in this array. Thus, these are rather easy.
VAR objects are those that have the data field in sub-index 0, and use no other sub-indexes. RECORD objects have up to 254 fields of varying types, and ARRAY objects up to 254 fields of the same type; both having the length as UNSIGNED8 in sub-index 0. These are the ones you'll most often be accessing.
When looking up a field via index:sub-index, there are several pieces of information you end up needing:
- The address to the field itself
- Number of bits/bytes used by the value
- Type of the value (BOOLEAN, UNSIGNED, INTEGER, REAL, VISIBLE_STRING, OCTET_STRING, UNICODE_STRING)
- Access mode (const, read-only, write-only, read-write)
Annoyingly enough, ARMv6-m can return a 64-bit integer in registers R0 and R1, but will not do so for 64-bit unions or structures. So, one ends up having to do something like
return ((fieldspec){ .ref = addr, .size = size, .datatype = type, .access = access }).packed; const fieldspec field = { .packed = find_field_using_index_and_subindex() };with
typedef union {
uint64_t packed;
struct {
union {
void *ref;
uint32_t *u32;
uint16_t *u16;
uint8_t *u8;
int32_t *i32;
int16_t *i16;
int8_t *i8;
uint32_t *boolean;
char *visible_string;
unsigned char *octet_string;
uint16_t *unicode_string;
};
uint16_t size;
uint8_t datatype:6;
uint8_t access:2;
};
} fieldspec;
so that you can use a single function to find the field, whether you are getting or setting the value, instead of having to copy-paste much of the same code. Such a function can easily support all different dictionary fragment types, too, not only doing the binary search within each fragment that contains indexes in the target range, but also scanning through all fragments belonging to a "full dictionary". (The access mode is rather important, if you do not want a CANopen SDO message trying to modify a data field in Flash to crash your MCU.)
The downside is that their initialization in C is
annoying, because the description of a single object is split across multiple arrays. Generating the C code for the structures from a definition, however, is simple. I don't like
code generation at all, but fortunately these are just structures and arrays in Flash and RAM, plus macros to define the names of entries in RAM so you can access the value directly also (if you want).
For example, the VAR dictionary consists of three arrays (that would be members of a structure, here shown separately):
const uint16_t var_index[INDEXES]; const acctype var_acctype[INDEXES]; field32 var_value[INDEXES];with
typedef union {
uint8_t acctype;
struct {
uint8_t datatype:6;
uint8_t access:2;
};
} acctype;
typedef union {
uint32_t u32;
uint16_t u16;
uint8_t u8;
int32_t i32;
int16_t i16;
int8_t i8;
uint32_t boolean;
char *visible_string;
unsigned char *octet_string;
uint16_t *unicode_string;
} field32;
So that to initialize say index 0x1007, as the third element in the arrays, you end up having
[2] = 0x1007, // in the
var_index array initializer
[2] = { .datatype = DATATYPE_U32, .access = ACCESS_RO }, // in the
var_acctype array initializer
[2] = { .u32 = 0 }, // in the
var_value array initializer
plus a macro definition similar to
STATUS_REGISTER(dict) ((dict).var_value[2].u32)The ARRAY and RECORD dictionaries are even more annoying, because you have a variable number of fields per index. No problem at all if you generate the structure initializers, but damned annoying and error-prone if you maintain the initializers by hand. The implementation I'd look at combines these, using a similar arrays as for VARs, but with an additional offset into the acctype and value arrays per index, indicating where these start:
const uint16_t rec_index[INDEXES]; const uintN_t rec_start[INDEXES]; const acctype rec_acctype[INDEXES+FIELDS]; field32 rec_value[INDEXES+FIELDS];where
N is 16 if the number of indexes and fields (including sub-index 0) fits in 16 bits, 32 otherwise.
To find a specific index:subindex, one does a binary search in the
rec_index array. If found as
k'th element in that array, then
start=rec_start[k] tells where in
rec_acctype and
rec_value arrays the object starts at. The number of sub-indexes is always described by sub-index 0, i.e.
rec_value[start].u8, so that will tell if the sub-index is within the object. If it is, then
rec_acctype[start+subindex] describes the data type and access mode, and
rec_value[start+subindex] contains the field value itself. To define a preprocessor macro to a specific index:subindex, you use
#define MACRONAME(dict) ((dict).rec_value[offset].datatype)allowing direct access to the field without any lookups.
Because you need the two access mode bits per field, I see no sense in treating arrays and records differently; it only saves 6 bits per field, but duplicates a lot of common things. Instead, I think it makes sense to treat arrays internally as records that just happen to have all fields the same type (except sub-index 0 as UNSIGNED8).
In all cases, instead of actual arrays, the dictionary fragment structures could use pointers, thus allowing the same fragment structure to be used with different values. Obviously, that makes hand-maintenance a nightmare, although with proper naming, generating the arrays and letting the user to construct the fragments as needed (via say helper macros), maintenance and use for us humans could be made quite nice and easy. After the underlying logic is explained, of course; without that, this definitely looks like black magic at the first glance.
One thing I have struggled on is how to initialize structures when I am setting them up.
Exactly! While the C99 and later array initializers allow you to define array entries in a random order -- for example,
const unsigned char A[10] = { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 };
const unsigned char B[10] = { [9] = 0, [8] = 1, [7] = 2, [6] = 3, [5] = 4, [4] = 5, [3] = 6, [2] = 7, [1] = 8, [0] = 9 };
defining the exact same array values –– each object (index definition) has to be spread into several different arrays; and modifying the length of one array or record will move the datatype, access, and value indexes for many other objects in the same fragments, making hand-maintenance of the initializers in C basically a fool's errand.
If you use Awk (from a tabulated definition) or Python (from a tabulated, CSV, or structured/XML definition) to generate the initializers, then there is no problem. One can even add enough comments to make human verification/reading of the structures easy. With a Makefile dependency on the source definition on the generated files, it'd even build easily.
My offer of example code still stands, but be fully aware of the abovementioned limitations/properties of my approach. I do claim the binary search to be extremely effective here, although I haven't microbenchmarked it on any real-world CANopen dictionaries (simply because I don't have any real-world profiles). Cache locality is far from perfect, but I suspect that on Cortex-M0/M0+ it won't matter that much.
The way I implement binary search in this case, assuming indexes sorted in ascending/increasing order, in pseudocode is
- If the sought-for index is less than the first index in this fragment, it is not within this fragment.
- If the sought-for index is the first index in this fragment, we found it at offset 0.
- If there are less than 1 index in this fragment, it is not within this fragment.
Otherwise, set last = one less than the number of indexes in this fragment. - If the sought-for index is larger than the last index in this fragment, it is not within this fragment.
- If the sought-for index is the last index in this fragment, we found it at offset last.
- Do a binary search between (and excluding) offsets 0 and last:
Set offset_min = 0 and offset_max = last, then loop:- Set offset = (offset_min + offset_max) / 2, using 32-bit unsigned integer arithmetic to avoid overflow.
If offset == offset_min, it is not within this fragment. - If the sought-for index is less than index at offset, set offset_max = offset.
Otherwise, if the sought-for index is greater than index at offset, set offset_min = offset.
Otherwise, the sought-for index is at offset offset.
This is basically bulletproof. It does mean that the first and last entries in the index array are examined first (which is horrible for cache locality, but on simple architectures like ARMv6-m shouldn't make any measurable difference), but no offset is examined twice, nor are there any excess conditional checks that can be avoided. For example, the single-object check ensures that
last will never be negative, which could cause an access out of array bounds. If it is possible to have empty dictionary fragments, that does need an additional check up front. Exactly 1+⌈log₂
N⌉ accesses are done for an array of
N elements; the exact number of comparisons depends on the data and/or the hardware architecture. On ARMv6-m, for a full < = > comparison, the data is loaded to a register, then the two registers are compared, followed by two conditional branch instructions, followed by the third case. Binary search is compact and simple, but does do rather many conditional jumps (which I believe are "cheap" on ARMv6-m, compared to say x86-64, where they are quite "expensive".)