EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: Simon on March 14, 2020, 06:58:03 pm

Title: text / strings in C
Post by: Simon on March 14, 2020, 06:58:03 pm
I am looking to send data from a micro-controller to an LCD over SPI. Looking for discussions online yields nothing about this generically other than issues with getting the protocol to work at all.

What i am wondering is the mechanics of doing the code to accomplish this. I read that strings are not actually a thing in C. So does this mean that I create all "text" as arrays of characters and either tell my transmission function how many characters there are to cycle through or put some sort of end code in like the NUL that would be on a string and when i detect that character stop transmission.

I expect that if i were to use a string i would then have to use a pointer to move through the string so in effect may as well use an array as i can "address" each character more easily.

I am using GCC for ARM
Title: Re: text / strings in C
Post by: rhodges on March 14, 2020, 07:10:02 pm
I think it is easiest to terminate the characters with a zero (NUL) byte. When you use a string like "hello", the compiler will add the zero at the end for you.

Here is a snippet from my LCD library:
void lcd_puts(char *s)
{
    while (1) {
        if (*s == 0)
            return;
        lcd_putc(*s);
        s++;
    }
}

And you could do something like this:
{
    lcd_puts("Hello, world\r\n");
}
 
Title: Re: text / strings in C
Post by: greenpossum on March 14, 2020, 07:41:33 pm
I read that strings are not actually a thing in C. So does this mean that I create all "text" as arrays of characters and either tell my transmission function how many characters there are to cycle through or put some sort of end code in like the NUL that would be on a string and when i detect that character stop transmission.

I don't know what you think a string needs to be to be a thing in C. Certainly all the literature talks about strings and string functions.

When you write "hello" in a C program, the compiler treats this as an array with the characters h e l l o plus a NUL byte. All the string functions expect this NUL byte. Therefore if you write a function to output to a LCD you can stop on that NUL byte.

Here are some initialisation idioms and what they will create:

char string[] = "hello";  // string is a 6 char array with NUL as the last byte
char *string = "hello"; // string is a pointer to an anonymous 6 char array with NUL as the last byte
char string[6] = "hello";  // same as 1st case
char string[5] = "hello";  // legal, the NUL byte is omitted (this feature added to classic C for usage that didn't want to waste the last byte)

printf("Hello world\n"); // the function printf is handed the address of an anonymous 13 char array containing hello SP world NL NUL
Title: Re: text / strings in C
Post by: Jeroen3 on March 14, 2020, 07:59:44 pm
Code: (dirty example) [Select]
typedef struct {
    size_t len;
    char *string;
} string;

int string_create(string &this){
    this.len = 0;
    this.string = NULL;
}

int string_set(string &this, const char *s){
    if(this.string){
       this.len = strlen(s);
       void *t = malloc(this.len);
       strcpy(t, s);
       free(this.string);
       this.string = t;
   }else{
       this.len = strlen(s);
       this.string = malloc(this.len);
       strcpy(this.string, s);
   }
}
Now strings are a thing in C!
You get the idea. C doesn't have data containers. C only has pointers to data. The rest is up to your imagination.
As programmers without containers we have settled on a standard that representations of printable text in memory are terminated with a NUL,\0,0x00 character. So this is what the compiler does.
Your protocol may utilize more of the ascii  (https://www.ascii-code.com/)set though. If you're transmitting printable text you may as well use those.

You may be able to use a string library if that makes you feel more comfortable:
https://github.com/kozross/awesome-c#string-manipulation (https://github.com/kozross/awesome-c#string-manipulation)

Quote
I expect that if i were to use a string i would then have to use a pointer to move through the string so in effect may as well use an array as i can "address" each character more easily.
[a] just means (+ (a * sizeof(type)) on a pointer.
Title: Re: text / strings in C
Post by: Simon on March 14, 2020, 08:05:40 pm
so long as throwing a string at something is fine for the compiler it's fine by me. So i can use a string and then need to use pointers to address the bytes or if I use

char string[] = "hello";  // string is a 6 char array with NUL as the last byte

I can just index through until i find a null character
Title: Re: text / strings in C
Post by: Yansi on March 14, 2020, 08:19:01 pm
Indexing through is the wrong approach (resulting in a slow code most cases).

Use pointer and post increment it. Whats so difficult?

I think it is easiest to terminate the characters with a zero (NUL) byte. When you use a string like "hello", the compiler will add the zero at the end for you.

Here is a snippet from my LCD library:
void lcd_puts(char *s)
{
    while (1) {
        if (*s == 0)
            return;
        lcd_putc(*s);
        s++;
    }
}

Polishing the example from rhodges a bit:

Code: [Select]
void lcd_puts(char *s)
{
    while (*s) {
        lcd_putc(*s++);
    }
}
Title: Re: text / strings in C
Post by: Simon on March 14, 2020, 08:55:35 pm
Aren't incrementing an index and incrementing a pointer the same effort? Your cleaned up code will go on forever as there is nothing to stop it at the end of the string so it will just run through the entire memory shoving random stuff onto the SPI.
Title: Re: text / strings in C
Post by: Simon on March 14, 2020, 08:58:00 pm
OK I see it but is 0 the nul code?
Title: Re: text / strings in C
Post by: greenpossum on March 14, 2020, 09:06:37 pm
Indexing into an array is more expensive than dereferencing a pointer, because of the addition of the index to the base address, although a smart compiler could reduce the disadvantage.

Yes NUL is the zero byte.
Title: Re: text / strings in C
Post by: dmills on March 14, 2020, 10:10:41 pm
Code: [Select]

int string_create(string &this){
    this.len = 0;
    this.string = NULL;
}
Pretty sure that ain't C.

Looks like C++ (Which has references) not C, but if you are going for C++, just skin it with std::string, possibly with a custom allocator, why reinvent the wheel, it is not like C++ does not already have 57 different varieties of kitchen sink built right in.
Title: Re: text / strings in C
Post by: rhodges on March 14, 2020, 11:28:15 pm
Your cleaned up code will go on forever as there is nothing to stop it at the end of the string ...
The final zero byte will evaluate as false and terminate the loop. No problem.

Yansi's code is exactly the same as mine, and most experienced C programmers will prefer that style. My code is expanded to make it easy for anyone to understand it perfectly. I am sure the compiler output will be exactly the same.
Title: Re: text / strings in C
Post by: brucehoult on March 15, 2020, 02:08:59 am
Indexing into an array is more expensive than dereferencing a pointer, because of the addition of the index to the base address, although a smart compiler could reduce the disadvantage.

Yes NUL is the zero byte.

It's exactly the same with any decent compiler e.g. gcc or LLVM.
Title: Re: text / strings in C
Post by: SiliconWizard on March 15, 2020, 02:26:14 am
Indexing into an array is more expensive than dereferencing a pointer, because of the addition of the index to the base address

Although it was true for pretty old architectures (or more recent but limited 8-bitters) in the general case, it's certainly not for most modern targets. Indexing comes at no cost as most modern ISAs have memory access instructions with base and offset. If anything, depending on the ISA, that could cost an extra register though, but only that. And if the indexing is linear in a loop, the compiler will select whatever is most efficient between that and just incrementing the base address register anyway. Either way, it's just incrementing a register, same cost.

Title: Re: text / strings in C
Post by: greenpossum on March 15, 2020, 02:41:44 am
Of course I know that decent compilers will do the right thing but pointer manipulation is part and parcel of C(++) so no need to stick to Pascal habits. :P

References are not part of C, this to the other poster.
Title: Re: text / strings in C
Post by: T3sl4co1l on March 15, 2020, 03:10:25 am
You may find it's better to add more functionality.  Reasons include non-blocking functions, shared use (i.e., from many functions, .c files), standardizing interface (could map it to stdout for example), etc.

For example, writing a string to the display could take up whole milliseconds at a time, precious time that might need to be spent polling devices or responding to interrupts.  Not that the write function would be interrupt-blocking, but it has to be if you want to get debug output from interrupt sources...

If it's buffered, like stdout, you don't have to worry about where or when data is sent.  As long as the buffer is updated atomically (usually by disabling interrupts, so it should be done quickly, too), it can receive data from any source (main or interrupt).

Downside is the buffer can fill up, so you do need to add checks for that.  Much better than completely missing a critical operation though.

Last time I did a HD44780 display, I used a buffer with in-band signaling.  Think it was that chars 0-7 were taken over for special functions (locate cursor, clear screen, etc.), and the rest are normal, like 8-15 are the programmable ones (CRAM) which are mirrored in that range so there's no loss of functionality, and 32-255 are normal.

Forget what string format I used on that project... it was a while ago.  I prefer to avoid ASCIIZ strings, the downside is fixed-length strings have to be passed by two parameters at all times (pointer and length), and the length is limited (whatever fits in a, char, short, whatever).

Of course if you're doing simple sequential stuff, basically doing Arduino stuff, blocking write functions are perfectly fine, too.

Tim
Title: Re: text / strings in C
Post by: Siwastaja on March 15, 2020, 08:15:00 am
Of course strings are a thing in C. C strings are series of bytes in memory, terminated with value 0. (I like to say 0 when I mean 0, instead of some possibly confusing special names like nul or NULL depending on context. Of course, be careful not to confuse the ASCII character 0 to the actual digital value 0.) A string literal "A" differs from the character literal 'A' by the fact that compiler generates the terminating byte: "A" is {65, 0} in memory.

It's easy to understand why the C designers made such choice: it's freaking simple, and easy. It's so simple that apparently people think it's not "a thing" because it's too simple.

There are other ways to do it, like storing the length separately; then you don't need the termination byte, and the risk of going out of bounds is lower. (With the zero-termination, you can accidentally either forget to add the zero, or forget to allocate space for it, or accidentally overwrite it, causing almost infinite loops (until there is a zero byte in memory by luck). These bugs are the reason for a bunch of "safer" library functions like snprintf instead of sprintf).

For such a simple use case, there is no compelling reason why you should implement any more complexity on your own. The zero-termination comes with traps and risks of bugs, but implementing a more complex string system comes with the risk of bugs, as well.

Use the simple few-liner already posted, but make sure you understand how it works.
Title: Re: text / strings in C
Post by: Simon on March 15, 2020, 08:56:09 am
You may find it's better to add more functionality.  Reasons include non-blocking functions, shared use (i.e., from many functions, .c files), standardizing interface (could map it to stdout for example), etc.

For example, writing a string to the display could take up whole milliseconds at a time, precious time that might need to be spent polling devices or responding to interrupts.  Not that the write function would be interrupt-blocking, but it has to be if you want to get debug output from interrupt sources...

If it's buffered, like stdout, you don't have to worry about where or when data is sent.  As long as the buffer is updated atomically (usually by disabling interrupts, so it should be done quickly, too), it can receive data from any source (main or interrupt).

Downside is the buffer can fill up, so you do need to add checks for that.  Much better than completely missing a critical operation though.

Last time I did a HD44780 display, I used a buffer with in-band signaling.  Think it was that chars 0-7 were taken over for special functions (locate cursor, clear screen, etc.), and the rest are normal, like 8-15 are the programmable ones (CRAM) which are mirrored in that range so there's no loss of functionality, and 32-255 are normal.

Forget what string format I used on that project... it was a while ago.  I prefer to avoid ASCIIZ strings, the downside is fixed-length strings have to be passed by two parameters at all times (pointer and length), and the length is limited (whatever fits in a, char, short, whatever).

Of course if you're doing simple sequential stuff, basically doing Arduino stuff, blocking write functions are perfectly fine, too.

Tim

Why not use interrupts so that the ports is left to transmit while the rest of the program carries on and then the interrupt call loads the next character until it ends. Obviously more complicated. The maximum SPI frequency appears to be 2MHz but of  course if this is wired to the display a lower frequency is probably more prudent.

At 2MHz that is 4.5µs per character or 72µs per 16 character line.
Title: Re: text / strings in C
Post by: T3sl4co1l on March 15, 2020, 09:17:50 am
Precisely -- and when the transmitter finishes, the interrupt has to read from somewhere asynchronously.  Hence, the buffer.  So you have a buffer, and the getters/setters to access it safely.  Simple once you've done it a few times. :)

Tim
Title: Re: text / strings in C
Post by: brucehoult on March 15, 2020, 10:19:28 am
Of course strings are a thing in C. C strings are series of bytes in memory, terminated with value 0.

That's purely a convention. The compiler happens to support a convenient syntax for making global arrays containing constant strings i.e. "Hello" instead of {'H', 'e', 'l', 'l', 'o', '\0'} or even {72, 101, 108, 108, 111, 0} but that's .. *all*. And there are some completely optional-to-use functions such as strlen and strcpy and strcmp in the standard library.

Even *character* isn't a proper type in C -- it's just a small integer. You don't even know whether it's signed or unsigned.

None of the string library functions (not even strdup) care how you allocated the memory space for your zero-terminated series of bytes in memory. You could use a global array, a local variable, something allocated on the heap, or a pointer to any random byte in the middle of any of those -- or completely outside those. Nothing cares, nothing checks. You can allocate one big chunk of memory and then store a bunch of "strings" inside it. You can make the tails of strings overlap. Whatever.

All of this is *totally* foreign to users of languages such as Python, Perl, PHP, Ruby, JavaScript, Java, C# in which "string" is a real thing.
Title: Re: text / strings in C
Post by: T3sl4co1l on March 15, 2020, 10:50:48 am
What's more, any named variable is simply a pointer to memory; so you can have the valid program,

Code: [Select]
int main = { ... };

Where "..." is the machine code for main().

Or, I forget if the compiler complains that it's not a function type and there's some type abuse to make it work.  But anyway, such an approach holds the honor of having forced an IOCCC (International Obfuscated C Code Challenge) rule change, namely that programs must be portable, not specific to CPUs or hardware. ;D

An aside: any executable statement simply gets compiled, eventually into machine code; can you place arbitrary statements within array initializers and have them compile?  Or do you need to write a function and read its contents to do that?  Never tried...  Also, how does it know, does it know that an array type can only have numeric initializers, and a function can only have statement initializers?  So in effect, the comma operator is context sensitive?

Conversely, you can commit such horrors as:

Code: [Select]
strcmp((char*)main, machine_code_to_compare_to)

But be careful doing

Code: [Select]
strcmp((char*)main(some_params), other_pointer_to_compare_to)

because main returns int so it better be some very special int where valid memory happens to be found... not to mention managing the correct recursion conditions. :D  (Recursive main() being another favorite of IOCCC entries.)

Tim
Title: Re: text / strings in C
Post by: Simon on March 15, 2020, 12:06:09 pm
I think basically the "string" and 'char' are just a way to tell the compiler that the character is not to be taken literally but converted to an 8 bit ASCII code. So we can read what we wrote and the compiler knows what we mean and puts the right value.

I expect my first move is to send a string to the display with waits for the buffer empty flag and then convert it to an interrupt driven function using a global variable.
Title: Re: text / strings in C
Post by: Simon on March 15, 2020, 02:25:39 pm
Right, I get it. With pointers I can tell "a" function to send "any" string, bit more difficult to write code to pass "any" variable but i can pass the pointer.
Title: Re: text / strings in C
Post by: Simon on March 15, 2020, 02:28:26 pm
The advantage of using an array for something like a display is that given the semi autonomy of the hardware and the ease of just refreshing the whole display an array can act as the "display ram" where I can put any text that is due to appear in the next screen load.
Title: Re: text / strings in C
Post by: T3sl4co1l on March 15, 2020, 03:10:49 pm
On that note -- on the upside, character displays are fairly light weight, so it's not unreasonable to use a (full screen) framebuffer.  It's only 80 bytes even for a larger display (e.g. 4x20).  This would probably be best done by getter/setter functions to manage the buffer, and a low priority refresh (can be done say by main() loop polling), copying the whole framebuffer to the display regardless of how much has changed.

This of course gets a lot less practical on larger, and especially graphic, displays.  Even the classic 80x25 text screen needs 4kB, a huge amount for say an Arduino (but, still quite practicable for many ARM platforms).

Tim
Title: Re: text / strings in C
Post by: Yansi on March 15, 2020, 03:39:50 pm
Note that most displays have painstakingly slow interfaces, so rather then wasting multiple thousands or even more CPU cycles on fetching a byte back from the display, you better buffer it internally for fast access whenever required.
Title: Re: text / strings in C
Post by: IanB on March 15, 2020, 04:40:51 pm
Just a sideways comment here, but C is one of the few languages that really does support strings. To justify this statement, consider what "string" means: it is a string of bytes (or characters) in memory. (A string being a sequence of things lined up one after the other.)

Other languages with a String datatype are not really providing strings as such, they are providing a text datatype where a block of text can be treated as a single object.
Title: Re: text / strings in C
Post by: Simon on March 15, 2020, 06:24:59 pm
Well I am only sending and ultimately it will be interrupt driven so I use the buffer empty interrupt to load the next character, minimal overhead. The little 2x16 screen I want to use for testing will take µs to refresh if I were to run at the declared 2MHz spi clock but for a wired application maybe a bit slower.
Title: Re: text / strings in C
Post by: Nominal Animal on March 16, 2020, 01:04:42 pm
As a sideways comment to those who are interested in such details -- wall of text follows:

C actually has two string types: ordinary strings, and wide strings.

They are both simply unspecified-length arrays, terminated by a zero value ('\0' and L'\0', respectively).
(Because it can be unclear whether by zero one means code point zero or the zero digit character, I like to call this value nul.  In comparison, the zero pointer value I call NULL, with the length of the final consonant separating them in everyday speech.)

For ordinary strings, each character in the array is of type char, but because of the integer promotion rules in C, in expressions literal character constants like 'X' are promoted to ints.

For wide strings, each character in the array is of type wchar_t.  Unlike ordinary string literals, the type wint_t is not related to integer promotion at all, and is just a type that can hold any wchar_t value, plus the WEOF value (indicating end-of-stream for wide character streams).

C11 added support for specifying Unicode code points in both ordinary and wide character constants and string literals, using \uHHHH or \UHHHHHHHH, where HHHH and HHHHHHHH are the code point in hexadecimal.

The exact character set used for ordinary and wide character constants and string literals is a bit of a complex issue.  In practice nowadays, the ordinary character set is ASCII compatible, either UTF-8 or one of the 8-bit ASCII-compatible character sets.  The character set used for wide characters is even messier, partly because the Microsoft C library used in Windows uses UTF-16, where some glyphs can require more than one wide character; I'm not exactly clear which Windows versions and libraries actually support that, and which are limited to the first 65536 characters of the Unicode set.

For POSIXy systems -- that means Linux, *BSDs, Mac OS, Android, and some other esoteric systems -- the C library provides iconv (http://man7.org/linux/man-pages/man0/iconv.h.0p.html) conversion facilities.  It can basically convert, at run time, between various character sets (using ordinary strings), and to/from wide character strings, using a very simple but efficient conversion interface.

Standard C also contains wide character equivalents of the typical I/O functions -- wprintf()/printf(), wfprintf()/fprintf(), wscanf()/scanf(), wcslen()/strlen() -- and so on.  (The only thing that is missing is the wide character equivalents of POSIX getline() and getdelim(), really; you have to roll your own (https://stackoverflow.com/a/36188053) for those.)

But to be most practical, we should just use UTF-8 everywhere (https://utf8everywhere.org/).
(This is most important when dealing with internet-of-things gadgets and such.)

As an example, if you write a Linux/Mac/BSD/POSIXy program that states that it only works in UTF-8 locales, and the sources use UTF-8 encoding, you can use ordinary string literals that contain non-ASCII characters like "Öbaut 2.50€", and they will work fine.  What will not work, however, is single-character non-ASCII literals like '€' or 'Ö', unfortunately.  This is because non-ASCII characters in UTF-8 are composed of 2 to 6 chars.  However, if you write your code to consider substrings instead of single character constants, it is not a problem at all.

I do personally have a bit of a chip on my solder about Microsoft wrt. C11 and getline() and wide-character support.  If MS hadn't made the mistake of assuming early on that 65536 characters would be enough for everyone (Unicode has 1,114,112 code points), we could have proper Unicode support standardized for C wide characters now, with widget and file system access libraries having wide character interfaces.  But enough of that: the world is what it is, and it is much better to be practical and robust, and forget whining about what could be.  Sorry about that.  ;)

In practice, you have two robust approaches to choose from, depending on where the C code you are working on will operate in.
If anyone is interested enough, I'd be happy to provide some example code for the various cases; just let me know of a specific situation you'd like to see.

(Full disclosure: I first encountered this problem in late nineties, when implementing a localized web form for course feedback reports for students, using Windows, Mac (pre-OS X), and Linux machines.  Internet Explorer in particular used the character set in current user locale for non-ASCII characters, regardless of the form data.  So, I developed hidden form fields with specific detector characters, to detect the actual character set the browser used for the input fields.  I have worked on character set and localization issues a lot, in other words.)
Title: Re: text / strings in C
Post by: Nominal Animal on March 16, 2020, 01:40:00 pm
A second wall of text: why terminated strings instead of netstrings (aka Pascal-style length-first-then-data)?

TL;DR: Because buffers and bufferbloat (https://en.wikipedia.org/wiki/Bufferbloat).

This matters a lot when you are designing your own custom protocol to talk with a computer or network-attached gadget.

There are two basic structures one can construct protocols on: size-structured, or stream-like.

File formats like PNG (https://en.wikipedia.org/wiki/Portable_Network_Graphics) and WAV (https://en.wikipedia.org/wiki/WAV) are size-structured, with each field being either fixed size (in bytes), or associated with the explicit length of that field.
File formats like HTML and XML (https://en.wikipedia.org/wiki/XML), and network protocols like HTTP, are stream-like; delimited by characters or strings, without explicit lengths for particular fields.

Note that being stream-like does not mean unstructured.  XML is most definitely a structured format.

In general terms, to process (send, or receive and handle) a complete field in a size-structured format, one needs to hold the entire field in memory at the same time.  (There are exceptions, particularly when the field itself can be decomposed into fixed- or known-size subfields, in which case one only needs to hold the subfield in memory at once.)

In comparison, stream formats can be, and often are, processed using a finite-state machine (https://en.wikipedia.org/wiki/Finite-state_machine).  The amount of RAM needed to process a stream format is typically a bit of state (a pointer or two per nesting level, up to maximum allowed depth is typical) and the length of the longest value field that cannot be processed as it arrives character-by-character.  In particular, structured input with known named fields and numerical values can often be processed using an FSM that parses/converts the numeric data on the fly ("on-line algorithm"), with very little RAM use.

Both types have their downsides with respect to error detection and correction.  Checksums are often used, included into the protocol format, sometimes as an optional feature.  I have seen many protocols with optional checksum support in both size-structured and stream-like protocols, and cannot really say there is any difference between the two wrt. checksumming; it is just something that has to be considered up front, and is very difficult to add on afterwards as an afterthought.

In stream-like protocols, numeric values are often in decimal, or a similar variable-length basis.  Base64 and Base85 are particularly common in protocols used on top of ASCII-compatible character sets.  In size-structured protocols, numeric data is usually in raw binary form.

Now, raw binary numbers have their own issue: byte order.  When a numeric value consists of multiple bytes, the order of those bytes needs to be specified and accounted for.  Currently, the two most used formats are big-endian (most significant byte first, then the others in decreasing order of significance), and little-endian (the inverse of big-endian).  The mixed byte orders like PDP-endian are rare to nonexistent.  While the most used desktop and server processors (made by Intel and AMD) are little-endian, a number of microcontrollers are big-endian, and this means the byte order must be considered at both ends.  In the "worst case", there are two conversions, wasting a bit of time.  With IOT, the amount of data transferred is so small that the conversion time is completely irrelevant.

For floating-point and custom integer types, in addition to byte order, the exact binary format must also be specified somehow.  Currently, most microcontrollers and DSPs use IEEE-754 (https://en.wikipedia.org/wiki/IEEE-754) binary32 and binary64 formats (typically corresponding to float and double in C), or at least the conversion between these and whatever internal format they might use.

For typical IOT devices, the overhead of byte order conversion or binary format conversion is completely neglible, due to the relatively small amount of data transferred.  I personally work with simulations, that generate megabytes to gigabytes of data, and there the conversion starts to matter.  As a solution, I developed a file format with prototype values in the header for each numeric type used, with the reader being responsible for the conversion.  If both the writer and the reader use the same byte order (and binary floating-point format), no conversion is necessary.
This boils down to picking prototype numeric values whose bit patterns are easily distinguished in different byte orders.  (I've never seen anything else than IEEE-754 binary32 and binary64, in either big-endian or little-endian byte order, so I'd say picking values with each byte having a different bit pattern is good enough.)  Plus, you want to use values you can specify exactly in decimal, so that you can express those values in any programming language.

In practice, this means that if you intend to build any data acquisition device, or similar wide bandwidth device producing lots of numeric data, the byte order deserves a bit of thought.  (As of this writing, "use little-endian" is the simple robust answer; I am just pointing out that in some cases you might arrive at a different answer, like I did wrt. MD simulator output.)
Title: Re: text / strings in C
Post by: Jan Audio on March 17, 2020, 02:12:16 pm
Why would you read back what you have written ?
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 10:52:54 am
Well I now have my SPI port communicating and I am reading what it put out with my oscilloscope.

While I can create an array and assign a string to it on declaration I cannot put a string into the array later on, I get:
Code: [Select]
Severity Code Description Project File Line
Warning assignment makes integer from pointer without a cast [-Wint-conversion]

So it looks like i will need to have slightly less readable code and assign one letter at a time or just come up with a string to array converter function.
Title: Re: text / strings in C
Post by: grumpydoc on April 12, 2020, 11:21:53 am
Quote
I cannot put a string into the array later on

No, you can't say eg
Code: [Select]
unsigned char foo[] = "Hello World";

and later say
Code: [Select]
foo = "Goodbye Cruel World";

nor say (which is what I think you might be trying from the error message)
Code: [Select]
foo[0] = "Goodbye Cruel World";

For one thing these are not the same - one is initialisation, the other assignment and "foo" is not an l-value so you can't assign it.

This is where strcpy and the other str* functions come in.

However it would *still* be an error to say

Code: [Select]
strcpy(foo, "Goodbye Cruel World");
Even though that might compile because when initialised only 12 bytes were allocated to foo (including the \0 terminator).

Could you post the code of what you are trying?
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 11:24:44 am
int8_t text[16] ;
int8_t text1[5] = "hello";
int8_t text2[4] = "you";
text[0] = text1;
text[5] = text2;
Title: Re: text / strings in C
Post by: Yansi on April 12, 2020, 11:28:37 am
that's what STRCPY is for.

Using signed int for characters is a bad practice too. Either use the basic CHAR, or uint8_t (if wanna have he hassle to typecast every time)

And least but not last, if using text constants like this:

int8_t text1[5] = "hello";

declare them as CONST if they are supposed to be const. And btw, you are missing a character there for the terminating null byte. This wont compile, but this code below will:

int8_t text1[6] = "hello";
Title: Re: text / strings in C
Post by: grumpydoc on April 12, 2020, 11:31:36 am
int8_t text[16] ;
int8_t text1[5] = "hello";
int8_t text2[4] = "you";
text[0] = text1;
text[5] = text2;

Hmmm text1 does not have enough space allocated (you forgot to allow for the '\0', though you got it right for text2)

should have been
Code: [Select]
int8_t text[16] ;
int8_t text1[6] = "hello";
int8_t text2[4] = "you";
strcpy(text, text1);
strcpy(text+5, text2);

That gets "helloyou" in text;

strncpy is safer than strcpy - but watch out for it not copying the '\0' if you are not careful.
Title: Re: text / strings in C
Post by: Yansi on April 12, 2020, 11:32:57 am
STRCPY should also be avoided, as it does not check for memory overflow.

Use STRNCPY instead, that checks number of copied characters is within the limits of the destination memory.

Also, STRxxx family of functions work with CHARs, so you should typecast your nonstandard integer texts:

strcpy((char*)text, (char*)text1);

and for string concatenation there is the STRCAT
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 11:34:20 am
yes I did use the signed type, thought the name of the type was shorter than usual.
Title: Re: text / strings in C
Post by: grumpydoc on April 12, 2020, 11:40:16 am
STRCPY should also be avoided, as it does not check for memory overflow.

Use STRNCPY instead, that checks number of copied characters is within the limits of the destination memory.

strncpy has its own "gotcha" in that it does not copy the '\0' if it "thinks" the destination string is full.

strcpy is fine IF you cave already checked lengths

Agree with the comments about types though.
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 12:43:54 pm
my problem seems to be trying to assign the contents of one array to another, i take it I need to cycle through and assign each index location to another.
Title: Re: text / strings in C
Post by: grumpydoc on April 12, 2020, 01:04:09 pm
I take it I need to cycle through and assign each index location to another.

Yes, there are functions in the standard library to help though.
Title: Re: text / strings in C
Post by: Jeroen3 on April 12, 2020, 02:19:40 pm
int8_t text[16] ;
int8_t text1[5] = "hello";
int8_t text2[4] = "you";
text[0] = text1;
text[5] = text2;
What are you even trying to do here. It isn't javascript or python.

my problem seems to be trying to assign the contents of one array to another, i take it I need to cycle through and assign each index location to another.
C can't do that for you. Although, the libs (https://pubs.opengroup.org/onlinepubs/009695399/basedefs/string.h.html) offer memcpy for you. C is stupid. That's why it's amazing, and hard to use.

Code: [Select]
// assign the contents of one array to another
int8_t array_a[16];
int8_t array_b[16];
memcpy(array_b, array_a, sizeof(array_a))

Code: [Select]
// assign the contents of one string to another
char string_a[16] = "from here";
char string_b[16] = "to here";
strncpy(string_b, string_a, sizeof(array_a))

Concatenating two strings, regardless of either length... Without risk of writing outside of your destination.
Code: [Select]
char string_a[6] = "hello";
char string_b[6] = "world";
char text[32] = "both strings: %s %s";
char result[32];
snprintf(result, sizeof(result), text, string_a, string_b);
printf(result);
Outputs: "both strings: hello world";
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 03:10:56 pm
Well at the end of the day I don't care for strings, they are just a convenient way to write code that puts ASCII into an array that I can pump out over a serial port. The null character would need removing anyway as otherwise it would show up in the middle of the text. So if i put a string into an array that does not have space for the last null character do I cause issues for the program as a whole as the memory location after the end of the array gets affected or does it mean I loose the null character as intended?

i can then write a function that puts a certain amount of variables from one string into another to fill my display buffer with what is required.
Title: Re: text / strings in C
Post by: radar_macgyver on April 12, 2020, 05:37:47 pm
Indexing into an array is more expensive than dereferencing a pointer, because of the addition of the index to the base address

Although it was true for pretty old architectures (or more recent but limited 8-bitters) in the general case, it's certainly not for most modern targets. Indexing comes at no cost as most modern ISAs have memory access instructions with base and offset. If anything, depending on the ISA, that could cost an extra register though, but only that. And if the indexing is linear in a loop, the compiler will select whatever is most efficient between that and just incrementing the base address register anyway. Either way, it's just incrementing a register, same cost.

Sometimes it helps to look at the output from the compiler to decide if that's the case. Here's a nice tool to do just that:

https://godbolt.org/z/Z-g6W3

So at least on ARM gcc, the two are equivalent. On x86 it seems like the indexed array method produces more code. My assembler skills are somewhat rusty so I can't tell if it's actually more efficient or not. Once optimizations are turned on (even -O1 or -Os), both produce exactly the same code.
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 05:58:13 pm
Yea the K&R book on C says that pointers are faster. From what i can tell I need to use pointers anyway to bring the data into my function. I would have to put the strings into an array as nothing else will hold that sort of data.
Title: Re: text / strings in C
Post by: Jeroen3 on April 12, 2020, 06:03:25 pm
So if i put a string into an array that does not have space for the last null character do I cause issues for the program as a whole as the memory location after the end of the array gets affected or does it mean I loose the null character as intended?
It will just keep reading memory until the first null.
Usually a segmentation fault (https://en.wikipedia.org/wiki/Segmentation_fault) will happen. Or until a bus error for embedded bare metal devices.

i can then write a function that puts a certain amount of variables from one string into another to fill my display buffer with what is required.
You don't have to, the libs provide the printf family.
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 06:06:26 pm
i can then write a function that puts a certain amount of variables from one string into another to fill my display buffer with what is required.
You don't have to, the libs provide the printf family.

i am talking about setting up my array to act as a message holder or display ram buffer and then copy into that the text to appear on the screen. So words and variable will get copied into locations to build up the screen contents that is then refreshed.
Title: Re: text / strings in C
Post by: Yansi on April 12, 2020, 06:45:30 pm
That is when SNPRINTF or VSNPRINTF  indeed comes in handy.

Title: Re: text / strings in C
Post by: Yansi on April 12, 2020, 06:58:38 pm
For example to make a custom PRINTF-like function to spit data on UART:
Code: [Select]
void USART_Printf(const char *fmt, ...);

/* Classy printf implementation for USART using variable argument */
void USART_Printf(const char *fmt, ...)
{
va_list ap; /* variable argument list */
char s[MAX_PRINTF_LEN]; /* string */
char *ps; /* pointer to string */

va_start(ap, fmt);

/* Print to string s */
vsnprintf(s, MAX_PRINTF_LEN, fmt, ap);

va_end(ap);

ps = s;

/* push the string out through the USART */
while (*ps) {

/* wait transmit empty */
  while (!(USART2->SR & USART_SR_TXE)) {}

    /* tx the char */
  USART2->DR = *ps++;
}
}


...and don't forget to include also stdarg.h!

Or an example how to declare a LCD printing function, that includes also the position of the text, example of use below:

Code: [Select]
//declaration:
void LCD_Printf(uint8_t pos_x, uint8_t pos_y, const char *fmt, ...);

//example function call printing a text, variable and a unit of measurement:
LCD_Printf(10,2, "Vbat = %.2f V", batt_volts);


Variable argument list is handled same as above example.
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 07:03:44 pm
Ah, turns out that as arrays are practically pointers I can pass the array names and let the function work on them.
Title: Re: text / strings in C
Post by: Yansi on April 12, 2020, 07:05:15 pm
Yes, the array identifier is a pointer.  See my example with vararg above. I have specially added the line of  ps = s; so you see the array is a pointer. Compiles without warnings.
Title: Re: text / strings in C
Post by: Simon on April 12, 2020, 08:11:31 pm
Code: [Select]
void copy_array_to_array(uint8_t destination[], uint8_t destination_start, uint8_t source[], uint8_t source_start, uint8_t source_end ){

uint8_t destination_index = destination_start;
uint8_t source_index      = source_start;

while(source_index <= source_end){

destination[destination_index] = source[source_index];
destination_index++;
source_index++;
}

}

I know I have probably reinvented the wheel but I need the learning and mental exercise in making code up, works a treat.
Title: Re: text / strings in C
Post by: Jeroen3 on April 12, 2020, 08:52:44 pm
Code: [Select]
void copy_array_to_array(uint8_t destination[], uint8_t destination_start, uint8_t source[], uint8_t source_start, uint8_t source_end ){

uint8_t destination_index = destination_start;
uint8_t source_index      = source_start;

while(source_index <= source_end){

destination[destination_index] = source[source_index];
destination_index++;
source_index++;
}

}

I know I have probably reinvented the wheel but I need the learning and mental exercise in making code up, works a treat.
That is a very convoluted way of writing
Code: [Select]
memcpy(&destination[destination_start], &source[source_start], source_end-source_start);
But ok..

It looks like you just want to replace some part of a string with another string?
Attempting to display something.
Write up a printf format string that fits your display. Probably a character display I'm guessing.
Code: [Select]
printf("Text: % 5s.", "A"); // prints A right aligned of a 5 char block
printf("Text: %-5s.\n", "B"); // prints B left aligned of a 5 char block
Printf is magic. (https://pubs.opengroup.org/onlinepubs/9699919799/)
Maybe not all the features are available in your toolchain, read the docs. These probably are.
This is the most luxury you're going to get, and you do not have to worry about memory access violation or forgotten null characters and getting gibberish on the screen.
Title: Re: text / strings in C
Post by: brucehoult on April 12, 2020, 10:57:12 pm
Yea the K&R book on C says that pointers are faster. From what i can tell I need to use pointers anyway to bring the data into my function. I would have to put the strings into an array as nothing else will hold that sort of data.

K&R was written over 40 years ago. Compilers have improved slightly in that time.

Also ignore any advice about declaring variables as "register" as modern compilers simply ignore that and decide for themselves what will be in registers -- usually "everything" for typical (not huge) functions on a machine with 16 or 32 registers.
Title: Re: text / strings in C
Post by: brucehoult on April 12, 2020, 11:20:47 pm
Code: [Select]
void copy_array_to_array(uint8_t destination[], uint8_t destination_start, uint8_t source[], uint8_t source_start, uint8_t source_end ){

uint8_t destination_index = destination_start;
uint8_t source_index      = source_start;

while(source_index <= source_end){

destination[destination_index] = source[source_index];
destination_index++;
source_index++;
}

}

I know I have probably reinvented the wheel but I need the learning and mental exercise in making code up, works a treat.

That's absolutely fine. I write loops and functions like this all the time, especially of the types of source and destination are different, in which case memcpy() won't help anyway. With at least -O1 optimization and small values of source_end-source_start this will be just as fast as memcpy anyway.

A few absolutely unimportant notes:

- unless you are actually on an 8 bit CPU such as AVR or PIC it would be better (more flexible and possibly a little more efficient) to have destination_start, source_start, source_end and *especially* source_index and destination_index as plain int. Then it will be ok for source[] and destination[] have more than 256 bytes, and work with offsets greater than 256 in them.

- source_index and destination_index are unnecessary but harmless. Arguably they serve to make the code slightly more readable, so it's fine, but as a matter of how C works it's absolutely fine to just use (and increment) source_start and destination_start. This won't propagate back the variables the caller uses. (It would in FORTRAN or ALGOL but those are ancient)

- I'd probably name source_end instead as source_last. Or change the comparison to < instead of <=. The name "source_end" makes me think it's as usual in C one-past-the-end. This would avoid a lot of doing "size-1" in the caller and fit most C programmer's expectations.

- with 5 function arguments you're getting close to the limit of what can be passed in registers on x86_64 (6) and past the limit on ARM32 (4). It's fine to exceed the limit but it just means more code in both the caller and the called function to copy the value to the stack and back.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 07:41:57 am

Also ignore any advice about declaring variables as "register" as modern compilers simply ignore that and decide for themselves what will be in registers -- usually "everything" for typical (not huge) functions on a machine with 16 or 32 registers.


How to you mean? I have to set any pointer to a register as volatile or nothing happens and the code does not work.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 07:46:35 am

That's absolutely fine. I write loops and functions like this all the time, especially of the types of source and destination are different, in which case memcpy() won't help anyway. With at least -O1 optimization and small values of source_end-source_start this will be just as fast as memcpy anyway.

A few absolutely unimportant notes:

- unless you are actually on an 8 bit CPU such as AVR or PIC it would be better (more flexible and possibly a little more efficient) to have destination_start, source_start, source_end and *especially* source_index and destination_index as plain int. Then it will be ok for source[] and destination[] have more than 256 bytes, and work with offsets greater than 256 in them.

- source_index and destination_index are unnecessary but harmless. Arguably they serve to make the code slightly more readable, so it's fine, but as a matter of how C works it's absolutely fine to just use (and increment) source_start and destination_start. This won't propagate back the variables the caller uses. (It would in FORTRAN or ALGOL but those are ancient)

- I'd probably name source_end instead as source_last. Or change the comparison to < instead of <=. The name "source_end" makes me think it's as usual in C one-past-the-end. This would avoid a lot of doing "size-1" in the caller and fit most C programmer's expectations.

- with 5 function arguments you're getting close to the limit of what can be passed in registers on x86_64 (6) and past the limit on ARM32 (4). It's fine to exceed the limit but it just means more code in both the caller and the called function to copy the value to the stack and back.


It's for ARM at the moment. For my intended use I can't see me needing more than 256 elements.

Not sure what I can do to reduce the amount of elements, the only way to get it to 4 is to have it just copy all of the source array, this is probably how i would use it most of the time.
Title: Re: text / strings in C
Post by: IanB on April 13, 2020, 07:52:45 am
Code: [Select]
void copy_array_to_array(uint8_t destination[], uint8_t destination_start, uint8_t source[], uint8_t source_start, uint8_t source_end ){

uint8_t destination_index = destination_start;
uint8_t source_index      = source_start;

while(source_index <= source_end){

destination[destination_index] = source[source_index];
destination_index++;
source_index++;
}

}

I know I have probably reinvented the wheel but I need the learning and mental exercise in making code up, works a treat.

It is by the way not necessary to create new variables for destination_index and source_index, since you already have destination_start and source_start variables available and ready to use. You can just increment them from their starting values in the loop.
Title: Re: text / strings in C
Post by: IanB on April 13, 2020, 08:03:40 am
Not sure what I can do to reduce the amount of elements, the only way to get it to 4 is to have it just copy all of the source array, this is probably how i would use it most of the time.

You don't really need a function for this since the code is so compact:

Code: [Select]
/* copy 8 elements from position 0 in array text to position 10 in array buf:
   int i, j, n;
   for (i = 10, j = 0, n = 8; n > 0; ++i, ++j, --n) {
       buf[i] = text[j];
   }
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 08:51:01 am
It's something that will be reused often. Potentially to save RAM I can create a number of constant strings that are different display options that get copied to the display RAM buffer with the numbers of variables inserted into the array in the appropriate places.
Title: Re: text / strings in C
Post by: Nominal Animal on April 13, 2020, 09:07:19 am
If you are using a GCC-derived compiler, using the __builtin_memcpy() (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) should yield optimal code for each processor; these is provided by gcc, not by any library per se. If you want your exact interface, then
Code: [Select]
static inline void copy_array_to_array(uint8_t destination[], int destination_start, uint8_t source[], int source_start, int source_end)
{
    __builtin_memcpy(destination + destination_start, source + source_start, (source_end - source_start + 1) * sizeof destination[0]);
}
The useful points to realize here are
Whether these are relevant for Simon, I'm not sure.  Typically, the amount of time spent in copying stuff around in memory is neglible on microcontrollers.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:12:52 am
Well as I said it's good code practicing for me. There is learning the mechanics of C and there is actually devising ways of using it to solve problems which is what this was more of an exercise in. In the time it takes me to look this stuff up I can write it myself and know that i am familiar with how it works.
Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 09:22:31 am

That's absolutely fine. I write loops and functions like this all the time, especially of the types of source and destination are different, in which case memcpy() won't help anyway. With at least -O1 optimization and small values of source_end-source_start this will be just as fast as memcpy anyway.

A few absolutely unimportant notes:

- unless you are actually on an 8 bit CPU such as AVR or PIC it would be better (more flexible and possibly a little more efficient) to have destination_start, source_start, source_end and *especially* source_index and destination_index as plain int. Then it will be ok for source[] and destination[] have more than 256 bytes, and work with offsets greater than 256 in them.

- source_index and destination_index are unnecessary but harmless. Arguably they serve to make the code slightly more readable, so it's fine, but as a matter of how C works it's absolutely fine to just use (and increment) source_start and destination_start. This won't propagate back the variables the caller uses. (It would in FORTRAN or ALGOL but those are ancient)

- I'd probably name source_end instead as source_last. Or change the comparison to < instead of <=. The name "source_end" makes me think it's as usual in C one-past-the-end. This would avoid a lot of doing "size-1" in the caller and fit most C programmer's expectations.

- with 5 function arguments you're getting close to the limit of what can be passed in registers on x86_64 (6) and past the limit on ARM32 (4). It's fine to exceed the limit but it just means more code in both the caller and the called function to copy the value to the stack and back.


It's for ARM at the moment. For my intended use I can't see me needing more than 256 elements.

Not sure what I can do to reduce the amount of elements, the only way to get it to 4 is to have it just copy all of the source array, this is probably how i would use it most of the time.

If it's for ARM (32bit architecture), it is not a generally good idea to limit yourself to 8bit index/size variables. It makes for an evil size limitation (when you will need larger sizes afterwards) and 32bit index variable will be as fast, in some cases even faster!
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:25:47 am
not really. ARM addresses memory in bytes. I have saved 3 bytes of RAM on the 4 you suggest. Next I could use 16 bits but really i can't see myself at the moment with an array of more than 256 elements. I can of course make this an 8 bit version and write the 16 bit version when required.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:26:43 am
My current use here is for a display of up to 80 characters.
Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 09:30:05 am
If you are using a GCC-derived compiler, using the __builtin_memcpy() (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) should yield optimal code for each processor; these is provided by gcc, not by any library per se. If you want your exact interface, then
Code: [Select]
static inline void copy_array_to_array(uint8_t destination[], int destination_start, uint8_t source[], int source_start, int source_end)
{
    __builtin_memcpy(destination + destination_start, source + source_start, (source_end - source_start + 1) * sizeof destination[0]);
}
The useful points to realize here are
  • GCC provides a number of built-ins that are normally provided by the C library; these are optimized for the target architecture.  The ones relevant for microcontrollers include strlen(), strchr(), strrchr(), strcmp(), memset(), memcpy(), and memchr().
  • When stuff is an array or a pointer, and index is an integer, (stuff + index) == &(stuff[index]).
  • sizeof stuff[0] evaluates to the size of the elements in stuff, without dereferencing it; the compiler computes the size at compile time.
    I recommend omitting the parentheses when possible, to remind us humans that sizeof is an operator, not a function.
    (For a type, I recommend sizeof (type), with an extra space in between.)
Whether these are relevant for Simon, I'm not sure.  Typically, the amount of time spent in copying stuff around in memory is neglible on microcontrollers.

Not really true. Maybe true for most "arduino type of work".  As Simon uses ARM, memory access speed in fact IS a concern.  ARM is an awful load-store architecture, with pretty inefficient memory access speeds.

For generic occasional data move, this is not of a concern. But whenever a repeated task of copying (especially larger) data is used, the programmer should well be aware of what resources will be consumed. It is especially important, when working with external memory chips and especially DRAM, that is well suited for burst access.

Repeated tasks such as: Continuously refreshing complex drawings on LCDs, any data processing, etc.

I would not like to go tangent here with an argument, just making a point, that memory access is not  free on ARM. Can take more cycles than you would think. It is now up to Simon to learn or ask if interested.

Regarding our current discussed topic of copying memory: The correct buzzwords to look for is loop unrolling and the generic idea behind improving memory access speeds is to read or write in bursts of the largest possible size.

Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 09:33:37 am
not really. ARM addresses memory in bytes. I have saved 3 bytes of RAM on the 4 you suggest. Next I could use 16 bits but really i can't see myself at the moment with an array of more than 256 elements. I can of course make this an 8 bit version and write the 16 bit version when required.

Thats just an imagination of a not well informed person. Just  make a function that has a parameter of a size BYTE (8 bits) and see how it compiles: You should not be surprised to find the parameter is actually passed using a 32bit register anyway.

Restricting a variable to 8bits on ARM may lead to unnecessary instructions executed, such as SIGN EXTENSION, or other truncation of resulting data to fit back to the memory location.

Memory access speed to BYTE (8bit), HALFWORD (16bit) and WORD (32bit) on ARM is exactly the same.*

And rarely you need to optimize for RAM size.  Most of times you need to optimize for execution time instead.

In fact, I have never came into a situation on ARM, where I did have to start optimizing memory used.

//EDIT: * if talking about aligned memory access. unaligned ones are slow or not possible at all, leading to a hardfault.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:39:30 am
So your saying that if i can spare the working RAM I should always use 32 bit variables to save the time spent dealing with converting what will be a 32 bit memory transaction anyway into 8 bit. I thought ARM M0+ which is what this is going on ran a 16 bit instruction set (thumb) in order to improve memory bandwidth. would 16 bit be a fair compromise that the CPU is more used to nativey handling.

Why have byte addressable RAM if the chip works best at 32 bit access chunks?
Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 09:46:02 am
Not always. That is not what I meant. I am just trying to make a point and make you aware what the difference may be, when using just 8bit variables and 32bits.

Having a byte addressable RAM does not rule out with anything. That is an actual benefit.

I am trying to make a point, that the memory has a 32bit wide data bus and that reading it by 32bits at a time is obviously faster, than reading 4 bytes successively.

Also, all processing is done using 32bit wide registers.   So you do not necessarily gain speed with processing data 8 bits at a time, but you can loose time instead, by processing them in this manner.

Title: Re: text / strings in C
Post by: Nominal Animal on April 13, 2020, 09:54:13 am
Note that very often function parameters and local variables are not stored in RAM, and only exist in registers.  This is especially true for 32-bit ARMs. If you use a smaller explicit-size type like unsigned char, int8_t, or int16_t, the compiler may have to add unnecessary AND instructions to ensure the passed register contains a value representable by that type.

Technically, C does nowadays provide types like int_fastN_t and uint_fastN_t for N = 8, 16, 32, and 64, where the type is at least that size, possibly larger, whatever is most efficient for the target processor, exactly for this purpose (local variables and static inline function parameters).   For example, on x86-64 on Linux, gcc uses 8-bit uint_fast8_t and int_fast8_t, but 64-bit for the larger types.



The one oddball trick I have sometimes used is to fill in the string buffer backwards, from right to left.

For example:
Code: [Select]
char *prepend_uint(char *p, unsigned int u)
{
    do {
        *(--p) = '0' + (u % 10);
        u /= 10;
    } while (u);
    return p;
}

char *prepend_block(char *p, const void *s, const int n)
{
    memcpy(p - n, s, n);
    return p - n;
}

char *prepend_reverse(char *p, const char *s)
{
    while (*s) {
        *(--p) = *(s++);
    }
    return p;
}
Let's say you want to construct string No accidents for num days, and you have a buffer with room for say 32 characters, char buffer[32];.  You would use the above functions using e.g.
Code: [Select]
    char *p = buffer + sizeof buffer;
    *(--p) = '\0';  /* End-of-string marker */
    p = prepend_reverse(p, "syad ");
    p = prepend_uint(num);
    p = prepend_reverse(p, " rof stnedicca oN");
or, equivalently,
Code: [Select]
    char *p = buffer + sizeof buffer;
    *(--p) = '\0';
    p = prepend_block(p, " days", 4);
    p = prepend_uint(num);
    p = prepend_block("No accidents for ", 13);
and in both cases, you would have the desired string starting at p .

To see exactly why this would be useful, one would need to look at the machine code: this compiles to very tight little code.  Right-aligning fixed-width fields, and alternate versions for architectures where ROM/Flash memory access is special, is easy to implement.

When filling buffers in the normal order, you can convert numbers to strings in reverse (swapped right-to-left), append any left padding, reverse the string, and finally append any right padding.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:54:31 am

I am trying to make a point, that the memory has a 32bit wide data bus and that reading it by 32bits at a time is obviously faster, than reading 4 bytes successively.


But how does that help me? I can't put more than one 8 bit variable into a single 32 one as I would have to then "decode" the variables which would in turn waste time. Yes if I was doing a bit flag variable I would make one 32 bit one rather than 4 8 bit ones.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 09:57:19 am
Ultimately my buffer is being pumped out on an 8 bit serial port, are there any better variable types that I should use? what does GCC for ARM support?
Title: Re: text / strings in C
Post by: brucehoult on April 13, 2020, 10:03:44 am
Not sure what I can do to reduce the amount of elements, the only way to get it to 4 is to have it just copy all of the source array, this is probably how i would use it most of the time.

You don't really need a function for this since the code is so compact:

Code: [Select]
/* copy 8 elements from position 0 in array text to position 10 in array buf:
   int i, j, n;
   for (i = 10, j = 0, n = 8; n > 0; ++i, ++j, --n) {
       buf[i] = text[j];
   }

That generates more code than memcpy(buf+10, text, 8) and you'd never notice the speed difference. You'd only win if your program didn't already link in memcpy.
Title: Re: text / strings in C
Post by: brucehoult on April 13, 2020, 10:09:51 am
- with 5 function arguments you're getting close to the limit of what can be passed in registers on x86_64 (6) and past the limit on ARM32 (4). It's fine to exceed the limit but it just means more code in both the caller and the called function to copy the value to the stack and back.
Not sure what I can do to reduce the amount of elements, the only way to get it to 4 is to have it just copy all of the source array, this is probably how i would use it most of the time.

You can reduce it by not passing both destination and destination_start to the function but just the sum of them -- or to put it another way, the address of the first byte to write to. And the same for source.
Title: Re: text / strings in C
Post by: brucehoult on April 13, 2020, 10:13:42 am
not really. ARM addresses memory in bytes. I have saved 3 bytes of RAM on the 4 you suggest. Next I could use 16 bits but really i can't see myself at the moment with an array of more than 256 elements. I can of course make this an 8 bit version and write the 16 bit version when required.

No, that's not correct. Those function arguments (the first 4 of them anyway, on ARM) and local variables (up to 12 or so total that are live at the same time) are in CPU registers, not RAM. The registers are always 32 bits even if you put an 8 bit value into them.
Title: Re: text / strings in C
Post by: Nominal Animal on April 13, 2020, 10:20:19 am
Here is a practical example why local variables and function parameters should generally be ints and not chars or shorts or explicit-length types:
Code: [Select]
int  my_slen1(const char *p) { return __builtin_strlen(p); }
char my_slen2(const char *p) { return __builtin_strlen(p); }
When compiling the above using GCC for Cortex-M4 using Thumb instructions, the two compile essentially to
Code: [Select]
    .text
    .thumb

my_slen1:
    b   strlen

my_slen2:
    push {r3, lr}
    bl   strlen
    uxtb  r0, r0
    pop  {r3, pc}
Instead of saving memory, my_slen2() generates extra code, because the return value must be limited to 8 bits!

Similar effects happen when calling functions that take parameters that are limited to smaller range of values than the native registers support.

So, what to do?

It is a portability issue.  On 8-bit Arduinos, int is a 16-bit type, and technically requires two registers.  On 64-bit architectures, int may be just 32-bits.

Although the int_fastN_t and uint_fastN_t types would be the technically best option (with N being the smallest reasonable value for that particular variable; 8, 16, 32, or 64), not many programmers use them.  If others work on your code, they may be surprised and get it wrong, which incurs a maintenance burden.

Current POSIXy systems I use are either ILP32 or LP64, so long is the "native register type".  However, for in-memory sizes and counts, I use size_t .

Some projects define their own types, using preprocessor #if directives to choose the best mapping.  For example, you might have iregN and uregN, analogously to int_fastN_t and uint_fastN_t.  Because of the nonstandard type name, other programmers might actually read the documentation or comments on how the types are intended to be used.  However, don't fall into the WORD/DWORD/QWORD trap; better assume the types are binary (two's complement if signed) with a fixed number of bits, and have that number (or lower limit) in the type.

Most current C code seems to use size_t (for in-memory sizes and counts) and int for everything else.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 10:21:30 am
not really. ARM addresses memory in bytes. I have saved 3 bytes of RAM on the 4 you suggest. Next I could use 16 bits but really i can't see myself at the moment with an array of more than 256 elements. I can of course make this an 8 bit version and write the 16 bit version when required.

No, that's not correct. Those function arguments (the first 4 of them anyway, on ARM) and local variables (up to 12 or so total that are live at the same time) are in CPU registers, not RAM. The registers are always 32 bits even if you put an 8 bit value into them.


So what is the difference if i give the register up to 8 bits or up to 32 bits? or is it the time taken to fill in the other 24 bits? but then the register should be empty before the 8 bits are put into it?
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 10:25:48 am
Here is a practical example why local variables and function parameters should generally be ints and not chars or shorts or explicit-length types:
Code: [Select]
int  my_slen1(const char *p) { return __builtin_strlen(p); }
char my_slen2(const char *p) { return __builtin_strlen(p); }
When compiling the above using GCC for Cortex-M4 using Thumb instructions, the two compile essentially to
Code: [Select]
    .text
    .thumb

my_slen1:
    b   strlen

my_slen2:
    push {r3, lr}
    bl   strlen
    uxtb  r0, r0
    pop  {r3, pc}
Instead of saving memory, my_slen2() generates extra code, because the return value must be limited to 8 bits!

Similar effects happen when calling functions that take parameters that are limited to smaller range of values than the native registers support.

So, what to do?

It is a portability issue.  On 8-bit Arduinos, int is a 16-bit type, and technically requires two registers.  On 64-bit architectures, int may be just 32-bits.

Although the int_fastN_t and uint_fastN_t types would be the technically best option (with N being the smallest reasonable value for that particular variable; 8, 16, 32, or 64), not many programmers use them.  If others work on your code, they may be surprised and get it wrong, which incurs a maintenance burden.

Current POSIXy systems I use are either ILP32 or LP64, so long is the "native register type".  However, for in-memory sizes and counts, I use size_t .

Some projects define their own types, using preprocessor #if directives to choose the best mapping.  For example, you might have iregN and uregN, analogously to int_fastN_t and uint_fastN_t.  Because of the nonstandard type name, other programmers might actually read the documentation or comments on how the types are intended to be used.  However, don't fall into the WORD/DWORD/QWORD trap; better assume the types are binary (two's complement if signed) with a fixed number of bits, and have that number (or lower limit) in the type.

Most current C code seems to use size_t (for in-memory sizes and counts) and int for everything else.

So should i just use int/uint instead? and let the compiler choose?
Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 10:35:15 am
Ultimately my buffer is being pumped out on an 8 bit serial port, are there any better variable types that I should use? what does GCC for ARM support?

Well, on ARM, you most time get some more cleverly built peripherals. And having an 8bit wide SPI or UART or whatever interface bus does not necessarily mean you need to feed it 8 bits at a time. Most peripherals (you need to RTFM the exact manual specific to your device!) support wider accesses to feed in data in more efficient manner. That is to make less memory accesses and use less bus cycles. Also, you have the DMA to help with all this.

Let me demonstrate a common one:

For example on STM32 microcontrollers, it is a well known case that numerous programmers fight with SPI peripheral. That's because the data register is 16bit wide and writing to the register just like this

Code: [Select]
uint8_t data = 0x55;
SPIx->DR = data;

triggers a transmission of 2 bytes instead of one.  That is because DR is 16bit and compiler does exactly as is supposed to: uses a 16bit write to 16bit memory location (DR register).

To correct it, one must typecast the data register to 8bit, to tell the compiler to produce just 8bit memory write access:

Code: [Select]
*(uint8_t*)&(SPIx->DR) = data;
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 10:53:30 am
The SPI port i am using is 8bits wide, if it were 16 or 32 bits wide I would just store my predefined texts in 32 bit variables so that I have 1/4 the interrupt calls when i get round to driving this with interrupts.
Title: Re: text / strings in C
Post by: Jeroen3 on April 13, 2020, 10:56:10 am
For example on STM32 microcontrollers, it is a well known case that numerous programmers fight with SPI peripheral. That's because the data register is 16bit wide and writing to the register just like this

Code: [Select]
uint8_t data = 0x55;
SPIx->DR = data;

triggers a transmission of 2 bytes instead of one.  That is because DR is 16bit and compiler does exactly as is supposed to: uses a 16bit write to 16bit memory location (DR register).

To correct it, one must typecast the data register to 8bit, to tell the compiler to produce just 8bit memory write access:

Code: [Select]
*(uint8_t*)&(SPIx->DR) = data;
False. SPI can only be accessed with an 16 or 32 bit load/store. Using 8 bit load/store instruction on SPI will cause a bus error. (entire APB can't do 8 bits I think)

Simon is still struggling with C string pointers and you're doing micro optimizations?
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 11:04:55 am
I'm still trying to find out what data types i am supposed to be using, another one of those best kept secrets, is it a case of making a compromise on memory use and speed of execution? i still don't get why putt 8 bits on a 32 bit bus is slower than putting 32 bits.
Title: Re: text / strings in C
Post by: brucehoult on April 13, 2020, 12:08:56 pm
I'm still trying to find out what data types i am supposed to be using, another one of those best kept secrets, is it a case of making a compromise on memory use and speed of execution? i still don't get why putt 8 bits on a 32 bit bus is slower than putting 32 bits.

It's not. It's usually the same.

Putting 8 bits four times in a row will be slower than putting 32 bits once. But that can complicate the programming a lot. And usually unncessarily.

Just keep doing what you're doing. Write code that makes sense for your problem, not for the exact CPU you're using. It will be *good enough*.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 12:22:40 pm
yes which is what I said earlier. I did consider the idea of putting all port and pin information into one byte but soon decided that the time required to separate the port information and pin information made it a really silly idea. i can't see how there is any way to encode anything into larger variables and save time over single memory transfers.
Title: Re: text / strings in C
Post by: Yansi on April 13, 2020, 12:29:01 pm
For example on STM32 microcontrollers, it is a well known case that numerous programmers fight with SPI peripheral. That's because the data register is 16bit wide and writing to the register just like this

Code: [Select]
uint8_t data = 0x55;
SPIx->DR = data;

triggers a transmission of 2 bytes instead of one.  That is because DR is 16bit and compiler does exactly as is supposed to: uses a 16bit write to 16bit memory location (DR register).

To correct it, one must typecast the data register to 8bit, to tell the compiler to produce just 8bit memory write access:

Code: [Select]
*(uint8_t*)&(SPIx->DR) = data;
False. SPI can only be accessed with an 16 or 32 bit load/store. Using 8 bit load/store instruction on SPI will cause a bus error. (entire APB can't do 8 bits I think)

Simon is still struggling with C string pointers and you're doing micro optimizations?

I think Simon is doing quite some progress already, so what is wrong with giving a little bit of an insight why stuff is than the way it is done? The worst thing is that he won' understand and will ask for clarification. Better than to stay dumb and know nothing.


Regarding the SPI what you call "false", you should look it up first.
https://electronics.stackexchange.com/questions/324439/stm32-spi-slave-send-data-problem
https://community.st.com/s/question/0D50X00009kHTHu/bug-in-stm32f0-spi-lowlevel-driver-llspireceivedata8-and-llspitransmitdata8
etc...  just look how a STM32F0_LL_HAL driver implements LL_SPI_TransmitData8 ...
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 12:38:57 pm
I do need to tackle one topic at a time. The SPI is 8 or 9 bit on the SMAC, no point in putting data into 32 bit variables. I would incur code overhead to get it in and out. Fact is much of the data in an embedded system is 8 bit, I do like the fact that with ARM when I do come to deal in 16 and 32 bit variables they will take it in their stride unlike 8 bitters.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 01:25:47 pm
silly question related to another thing I wrote some code for. If I have a function that works on an array variable but only on one can I do a return variable at the end of the function or should I just access the variable externally. It's an external variable anyway.
Title: Re: text / strings in C
Post by: Nominal Animal on April 13, 2020, 01:50:22 pm
So should i just use int/uint instead? and let the compiler choose?
Like I wrote, it depends on how important portability is to you, and to what sort of systems.

For code running on 32-bit processors, using int or unsigned int works well.
Title: Re: text / strings in C
Post by: Nominal Animal on April 13, 2020, 02:29:51 pm
silly question related to another thing I wrote some code for. If I have a function that works on an array variable but only on one can I do a return variable at the end of the function or should I just access the variable externally. It's an external variable anyway.
If there is a real reason why it is always the same array, just access the global variable.

If there is a possibility the same operation could be used for different arrays, pass the array (and length) as a parameter.

For example, consider:
Code: [Select]
char *uint_string(unsigned int  u)
{
    static char buffer[12]; /* Max. 99,999,999,999 */
    char *p = buffer + sizeof buffer;

    if (u > 99999999999) {
        /* Not enough room in buffer! */
        return NULL;
    }

    *(--p) = '\0';
    do {
        *(--p) = '0' + (u % 10);
        u /= 10;
    } while (u);

    return p;
}
This uses 12 bytes of RAM for the buffer, and each consecutive invocation overwrites the previous value.  The return value is a pointer to a string (in RAM) containing the unsigned integer value as a decimal string.

It would be perfectly okay to move the static char buffer[]; outside the function, for example if several functions could use the same buffer.

However, what if one needs more than one result at the same time?  We could instead do
Code: [Select]
char *uint_buffer(char *buf, size_t len, unsigned int u)
{
    char *ptr = buf + len;

    if (len > 0) {
        /* Terminate buffer */
        *(--ptr) = '\0';
    }

    do {
        if (len < 1) {
            /* Not enough room in buffer! */
            return NULL;
        }

        --len;
        *(--ptr) = '0' + (u % 10);
        u /= 10;
    } while (u);

    return ptr;
}

char *uint_string(unsigned int u)
{
    static char buffer[12]; /* 11 digits is enough for everyone! ;-) */
    return uint_buffer(buffer, sizeof buffer, u);
}
which provides the same uint_string(), but only as a wrapper around uint_buffer(), which takes the buffer array as a parameter.

For embedded code, I'd lean towards using the array directly (first example).

It does mean that if you realize you could reuse the function with a different array, instead of just copy-pasting and tweaking the copy to "just work", you really need to do refactoring, to extract the common code to a single function, possibly with wrapper functions for the particular cases, similar to uint_buffer(). It's not a lot of extra work, but there is the temptation to just copy-paste-tweak it; and that leads to unmaintainable bloated blob of spaghetti code, which is no fun.
Title: Re: text / strings in C
Post by: IanB on April 13, 2020, 06:01:22 pm
So should i just use int/uint instead? and let the compiler choose?

For the most part, yes, you should use the ordinary types. The design of the C language is that the compiler will try to choose a representation that is most efficient for the target hardware. The only time you should really use the special types like uint8_t is for example when you very specifically want the value to fit into a hardware register of known size.

For general programming, use char, int, long, unsigned int, etc.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 07:31:19 pm
so if I use uint the compiler will decide if i need 8, 16 or 32 bits? sounds a bit dodgy for things like writes to an spi peripheral.
Title: Re: text / strings in C
Post by: IanB on April 13, 2020, 07:59:13 pm
so if I use uint the compiler will decide if i need 8, 16 or 32 bits? sounds a bit dodgy for things like writes to an spi peripheral.

No, an int is always at least 16 bits, but it maybe more if the compiler thinks that would be more efficient.

If you need at least 8 bits use "char".
If you need at least 16 bits use "int".
If you need at least 32 bits use "long int".
If you need at least 64 bits use "long long int".

If you need it to be unsigned put "unsigned" in front of it. Therefore unsigned integers of at least 16 bits are "unsigned int".

As I mentioned above, when dealing with hardware you may need to construct bit patterns of exact size and order. You can do this in C if needed, but for general programming it is better to use the regular datatypes and let the compiler have freedom to optimize your code.
Title: Re: text / strings in C
Post by: Simon on April 13, 2020, 08:25:55 pm
Well this is sort of what i asked above. If using 8 bit variables causes code overhead does using the generic data types allow the compiler to optimise, so if I only need 8 bits but there is plenty of memory to use 32 bits and it's faster it does that ?
Title: Re: text / strings in C
Post by: IanB on April 13, 2020, 09:31:28 pm
Well this is sort of what i asked above. If using 8 bit variables causes code overhead does using the generic data types allow the compiler to optimise, so if I only need 8 bits but there is plenty of memory to use 32 bits and it's faster it does that ?

It's best not to overthink it. If you are sending text to an LCD you are using (probably) ASCII characters, so use "char".

If you are doing integer arithmetic or bit manipulation, use "int" or "unsigned int".

Contrary to what I think I saw earlier in the thread, it is perfectly OK to do this:

Quote
    const char* const message = "Some menu item";

This will allow the compiler to store the text string in some area of memory reserved for constant data and you promise the compiler you will never try to overwrite it. The two consts also make sure you never accidentally reassign the message pointer to point somewhere else and thus lose access to the message.

This construct is very good for menus, since the menu items do not change during the running of the program.

You could also do this:
Code: [Select]
    const char* menu_items[] = { "First item", "Second item", "Third Item" };
Title: Re: text / strings in C
Post by: Nominal Animal on April 14, 2020, 12:22:03 am
Well this is sort of what i asked above. If using 8 bit variables causes code overhead does using the generic data types allow the compiler to optimise, so if I only need 8 bits but there is plenty of memory to use 32 bits and it's faster it does that ?
If we consider code that will only run on 32-bit microcontrollers, and ignore the portability issues to other types of architectures (64-bit, 8-bit), then:

If you need at least 8 bits use [...]
Noooooo!

The proper types are provided by the compiler when you include <stdint.h> (this header file is provided by the compiler; GCC in Simon's case).

Here is the more complete, portable but complex, criterion for choosing the best integer type for each variable or structure member:

The bit array word size autodetection is actually very simple:
Code: [Select]
#if !defined(BITGROUP) && !defined(BITGROUP_BITS)
#if __WORDSIZE < 16
#define  BITGROUP  uint8_t
#define  BITGROUP_BITS  8
#elif __WORDSIZE < 32
#define  BITGROUP  uint16_t
#define  BITGROUP_BITS  16
#elif __WORDSIZE < 64
#define  BITGROUP  uint32_t
#define  BITGROUP_BITS  32
#else
#define  BITGROUP  uint64_t
#define  BITGROUP_BITS  64
#endif
#endif
#define  BITGROUPSFOR(bits)  (((bits) + BITGROUP_BITS - 1) / BITGROUP_BITS)

static inline BITGROUP bitgroup_getbit(const BITGROUP *const map, const size_t bit)
{
    const BITGROUP  mask = ((BITGROUP)1) << (bit % BITGROUP_BITS);
    return !!(map[bit / BITGROUP_BITS] & mask);
}

static inline void bitgroup_clearbit(BITGROUP *const map, const size_t bit)
{
    const BITGROUP  mask = ((BITGROUP)1) << (bit % BITGROUP_BITS);
    map[bit / BITGROUP_BITS] &= ~mask;
}

static inline void bitgroup_setbit(BITGROUP *const map, const size_t bit)
{
    const BITGROUP  mask = ((BITGROUP)1) << (bit % BITGROUP_BITS);
    map[bit / BITGROUP_BITS] |= mask;
}

static inline void bitgroup_togglebit(BITGROUP *const map, const size_t bit)
{
    const BITGROUP  mask = ((BITGROUP)1) << (bit % BITGROUP_BITS);
    map[bit / BITGROUP_BITS] ^= ~mask;
}

static inline void bitgroup_setbit_to(BITGROUP *const map, const size_t bit, BITGROUP state)
{
    if (state) {
        map[bit / BITGROUP_BITS] |= ((BITGROUP)1) << (bit % BITGROUP_BITS);
    } else {
        map[bit / BITGROUP_BITS] &= ~(((BITGROUP)1) << (bit % BITGROUP_BITS));
    }
}
so that to declare an array of words enough for e.g. 293 bits, you use
Code: [Select]
BITGROUP mybits[BITGROUPSFOR(293)];
The BITGROUPSFOR(bits) macro calculates the number of elements needed for bits bits, rounding up.  To get, set, clear, or toggle specific bits, you can use the bitgroup_getbit()/_setbit()/_setbit_to()/_clearbit()/_togglebit() inline helper functions.  I normally add also bit range functions, for clearing ranges of bits.  To clear the entire map, I use memset(mybits, 0, sizeof mybits);.

You can also use the BITGROUP type for unsigned integers of native register size.  You might wish to name it better, though.

The default type and size can be overridden by defining the preprocessor macros BITGROUP and BITGROUP_BITS before the above code; for example, at the GCC command line, using -DBITGROUP=type -DBITGROUP_BITS=size .
Title: Re: text / strings in C
Post by: IanB on April 14, 2020, 12:46:12 am
If you need at least 8 bits use [...]
Noooooo!

The proper types are provided by the compiler when you include <stdint.h> (this header file is provided by the compiler; GCC in Simon's case).

Well I think that depends on your definition of the word "proper"  :)

If I have one of the special cases you list, I should use one of the special types provided for those cases (e.g. size_t for sizes and indices).

If I am just doing simple integer arithmetic with small numbers I will use "int". As long as I know that I am only guaranteed support for values from -32768 to 32767 this will be fine and will be perfectly portable. Code should never contain more decoration and special cases than are truly needed, otherwise readers will ask, "Why is this here?"
Title: Re: text / strings in C
Post by: Nominal Animal on April 14, 2020, 04:46:34 am
Well I think that depends on your definition of the word "proper"  :)
For sure.  There wasn't a good emoticon to express it.  I vacillated between "noooo" and "yes, but, butbut"  :).

Basically, I too recommended int + size_t to Simon in a message or two earlier, for the very same reasons.

Yet, the simplistic char/short/int/long int (+ long long int for compilers that support it) selection rule has failed me before (and caused me pain in existing old projects).  It works fine on any single architecture, except for the "if you choose a too small type for the native register size, you get extra machine code" detail, that might or might not matter.

However, when porting code to different architectures, with different type sizes (in particular, from an ILP32 to LP64, i.e. from 32-bit ints, longs and pointers to 32-bit ints but 64-bit longs and pointers), the types don't work that well.  That's why the "new" integer types were added to C99, after all.

So, consider the "nooooo" as a quiet sob, not a yell.

The "new" integer types were designed to solve the exact questions Simon has posed here.  When moving from 16-bit to 32-bit architectures, and again when moving from 32-bit to 64-bit architectures, code that used the simple size-based rules became inefficient.  (The integer/pointer size disparities between e.g. ILP32 and LP64 did lead to bugs, and the intptr_t/uintptr_t types are designed to fix those now and in the future, but I'm talking about inefficiencies as in compiler generating unneeded extra code to follow the standard C integer type rules because it does not know the programmer intent.  These "new" integer types "size behaviour" better expresses programmer intent, even if the hardware implementation is unchanged.)

In particular, with these "new" types, there is no reason why int_fast16_t, int16_t, or int_least16_t should all/any be the same type: the first one is the one that is suitable for temporary variables and function parameters (i.e., is of native integer arithmetic size), second is exactly 16 bits, and the third is at least 16 bits, but the architecture can use a larger type if keeping it to just 16 bits would mean extra code (say, accessing the upper 16 bits of a 32-bit word would require bit shifting).

While there is nothing special in their hardware implementation, the interesting part in them is the rules on how their sizes are defined on different architectures, and how useful they can be -- for portable code.  But few C programmers write truly portable code, so not many C programmers know this.  (Which is kinda why I'm harping about it here, even this is just a thread where Simon is looking for quick hints on how to progress.)
Title: Re: text / strings in C
Post by: Simon on April 14, 2020, 08:33:39 am
so who created the uint8_t stuff?
Title: Re: text / strings in C
Post by: SiliconWizard on April 14, 2020, 03:45:47 pm
However, when porting code to different architectures, with different type sizes (in particular, from an ILP32 to LP64, i.e. from 32-bit ints, longs and pointers to 32-bit ints but 64-bit longs and pointers), the types don't work that well.  That's why the "new" integer types were added to C99, after all.

This was a longgg overdue addition. For a standardized language that was supposed to be good for low-level stuff, not having standard exact-width (and guaranteed minimal width, with the least* variants) types was mind-boggling.

Even much higher-level languages such as ADA had this.

Title: Re: text / strings in C
Post by: Nominal Animal on April 15, 2020, 02:43:56 am
so who created the uint8_t stuff?
I'm not sure who came up with them first, but by the mid-nineties, everyone agreed they were a good, necessary thing, and the C standard included them in C99, also known as the ISO/IEC 9899:1999 standard.

Although you need to #include <stdint.h> to use them, they aren't really a part of the standard C library, and are provided by the compiler itself.  So, even when you are writing freestanding code, say an operating system kernel, or bare metal code for a microcontroller, this is still available.  It is very much part of the core language itself, ever since C99.  (Although, *all* C compilers, even ANSI C/C89 ones, can provide them; and even if one does not, anyone can trivially add it themselves, so basically all C compilers, even oddball specialized ones, can be expected to provide these.)

And SiliconWizard is exactly right, C should have had these from the get go.

IanB is also right in that most C programmers are still unaware of them (well, everyone knows about intN_t and uintN_t and size_t, but very few regularly use int_fastN_t and uint_fastN_t -- not even myself), but I think that is a human problem -- the tutorials don't describe or recommend their use, and the huge amount of existing C code doesn't use them much either; some human C programmers never encounter them at all.

In real life, it is much more common to use custom type names in library code.  For example, glib uses g_uint, Windows APIs use DWORD et cetera.

If I were to write my own OS kernel or a Hardware Abstraction Library for programming a microcontroller or embedded system on bare metal in C or the appropriate subset of C++, I would probably use unsigned char for all strings, size_t for sizes, inative_t and unative_t for native register size signed and unsigned integer types (8, 16, 32, or 64 bit two's complement binary), inativeN_t and unativeN_t instead of int_fastN_t and uint_fastN_t, and intN_t and uintN_t for fixed-size signed and unsigned integer types.

If the Arduino environment had done that, it would be much easier to write code that works near-optimally on different hardware architectures (like 8-bit AVRs and 32-bit ARMs).
Title: Re: text / strings in C
Post by: Simon on April 15, 2020, 09:16:05 am
Well I would guess that using a specific size variable will make code more portable not that that is an issue with bare metal program as it is not meant to be portable.
Title: Re: text / strings in C
Post by: Nominal Animal on April 15, 2020, 11:37:23 am
Surprisingly, there is quite a lot of bare-metal C code, that is portable between wildly different architectures.  And I'm not talking only about the Linux kernel, either -- although it is probably the most well known example.

Any embedded device that has a product lifetime longer than say 5 years should consider the portability aspect, really; it may make a significant difference to the BOM cost, later on.  The portability aspect is the only thing that can save on the software development cost.  If you look at many embedded devices, like routers, you'll notice they can completely switch hardware architectures within the same product, between versions/revisions.  You don't do that, unless (most of) the same code can work on both.

Others who work on actual commercial embedded products could chime in, as I don't, but even using an interface shim layer using the custom types I mentioned on top of existing HALs, can make the actual product much easier to port between vendors (and their HALs).

In particular, for libraries and HALs, one must remember that it does not matter if you have one or more implementation, as long as the user-developer facing side is the same across all of them; then, the user-developer does not even need to port their code between the architectures, as it should Just Work -- much like in the Arduino system.  (Except that because Arduino folks did not consider the integer types, there is a lot of stuff that makes life hard for library writers, and compiled Arduino code less than optimal, particularly when comparing 8-bit AVR and 32-bit ARMs.)



A lot of what I have written here, Simon, is something to consider only, and perhaps let simmer; something you might recall when encountering a related problem.  In particular, do feel comfortable using just size_t and int and unsigned int, because that's how most existing C code does it.

Besides, especially during the learning phase, it is most important to get stuff working, even if it is not that elegant/clean/optimal, as writing the code is just a small part of software engineering, and you need to have somewhat working code to get experience on the rest, especially testing, maintenance (and porting, yes), documentation, and so on.

Also, I've found out that if something turns out to be useful in the long term or in more than one environment, you do end up rewriting it, incorporating all the features (and dropping the unneeded ones) and details one has learned from experience.  So, it is not "ugly" or "bad" to write code that you know is far from optimal!  (Security, on the other hand, must be designed in to the software, and cannot be bolted on top afterwards.)

Indeed, one of the common programmer faults is premature optimization.  Algorithmic and system-level optimizations always yield much better results than code-level optimizations, and my own experience says that one shouldn't bother code-level optimizations at all before the first rewrite; the actual use and testing of the "crude"/"naïve" version always teaches me so much about the actual human-scale problem/task at hand, that code optimization before that is usually just wasted time.  There are exceptions, of course, but there is something in code-level optimization that tends to attract programmer minds, and being aware of it and that it doesn't matter much at all in real life, is kinda important.
Title: Re: text / strings in C
Post by: Simon on April 15, 2020, 01:16:24 pm
Barebones code cannot be portable unless the new device has the exact same peripheral functionality and registers. Application code can be portable. I assume this is the aim of any HAL to abstract out the hardware so that the same top level application code works. So currently i am working with the SERCOM of the SAMC in SPI mode. it is unlikely that outside of the SAMC family the registers will be the same. I have already found the SAMD registers to be different for the timer/counter even though it's a similar device. But yes i aim to present to my application code functions to call to interact with the hardware that can be rewritten for other architectures, so that a move of target would mean that having rewritten my low level drivers i can just port it over. My SPI libraries are in two files, one that does hardware interaction and one that creates useful things that the main code can call.
Title: Re: text / strings in C
Post by: SiliconWizard on April 15, 2020, 06:59:53 pm
Surprisingly, there is quite a lot of bare-metal C code, that is portable between wildly different architectures.

Yes. Most of the code I've ever written was completely portable. Even low-level stuff. Obviously hardware-related code would need to be modified for a different target, but even that you can write as portable as can be, so the porting effort is minimal. It saves a gigantic amount of time in the long run. I'm sad to see that many people never realize it, and kind of keep "rewriting" the same stuff over and over again (possibly with the same initial bugs to iron out), as though they were paid by the amount of code lines. Sadly, I think this is very close to being the case for many employed developers.

Title: Re: text / strings in C
Post by: Nominal Animal on April 16, 2020, 04:26:51 am
Barebones code cannot be portable unless the new device has the exact same peripheral functionality and registers.
No, that is not true.

There is a difference between "being portable" and "compiling as-is" on different architectures.  The code can have modular parts, where alternative implementations of some part all provide the exact same interfaces, only differ in their internal implementation.  This is usually called abstraction -- and if you do it in a library form for a set of hardware, you get a hardware abstraction layer -- or a driver.. but at the core, it is just a modular approach, choosing how a detail is implemented (often, but not always based on the hardware it needs to access), within a single software project.

(Note that I am writing the following in case someone finds it interesting.)

There are four common ways this can be done:

In all four cases, the way the C standard defines the integer types (char, short, int, long, and their unsigned variants) makes it hard to write efficient portable code that works on different hardware (different register and word sizes, and memory access methods).  Custom types, on top of exact-width intN_t/uintN_t, size_t, and int_fastN_t/uint_fastN_t, much better match the programmer intent, while still allowing the compiler to generate optimal code.

One could of course argue that because it is not the exact same preprocessed C source for different architectures, it is not exactly the same C code, but I disagree, because the code is part of the same project, is in the same intertwined and interconnected source code set, written and maintained by the same people.