Author Topic: Crazyness: C and C++ mixing (Read 10645 times)

Siwastaja · « **Reply #25 on:** June 14, 2021, 03:36:11 pm »

What comes to RAII in microcontroller embedded, I don't think it's the best pattern, or any good at all.

Despite its name, the "strength" of RAII is not in initialization, but in deinitialization when the objects fall out of scope.

For initialization, it's just a bit of syntactic sugar of calling a constructor with (params) instead of init_object(&object, params) then error-checking through exceptions, instead of "if(!object)". Both are short and readable, just a bit different.

But the real argument for RAII in desktop is that you don't accidentally forget to deinit, and don't need to do it explicitly, it happens automatically when the object goes out of scope. Great, I agree!

But on MCUs, the #1 pattern is to never deinitialize at all! You allocate mostly fixed resources as a very first thing in main(), even on quite complex projects. And further, when you do need to deinitialize, that is always for some very good reason, and it's not going to be a case of "oh, let the compiler do it somewhere implicitly, I don't care"; no, it's going to be something where the exact timing is likely going to matter, it might intervene with other peripherals in twisted ways, and you want to handle the whole deinit/reinit/whatever shebang on the lowest possible level, very likely having to break "nice abstractions".

High level properly designed C++ projects are amazing cathedrals, and yes they do exist, but in real life it's not always that easy, especially on a small MCU with peripherals that connect to each other by design.

SiliconWizard · « **Reply #26 on:** June 14, 2021, 04:48:37 pm »

Just to add a quick note about the original question, which was mixing C and C++ in the same project.

I have done this in the past - never in embedded projects, but on some "desktop" projects, for which I had to use a library written in one language within a project mainly written in another. So that's pretty much what the OP describes. First step when deciding to do this is have a good rationale for choosing one particular language, and for choosing one particular library written in another. When things are clear this is a good idea (or maybe sometimes the only option), then you can proceed.

First, define and implement an interface - that may be a thin layer over the original interface. Of course, if you're using a C library in a C++ project, you may not need to define an extra interface at all - C++ can use C functions provided they are declared as extern "C". Now if this is the opposite (using a C++ lib in a C project, something I've done a couple times too), obviously you'll need to add a bridge interface. This will be written in C++ (to be able to use the C++ library) and will expose C functions (and possibly data structures) that can be called from C.

Compiling C as C++ is rarely a good idea and actually yields no benefit, except for not necessarily having to care with 'extern "C"', but really, that's something basic to learn. C is definitely NOT a subset of C++. Many rules are different, and this is a nice recipe for headaches and posting a lot of annoying questions on dev forums. Sure if you're writing C code *yourself* that you want to be compilable as C++, it's doable. But you'll need to know what is not C about that. Now if this is some third-party C code, chances are, it will require a lot of modifications for even be compilable. Rarely worth it.

So if you have a C lib that you want to use within C++, just compile it as C, and make sure its interface (through some header file) are enclosed with 'extern "C"'. That's all there is to it.

This can apply to any other mix of languages. Some complex projects can be made of pieces written in different programming languages, this can actually make sense. So in general, each will be compiled with its own compiler, an interface between languages will be required, and that's it. In particular, many languages can call C stuff, even ADA!

Just my 2 cents.

Unixon · « **Reply #27 on:** June 14, 2021, 06:22:03 pm »

Thinking of C and C++ as two different languages seems wrong, it is one language that comes in full featured and reduced modes.
Please correct me here, but I tend to think that the only reason to choose "pure C" is when a compiler for full C++ mode is not available for a particular hardware architecture.

Siwastaja · « **Reply #28 on:** June 14, 2021, 06:32:50 pm »

Quote from: Unixon on June 14, 2021, 06:22:03 pm

Thinking of C and C++ as two different languages seems wrong, it is one language that comes in full featured and reduced modes.

No, no and no. It makes no sense to spell it out to you specifically, you'll find a lot of educational material and discussion about this using the popular search engines, but in a nutshell, C is not a subset of C++, the seemingly common parts have many more or less subtle differences, you notice it right away when you try to compile any large/complex C project using a C++ compiler (in C++ mode).

They are both properly standardized and specified languages developed by different steering groups resulting in different standards.

And obviously, born almost two decades earlier, C is completely its own language and guess what, there are C compilers in existence, which are not "C++ compilers working in reduced mode".

Also there are many good reasons to choose "pure C" over some subset of C++, one of which is actually to keep harmful programmers out of the project.

Unixon · « **Reply #29 on:** June 14, 2021, 07:11:43 pm »

Quote from: Siwastaja on June 14, 2021, 06:32:50 pm

and guess what, there are C compilers in existence, which are not "C++ compilers working in reduced mode".

Sure I am aware of that. This is exactly my argument about reason behind choice of using C instead of C++.

p.s. After re-reading C/C++ list of incompatibilities I would say that where those take place C allows for more bad coding practices.
Here I'm referring to Wiki [https://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B].

AntiProtonBoy · « **Reply #30 on:** June 15, 2021, 03:13:55 am »

Quote from: Siwastaja on June 14, 2021, 03:36:11 pm

What comes to RAII in microcontroller embedded, I don't think it's the best pattern, or any good at all.

If the language offers the feature, when why not use it as needed? Every time you need to manage state and there is a requirement to clean up, RAII is the best pattern for the job. The resource in question could be literally anything ranging from memory, to IO ports that controls an external device, or whatever.

Nominal Animal · « **Reply #31 on:** June 15, 2021, 09:35:20 am »

Quote from: SiliconWizard on June 14, 2021, 04:48:37 pm

Just to add a quick note about the original question, which was mixing C and C++ in the same project.

No, the OP just didn't word it properly. Everything they mentioned was specifically not standard C nor C++, but the kinds of freestanding/oddball libraries using subsets of C++ and/or C in embedded environments.

Yes, extern "C" { ... } pattern is relatively common in these environments, simply because they do mix both C and C++. Typically the main reason is not so much language compatibility, but link-time and symbol compatibility, as GCC does not "mangle" C symbols like it does C++ symbols. Even Arduino core <Arduino.h> includes that.

RAII does not apply well to embedded environments with limited resources, because the vast majority of objects are always present (on ARMs and AVRs, often initialized by the bootloader, by copying initial data structures from ROM/Flash, so that when code execution starts, all static initialization has already been done).

To ensure the order of objects requiring dynamic initialization (probing etc., say enumerating sensors on an I²C bus) across compilation units, GCC/G++ init_priority attribute is sometimes needed to ensure the initialization order required by hardware, when those aren't trivially expressed by the C++ code. So, typical C++ objects used in this environment are used in a very different manner/pattern than what you see in typical application or system-level C++ code: vast majority of nontrivial objects in this environment are declared statically in file scope, because anything else would be a waste of resources.

Besides, the raison d'être for the RAII approach is exception safe resource management – and as I've described, exceptions are not supported by most of these environments at all. It is just not the correct tool for the job here.

Simply put, neither the typical C++, or C, development paradigms are directly suitable for this not-very-standard environment. Parts and aspects, yes, definitely; this isn't that different. But the differences are big enough to trip people.

I have not described any specific paradigms that are suitable here, because they are heavily influenced by the base libraries available, as well as the regulatory domain one wants to conform to (MISRA aka Motor Industry Software Reliability Association in particular). In short, there isn't one; there are many. And that, too, means that it is best to start from minimal expectations, and build up from there.

Personally, I do not even have "one": I have a set, like a toolbag, from which I select the ones that seem to fit the problem best, and am continuously learning new ones. I will never, ever know what the "best" one is, because I never use just one in isolation; and for each use case, the set is different. Even the license for the work product is a tool. This approach seems to work quite well.

Tagli · « **Reply #32 on:** June 15, 2021, 10:22:28 pm »

RAII becomes irrelevant in these 2 cases:

1. Dynamic allocation isn't used in the project.
2. Dynamic allocation is used, but during initialization only and the objects never get destroyed.

These both cases are common in embedded systems with constrained resources.

I think dynamic allocation helps to organize object initialization order, and makes it easier to write generalized & easy to configure classes. In my projects, I generally use "allocate during initialization only and never destroy anything" method.

Placement New is another good C++ tool that allows object initialization on statically allocated memory buffers. It can help avoid using heap.

westfw · « **Reply #33 on:** June 16, 2021, 12:30:06 am »

Quote

see Kate Gregory's excellent presentation on the subject.

Watched it (again, actually.) While her points make a lot of sense for desktop application developers, you'll note that she pretty much IMMEDIATELY recomends using exactly the features that Animal mentions are frequently not present in "freestanding C++ environments.""Things will be much simpler and understandable if you use C++ <String> instead of "char*" arrays, and <vector> instead of C arrays (and the rest of <algorithm> instead of re-inventing wheels.) Also, avoid using pointers."

That's swell in a traditional C++ environment, but fails pretty immediately in most microcontroller environments. Not only is the STL frequently "not included", but the usual implementations are full of features (like dynamic allocation) that are actively avoided in small embedded environments. (For example, Arduino has its own simplified implementation of <String>, but it's widely suspected of being broken for "typical" usage, because of the way it thrashes memory.

Nominal Animal · « **Reply #34 on:** June 16, 2021, 06:59:13 am »

You hit the nail exactly, westfw.

This is not about "normal" C or C++ or their relative merits. This is about a funky environment that works best with a specific subset of both, both in the language spec, and in the approach. Plus quite a few utterly nonstandard GNU and ELF tricks, because they're the most widely supported toolchain (with those tricks supported by many non-GNU toolchains also).

As an analog:

Many people know that when dealing with aluminium, especially cast aluminium, you can actually use woodworking tools to work with it. They don't work perfectly, and dedicated aluminium tools will perform better and leave a nicer finish; but you can do it in a pinch without damaging your woodworking tools if you do it carefully (not forcing the tool, letting it do its job at its pace; not letting the tools bind or jam, heat up, etc).

My approach here is to AVOID starting with that, and instead examine the properties of aluminium and see how it can be machined and worked with, and what kind of tools work best for it; what to avoid, what to do in a pinch, and what some of the hard-won practical tips and tricks are.

I believe, and claim, that starting by comparing to woodworking or working with steel and hard metals, is problematic, because it does not "build" new knowledge; it relies on existing knowledge being correct, and teaching by comparison. Instead, I want learners to start at the very basics, so they can expand their understanding, tie it to whatever they already know – and optionally fix/expand their knowledge elsewhere, if they discover they learned/believe something incorrectly – without having to start with their existing knowledge and habits and change those to apply to this situation.

It makes a lot of sense, even if it offends those who insist on comparing C and C++ merits, and using a singular tool for everything ignoring the task at hand.

Example: ELF-assisted automatically-collected arrays

Since most embedded toolchains use the ELF file format for object file representation, we can use the ELF file format properties, and the linker, to do useful work at compile time. This is most commonly used to collect variables and objects declared at random places into a single, contiguous array of memory; either RAM or ROM/Flash.
The variables or objects only need to be declared in the file scope, but can be static (their name/symbol not exported outside the compilation unit or scope), with a custom section attribute, using syntax __attribute__((section ("name"))).

For full control of how sections are mapped to the final binary or target address space, a linker file is used. However, most/all default to a linker file that has catch-all rules based on prefixes; for example, that ".rodata.foo" is merged with read-only data, ".rodata", and so on; the linker even provides symbols whose address corresponds to the beginning and end of these sections. So, for simple cases, like collecting structures or objects that define a supported command the embedded device provides into a single consecutive array with known size, one only needs to add the section attributes, declare the section start and end "variables" as externs, and that's it; the linker will do the work for you, even if the structures and objects are defined in a number of different compilation units (separate object files).

The only "trick" here is that each of the structures/objects/variables thus collected needs to have a specific size; and this is affected by packing and padding rules. Either the size must be the same for all objects and match that of an array element of that object type, in which case it can be treated as a normal array; or the exact size must be at the beginning of the object, so that the "array" can be traversed like a list.

For base type objects (data pointers, for example), or objects of the same type with a suitable size (end padding is often a tricky problem), you don't need to bother, and just keeping the structures as C++ will work absolutely fine. (This is exactly how GCC implements static initializer and finalizer functions: their addresses are collected in .init_array and .fini_array sections, which the library start and exit code uses to call those functions without parameters.)

AIUI, C rules differ, so you may need to use extern "C" { ... } and declare them as C structures, with members in specific order and explicit padding members (making each N-bit member aligned to N-bit boundary, with largest members first), to get this to be portable across different hardware architectures, even across 8-bit/32-bit ones. I would also use the C rules for objects of varying size, with the size as the first member in each object.

Usually, a bit of preprocessor macro magic is used, so that all the source code shows is something like
EXPORT_COMMAND("foo", command_processing_function);
in the file scope of a module or source file implementing a specific command or command set. The command_processing_function does not even need to be exported; it is sufficient for the symbol to be visible at that point in the file scope.

I hope you see what this can mean for typical command-processing firmware implementation; how much cleaner and simpler it can make both the source organization and the build machinery. Yet, it is rarely used, because it is not something you use or teach others about in standard hosted environments, because of non-standardness and limited portability there. Here, the situation is different. It is perfectly suitable for the approach/paradigm in this environment.

(This is also exactly how the Linux kernel implements kernel module information including licenses and module options/parameters: it uses the linker to do the work. That's where I initially learned about it. I have used it in systems programming on ELF-based architectures, too; it works fine in userspace in Linux, Mac OS, BSDs, etc.)

netmonk · « **Reply #35 on:** June 16, 2021, 11:14:53 am »

Well, Im more familiar with C than with C++ idiom. (i come from ASM and TP)
My issue was that i need a lorawan library written in C++ to use into a C project.
As far as there isn't C library i can use on ESP32 with esp-idf.

For little testing i was able to use exemple.cpp provided by the lorawan library, to send a packet successfully to TTN. But basically i have already working code of i2c temp sensor in C and im basically more familiar with C so i was asking what would be the strategy ?
I just dont want to learn bloated C++ or whatever you call it, i was always since 25 years allergic to OOP in any kind of way.(it started at univ when my partner was dealing with all the java work while i was dealing with all the C work)

I remember one of the HFT dev i was working with few years ago, after interviewing candidates :"that's horrible, they cannot do an *hello world* without libboost".
May be i'm ultra biaised, may be i'm stupid, but i really allergic of C++ and seeing all those c++ library for no reasons i cannot use is frustrating me

Unixon · « **Reply #36 on:** June 16, 2021, 04:17:52 pm »

Quote from: netmonk on June 16, 2021, 11:14:53 am

My issue was that i need a lorawan library written in C++ to use into a C project.

Which library exactly?

Quote from: netmonk on June 16, 2021, 11:14:53 am

what would be the strategy ?

That depends on what you want from that library and what it provides and how.
Basically, you either write a C wrapper for it or cure C++ allergy.

Quote from: netmonk on June 16, 2021, 11:14:53 am

all those c++ library for no reasons i cannot use is frustrating me

Well, maybe there are reasons... including that it could be more comfortable for people to write that way.

I agree that latest standards changed old good C++ beyond recognition and now it's not a rare thing to see a code in formally correct C++ without having a slightest idea what it does and how.

Nominal Animal · « **Reply #37 on:** June 16, 2021, 04:18:45 pm »

Quote from: netmonk on June 16, 2021, 11:14:53 am

I just dont want to learn bloated C++ or whatever you call it

Yet, the parts of C++ that you need in this domain – classes, templates, namespaces – would make development easier for you.

In this domain, classes are typically used for encapsulation. Consider a microcontroller with seven UARTs, and you want to implement a command to format and send string type data via any of them. In C, you need to pass a parameter identifying the UART; in C++, if you call via the object, the reference is implicit.
In other words, your C interface might look like
send_string(uart, "format", ...)
and your C++ interface like
uart.send_string("format", ...)

Within a class, you can access the class members as if they were in local scope, without any kind of prefix or pointer notation. Each member can be public, private, or protected (which is kinda-sorta public to classes derived from this one, and private to all others).

Instead of duplicating functionality in different classes, you can create a class (or an abstract class aka interface) that implements the common functionality, and have the other classes inherit the functionality. This is how e.g. Arduino implements just one Print class for its print() and println() functions, but lets users use it with UARTs, USB, et cetera.

Function overloading is easier in C++ than it is in C. You can have multiple functions with different signatures, and the compiler will call the one with compatible parameters. In C, you need to use the C11 _Generic selection in a preprocessor macro named as the "generic" function, choosing the actual function based on parameter type(s). This way, if you use const char and const unsigned char for immutable strings in ROM/Flash, and non-const for those in RAM, you can overload your send_string function so that
send_string(uart, "string literal")
send_string(uart, message)
and
uart.send_string("string literal")
uart.send_string(message)
will Do The Right Thing for both string literals in Flash/ROM, as well as non-const char array message in RAM. For C, it does require preprocessor macro trickery; C++ does it "natively".

Remember, right now, Arduino completely messes this up, requiring developers to use F() macros and _P() -suffix functions to handle these correctly. Yuk.

Namespaces are more of a visual/brevity thing. Instead of having to write uart_send_string(), you can enclose a set of functions, objects, variables etc. inside a namespace, where they can be accessed using short names that are also used in other namespaces. You can use using namespace namespace; to set the "default" namespace, import individual names from other namespaces via using namespace::name; , and refer to a specific name in a specific namespace using namespace::name. This means you don't need to prefix each globally visible name with the module/library/feature prefix, you can let the compiler handle that detail.

Quote from: netmonk on June 16, 2021, 11:14:53 am

i'm really allergic to C++

Like I keep saying, this is not "real" C++, this really is just a small subset of C++ (plus freestanding C), and should be considered its own language.

I'm allergic to Perl, myself; but even so, I very happily use regular expressions, even the Perl variants. I will happily use the Linux kernel checkpatch.pl, even examine it if need be (although reluctantly); as long as I don't have to fix others' bugs in it.

Before you reject this subset of C++, make sure you know what you're rejecting, and are not just assuming it is the same stuff.

SiliconWizard · « **Reply #38 on:** June 16, 2021, 04:57:20 pm »

Quote from: Nominal Animal on June 15, 2021, 09:35:20 am

Quote from: SiliconWizard on June 14, 2021, 04:48:37 pm
Just to add a quick note about the original question, which was mixing C and C++ in the same project.
No, the OP just didn't word it properly. Everything they mentioned was specifically not standard C nor C++, but the kinds of freestanding/oddball libraries using subsets of C++ and/or C in embedded environments.

The OP didn't mention much actually. I think you read a lot from what they were asking. A bit too much.

As their last post says, they exactly wanted to do this: use a C++ library in a C project. Your points were all interesting and a great read, but I still think my straight answer was more to the point.

If they insist on using a C++ third party library and call it from C, they'll need to write an interface. That's not rocket science. And as I mentioned in my previous post, whether in the end this is a good idea or not depends on a number of factors.

I haven't looked at the library the OP mentions specifically. First step would be of course to try and find a pure C library implementing the same. Now if there isn't any, or if the OP is convinced this one is very good and worth using, then two options: use C++ for their whole project, or write an interface for accessing the lib from C. It's perfectly doable, has been done a lot and there's nothing inherently wrong with that. Of course yet another option would be to re-implement the C++ library in pure C. If said library is not overly big, this is probably not a lot of work. But this would of course require the OP to know C++ well enough to do this; which is probably not the case here.

But all in all, selecting a language JUST because you want to use some specific library sounds like a very bad idea to me. So if the OP meant to write their project in C, suddenly switching to C++ just to be able to use some lib is fucked up. I mean, if they don't have any other solid rationale. This would be a recipe for a lot of frustration, probably bugs, and this would be promoting once again the use of an poorly (read: not) defined subset of C++, which is a real plague IMHO. But that point was discussed in a number of other threads...

Nominal Animal · « **Reply #39 on:** June 16, 2021, 05:25:43 pm »

Quote from: SiliconWizard on June 16, 2021, 04:57:20 pm

The OP didn't mention much actually. I think you read a lot from what they were asking. A bit too much.

They did. ESP-IDF and LoRa RF are explicitly embedded/IoT stuff. They also specifically stated "Is it something usual in MCU world?"

Feel free to disagree, but in my opinion, the context of the question is clear. And it is not what one might think by reading only the subject title, at all.

netmonk · « **Reply #40 on:** June 16, 2021, 09:17:27 pm »

This library https://github.com/manuelbl/ttn-esp32
I even opened an issue requesting full C translation : https://github.com/manuelbl/ttn-esp32/issues/38

westfw · « **Reply #41 on:** June 16, 2021, 09:56:17 pm »

Quote

if you use const char and const unsigned char for immutable strings in ROM/Flash
:
Arduino completely messes this up, requiring developers to use F() macros and _P() -suffix functions to handle these correctly.

I don't think this is Arduino's fault. At the C and even architectural level, "const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR, without breaking or bloating all of the standard C library functions, not to mention users' code. (and the _P functions from avr-libc, not from Arduino...)

Unixon · « **Reply #42 on:** June 17, 2021, 07:42:28 am »

Quote from: netmonk on June 16, 2021, 09:17:27 pm

This library https://github.com/manuelbl/ttn-esp32
I even opened an issue requesting full C translation : https://github.com/manuelbl/ttn-esp32/issues/38

If an app needs multiple instances of something it is much better to stay with classes, otherwise decoration with namespace is sufficient.
But wait, C doesn't even have namespaces and what a mess of identifiers this creates! No, this is bad.
I don't know if instantiating multiple entities of TTN-something over LMIC is necessary.
If this is not necessary, you can pretty much do yourself a C version easily by stripping class declarations and moving class members to .c file.
This is basic stuff, it doesn't require you to know template magic and other newer concepts.

Nominal Animal · « **Reply #43 on:** June 17, 2021, 08:13:57 am »

Quote from: westfw on June 16, 2021, 09:56:17 pm

"const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR [...]

Of course it is; it is just a matter of link time configuration.

C nor C++ do not inherently have any notion of "const" implying anything about address space, but the base set of functions implemented for the environment can do that, using either C11 _Generic, or C++ function overloading.

Quote from: westfw on June 16, 2021, 09:56:17 pm

[...] without breaking or bloating all of the standard C library functions, [...]

But that's exactly my point: that is an arbitrary requirement.

By incorporating avr-libc as part of Arduino, they cornered themselves. Why rely on the standard C library function implementation, when all they do is cause you trouble?

It is true that some bloat would be inevitable. Lets look at the memory function variants we'd need, from the users' point of view, assuming either C++ or C11 generics were used:

Code: [Select]

void *memset(void *dst_ram, int byte, size_t bytes);
void *memcpy(void *dst_ram, void *src_ram, size_t bytes);
void *memcpy(void *dst_ram, const void *src_rom, size_t bytes);
void *memmove(void *dst_ram, void *src_ram, size_t bytes);
void *memmove(void *dst_ram, const void *src_rom, size_t bytes);
void *memchr(void *dst_ram, int byte, size_t bytes);
const void *memchr(const void *dst_ram, int byte, size_t bytes);
void *memrchr(void *dst_ram, int byte, size_t bytes);
const void *memrchr(const void *dst_ram, int byte, size_t bytes);
void *memmem(void *data1_ram, size_t data1_bytes, void *data2_ram, size_t data2_bytes);
void *memmem(void *data1_ram, size_t data1_bytes, const void *data2_rom, size_t data2_bytes);
const void *memmem(const void *data1_rom, size_t data1_bytes, void *data2_ram, size_t data2_bytes);
const void *memmem(const void *data1_rom, size_t data1_bytes, const void *data2_rom, size_t data2_bytes);

Copy functions have two variants, comparison functions four. Some functions (memset()) do not have any variants. During linking, only the variants used are included in the final binary. Similar list can be constructed for the string functions (str*()).

However, to avoid that Flash/ROM bloat, you now copy all string literals to RAM. Does that sound like a good tradeoff to you? It does not to me, especially because even avr-libc implements most of those variants with the _P or _PF suffix anyway.

In other words, the "bloat" you object to, already exists within avr-libc.

Quote from: westfw on June 16, 2021, 09:56:17 pm

[...] not to mention [breaking] users' code.

Embedded/IoT/Arduino/etc. code does not heavily use the functions provided by avr-libc, so I am unsure of exactly how much breakage or work it would be for users to port their code to a new environment, if they agreed it was a completely new one. Like Arduino was supposed to be, originally.

Personally, I claim that starting from scratch, and designing the functionality to give the maximum power and control to the user, would be preferable. Yes, there would be a lot of annoyed people who hate reading documentation, and just want their C or C++ code to be copy-pastable and just work, but their code is utter shit anyway, and catering to the lowest common denominator only works if your strategy is to be cheaper than other alternatives.

As to Arduino, their build mechanism already "breaks" C/C++ expectations, so I don't really see any issue with having the base functions assume that const char * and const unsigned char * refer to immutable data in ROM/Flash (in the old PROGMEM address space), whereas char * and unsigned char * refer to mutable data in RAM.

How hard would it be to get through to users that in this Notarduino environment there are two address spaces, and that the base functions use const to indicate the code one, non-const the data one? We could replace it with a macro, say immutable, but in the 2005-2012 timeframe it would have had to incorporate const.

Since 2012 or so, GCC has had named address space support, so instead of const, we can use __flash qualifier for pointers, indicating data in the .progmem.data section (which a linker file should map to Flash). It is a type qualifier like const or volatile, so both C11 _Generic and C++ function overloading do differentiate signatures that only differ by __flash. This one could be "renamed" using a preprocessor directive to whatever new keyword that does not correspond to an existing C or C++ keyword, without any problems.

SiliconWizard · « **Reply #44 on:** June 17, 2021, 05:11:59 pm »

Quote from: Nominal Animal on June 16, 2021, 05:25:43 pm

Quote from: SiliconWizard on June 16, 2021, 04:57:20 pm
The OP didn't mention much actually. I think you read a lot from what they were asking. A bit too much.
They did. ESP-IDF and LoRa RF are explicitly embedded/IoT stuff. They also specifically stated "Is it something usual in MCU world?"

Feel free to disagree, but in my opinion, the context of the question is clear. And it is not what one might think by reading only the subject title, at all.

The fact it's embedded stuff doesn't change one thing about what I said in my previous posts.

westfw · « **Reply #45 on:** June 18, 2021, 01:11:25 am »

Quote

Quote
"const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR [...]
Of course it is; it is just a matter of link time configuration.

Not it's not. An AVR uses different instructions for accessing data in flash, vs accessing data in RAM. You would have to add width to all pointers, and have all code that deals with indirect access to values (pointers, references, etc) be prepare to handle either pointers to read/write data, or to const data. ("if (*p) == 0) {} would need to be aware, which is core compiler functionality, not just libc behavior.) That's WAY more than "a little bloat." (it would have one of the mis-features that C programmers often ascribe to C++ - seemingly minor changes would have dramatic effects on size/performance.)

C actually added support for "named memory spaces" that could have helped solve this problem, but the C++ folks have rejected the idea.

Quote

to avoid that Flash/ROM bloat, you now copy all string literals to RAM. Does that sound like a good tradeoff to you?

It seems to have worked pretty well up till now. Perhaps because AVRs had so little memory that string literals were not very common, anyway.

I can vaguely imaging a "pure C++" implementation that overloaded a bunch of the normal C pointer operators that might WORK, but it sounds like it'd be a lot more distasteful than the current F() and _P() hacks...

DiTBho · « **Reply #46 on:** June 18, 2021, 09:08:35 am »

Quote from: Nominal Animal on June 17, 2021, 08:13:57 am

Quote from: westfw on June 16, 2021, 09:56:17 pm
"const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR [...]
Of course it is; it is just a matter of link time configuration.

The Avr8 is somehow like 8051 and exotic DSPs, where the code and data in separate belong to separate spaces and you need an "instruction bridge" to load something from the code space and use it as a constant, otherwise the code space would only be accessible for fetching op-code.

I had the same problem with a couple of weird Japanese graphing calculators; surprisingly the official C compiler was not Gcc but rather a proprietary tool capable of automatically figuring out when to use the "bridge instructions[/ i]".

So whenever you write "const" in front of a variable declaration, the machine level correctly understands that you want to have a constant within the flash and instantiates a bridge instruction" to manage it.

Sweet, but it's now how Gcc-Avr8 does its job

langwadt · « **Reply #47 on:** June 18, 2021, 09:18:46 am »

Quote from: DiTBho on June 18, 2021, 09:08:35 am

Quote from: Nominal Animal on June 17, 2021, 08:13:57 am
Quote from: westfw on June 16, 2021, 09:56:17 pm
"const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR [...]
Of course it is; it is just a matter of link time configuration.

The Avr8 is somehow like 8051 and exotic DSPs, where the code and data in separate belong to separate spaces and you need an "instruction bridge" to load something from the code space and use it as a constant, otherwise the code space would only be accessible for fetching op-code.

https://en.wikipedia.org/wiki/Harvard_architecture
https://en.wikipedia.org/wiki/Von_Neumann_architecture

Nominal Animal · « **Reply #48 on:** June 18, 2021, 09:51:25 am »

Quote from: westfw on June 18, 2021, 01:11:25 am

Quote
Quote
"const" does not and can not be sufficient to put immutable strings in Flash on (traditional) AVR [...]
Of course it is; it is just a matter of link time configuration.
Not it's not. An AVR uses different instructions for accessing data in flash, vs accessing data in RAM. You would have to add width to all pointers, [...]

I am fully aware! But you completely misunderstand.

We can use const in the base function signatures to indicate which address space the pointer parameter is in.

The pointer itself does not carry the information; the function signature does, and the functions are overloaded to handle this transparently without the user having to use different name for the function depending on the pointer type as one has to with avr-libc (_P(), _PF() suffixes).

Remember, avr-libc already has all of these variants, so "code bloat" is not a valid argument.

Here is an example implementation of strcmp() for avr-g++:

Code: [Select]

#include <avr/pgmspace.h>
#define  FLASH  const
#define  RAM

__attribute__((noinline, weak))
int lib_strcmp_ff(FLASH char *s1, FLASH char *s2)
{
    while (1) {
        const unsigned char  c1 = pgm_read_byte(s1++);
        const unsigned char  c2 = pgm_read_byte(s2++);
        if (c1 != c2)
            return (int)c1 - (int)c2;
        if (!c1)
            return 0;
    }
}

__attribute__((noinline, weak))
int lib_strcmp_fr(FLASH char *s1, RAM char *s2)
{
    while (1) {
        const unsigned char  c1 = pgm_read_byte(s1++);
        const unsigned char  c2 = *(unsigned char *)(s2++);
        if (c1 != c2)
            return (int)c1 - (int)c2;
        if (!c1)
            return 0;
    }
}

__attribute__((noinline, weak))
int lib_strcmp_rr(RAM char *s1, RAM char *s2)
{
    while (1) {
        const unsigned char  c1 = *(unsigned char *)(s1++);
        const unsigned char  c2 = *(unsigned char *)(s2++);
        if (c1 != c2)
            return (int)c1 - (int)c2;
        if (!c1)
            return 0;
    }
}

static inline int  lib_strcmp(FLASH char *s1, FLASH char *s2) { return  lib_strcmp_ff(s1, s2); }
static inline int  lib_strcmp(FLASH char *s1, RAM   char *s2) { return  lib_strcmp_fr(s1, s2); }
static inline int  lib_strcmp(RAM   char *s1, FLASH char *s2) { return -lib_strcmp_fr(s2, s1); }
static inline int  lib_strcmp(RAM   char *s1, RAM   char *s2) { return  lib_strcmp_rr(s1, s2); }

Here is a trivial example of how the above can be used, using current avr-libc notation for Flash storage:

Code: [Select]

const char  m1[] PROGMEM = "First string, in Flash";
const char  m2[] PROGMEM = "Second string, also in Flash";
char        m3[] = "Third string, in RAM";
char        m4[] = "Fourth string, also in RAM";

unsigned char  test1(void)
{
    return ((lib_strcmp(m1, m2) < 0) ?  1 : 0)
         | ((lib_strcmp(m1, m3) < 0) ?  2 : 0)
         | ((lib_strcmp(m1, m4) < 0) ?  4 : 0)
         | ((lib_strcmp(m2, m3) < 0) ?  8 : 0)
         | ((lib_strcmp(m2, m4) < 0) ? 16 : 0)
         | ((lib_strcmp(m3, m4) < 0) ? 32 : 0);
}

unsigned char  test2(const char *fs1, const char *fs2, char *rs1, char *rs2)
{
    return ((lib_strcmp(fs1, fs2) < 0) ?  1 : 0)
         | ((lib_strcmp(fs1, rs1) < 0) ?  2 : 0)
         | ((lib_strcmp(fs1, rs2) < 0) ?  4 : 0)
         | ((lib_strcmp(fs2, rs1) < 0) ?  8 : 0)
         | ((lib_strcmp(fs2, rs2) < 0) ? 16 : 0)
         | ((lib_strcmp(rs1, rs1) < 0) ? 32 : 0);
}

Both test1() and test2() will call the correct variants of the lib_strcmp() function.
test2() assumes fs1 and fs2 to point to Flash/ROM, and rs1 and rs2 to RAM.

See? We can exploit the type of the parameter to determine which address space it points to. We do NOT need to record that information in the pointer itself.

Furthermore, GCC uses a separate section for const and non-const file scope variables and objects, so it is a trivial matter of mapping file-scope const variables and objects to ROM/Flash. Dropping the standard C library implementation from avr-libc, and implementing optimized versions of above (remember, I already pointed out GCC can provide many of these functions as built-in optimized versions), would make for a better IoT programming environment.

DiTBho · « **Reply #49 on:** June 18, 2021, 10:07:30 am »

Quote from: langwadt on June 18, 2021, 09:18:46 am

https://en.wikipedia.org/wiki/Harvard_architecture
https://en.wikipedia.org/wiki/Von_Neumann_architecture

Thanks for the links, but my * confused face * at the end of my previous post is only related to the last sentence I wrote: I mean, the part where a custom C compiler automatically understands when to use "bridge instructions" whereas gcc-Avr8 prompts the user to specifically request.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Crazyness: C and C++ mixing (Read 10645 times)

Share me