Author Topic: constant sufixes (Read 5401 times)

golden_labels · « **Reply #25 on:** September 17, 2021, 06:25:00 pm »

Quote from: Simon on September 17, 2021, 12:37:38 pm

so I either create the variables to be the same size as that which they will be multiplied into or I have to cast each one at a time in any multiplication.

If you want to obtain the full result, then at least one of the operands should be of type that can hold it: either naturally, casted or for other reasons. But keep in mind that this doesn’t imply it must be always twice the size. If you are multiplying two uint_fast16_ in the range of [0, 10], they will fit even into the smallest possible signed char.

Quote from: Jeroen3 on September 17, 2021, 12:44:19 pm

How have I not been bitten by that in all those years?

Likely for the reasons explained in my previous post.

hamster_nz · « **Reply #26 on:** September 18, 2021, 09:35:54 am »

I might be getting the new twist in this thread wrong....

Take this code on a 32-bit platform:

Code: [Select]

#include <stdio.h>
#include <stdint.h>
uint16_t my_func(uint8_t a, uint8_t b)
{
    return a*b ;
}

Called with "my_func(20,20)" I can't work out if people are suggesting it should be 400 or 144.

My understanding is that the required behavior is smaller types are first converted to the native signed or unsigned integers, the calculation is then carried out, and then the result is truncated during the assignment.

This also agrees with the generated code:

Code: [Select]

        push    rbp
        mov     rbp, rsp
        mov     edx, edi
        mov     eax, esi
        mov     BYTE PTR [rbp-4], dl
        mov     BYTE PTR [rbp-8], al
        movzx   edx, BYTE PTR [rbp-4] ; Promote 8-bit unsigned value to 32-bit unsigned value
        movzx   eax, BYTE PTR [rbp-8] ; Promote 8-bit unsigned value to 32-bit unsigned value
        imul    eax, edx                       ; 32-bit multiply to give result in eax.
        pop     rbp
        ret

Likewise all floating point calculations are carried out as doubles, unless you override this with a compiler switch.

Nominal Animal · « **Reply #27 on:** September 18, 2021, 01:44:27 pm »

In arithmetic expressions, if the value of an operand can be described as an int, it is promoted to an int; if it is smaller than an int but cannot be represented by an int, it is converted to an unsigned int. All other integer types (i.e., those with range at least as large as int or unsigned int) are kept unchanged. (See e.g. C99 6.3.1.1.)
Thus, all operands in an integer arithmetic expression are always at least int or unsigned int, or larger integer types.

Arithmetic operations (like addition and subtraction) cause additional conversions (see e.g. C99 6.3.1.8). Basically, floating-point and integer operands are promoted to the type of the larger operand. (Technically, floating point type ranking is float, double, and long double; integer type ranks are listed in e.g. C99 6.3.1.1p1.)

This means that if you define your own Q15 fixed point format, you can write the multiplication operation (A×B×2^-15) as

Code: [Select]

#include <stdint.h>

typedef  int32_t  q15;

static inline q15 q15_mul(const q15 a, const q15 b)
{
    return ((int_fast64_t)a * b) / INT32_C(32768);
}

Using return ((int_fast64_t)a * b) >> 15; will work, but is technically "implementation defined". (I personally do use it, because it is common enough so that any implementation that fails to generate equivalent code, will silently miscompile a ton of of existing C code anyway.)

Because a and b are 32-bit quantities, we want the intermediate result to be 64-bit, before the division/shift. To do this, we need to cast one of the operands to a 64-bit signed integer type, here int_fast64_t. The compiler will promote the other operand(s) to that type also, due to the arithmetic operations.

The INT32_C(32768) is simply a way to write the 16-bit divisor without worrying about the maximum range of an int . Written this way, we leave the compiler to promote it to int_fast64_t. All C compilers currently used will optimize the division to a bit shift, on all architectures where that works (all GCC supports).

We could explicitly cast all three operands to int64_t or int_fast64_t, but the last time I checked the behaviour of various C compilers a decade ago, letting the compiler to do the type promotions, let them generate better optimized code: fewer register moves in static inline function, especially on LP64 architectures like x86-64 on Linux.

Writing your own mixed-size fixed point arithmetic operations (in assembly, e.g. GCC or Clang extended asm) on types larger than the native word size, can save a significant number of operations, because (at least the last time I checked), compilers typically use a compiler-provided functions to do the operations, without optimized mixed-size versions. (A particular example is on 8-bit architectures, converting an N-bit unsigned integer to decimal via repeated multiplications by 10, and subtracting the smallest power of ten at least 2^N using N+8-bit unsigned integer type.)

TheCalligrapher · « **Reply #28 on:** September 24, 2021, 05:37:13 am »

Quote from: Nominal Animal on September 18, 2021, 01:44:27 pm

let them generate better optimized code: fewer register moves in static inline function

Why not fewer register moves in any function? What is the importance of `static inline` in this context, especially considering that `static inline` is always redundant and 100% equivalent to plain `static`.

Berni · « **Reply #29 on:** September 24, 2021, 07:15:28 am »

A static function in C just means that the function is only reachable from within that one file.

But "inline" has a completely different effect. Normally the compiler places a function somewhere in memory and jumps into it when needed, but inline tells the compiler to instead just copy paste the whole function code into program memory on the spot where you call it. This not only saves the time required to jump in and out of a function but allows the compiler to additionally optimize the function for that particular use case (like if a parameter is being passed a constant value that causes parts of the function to never execute). This can be a big performance boost for functions that get called very often, but can also consume a lot of extra code memory.

But not making a function inline doesn't mean it will never get inlined. The compiler might still decide it is worth inlining for very short functions. Making it inline just forces the compiler to do it every time, because you deem the performance gain to be worth the code size sacrifice.

SiliconWizard · « **Reply #30 on:** September 24, 2021, 04:36:54 pm »

There is semantics and then there is actual compiler behavior.

'static' is only meant to make a declaration (be it functions or global variables, for local variables, that's a bit different yet) only visible to the current "compilation unit" (well, "file" is not completely exact, because a compilation unit can be made of several files with included files...), while 'inline' (at least in C) is just a hint to the compiler that you'd the like a function to get inlined wherever it's called - to be more exact, actually, that it must be fast to call, the method to this end not being defined. There is no guarantee a given compiler will honor this though. Likewise, a given compiler may absolutely inline functions that you never qualified 'inline'.

Source:

Quote

A function declared with an inline function speciﬁer is an inline function. The
function speciﬁer may appear more than once; the behavior is the same as if it appeared
only once. Making a function an inline function suggests that calls to the function be as
fast as possible.
The extent to which such suggestions are effective is
implementation-deﬁned.

In particular, and that's where there's a link between the two, many C compilers, above a certain optimization level, will inline 'static' functions when they determine it's worthwhile, while never generating a callable function for them - since they are not callable from anywhere else.

But compilers can also inline functions in a given compilation unit that are neither qualified 'static' or 'inline'. In that case, they will usually both inline said functions in the same compilation unit - if they determine it's worthwhile - and generate the code for a callable function, since it may be called from outside of the compilation unit.

Only some compiler extensions can guarantee that a given function will get inlined for sure. That's not standard.

TheCalligrapher · « **Reply #31 on:** September 24, 2021, 05:54:35 pm »

Quote from: Berni on September 24, 2021, 07:15:28 am

A static function in C just means that the function is only reachable from within that one file.

But "inline" has a completely different effect. Normally the compiler places a function somewhere in memory and jumps into it when needed, but inline tells the compiler to instead just copy paste the whole function code into program memory on the spot where you call it.

Absolutely not. Keyword `inline` has no direct relation to "telling the compiler to instead just copy paste the whole function code". It is true that back in the dark ages this keyword was somewhat "planned" to serve that purpose, but the idea was laughed out of existence virtually instantly. It has never been that way.

Keyword `inline` does not tell compiler anything about inlining the function code. It does not request inlining. It is completely unrelated.

The purpose of keyword `inline` is very simple and very focused: keyword `inline` allows you to circumvent the restrictions of the dreaded One Definition Rule (ODR). Keyword `inline` allows you to make multiple definitions (!) of the same entity with external linkage (!) in your program, and at the same time keep the linker happy (i.e. prevent the linker from barking a "multiple definition" error). In C++ keyword `inline` is equally applicable to functions and variables for that purpose.

One might ask: why would anyone want to provide multiple definitions of the same external-linkage entity?

The are many valid reasons for that, which involve such things as header-only libraries, template metaprogramming, constant substitution, bla bla bla, etc etc etc. It is a very convenient feature.

But one of those reasons is, not surprisingly: to facilitate function inlining. In order to inline a function call the compiler has to see the entire definition of the function from the point of the call. Which means that if you want a global function to be inlinable everywhere, the full definition of the function has to be visible in every translation unit. But doing it literally that way for a function with external linkage will immediately trigger ODR violations. I.e. the linker will kill you for trying that. And this is where keyword `inline` comes to the rescue. In simple words, keyword `inline` suppresses ODR violations. It tells the linker: "if you encounter multiple definitions of the same entity, just keep your mouth shut, quietly choose and keep just one definition". That all keyword `inline` does.

Now, whether the compiler will decide to actually inline a function call (or not) depends entirely on its internal decision criteria. Keyword `inline` has no bearing on it whatsoever. Keyword `inline` indirectly helps to faclilitate inlining, but does not trigger or request inlining in any way, shape or form (contrary to a popular misguided belief).

Now, if you paid attention, above I'm talking specifically about entities with external linkage. But what about entities with internal linkage, like freestanding `static` functions? Well, for entities with internal linkage the aforementioned problem simply does not exist at all. The aforementioned ODR issue simply does not apply to entities with internal linkage. Functions with internal linkage are always defined in their respective translation units. Their full definitions are always visible to the compiler. Which means that for them inlining is already fully facilitated by the mere fact that these functions have internal linkage. There's no need to do anything extra.

This is the reason why `static inline` is always redundant for a freestanding function or variable. `static inline` is equivalent to plain `static`. Once you declared a function as `static` you have already done everything you could to facilitate its inlining. Adding `inline` on top of it will not achieve anything extra.

End of story.

(The above is written with C++ in mind. The semantics of `inline` with external linkage functions is different in C. But when it comes to `static inline` the semantics is pretty much the same. `static inline` is exactly as redundant in C as it is in C++ for exactly the same reasons.)

Quote from: SiliconWizard on September 24, 2021, 04:36:54 pm

while 'inline' (at least in C) is just a hint to the compiler that you'd the like a function to get inlined wherever it's called

No, it isn't. This "old wives' tale" just refuses to die. There's no "hint" of any kind in `inline`.

In C, just like in C++, `inline` is nothing more than a "ODR circumventer": a way to marry external linkage with multiple definitions.

Jeroen3 · « **Reply #32 on:** September 24, 2021, 06:48:08 pm »

inline never has the effect you are looking for. It's better to use the compiler specific options if you want the literal effect of the keyword, like: __attribute__((always_inline)); or __forceinline.

Yes, you lose portability. But when you need to do this you probably are micro-optimizing anyway.

SiliconWizard · « **Reply #33 on:** September 24, 2021, 07:15:21 pm »

Quote from: TheCalligrapher on September 24, 2021, 05:54:35 pm

Quote from: SiliconWizard on September 24, 2021, 04:36:54 pm
while 'inline' (at least in C) is just a hint to the compiler that you'd the like a function to get inlined wherever it's called

No, it isn't. This "old wives' tale" just refuses to die. There's no "hint" of any kind in `inline`.

I quoted the f*cking standard as a source. And added that what it said was more about the call being "fast" than inlining, which you omitted here.
Apparently, your own tale is better than standards.

Nominal Animal · « **Reply #34 on:** September 24, 2021, 11:52:55 pm »

The key reason why I mark macro-like accessor functions static inline as opposed to just static, is that GCC does not issue a warning for unused functions of the former type, while they do for the latter, when default/recommended/common warnings are enabled. In other words,

static inline: Accessor-like function, okay if not used at all. Only defined in this compilation unit; does not generate a linkable symbol in the symbol table. If not used, the object file won't implement the function at all.
static: Local function. The compiler should warn, if not used at all. Only defined in this compilation unit; does not generate a linkable symbol in the symbol table.
Neither static nor inline: Externally accessible function (part of API, generates a linkable symbol in the symbol table in the object file).

Note that these are all at least as much directed to my fellow developers as they are to the compiler; and do not make any assumptions about whether the function is actually inlined by the compiler or not.

Anyway, no need to listen to me or anyone else, when you can verify the facts for yourself. Take for example the following example.c:

Code: [Select]

#include <stdlib.h>
#include <stdio.h>

#undef   FUNC_PREFIX

#if defined(USE_STATIC_INLINE)
#define  FUNC_PREFIX  static inline
#elif defined(USE_STATIC)
#define  FUNC_PREFIX  static
#elif defined(USE_INLINE)
#define  FUNC_PREFIX  inline
#else
#define  FUNC_PREFIX
#endif

FUNC_PREFIX void describe(const int num, const char *val)
{
    printf("%d: %s\n", num, val);
}

FUNC_PREFIX int unused_function(int num)
{
    return num + 1;
}

int main(int argc, char *argv[])
{
    for (int i = 0; i < argc; i++)
        describe(i, argv[i]);

    return EXIT_SUCCESS;
}

and compile the four versions (I will be using -O2 because that's my habit, but do check other opimization options as well as omitting it):

Code: [Select]

gcc -Wall -O2 example.c -o ex.none
gcc -DUSE_STATIC -Wall -O2 example.c -o ex.static
gcc -DUSE_INLINE -Wall -O2 example.c -o ex.inline
gcc -DUSE_STATIC_INLINE -Wall -O2 example.c -o ex.static_inline

On my system, the -DUSE_STATIC causes the compiler (GCC 7.5.0) to complain about unused_function() being defined but not used.

(Clang does complain for both -DUSE_STATIC and -DUSE_STATIC_INLINE, though.)

The above example is too simple to exhibit any code changes. It always triggers the compiler logic on when to inline a function. That is, all run the same code, but only ex.none contains contains the binary symbols describe and unused_function. Feel free to investigate your own functions (my own focus was in funky complicated double-precision arithmetic functions and basic 3D vector algebra operations) to see if your code tends to be affected the way I described in my earlier post.

Although GCC code generation has improved a lot since the GCC 2 (1992-2001) and 3 (2001-2006) era, GCC 4 still generated a lot of superfluous register moves, increasing register pressure, and often using stack for temporary variables. This was particularly noticeable when inlining a function (which can occur with or without declaring the function inline).

If you are interested in how GCC static inline has evolved, compare 4.0.4 to 7.5.0 to latest GCC version inlining documentation. As described in various versions, static inline has similar semantics in both C and C++, which is very useful when working on microcontrollers (that rely on a funky mix of freestanding C and C++ environments).

In short, the reasons I personally mark a function static inline has nothing to do with inlining per se, and everything to do with my intent regarding that function; especially whether it is okay for it to be completely omitted from the compiled binaries (i.e., not used/needed at all).

Asking myself: Okay, but how that relates to your statement that "let them generate better optimized code: fewer register moves in static inline function"?

About a decade ago, I had access to GCC (4.x.y), Intel Compiler Collection, Pathscale, and AMD Open64 C compilers on Linux; that's when I did those experiments on x86 and x86-64 to see the effects on the generated code.

The understanding I developed from testing the abovementioned compilers (and ignoring "no change either way" cases; thus not trying to get the best results for a specific compiler, but to avoid the worst cases regardless of compiler), was that implicit and explicit casting are done at different stages of code synthesization, and that implicit casting makes it easier for the compilers to realize a register is unused, or always filled with zeros –– for example, when casting a 32-bit or smaller value to uint64_t on a 32-bit architecture. When the code is in a smallish local scope, say a macro-like accessor function, or a pure arithmetic function, this was more noticeable. Obviously, this only matters when these expressions are heavily used in a program; I was dealing with potential models in simulations, calculated hundreds of millions of times per second.

The best way to explain this, is to compare the following code (a×b/2³²) compiled for 32-bit Cortex-M4 and Cortex-M0:

Code: [Select]

#include <stdint.h>

int64_t mul64q32(const int64_t a, const int64_t b) { return a*b >> 32; }
int32_t mul32q32(const int32_t a, const int32_t b) { return ((int_fast64_t)a * b) >> 32; }

Compiling these to Cortex-M4 on GCC-7.5.0 yields (essentially)

Code: [Select]

mul64q32:
        mul     r3, r0, r3
        mla     r1, r2, r1, r3
        umull   r2, r3, r0, r2
        adds    r0, r1, r3
        asrs    r1, r0, #31
        bx      lr

mul32q32:
        smull   r0, r1, r0, r1
        mov     r0, r1
        bx      lr

Because of the 32-bit shift, one of the four 32-bit multiplications can be omitted in mul64q32. Cortex-M4 has 32×32bit multiplication with 64-bit result (in a register pair), so a single operation suffices. If mul32q32 gets inlined, and the surrounding code can use the result directly in the r1 register, the mov can be avoided, too: it then simplifies to a single smull instruction.

Now, compile the same code for Cortex-M0, and we get

Code: [Select]

mul64q32:
        push    {r4, lr}
        bl      __aeabi_lmul
        movs    r0, r1
        asrs    r1, r1, #31
        pop     {r4, pc}

mul32q32:
        movs    r2, r1
        push    {r4, lr}
        asrs    r1, r0, #31
        asrs    r3, r2, #31
        bl      __aeabi_lmul
        movs    r0, r1
        pop     {r4, pc}

where __aeabi_lmul is a compiler-provided 64×64-bit multiplication with 64-bit result (r1:r0 × r3:r2 = r1:r0).

Because the ARM GCC implementation on Cortex-M0 does not have a 32×32-bit multiplication with 64-bit result as a compiler-provided function, it has to expand the multiplicands to 64 bits, and use a generic 64×64-bit multiplication function. (Clang-10 does the same, using __muldi3 function, but does a funky shuffle to swap the two register pairs - essentially five unnecessary register-to-register moves. Odd.)

The root problem here is not at all in inlining or anything related to that, but the premature promotion of arguments to a multi-word type, then using a generic but slower operation to do the arithmetic (because the compiler does not realize it can simply omit doing the superfluous operations).

This seems to still be an issue, so much so that if using GCC or Clang-10 to compile for Cortex-M0, it would be worth the effort to implement mul32q32 in inline assembly, since it would need only two multiplication instructions, compared to four in __aeabi_lmul/__muldi3, assuming mul32q32 was so heavily used the difference would matter in real life. (Personally, I implement both [inline assembly and naïve-but-easily-verifiably-correct versions], selectable at compile time, with runnable unit tests on the target to verify they produce identical results for all arguments.)

As I always say, reality beats theory. Here, it means that while the C (and C++) standards describe the rules that should yield portable code (for example, "correctness"), individual compilers have behaviours ("efficiency") we can examine and rely on because of practical reasons. Yes, it does mean that before one can rely on these features, the output of each new (major) version of ones compiler has to be checked.

Simply put, standards describe "correctness", whereas "efficiency" is up to individual compilers. If you want the latter, you need to examine how different compilers generate efficient code.

In my experience, the key point is actually not optimum code generation, but to try and avoid the silliest and worst cases instead. (A good example of this is how optimizing for size, -Os, can often yield as efficient code as -O2 or even better. Then, the efficiency gained is just a side effect of trying to keep code size down.)

When you have something like mul32q32 above, used millions of times a second, implementing it in assembly for specific architectures is often worth the effort; you only know after examining the code generated by your toolchain for that particular architecture. You basically sidestep the compiler altogether by switching to assembly, instead of trying to find the best C or C++ expression for the job. (On x86-64, one can use <immintrin.h> intrinsics for Single-Instruction-Multiple-Data operations, instead of resorting to assembly. This was my main observation on x86 and x86-64 with floating-point math, really; and not relying on the compiler to vectorize the expressions also means that one has to think of data ordering and access patterns, which makes a major difference wrt. efficiency with SIMD.)

TheCalligrapher · « **Reply #35 on:** September 25, 2021, 04:06:04 pm »

Quote from: Nominal Animal on September 24, 2021, 11:52:55 pm

Anyway, no need to listen to me or anyone else, when you can verify the facts for yourself.

That's a rather naive and self-contradictory statement.

In order to "verify the facts oneself" one needs to be able to properly interpret the results obtained from the experiments with specific compilers. Which implies an extremely high level of theoretical expertise in the language concepts and rationales behind these concepts. And that in turn entails either spending those long winter nights buried neck-deep in language-related documentation (it you want to eventually cover the whole thing) or devotedly listening to me and committing every word I say to memory (if you want an expert explanation focused on a specific topic).

Without it, these attempts to "verify the facts oneself" will only lead to rather silly, blindly misguided conclusions, as we all perfectly well know.

But even in the best case, quirks of a specific compiler are just that. And most of the time they are just consequences of the fact that some users don't even bother to learn to use these compilers properly.

Nominal Animal · « **Reply #36 on:** September 25, 2021, 08:40:03 pm »

Quote from: TheCalligrapher on September 25, 2021, 04:06:04 pm

Quote from: Nominal Animal on September 24, 2021, 11:52:55 pm
Anyway, no need to listen to me or anyone else, when you can verify the facts for yourself.
In order to "verify the facts oneself" one needs to be able to properly interpret the results obtained from the experiments with specific compilers.

Yes, but only to the extent of comparison, and detecting the worst cases; definitely not down to estimating instruction timings and such. Remember, the key is not to detect the optimum or analyze anything at high precision; it is to detect if something produces superfluous code compared to other options. The goal is not to get the optimum on a specific compiler version, but to keep results acceptable. (When I want optimum, I implement architecture-specific function variants in assembly, and auto-choose at compile time using Pre-defined Compiler Macros (Sourceforge project); using extended asm for the most of a static inline function body lets GCC and Clang inline (or not) the code, saving register moves, and even pick better (different!) registers used in the inlined part in different use cases.)

As I often say: Opinions are not that interesting, but the reasons and facts behind them are. I am attempting to show others how to re-discover those facts by oneself, especially because it has been a significant time (in GCC and Clang development terms) since I spent the few dozen hours myself.

The most useful tool for this kind of analysis (in addition to -S compile option flag, which causes the compiler to generate assembly .s source files from the C or C++ source files) in a typical GCC/Clang toolchain is objdump; the object file examination tool. The key options are
-t List symbols in the symbol table (usable across compilation units)
-T List dynamic symbols (if dynamically linked binary
-d Disassemble (show assembly of object code)

In addition, you need to understand the basic features of the assembly code on the target architecture. For example, on AVRs, most instructions are one of two cycles (see Atmel AVR Instruction Set Manual (PDF)) and since there is no instruction cache or speculative execution or other stuff complicating things, the assembly is not difficult to read: comparing two functionally equivalent assembly snippets at the "is one of these silly, much larger or slower than necessary?" level, does not require much expertise.

Even in the Arduino environment, if you bother to find out the locations for the Arduino AVR-GCC installation and the sketch temporary directory locations, you can use avr-objdump (with above option flags) to examine the user-code object files. If one copies the snippets (removing the address part) to separate text files, diffutils (in particular, diff -Nabur file1 file2) provide excellent tool for comparisons.

SiliconWizard · « **Reply #37 on:** September 26, 2021, 01:42:51 am »

Experimenting with compilers is not unreasonable when you're dealing with features that are implementation-defined as per the standard. Now whether one is qualified enough to interpret the output of a given compiler - that may not be trivial indeed, but let's not insult people. If you attempt to do this, chances are you at least half know what you're looking at.

Whether it's worth bothering about what is implementation-defined, it's yet another matter. It should usually *not* matter if you're only writing 100% portable code, but otherwise, it's often useful, or sometimes necessary.

Now I admit getting a given compiler's implementation for a given language feature from the documentation of said compiler would be better than trying to "reverse-engineer" it. But this kind of documentation is not always available, or if so, not always accurate or up-to-date.

Let's see what GCC's docs say about 'inline' functions, for instance: https://gcc.gnu.org/onlinedocs/gcc/Inline.html
Unfortunately, it doesn't seem really up-to-date. But should still give you a general idea of how GCC deals with 'inline'.

Nominal Animal · « **Reply #38 on:** September 27, 2021, 08:55:03 am »

Quote from: SiliconWizard on September 26, 2021, 01:42:51 am

Experimenting with compilers is not unreasonable when you're dealing with features that are implementation-defined as per the standard.

Also, the only implementation-defined part in my examples is the right bit shift of signed values. The rest is about how the compiler implements the behaviour, not about analyzing the implemented behaviour. I'm only suggesting looking at the assembly to compare functionally equivalent code to see if one is unacceptably stupid or complicated; not even to see if one is preferable over the other.

Jan Audio · « **Reply #39 on:** September 30, 2021, 03:25:05 pm »

Simon, just take a value type that fits your calculation;

signed long i = yourbyte;
i += yoursignedchar;
i -= moreofthesame;

With XC8 it takes yourbyte and works no further then 0..255 if you code like :
signed long i = yourbyte + yoursignedchar - moreofthesame;

All compilers work different, this understands all.

TheCalligrapher · « **Reply #40 on:** September 30, 2021, 04:19:20 pm »

Quote from: SiliconWizard on September 26, 2021, 01:42:51 am

Let's see what GCC's docs say about 'inline' functions, for instance: https://gcc.gnu.org/onlinedocs/gcc/Inline.html
Unfortunately, it doesn't seem really up-to-date. But should still give you a general idea of how GCC deals with 'inline'.

The description is not as much "out of date", as it is embogged in historical GCC's idiosyncrasies. GCC jumped the gun, made an attempt to introduce its own idea of `inline` before it was standardized by C99, and eventually failed miserably. Basically, the primary outcome of that attempt was a big mess. The page at your link is a rather cursory description of that mess. I don't see any reason to waste time on that, unless one's trying to do some archeological research on old code.

In order to properly understand compiler-specific behavior, be it presented in form of compiler-documentation or in form of translated code sample, one has to be able to properly immerse that compiler-specific behavior into the context of abstract requirements of the language. Compilers do follow those requirements, no way around that.

But anyway, this all looks like a smoke screen that can only obfuscate a rather specific issue I was originally commenting upon: the proper idea of `inline` in modern C and C++ immediately and invariably entails the understanding that `static inline` is redundant and fully equivalent to plain `static`. (Which, not surprisingly, the above experiments with the compiler fully confirm.)

Nusa · « **Reply #41 on:** September 30, 2021, 05:15:00 pm »

Quote from: Simon on September 15, 2021, 08:00:31 am

I have had two days of fun trying to work out why I could not get the correct result from a calculation in 32 bit envelope on a mega 0 series (8 bit CPU). I am using defines to work things out with various calculations that I expect to be optimized out. But it turns out that writing 5000 does not mean 5000, it's down to what the compiler chooses to interpret it as. In this case possibly an 8 bit number? so gibberish. I solved all problems relating to math errors by running around putting "u" or "ul" on the end of every defined constant.

OK so lesson learnt but u means "unsigned int" which it seems in this context is 16 bits. What if I have a 16 bit signed value what do I put? Looking around the net most explanations seem to assume a non 8 bit system that seems to treat everything as 8 bit unless told.

What do I do?

I could try using constants but then I cannot carry out calculations so easily. I can't calculate a constant that is in turn calculated, the compiler will complain.

Umm...you do realize that #defines are macros that result in literal TEXT substitutions in the context you're using them in, and are generally not numerically evaluated by the preprocessor until they are referenced in a line of code aka the context? We really need to see your specific example (#defines and context) if you want specific feedback.

Nominal Animal · « **Reply #42 on:** October 01, 2021, 05:20:28 pm »

Quote from: Simon on September 15, 2021, 08:00:31 am

But it turns out that writing 5000 does not mean 5000, it's down to what the compiler chooses to interpret it as. In this case possibly an 8 bit number? so gibberish. I solved all problems relating to math errors by running around putting "u" or "ul" on the end of every defined constant.

The most portable way to define the size of a literal integer constant (like 5000) is to use one of the NAME_C(literal) macros as defined in the compiler-provided <stdint.h>. (This header file is available even in freestanding environments where the standard C library (like <stdio.h> and others) are NOT available. Specifically:

INT8_C(-128...+128)
UINT8_C(0...255)
INT16_C(-32768...+32767)
UINT16_C(0...65535)
INT32_C(-2147483648...+2147483647)
UINT32_C(0...4294967295)

These evaluate to the literal itself, with the appropriate suffix (u, uL, LL, uLL, et cetera) appended, taking the guessing completely out.
This also means you cannot do e.g. UINT8_C(NAMED_CONSTANT). Use a cast, (uint8_t)(NAMED_CONSTANT), for those instead.

Again, these are provided by the compiler (and must be provided by the compiler since C99 or so, and are provided even by Microchip XC compilers), and do not depend on any standard C library being available.

SiliconWizard · « **Reply #43 on:** October 01, 2021, 06:47:32 pm »

Quote from: Nominal Animal on October 01, 2021, 05:20:28 pm

Quote from: Simon on September 15, 2021, 08:00:31 am
But it turns out that writing 5000 does not mean 5000, it's down to what the compiler chooses to interpret it as. In this case possibly an 8 bit number? so gibberish. I solved all problems relating to math errors by running around putting "u" or "ul" on the end of every defined constant.
The most portable way to define the size of a literal integer constant (like 5000) is to use one of the NAME_C(literal) macros as defined in the compiler-provided <stdint.h>. (This header file is available even in freestanding environments where the standard C library (like <stdio.h> and others) are NOT available. Specifically:

INT8_C(-128...+128)
UINT8_C(0...255)
INT16_C(-32768...+32767)
UINT16_C(0...65535)
INT32_C(-2147483648...+2147483647)
UINT32_C(0...4294967295)

These evaluate to the literal itself, with the appropriate suffix (u, uL, LL, uLL, et cetera) appended, taking the guessing completely out.
This also means you cannot do e.g. UINT8_C(NAMED_CONSTANT). Use a cast, (uint8_t)(NAMED_CONSTANT), for those instead.

Again, these are provided by the compiler (and must be provided by the compiler since C99 or so, and are provided even by Microchip XC compilers), and do not depend on any standard C library being available.

Yes, yes. That has even already been suggested earlier in the thread. Now I guess it needs to sink in.

Now, as we discuss on a regular basis, is C a bit tricky for arithmetics? It is. Comes both from legacy and efficiency reasons.

Regarding literals, and otherwise potential conversion issues, there is a GCC warning that is very useful and that I always enable, especially for embedded development: '-Wconversion'. Enable it!

A quick example (using large values just because I tested it with GCC for x86_64):

Code: [Select]

int Test(int n)
{
        int m = 20000000000 * n;
        return m;
}

The literal here is 20.10^9 - it exceeds what can be represented as an 'int' in this context.
With 'gcc -Wall': no warning.
With 'gcc -Wall -Wconversion':

Code: [Select]

test1.c: In function 'Test':
test1.c:3:10: warning: conversion from 'long long int' to 'int' may change value [-Wconversion]
    3 |  int m = 20000000000 * n;
      |          ^~~~~~~~~~~

I really suggest the OP test this warning option with their original code, and report back if it indeed catches the issue. It should.

Nominal Animal · « **Reply #44 on:** October 01, 2021, 07:35:58 pm »

Quote from: SiliconWizard on October 01, 2021, 06:47:32 pm

Yes, yes. That has even already been suggested earlier in the thread. Now I guess it needs to sink in.

True; in my defense, the explicit forms (type_C() macros) only in my own post, and I thought that listing them in a post-it note-worthy form would be useful.

Quote from: SiliconWizard on October 01, 2021, 06:47:32 pm

Regarding literals, and otherwise potential conversion issues, there is a GCC warning that is very useful and that I always enable, especially for embedded development: '-Wconversion'. Enable it!

Definitely!

To me, it is an important tool on microcontroller targets with less common sizes of the base types (short, int, and long). I often only realize I need to wrap a constant in one of the abovementioned macros from the compiler warning...

In addition to the above, let me repeat that when an operation produces an intermediate value that can exceed the range of the operands (remembering that the C compiler already promotes the operands to ints automatically, and if even larger, the smaller to the larger operands type), one only needs to explicitly cast (via (type)variable or (type)(expression)) one of the operands, logically before the operation is done. In particular, a = (type)(b*c) or const int64_t a = b*c; only promotes the result, and is up to the compiler whether the result is limited to the range and precision of b and c. This bug is so common that on x86-64, at least GCC provides the 64-bit result; I consider this bad, because on other architectures and compilers, it can differ and often leads to very frustrating bugs. For correct code, you need to do e.g. const int64_t a = (int64_t)a * b; instead.

Let's say you have 32-bit variables a, b, and c, and you want to calculate (a+b)*c, but you know that a+b will always fit in a 32-bit integer (for some external reason). My earlier text basically attempted to explain why in that case (int64_t)(a+b) * c or (a+b) * (int64_t)c will work. If the sum does not necessarily fit in a 32-bit integer, then you should use for example ((int64_t)a + b)*c, so that b and c are automatically promoted to int64_t too, by the compiler. (Other expressions will also work, for example (a + (int64_t)b)*c.)

Taken all together, this can produce quite verbose-looking code with (int64_t)(...) and INT64_C(...) mixed in. Do not let it bother you; such verbosity is sometimes preferable. And not just for the compiler; it can help us humans too, as reading new code with such notation, even when verbose and on long lines, makes the programmer intent/understanding (of valid ranges for that expression) easier to see and understand. Code golf may be fun, but it is definitely not practical in real life.
Adding good comments, describing the math formulae, operand/variable ranges, and programmer reasoning why exactly this implementation was chosen, makes such code excellent in my opinion, even if long and somewhat arduous to unravel. (The opposite, "nasty" code, is brief code that has "clever tricks" hiding in the implementation, unmentioned anywhere, hard to notice, and are the linch-pin for the implementation to work correctly.)

SiliconWizard · « **Reply #45 on:** October 01, 2021, 07:44:34 pm »

Quote from: Nominal Animal on October 01, 2021, 07:35:58 pm

Taken all together, this can produce quite verbose-looking code with (int64_t)(...) and INT64_C(...) mixed in. Do not let it bother you; such verbosity is sometimes preferable. And not just for the compiler; it can help us humans too, as reading new code with such notation, even when verbose and on long lines, makes the programmer intent/understanding (of valid ranges for that expression) easier to see and understand. Code golf may be fun, but it is definitely not practical in real life.

I agree. And this is all due to the fact C needs to remain the most efficient possible in most situations. So you need to "instruct" the compiler what it is you really want it to implement.

Sure some could dream of easier arithmetics, not needing to care about types and conversions, and with all expressions always being mathematically correct in all cases; that would almost invariably lead to much less efficient code in general. If you expect that, use another language.

blacksheeplogic · « **Reply #46 on:** October 01, 2021, 11:39:49 pm »

Quote from: Jeroen3 on September 24, 2021, 06:48:08 pm

inline never has the effect you are looking for. It's better to use the compiler specific options if you want the literal effect of the keyword, like: __attribute__((always_inline)); or __forceinline.

Yes, you lose portability. But when you need to do this you probably are micro-optimizing anyway.

In general I don't second guess the compiler but if there's a specific need for it use the compile specific directives and use the preprocessor to deal with it:
#ifdef __GNUC__
static inline ureg load32(void __iomem *address) __attribute__((always_inline))
#endif

SiliconWizard · « **Reply #47 on:** October 02, 2021, 12:34:43 am »

Quote from: blacksheeplogic on October 01, 2021, 11:39:49 pm

Quote from: Jeroen3 on September 24, 2021, 06:48:08 pm
inline never has the effect you are looking for. It's better to use the compiler specific options if you want the literal effect of the keyword, like: __attribute__((always_inline)); or __forceinline.

Yes, you lose portability. But when you need to do this you probably are micro-optimizing anyway.

In general I don't second guess the compiler but if there's a specific need for it use the compile specific directives and use the preprocessor to deal with it:
#ifdef __GNUC__
static inline ureg load32(void __iomem *address) __attribute__((always_inline))
#endif

If you want to *force* inlining with GCC, yes, this is the only thing that will work 100% of the time, even at -O0 optimization level.
(Note: a minor remark here, but GCC yields an error when you define the attribute to a function at the end of the declaration, as you did. You need to put it at the beginning, for some reason.

Code: [Select]

error: attributes should be specified before the declarator in a function definition

I'm not 100% sure, but LLVM/Clang should support this attribute too. Just guessing, as it tends to support a good range of GCC extensions and attributes.

But otherwise, unless you selectively disable function inlining through options, GCC will inline almost every function it can starting at -O1, with or without 'inline', even with or without 'static'. In the latter case, a non-static function will usually be inlined where it's called inside the same compilation unit (source file if you will), AND callable code for this function will also be generated at the same time, which will be the version that gets called outside of its compilation unit. Depending on optimization level though, GCC will inline more or less aggressively. At -O1, I think it will automatically inline only functions that are "small enough", but at -O3, it will inline practically ALL functions, which can lead to significantly larger binaries. I don't know GCC's exact criterions though, and this may change at every new version...

So while "inline" is indeed pretty much ignored by most modern compilers *for call optimization*, the standard, as I quoted, still states that it can be an implementation-defined suggestion to the compiler that the call should be "fast" - without mentioning a particular method. The C standard add two footnotes about it:

Quote

By using, for example, an alternative to the usual function call mechanism, such as ‘‘inline
substitution’’. (...)

Quote

For example, an implementation might never perform inline substitution, or might only perform inline
substitutions to calls in the scope of an inline declaration.

As it's entirely up to the implementation, of course, it may do nothing as far as optimization is concerned. Which is indeed the case with modern GCC and many other C compilers.
But yes, it's true that a number of C compilers already implemented "inline" well before it got standardized (and its getting included in the std probably comes from this fact, actually), and AFAIR, in older compilers, the keyword was indeed the only way of having the compiler inline a function (when the function could be inlined...), as they would not inline anything on their own.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: constant sufixes (Read 5401 times)

Share me