Author Topic: Pointer confusion in C -language (Read 22003 times)

Veketti · « **on:** June 20, 2021, 01:56:16 pm »

Dear All,

Please help me to understand why does following work. It is example from page:
https://www.geeksforgeeks.org/convert-floating-point-number-string/

I'm wondering why does pointer "res" work with this syntax. First of all variable "res" is not with * inside the ftoa -function. Then in main function, res is not called with & -prefix.

Code: [Select]

// Converts a floating-point/double number to a string.
void ftoa(float n, char* res, int afterpoint)
{
    // Extract integer part
    int ipart = (int)n;
  
    // Extract floating part
    float fpart = n - (float)ipart;
  
    // convert integer part to string
    int i = intToStr(ipart, res, 0);
  
    // check for display option after point
    if (afterpoint != 0) {
        res[i] = '.'; // add dot
  
        // Get the value of fraction part upto given no.
        // of points after dot. The third parameter 
        // is needed to handle cases like 233.007
        fpart = fpart * pow(10, afterpoint);
  
        intToStr((int)fpart, res + i + 1, afterpoint);
    }
}
  
// Driver program to test above function
int main()
{
    char res[20];
    float n = 233.007;
    ftoa(n, res, 4);
    printf("\"%s\"\n", res);
    return 0;
}

To my understanding this should be:

Code: [Select]

// Converts a floating-point/double number to a string.
void ftoa(float n, char *res, int afterpoint)
{
    // Extract integer part
    int ipart = (int)n;
  
    // Extract floating part
    float fpart = n - (float)ipart;
  
    // convert integer part to string
    int i = intToStr(ipart, *res, 0);
  
    // check for display option after point
    if (afterpoint != 0) {
        *res[i] = '.'; // add dot
  
        // Get the value of fraction part upto given no.
        // of points after dot. The third parameter 
        // is needed to handle cases like 233.007
        fpart = fpart * pow(10, afterpoint);
  
        intToStr((int)fpart, *res + i + 1, afterpoint);
    }
}
  
// Driver program to test above function
int main()
{
    char res[20];
    float n = 233.007;
    ftoa(n, &res, 4);
    printf("\"%s\"\n", res);
    return 0;
}

Are both examples actually the same, but the first is just some simplified version that compiler also understands? Testing in STM32CubeIDE

Thank you in advance

golden_labels · « **Reply #1 on:** June 20, 2021, 02:07:17 pm »

Quote from: Veketti on June 20, 2021, 01:56:16 pm

First of all variable "res" is not with * inside the ftoa -function.

Because:

Code: [Select]

a[b] == *(a + b)Quite literally, including both + and the square brackets operators being commutative.

Quote from: Veketti on June 20, 2021, 01:56:16 pm

Then in main function, res is not called with & -prefix.

Because in C arrays ot type T can be implicitly casted to T* pointing to the first element of such array.

More specifically:
In main there is a variable res. That variable is an object consisting of 20 elements of type char.
The ftoa function accepts a pointer to char as its second argument (char*).
Upon invocation, a pointer to the first element of res is taken and passed to that second argument. It is exactly equivalent of:

Code: [Select]

char res[20];
char* ptr = &res[0];
ftoa(…, ptr, …);

This should not be confused with &res, which would have a different type:

Code: [Select]

&res[0] // `res[0]` is `char`, `&res[0]` is `char*`
&res // `res` is a `char[20]`, `&res` is `char(*)[20]`

Due to how compilers are implemented, they may appear numerically equal if inspected and — ignoring compile errors/warnings — using them interchangeably may by pure coincidence seem to “work”. But they are not equal as they have different types. The latter type is also rarely seen in the wild, so don’t worry you don’t the notation now — quite likely you will not see it for the next few years.

Siwastaja · « **Reply #2 on:** June 20, 2021, 02:10:36 pm »

Yes, as explained above, you can access a pointer like it was an array. Accessing something[0] is the same as *something. Then something[1] will be the next element, i.e., the compiler knows the size of the type and goes forward that many bytes.

You get used to it; it's very common to see
uint8_t single_variable;
uint8_t buffer[1024];
some_function(&single_variable);
some_function(buffer);

In such case, the latter would be equivalent to:
some_function(&buffer[0])

Veketti · « **Reply #3 on:** June 20, 2021, 04:03:21 pm »

Thank you. One more thing. I noticed if I had in the main funtion the char array with different name eg:

Code: [Select]

    char notthesamename[20];
    float n = 233.007;
    ftoa(n, notthesamename, 4);

It did still work. Should it, as ftoa -function is still handling char* res?

However if I changed the char array size to 33, it worked for a while and then the MCU started behaving strangely.

Thanks for helping

ataradov · « **Reply #4 on:** June 20, 2021, 04:28:50 pm »

Quote from: Veketti on June 20, 2021, 04:03:21 pm

It did still work. Should it, as ftoa -function is still handling char* res?

Yes, the function is independent of the calling code.

Quote from: Veketti on June 20, 2021, 04:03:21 pm

However if I changed the char array size to 33, it worked for a while and then the MCU started behaving strangely.

How strangely? There is nothing inherently wrong with larger array size.

Your code does the work and exits, so I don't see how working "for a while" is even possible.

The only thing to keep in mind that this array is allocated on the stack, and the stack may overflow. But 33 bytes and this simple program should not cause stack overflow issues.

ejeffrey · « **Reply #5 on:** June 20, 2021, 04:45:24 pm »

Yes that will still work the same. There is no relationship between the variable names in the caller and the formal parameter names in the callee. Sometimes they are the same but often not.

It should also be fine to pass a pointer to a larger character array than is needed, ftoa will just fill as much as it needs. So I suspect that whatever troubles happen later are unrelated to the ftoa call.

SiliconWizard · « **Reply #6 on:** June 20, 2021, 05:14:22 pm »

Just a quick look at the code. We don't know how intToStr() is implemented? So is that certain that ftoa() will zero-terminate the string in all cases? (I guess it should if intToStr() ensures that.)

Veketti · « **Reply #7 on:** June 20, 2021, 06:37:07 pm »

Oh, was just wishing it was pointer issue but I guess it was not then. I tested this with more complex code which had freeRtos and two tasks. The other task basically read ADC and other task manipulated the data and updated display. This was called in that data manipulation task. What I meant with that strange behavior was that the ADC data stopped either completely updating or started receiving zeros. Don’t know yet which way. However it didn’t completely lock the MCU as it is still updating the display. I’m stumbled, how come making the char array size = 33 will mess with the ADC read. Not immediately but after a minute or so running. Maybe some buffer overflow or something which writes to the same memory address where the ADC data is? Array size 20 it works flawlessly.

Thank you all for helping.

ataradov · « **Reply #8 on:** June 20, 2021, 06:48:55 pm »

If you have RTOS, then you assign the stack size to each task, so you need to check how big of a stack size you have assigned to the task that makes uses the array. You pass the stack pointer and size to the task creation function.

Veketti · « **Reply #9 on:** June 20, 2021, 07:58:00 pm »

Wow, that was it! So happy, thank you so much! Task size was:
.stack_size = 128 * 4
Increased it to:
.stack_size = 1024

And now it works perfectly. Never would have thought about it and probably never would have figured that out by myself.
Can I btw. use: .stack_size = sizeof(thread_name)
Can it figure it out by itself?

Sorry, this is already bit off of the pointer topic..

ataradov · « **Reply #10 on:** June 20, 2021, 08:10:01 pm »

No, there is no automatic way to figure out required stack size. Recursive functions may want to use unlimited amount of stack, for example.

A typical way to estimate the required stack size is to fill the stack area with a known value (like 0xaa) and then let your program run for a while, then look at the stack area and see what was the last address that is no longer 0xaa.

And you need to keep the stack limitation in mind when making changes. This is something you do automatically with experience.

SiliconWizard · « **Reply #11 on:** June 22, 2021, 04:31:45 pm »

Quote from: ataradov on June 20, 2021, 08:10:01 pm

No, there is no automatic way to figure out required stack size. Recursive functions may want to use unlimited amount of stack, for example.

A typical way to estimate the required stack size is to fill the stack area with a known value (like 0xaa) and then let your program run for a while, then look at the stack area and see what was the last address that is no longer 0xaa.

And you need to keep the stack limitation in mind when making changes. This is something you do automatically with experience.

Which is giving me an idea that I can try on my RISC-V core. In a typical architecture in which the stack pointer is growing downwards, we could just keep track of the *minimum* value of the stack pointer every time it changes, and store that in a dedicated register (like in a CSR.)

Is there any existing CPU/MCU implementing this kind of thing?

ataradov · « **Reply #12 on:** June 22, 2021, 04:58:11 pm »

Why would you implement it in the hardware when it can be trivially be implemented in the software?

The only thing that makes some sense to do in the hardware is to set the stack limit and have an exception when stack overflow is about to happen.

SiliconWizard · « **Reply #13 on:** June 22, 2021, 05:42:32 pm »

Quote from: ataradov on June 22, 2021, 04:58:11 pm

Why would you implement it in the hardware when it can be trivially be implemented in the software?

Maybe because the method you mentioned above:
* Requires a modified startup code. Not necessarily a big problem, but it's not completely "non-invasive".
* Is not fully reliable: although the probability of data written to the stack being equal to your magic values is likely pretty low in general, it's not completely guaranteed. For an approximate indicator, it's probably OK. For something exact, a bit less so.
* Would have too much overhead for tracing stack size during programming execution, something that can be interesting. Your method is OK for manual analysis after program execution, but not really for "real-time" tracing. There could be a lot of interesting uses for this IMO.

Quote from: ataradov on June 22, 2021, 04:58:11 pm

The only thing that makes some sense to do in the hardware is to set the stack limit and have an exception when stack overflow is about to happen.

That's something else, and complementary indeed. That's an interesting feature to have, but doesn't give you the same kind of information. Also, on implementations that make use of several stacks,the stack limit has of course to be strictly updated every time the stack is switched. Obvious, but could be a source of additional bugs. Anyway, this feature is a protection feature, whereas the above one is a tracing feature. Pretty different.

ataradov · « **Reply #14 on:** June 22, 2021, 07:13:12 pm »

I have never seen a case where I need to trace stack usage to the byte in real time. I don't know of any other implementations that do this, probably because it is not really needed.

You don't need to modify startup code itself, you can fill the top of the stack from the main code. In fact that's what many RTOSes do for the "stack guard" feature. They put a magic word on top of the task stack, and periodically check for that guard value when switching tasks. If the guard value is wrong, then the stack has overflown.

And in this case we are discussing the RTOS use case, so stack pointer would be jumping all the time, so your hardware thing would have to be resettable for each task too.

SiliconWizard · « **Reply #15 on:** June 22, 2021, 09:55:04 pm »

Yep. And yep for having to save and restore this special register upon stack switch as well.

Thing is, you are describing what is being done with existing hardware. But if there was a specific register, as I suggested, to track the stack pointer, it could be used instead, in a much simpler way with less overhead than checking for guard values (and the fact guard values are not 100% foolproof). If it was implemented, why not use it?

It frankly doesn't look like much in terms of hardware. Sure you wouldn't add this on very small cores, but for anything moderately complex, it would be pretty negligible.

The fact this kind of thing hasn't been implemented probably has a number of reasons. One is, as you mentioned, it can more or less be emulated in pure software (but as all software solutions, they are bound to add some overhead AND not be as robust as a pure hardware implementation). Another is, IMHO, that actually very little has been done for stack protection/monitoring if you compare what has been done with all the disastrous stack overflow issues that have plagued software in the last decades. Now on systems that embed a MMU, stack protection can be implemented with the MMU (at least stack overflow/underflow). But there is relatively little that has been done in terms of stack monitoring. I know you mentioned software approaches, but I again think a pure hardware approach would be more robust and have less overhead.

Regarding the usefulness of stack monitoring during program execution, I do think it could give interesting insights. Actually most of the time, we just set up stacks, try to implement reasonable code, and hope for the best. We may implement stack protection when it's available, through periodic checks or exceptions. That would trigger when things have gone bad.

But we often have absolutely no clue how the stack is really being used during program execution. I'm sure it could be interesting, and I bet we would often not expect what we see. I've always found using stacks a bit like shooting in the dark. Just a thought.

brucehoult · « **Reply #16 on:** June 23, 2021, 03:08:46 am »

Quote from: SiliconWizard on June 22, 2021, 09:55:04 pm

Yep. And yep for having to save and restore this special register upon stack switch as well.

Thing is, you are describing what is being done with existing hardware. But if there was a specific register, as I suggested, to track the stack pointer, it could be used instead, in a much simpler way with less overhead than checking for guard values (and the fact guard values are not 100% foolproof). If it was implemented, why not use it?

It frankly doesn't look like much in terms of hardware. Sure you wouldn't add this on very small cores, but for anything moderately complex, it would be pretty negligible.

The fact this kind of thing hasn't been implemented probably has a number of reasons. One is, as you mentioned, it can more or less be emulated in pure software (but as all software solutions, they are bound to add some overhead AND not be as robust as a pure hardware implementation). Another is, IMHO, that actually very little has been done for stack protection/monitoring if you compare what has been done with all the disastrous stack overflow issues that have plagued software in the last decades. Now on systems that embed a MMU, stack protection can be implemented with the MMU (at least stack overflow/underflow). But there is relatively little that has been done in terms of stack monitoring. I know you mentioned software approaches, but I again think a pure hardware approach would be more robust and have less overhead.

Even the smallest commercial RISC-V CPU cores usually implement PMP (Physical Memory Protection) with typically 8 or 16 memory regions which can each be assigned Read/Write/eXecute protections for each of Machine/Supervisor (if it exists)/User modes. This is separate to and much lighter weight than MMU.

On big machines MMU is typically managed by Supervisor-mode software while PMP is used by Machine mode/Hypervisor to stop Supervisor mode OS software from poking around in hardware or other OS areas.

On small machines with an RTOS PMP can be used to protect threads from each other. Of course it needs to be swapped out on task switch, but it's less state than the integer registers (let alone FP) so no big deal, especially if only two or three regions are changed. If every thread uses the same PMP configuration register for its stack region (which of course the RTOS probably would do) and stacks are constrained to be a naturally power of 2 in size (different size and address for each thread) then you need to update a single 32 bit CSR to swap between stack regions.

FE310-G002 and later have PMP (and User mode). So do GD32V and K210.

ataradov · « **Reply #17 on:** June 23, 2021, 03:38:42 am »

You can do the same with MPU on ARM. Swapping all that state sucks. And realistically not worth it, especially given that stack overflow results in unrecoverable error anyway. Simple software check is sufficient in most cases.

brucehoult · « **Reply #18 on:** June 23, 2021, 03:55:07 am »

Quote from: ataradov on June 23, 2021, 03:38:42 am

You can do the same with MPU on ARM. Swapping all that state sucks. And realistically not worth it, especially given that stack overflow results in unrecoverable error anyway. Simple software check is sufficient in most cases.

Only if the typical thread runs only a couple of function calls between task switches! In which case you're being killed by register save/restore anyway and would be better off using continuations not threads.

If you dedicate a register to stack limit (which is not really affordable on 32 bit ARM) then you need two fast instructions in every function to check it. If the stack limit has to be loaded from memory then that's at least a slow load instruction as well. It's significant code bloat as well.

brucehoult · « **Reply #19 on:** June 23, 2021, 03:57:39 am »

Quote from: golden_labels on June 20, 2021, 02:07:17 pm

Quote from: Veketti on June 20, 2021, 01:56:16 pm
First of all variable "res" is not with * inside the ftoa -function.
Because:
Code: [Select]
a[b] == *(a + b)Quite literally, including both + and the square brackets operators being commutative.

Even more than that (which maybe you know as you mentioned both being commutative, but the OP won't):

Code: [Select]

a[b]  == *(a + b) == *(b + a) == b[a]

Nusa · « **Reply #20 on:** June 23, 2021, 08:07:47 am »

As for declarations, these are all equivalent:
char *res;
char * res;
char* res;
char*res;

You'll see the first three if you see enough code, the last one not so much. The majority of spaces in C are there for human readability and style purposes, not because they're required.

golden_labels · « **Reply #21 on:** June 23, 2021, 09:40:32 am »

Quote from: brucehoult on June 23, 2021, 03:57:39 am

Even more than that (which maybe you know as you mentioned both being commutative, but the OP won't):

Indeed I do and initially I entered that code, but later removed it to not confuse OP.

Nominal Animal · « **Reply #22 on:** June 23, 2021, 01:51:36 pm »

Quote from: SiliconWizard on June 22, 2021, 04:31:45 pm

Which is giving me an idea that I can try on my RISC-V core. In a typical architecture in which the stack pointer is growing downwards, we could just keep track of the *minimum* value of the stack pointer every time it changes, and store that in a dedicated register (like in a CSR.)

If you expand that idea, so that every access to memory is verified in hardware to be within an allowed range or an interrupt is triggered, you'll end up with a segmented memory model with linear address space.

Due to experimenting with named address spaces in C and C++ (no, they are definitely not standardized; just an extension implemented by GCC for C and clang for C and C++), I've been surprised to find how extremely useful these are when they do have hardware support. (Even on x86-64, there are essentially four independent address spaces: the "default" one, the stack (accessed via the SS segment), and FS and GS segments. On many AVRs, being based on Harvard architecture, there are two: code ("progmem"), and data.)

Named address space support as implemented by gcc and clang use the type of a pointer variable to indicate its address space; and this can be overridden/cast via explicit expressions. The named address space in the type specification does affect overloading and templates (C++) and type generics (C _Generic), so with a workable software engineering mindset, these Just Work.

It seems to me OP has the misguided understanding that types and type specifications are just specifications for how the underlying hardware memory access is done; and that the confusion stems from this fundamental misconception. (You could, and I'm sure someone will, claim that "You're wrong! That's exactly what types are!", based on the fact that compilers and interpreters use the type to perform the correct memory access; to that, I retort that just because a human can be eaten, does not make humans "food".)

In a much more fundamental sense, type specifies all the known information about the variable or memory access expression at that point, that the value of the variable or expression does not.

(Not realizing this, and being fixated on their own definitions, is why the GCC C++ front-end developers and C++ standard committee claims that named address space support is impossible/"too hard" to implement for C++; and inversely, having realized the above, the Clang/LLVM C++ developers just went and implemented it because it was needed for OpenCL anyway. I'm very tempted to draw a parallel to an universal list-based structural markup format we could replace XML with, and both significantly increase the parsing speed and minimizing RAM usage (especially important for IoT) while allowing basically a complexity explosion in the options for specifying structured metadata associated with each node; and how getting it accepted or standardized without first implementing actual real-world examples handily beating their current XML/XML-derivative "competitors" –– and even afterwards! –– is basically impossible because most human minds are set in their ways. Smart people can be scarily stupid, you see.)

In C, the exact meaning of the asterisk (*) is heavily dependent on the context. It can be the binary multiplication operator, the unary dereferencing operator, or a pointer type specifier. Similarly, the ampersand (&) can be the binary 'and' operator, or the unary address-of operator. So, one of the first skills one needs to cultivate, is the skill of recognizing what context to apply to any statement, lexical sequence, or expression.

For type specifications – for example, when specifying what kind of parameters a function takes –, the asterisk-as-pointer-type-specifier is particularly simple: it reads as "is a pointer to". Type specifications themselves are split at asterisks into type specifier sets. The order within a type specifier set is irrelevant: const volatile int, volatile const int, int volatile const, volatile int const, and so on, are all equal; although many humans prefer a specific order to keep things familiar. The order of the sets, however, is important: they are read from right to left. A final wrinkle is that if the type specification names a variable, any const or volatile preceding the variable name without an asterisk in between, means those are associated with the variable and not the type.

For example, const volatile int *const x;says, quite literally, "x is a const pointer to a const volatile int". The first const in the English statement, corresponding to the rightmost const in the C expression, is a promise to the compiler that the code does not try to change the value of variable x. The other const, the leftmost const in the C expression, is a promise to the compiler that the code does not try to change the value of whatever this pointer refers to. The volatile tells the compiler that although this code won't try to change that value, other code may, and therefore the compiler should not try and cache the value. Aside from the asterisk * that tells us the declared variable x is a pointer, the only thing left is the int: it specifies the type of the object that variable x points to, is an int.

The same simple logic applies to all type specifications. For function pointers, the fact that the pointer variable name is in the middle with its parameter specification in parentheses to the right does make things harder to read, but the rules stay the same.

Type specifications can also appear in cast expressions. In C, a cast is an operation that affects the type of a variable or an expression. These are common in "accessor" functions: small functions whose purpose is to make code more maintainable and easier to read. For example, if you have a binary communications protocol, you might wish to have an accessor function that can convert four consecutive bytes in a specific byte order ("endianness") to a 32-bit signed integer:

Code: [Select]

static inline int32_t  get_s32_le(const void *ref)
{
    const unsigned char *const buf = ref;  /* = (const unsigned char *const)ref */
    return (int32_t)(  ((uint32_t)buf[0])
                    | (((uint32_t)buf[1]) << 8)
                    | (((uint32_t)buf[2]) << 16)
                    | (((uint32_t)buf[3]) << 24));
}

However, some "programmers" think they need to write clever code minimizing the length to show how "good" they are, so they may instead write the above as a macro

Code: [Select]

#define  S32_LE(ptr)  ((int32_t)( *((const unsigned char *)(ptr)+0) \
                                | ((uint32_t)(*((const unsigned char *)(ptr)+1)) << 8) \
                                | ((uint32_t)(*((const unsigned char *)(ptr)+2)) << 16) \
                                | ((uint32_t)(*((const unsigned char *)(ptr)+3)) << 16) ))

or even as

Code: [Select]

#define  S32_LE(ptr)  ({ const unsigned char *_p = (ptr); (int32_t)( p[0] | (((uint32_t)p[1]) << 8) | (((uint32_t)p[2]) << 16) | (((uint32_t)p[3]) << 24) );})

and congratulate themselves for having "optimized" the heck out of this operation, without realizing that using either GCC or Clang with typical/recommended optimizations enabled (-O2), in the final binaries, all three end up generating the same machine code. Yet, two of the three are quite unreadable (thus likely sources of bugs; can you find the one that I might have inserted there on purpose?).

True, the static inline function does have unnecessary code – it's verbose like me; but not for verbositys sake, only to try and convey the underlying concepts and ideas in as useful form as possible – like the cast of the first byte to 32-bit unsigned integer. However, since they generate no extra code, whether one should keep or drop them, depends on which form one believes is the most efficiently maintainable one: which form is easiest to maintain in the long term, keeping the probability of a bug (being accidentally introduced and/or left unnoticed in this code) as low as possible.

If the couple of us still left reading this novel of a post circle back to the original question at hand, using what we learned just above (with the named address spaces being just a spice on top to remind us how oddly useful and strange types can be), we'll find we have the following:

We have functions void ftoa(float v, char *p, int n); and int intToStr(int v, char *p, int n);
We have char res[20]; and float n;
We call ftoa(n, res, 4);
Within function ftoa(), we have int i = intToStr( (int)v, p, 0);.

A detail in C not yet discussed is that the name of an array variable "decays" to a pointer to its first element.
That is, in a very real sense, res is an array of 20 chars, but when we use res in an expression (that does not just specify a type, so excluding for example expressions like sizeof res, which evaluates to 20 and not to sizeof (char *)), it behaves like it was declared as char *res;.
Indeed, (res + 1) == &(res[1]) is true.

This means that when main() calls ftoa(n, res, 4), res decays to a pointer to the first element in an array of 20 chars. The definition of the function ftoa() says the second parameter is a pointer to char, so this is absolutely fine.

To convert the integer part of nto i chars, ftoa() calls the equivalent of int i = intToStr( (int)v, p, 0); . Why is the second parameter just p and not &p? Because p is a pointer to char, that's why. &p would pass a parameter of type char **: a pointer to a pointer to a char.

Because of pointer arithmetic and array variables decaying to pointers to their first element in C, there is no distinction in C between pointers pointing to a single element, and pointers pointing to several consecutive elements. Put simply, there is no reason to expect p above to point to a single character. If there is no separate parameter specifying how many there are, we just do not know. If the code overflows the buffer, then we can just say "ouch, that wasn't what I intended", and then fix it.

It would be much better to use something like the following function signature here:
char *float2str(char *buffer, size_t buflen, float value, int decimals);
The first parameter is a pointer to the array of characters used to store the resulting string. (A string in C is just an array of chars with a terminating nul byte, '\0', at end.) The second parameter is the number of chars in that array. Because we have the wonderful sizeof operator (and in C, sizeof (char) == 1 always), this includes the string-terminating nul byte at the end. The third parameter is the value to be converted, and the fourth is the number of decimal digits desired. The function returns a pointer to the first character of the string describing the value as a decimal number, stored somewhere within the specified buffer, but not necessarily at the beginning of the buffer.
If there is a problem, for example the buffer is too small for the number of decimals desired, the function can return a NULL pointer indicating the conversion was impossible. (For embedded code, I would use a dedicated "error" pointer, one that always points to a nul byte, however. That way checking for errors would be optional, but barring implementation bugs, it would Always Just Work.)

The trick to implementing such a function efficiently, is to start at the decimal point. The integral and fractional parts are constructed separately; it does not really matter which one first. Operate on magnitudes only (absolute values); and if the original value was negative, prepend a '-' just before the most significant decimal digit. The integral part advances left via repeated division (of the integral part only) by ten, and the fractional part right via repeated multiplication (of the fractional part only) by ten. The same approach works fine for all fixed-point formats also, and you only need integer division-by-ten-with-remainder (the remainder corresponding to the digit at that position), and fractional multiplication by ten (followed by extracting the integral part of the result as the digit, with rounding ever applied to the last presented digit.

If anyone is interested, I can post an example implementation; however, I'd prefer if OP tried their hand at it first. Not only is it interesting – implemented this way the function is much faster than e.g. snprintf() in hosted C environments, while still yielding the exact same string for all finite floats/fixed-point numbers – so there is real motivation to implement and test one of their own, but everything I blabbed about above about learning the proper context, is easier to learn in practice. You'll find that whenever you read or write code that specifies a type, you automatically think in terms of "type specification context and syntax".
The English equivalent is that lead and lead do not rhyme.

SiliconWizard · « **Reply #23 on:** June 23, 2021, 04:24:25 pm »

Quote from: Nominal Animal on June 23, 2021, 01:51:36 pm

Quote from: SiliconWizard on June 22, 2021, 04:31:45 pm
Which is giving me an idea that I can try on my RISC-V core. In a typical architecture in which the stack pointer is growing downwards, we could just keep track of the *minimum* value of the stack pointer every time it changes, and store that in a dedicated register (like in a CSR.)
If you expand that idea, so that every access to memory is verified in hardware to be within an allowed range or an interrupt is triggered, you'll end up with a segmented memory model with linear address space.

Yes, but to you and those who mentioned something similar: keep in mind what I said above: what you describe is memory protection. What I suggested was for code instrumentation (with stacks in mind here, but I do have a knack for code instrumentation anyway.) Those are two different things.

(Just a thought about memory protection for stacks - but we had a long discussion about stacks a while ago, so it's probably already all there: you can define a memory area for a stack and protect it, but there's little way you can prevent some program to write outside of the stack area, when it means to write inside of it, if this 'outside' location is itself another memory area that is allowed to be written to. Sure you can leave some space between the stack and any other memory area to mitigate this, but this is assuming any access to the stack would be strictly monotonous, meaning if the program ever had a stack overflow, it would necessarily overflow *right* after the allowed stack area. This certainly can't catch *all* faulty accesses to the stack, but again, let's refer to the other thread, because I think we discussed this, and I'm already sorry to have gotten a bit off-topic here.)

There's a number of stack analysis tools on the market. Some are purely static analysis tools (and a few commercial ones are pretty expensive). Some are dynamic tools. The dynamic approaches usually consists in adding instrumentation code. Filling stacks with magic values to estimate how much stack was used, as someone mentioned above, is a classic. I'm just suggesting something implemented in hardware. Now of course, something like this would work only for languages sticking to the standard ABI.

A word on static analysis tools: GCC can give you the stack usage of each defined function with the '-fstack-usage'. Although it can be interesting, it obviously doesn't tell you much about total stack usage as it doesn't perform call tree analysis combined with this, so you don't have an estimated total stack usage. I haven't found any open-source tools that can do this, but if anyone knows of one, please share.

DiTBho · « **Reply #24 on:** June 23, 2021, 04:27:23 pm »

Quote from: brucehoult on June 23, 2021, 03:08:46 am

Even the smallest commercial RISC-V CPU cores usually implement PMP (Physical Memory Protection) with typically 8 or 16 memory regions which can each be assigned Read/Write/eXecute protections for each of Machine/Supervisor (if it exists)/User modes. This is separate to and much lighter weight than MMU.

Hey? That's awesome! I had to pay extra money to buy MIPS32 commercial cores with PMP implemented. I really like to hear it's a default feature with RISC-V commercial cores

SiliconWizard · « **Reply #25 on:** June 23, 2021, 04:31:29 pm »

Quote from: DiTBho on June 23, 2021, 04:27:23 pm

Quote from: brucehoult on June 23, 2021, 03:08:46 am
Even the smallest commercial RISC-V CPU cores usually implement PMP (Physical Memory Protection) with typically 8 or 16 memory regions which can each be assigned Read/Write/eXecute protections for each of Machine/Supervisor (if it exists)/User modes. This is separate to and much lighter weight than MMU.

Hey? That's awesome! I had to pay extra money to buy MIPS32 commercial cores with PMP implemented. I really like to hear it's a default feature with RISC-V commercial cores

Before you get too excited, you'd probably need to compare the cost of the respective IPs, though.

Nominal Animal · « **Reply #26 on:** June 23, 2021, 05:30:33 pm »

Quote from: SiliconWizard on June 23, 2021, 04:24:25 pm

Yes, but to you and those who mentioned something similar: keep in mind what I said above: what you describe is memory protection. What I suggested was for code instrumentation (with stacks in mind here, but I do have a knack for code instrumentation anyway.) Those are two different things.

Agreed; that is also why I qualified it with "If you expand the idea".

The security aspect can obviously used for instrumentation (at a suitable granularity), by simply having the out-of-bounds access interrupt expand the region. It is probably not useful to make it byte-granular, but it does cover all accesses to that segment.

Quote from: SiliconWizard on June 23, 2021, 04:24:25 pm

you can define a memory area for a stack and protect it, but there's little way you can prevent some program to write outside of the stack area, when it means to write inside of it, if this 'outside' location is itself another memory area that is allowed to be written to.

That is precisely why the segmented memory model so intrigues me: because the "segment register" acts as the key to the address space, it does affect all accesses. For a stack, it means that an OS on x86-64 could trivially have separate segments (and segment descriptors) for stack data, and initially only allocate a single page; and whenever a page fault in that segment occurs, the OS kernel could expand the mapping up to whatever process-specific limits. There would be some overhead/slowdown, but I guess it would be neglible compared to the issues with fixed-size per-thread stacks in e.g. Linux. (Even virtual address space without backing pages, costs resources.)

What I don't know how to implement, is any sort of automatic shrinking of such segments; or even how to instrument it. Yes, you can track accesses, but only the process itself knows when the data in that segment is no longer needed; perhaps it knows that it will never need more than N bytes of stack, and uses a region beyond that as an extra scratchpad or something.

It is also the reason why I find named address spaces (on any hardware that can use them, not just on hardware that has to use them like Harvard architecture AVRs without an unified address space) so darned intriguing and useful in embedded/freestanding C and C++. It just makes all the complexity and overhead *vanish*.

(I've experimented a bit about vectors or indirect addressing modes including address space identifiers within the value, or address of the pointer, some time ago, and how to expose that to a C or C++ like systems programming language. I admitted defeat there; it is technically possible, but the complexity needed and overhead spent is way more than what benefits I could squeeze out of it. "Segment register overrides", or instruction prefixes/modifiers that specify the nonstandard address space the instruction operates on, with a couple of CSRs defining those address spaces, is just delightfully simple and effective in comparison. Granted, I have NOT implemented anything on an FPGA yet, so my opinion on this is subject to change once I do.)

SiliconWizard · « **Reply #27 on:** June 23, 2021, 05:39:27 pm »

Quote from: Nominal Animal on June 23, 2021, 05:30:33 pm

Quote from: SiliconWizard on June 23, 2021, 04:24:25 pm
you can define a memory area for a stack and protect it, but there's little way you can prevent some program to write outside of the stack area, when it means to write inside of it, if this 'outside' location is itself another memory area that is allowed to be written to.
That is precisely why the segmented memory model so intrigues me: because the "segment register" acts as the key to the address space, it does affect all accesses. For a stack, it means that an OS on x86-64 could trivially have separate segments (and segment descriptors) for stack data, and initially only allocate a single page; and whenever a page fault in that segment occurs, the OS kernel could expand the mapping up to whatever process-specific limits.

OK, with this example, I can see the benefits of a segmented memory model. (Of course not the horrible segments of non-protected x86!)

Actually, any memory access could be implemented as a pair of some key (that you can call "segment") identifying a memory block with given access rights, and an address (offset) within this block. This scheme can of course be used also for MMIO.

Flat address spaces are more convenient to use, but they are pretty crappy security-wise.

Nominal Animal · « **Reply #28 on:** June 23, 2021, 08:51:21 pm »

Quote from: SiliconWizard on June 23, 2021, 05:39:27 pm

Actually, any memory access could be implemented as a pair of some key (that you can call "segment") identifying a memory block with given access rights, and an address (offset) within this block. This scheme can of course be used also for MMIO.

Exactly. For most accesses, like on x86-64 (aka AMD64), the default key is used (as implied by the instruction).

My experiments thus far have shown that encoding the key in the pointer value does not really work. You do want the instruction to specify the key, and not the address being referred to. In C and C++, this maps surprisingly well to the named address space extensions.

Encoding the key in the address value seems interesting at first. I examined using the least significant bits of a base address, with base addresses having sufficiently large alignment requirements (64 bits or 8 bytes seems a good value on 64-bit arches; 32 bits or 4 bytes would be okay on 32-bit ones, I believe, based on examining how segments and named address spaces are used in current code). It just does not map at all to the sort of continuous linear address spaces C and C++ expects.
After that frustration, seeing how "segments" as address space overrides, and expressing those via GCC C or clang C/C++ named address space extensions, surprised the heck out of me.

On an embedded architecture without virtual memory of any sort, the protection mechanism is still useful. It does mean that there are quite likely ways to get around the limitations, say by using DMA memory-to-memory copy operation; but the thing that worries me wrt. embedded architectures is the stack.

I'd really want to have stack pointer arithmetic operations (assignment, addition and subtraction, push and pop) and stack pointer relative addressing modes to be compared against a pair of limiting registers –– even if it added an extra clock cycle to all stack-based instructions. At least that way you would find out that when the device crashed due to stack overflow, that it indeed was due to stack overflow and not something else. The software running on the device might not be able to do much besides let the user know somehow, and restart – or perhaps the interrupt can only occur after the access has already occurred –; but I am absolutely certain that for those of us humans that care why a crash occurred, the overhead and costs associated with this would be worth it.

I don't think I'm the only one who has wondered how to insert a stack pointer check to the preamble of a specific function or specific functions, on embedded architectures, just to be able to know whether at that point, the stack pointer looks sane enough to proceed.

SiliconWizard · « **Reply #29 on:** June 23, 2021, 10:20:18 pm »

Quote from: Nominal Animal on June 23, 2021, 08:51:21 pm

My experiments thus far have shown that encoding the key in the pointer value does not really work. You do want the instruction to specify the key, and not the address being referred to.

Wherever you encode the "key" wouldn't really matter IMO. There are drawbacks and benefits either way.

What I have in mind to be more flexible, but still more secure, would be to add a field in the "segment descriptor" (or equivalent) specifying which instruction(s) are allowed to access this particular memory block. Read, write, execute rights are common flags, but I don't think I have ever seen this: restricting access to some specific instructions.

Thus, you could allow access to some memory block only for some instructions, and not others.
For stack manipulation, having specific instructions (instead of just using general-purpose load/store instructions) would definitely help here. To avoid making the ISA too complex, it could just be a matter of encoding a flag in the load/store instructions telling if they are accessing a stack or not. Would just cost one bit.

ataradov · « **Reply #30 on:** June 23, 2021, 10:58:46 pm »

But auto variables are stored on the stack, and they should be accessed by regular instructions.

I feel like you are over complicating things that are not a real problem.

DiTBho · « **Reply #31 on:** June 23, 2021, 11:26:52 pm »

Quote from: ataradov on June 23, 2021, 10:58:46 pm

I feel like you are over complicating things that are not a real problem.

I do feel so. The same feeling. I mean, I wouldn't do it

DiTBho · « **Reply #32 on:** June 23, 2021, 11:55:10 pm »

Quote from: Nominal Animal on June 23, 2021, 08:51:21 pm

At least that way you would find out that when the device crashed due to stack overflow, that it indeed was due to stack overflow and not something else.

With my little 68hc11 I have a "memory segregation" unit for the stack slice (2Kbyte of 62Kbyte addressable, 2Kbyte are reserved).

It's a simple circuit attached to the address bus. It does the following:

Code: [Select]

is_stack_ok = (SP >= SP_begin) and (SP < SP_end);
if is_stack_ok is false ('0') it raises an interrupt to inform the CPU about stack the overflow/underflow.

Unfortunately the 68hc11 doesn't have any separation between kernelspace and userspace, the stackpointer is physically the same register reloaded according to the working condition, if an application does something bad, there is no way to resume, but I added a flip-flop with a LED, so if the CPU crashes on a program, I can visually see what happened.

The typical scenario is: d'oh, it just crashed. WTF !? Oh, see, the red segregation memory LED is on, was there any issue with the stack? Really? Maybe too many nested function calls? Let's check it out ... 70% of times, a couple of interactions (sometimes many many more) later: problem solved.

No doubt it's wild, but it has worked well for a long time. The flip flop is reset by a GPIO or by reset

Nominal Animal · « **Reply #33 on:** June 24, 2021, 12:06:04 am »

Quote from: ataradov on June 23, 2021, 10:58:46 pm

I feel like you are over complicating things that are not a real problem.

I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.

brucehoult · « **Reply #34 on:** June 24, 2021, 12:26:52 am »

Quote from: DiTBho on June 23, 2021, 04:27:23 pm

Quote from: brucehoult on June 23, 2021, 03:08:46 am
Even the smallest commercial RISC-V CPU cores usually implement PMP (Physical Memory Protection) with typically 8 or 16 memory regions which can each be assigned Read/Write/eXecute protections for each of Machine/Supervisor (if it exists)/User modes. This is separate to and much lighter weight than MMU.

Hey? That's awesome! I had to pay extra money to buy MIPS32 commercial cores with PMP implemented. I really like to hear it's a default feature with RISC-V commercial cores

It's a standard option. It may or may not be a zero cost option. I have no idea.

If you go to the SiFive Core Designer at ...

https://scs.sifive.com/core-designer/customize/96d93240-cade-4c9e-9987-a12c8d15c9ad/

... and click "security" you'll see that you can adjust the number of PMP regions to any number from 1 to 16, or eliminate PMP entirely (which also removes User mode).

Do you save licensing cost by removing PMP and User mode? I don't know. Your chip will be smaller, certainly.

DiTBho · « **Reply #35 on:** June 24, 2021, 12:41:57 am »

Quote from: Nominal Animal on June 23, 2021, 05:30:33 pm

What I don't know how to implement

Why don't you write a Python simulator? So you can see if it works, and how good it goes

(I said "Python" only because I do find it as one of the most comfortable and rapid development language, but it's just "me" ... ).

ataradov · « **Reply #36 on:** June 24, 2021, 12:45:58 am »

Quote from: DiTBho on June 23, 2021, 11:55:10 pm

It's a simple circuit attached to the address bus. It does the following:

Not going to work on ARM or any RISC. Compilers rarely actually update SP. They use SP-relative addressing. So SP itself would be in the range, but instruction accessing the memory would still access outside of the stack space.

This would catch instructions that implicitly use SP, like push and pop, but not much more.

ataradov · « **Reply #37 on:** June 24, 2021, 12:46:43 am »

Quote from: Nominal Animal on June 24, 2021, 12:06:04 am

I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.

This is not true. There are already sufficient support in the hardware and software to catch stack overflows.

SiliconWizard · « **Reply #38 on:** June 24, 2021, 12:49:10 am »

Quote from: Nominal Animal on June 24, 2021, 12:06:04 am

Quote from: ataradov on June 23, 2021, 10:58:46 pm
I feel like you are over complicating things that are not a real problem.
I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.

Well, to be honest, maybe I was going a little too far with my "per-instruction access rights", but I was like thinking out loud.

I do agree and think that certainly not enough has been done for software security. Now the "good" ideas are hard to come by.

SiliconWizard · « **Reply #39 on:** June 24, 2021, 12:55:44 am »

Quote from: ataradov on June 24, 2021, 12:45:58 am

Quote from: DiTBho on June 23, 2021, 11:55:10 pm
It's a simple circuit attached to the address bus. It does the following:
Not going to work on ARM or any RISC. Compilers rarely actually update SP.

Ah, that's not true, at least on RISC-V, for which I have extensively worked with assembly. Although solely relying on checking SP is not completely robust indeed, the compilers I've used do update SP. At least with GCC, everytime the stack is being used (for local variables and/or register saving), it starts by emitting an instruction decreasing SP by the amount it needs, and then accesses the stack with offsets indeed, but only positive offsets. Even at the most aggressive optimization levels. At least that's what I've seen so far.

But, I agree this is compiler-dependent, and checking the register defined as SP in the ABI is not a 100% reliable way of handling this.

brucehoult · « **Reply #40 on:** June 24, 2021, 12:58:32 am »

Quote from: Nominal Animal on June 23, 2021, 05:30:33 pm

Quote from: SiliconWizard on June 23, 2021, 04:24:25 pm
Yes, but to you and those who mentioned something similar: keep in mind what I said above: what you describe is memory protection. What I suggested was for code instrumentation (with stacks in mind here, but I do have a knack for code instrumentation anyway.) Those are two different things.
Agreed; that is also why I qualified it with "If you expand the idea".

The security aspect can obviously used for instrumentation (at a suitable granularity), by simply having the out-of-bounds access interrupt expand the region. It is probably not useful to make it byte-granular, but it does cover all accesses to that segment.

Quote from: SiliconWizard on June 23, 2021, 04:24:25 pm
you can define a memory area for a stack and protect it, but there's little way you can prevent some program to write outside of the stack area, when it means to write inside of it, if this 'outside' location is itself another memory area that is allowed to be written to.
That is precisely why the segmented memory model so intrigues me: because the "segment register" acts as the key to the address space, it does affect all accesses.

Sure. This was all done 40 years ago or more. Many minicomputers had this functionality. Data General ones, for example. And I think Pr1me. Probably others. HP? You can make loading the segment base and limit registers something only the OS can do, not the user program.

Such machines also tended to have "gates" for system calls. You didn't load a syscall number into a register and use a single syscall instruction -- you just did a normal function call into the OS address space. But because the caller and callee were in a different protection "ring", the call instruction would check that the address called held some special data structure specifying that it was an allowed entry point to the OS. These were rather complex and CISCy but you could achieve pretty much the same thing in a RISCy way by just requiring the first instruction of the entry point to be a special instruction indicating that it was allowed to be called. (We are getting some of the same ideas in modern CPUs as part of "control flow integrity" to prevent ROP and other security breaches)

The 386 provided basically all of this. And it was pretty much never used because existing operating systems and programming languages weren't built around using such things.

Quote

What I don't know how to implement, is any sort of automatic shrinking of such segments; or even how to instrument it. Yes, you can track accesses, but only the process itself knows when the data in that segment is no longer needed; perhaps it knows that it will never need more than N bytes of stack, and uses a region beyond that as an extra scratchpad or something.

Most ABIs say that everything below the current stack pointer (or below a "red zone") can be arbitrarily nuked by the OS, by interrupts, by whatever. There are debugging tools that do this regularly. Boehm GC clears several KB of memory below the stack pointer at the start of every garbage collection, to prevent dead pointers being traced. I've never seen this cause a problem, and it's certainly a bug if it does.

So, you can on some regular basis clear the accessed and/or dirty bits on stack segment pages and some time later return any pages beow SP that haven't been used in that time period to the OS using mmap with MADV_DONTNEED or equivalent.

brucehoult · « **Reply #41 on:** June 24, 2021, 01:02:05 am »

Quote from: SiliconWizard on June 23, 2021, 10:20:18 pm

Thus, you could allow access to some memory block only for some instructions, and not others.
For stack manipulation, having specific instructions (instead of just using general-purpose load/store instructions) would definitely help here. To avoid making the ISA too complex, it could just be a matter of encoding a flag in the load/store instructions telling if they are accessing a stack or not. Would just cost one bit.

Having load/store instructions with a flag bit is exactly the same thing as having normal load/store and special load/store stack instructions. It's just a difference in how you write it in assembly language, not a difference in the binary encoding.

SiliconWizard · « **Reply #42 on:** June 24, 2021, 01:07:02 am »

Quote from: brucehoult on June 24, 2021, 12:58:32 am

The 386 provided basically all of this. And it was pretty much never used because existing operating systems and programming languages weren't built around using such things.

Yes, I remember having studied the 386(+) protected mode, and it allowed pretty nice stuff. Actually, whereas the x86 ISA itself was pretty mediocre, the protected mode was not too badly thought out IMHO.

I was always baffled seeing how most OSs were not making good use of it (except a couple experimental OSs I had seen). Sure that would have required some serious changes, but that was not rocket science either. Oh well. Just thinking how long we've had to wait till Windows actually made use of the execute flag.

SiliconWizard · « **Reply #43 on:** June 24, 2021, 01:10:13 am »

Quote from: brucehoult on June 24, 2021, 01:02:05 am

Quote from: SiliconWizard on June 23, 2021, 10:20:18 pm
Thus, you could allow access to some memory block only for some instructions, and not others.
For stack manipulation, having specific instructions (instead of just using general-purpose load/store instructions) would definitely help here. To avoid making the ISA too complex, it could just be a matter of encoding a flag in the load/store instructions telling if they are accessing a stack or not. Would just cost one bit.

Having load/store instructions with a flag bit is exactly the same thing as having normal load/store and special load/store stack instructions. It's just a difference in how you write it in assembly language, not a difference in the binary encoding.

Oh yes I know. By that I was just meaning that the instructions themselves could otherwise be identical and thus there wouldn't be any overhead implementing this on a hardware level.

But, as was noted, this is a can of worms. Because most languages make use of stacks that can be accessed as regular memory, in particular through pointers, and thus there is no way to make this really work. Except if we completely change the programming model...

brucehoult · « **Reply #44 on:** June 24, 2021, 01:10:55 am »

Quote from: ataradov on June 23, 2021, 10:58:46 pm

But auto variables are stored on the stack, and they should be accessed by regular instructions.

On modern machines with a sufficient number of registers, auto variables are stored in registers UNLESS they have their address taken.

If you run out of registers then some auto variables may be on the stack. The function that declares them knows this, and can use the special stack access instructions.

The difficulty is if the address is taken and passed to another function. Then the other function that uses the address has no way of knowing that special stack access instructions are needed.

This is a rather C-centric problem. Many other programming languages can't take the address of an auto variable.

In any case it's easily solved. The compiler knows which variables are affected and can store them on the heap instead of on the stack, with a malloc() at the start of the function and a free() at the end. This is probably a rare enough thing to be low overhead.

Nominal Animal · « **Reply #45 on:** June 24, 2021, 01:27:39 am »

Quote from: brucehoult on June 24, 2021, 12:58:32 am

The 386 provided basically all of this. And it was pretty much never used because existing operating systems and programming languages weren't built around using such things.

I take it you never read the Speeding Up Thread-Local Storage Access report by A. Oliva and G. Araújo, 2005? This is used by the Linux kernel and standard C library for thread-local storage, right now, on AMD64 – via the separate FS segment address space.

(And it is a separate address space, too. The same offset to the FS segment refers to different memory, depending on the current thread doing the access, as the FS segment base address varies. The protection features are not used. The toolchain even has a dedicated relocation entry type (tpoff) for offsets to these objects.)

In particular,

Code: [Select]

static __thread int  foo;
int get_foo(void) { return foo; }
int *get_foo_addr(void) { return &foo; }

compiles to

Code: [Select]

get_foo:
    movl    %fs:foo@tpoff, %eax
    ret

get_foo_addr:
    movq    %fs:0, %rax
    addq    $foo@tpoff, %rax
    ret

where foo@tpoff is the 32-bit ELF-relocated TLS offset (@tpoff being the relocation type). As you can see, address 0 relative to the FS segment contains the base address of the object in the standard address space.

Nominal Animal · « **Reply #46 on:** June 24, 2021, 01:38:46 am »

Quote from: ataradov on June 24, 2021, 12:46:43 am

Quote from: Nominal Animal on June 24, 2021, 12:06:04 am
I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.
This is not true. There are already sufficient support in the hardware and software to catch stack overflows.

That statement pair is evidence of my claim.

The support for catching stack overflows is based on heuristics. It is not deterministic. By the very definition, your "sufficient" can only mean "to the extent that I care". I find the extent to which you care, lacking.

The existing support on architectures like AVR is limited to canaries. Or does your "hardware and software" exclude AVR, too? A single byte sized stack canaries have a one in 256 chance of not recording an overflow, regardless of whether the canary value is a fixed constant or not. Perhaps you find those odds "sufficient"? Multibyte canaries can reduce the probability, but not eliminate it. Thus, heuristic, not deterministic.

Whatever the exact definitions to your words are, I definitely have observed a consistent attitude from you that it is not worth your time to consider "rarely" occurring problems, even when those problems cause a crash or reboot of the affected device. I disagree. I do fully acknowledge that your attitude makes much more business sense, is the one employers prefer, and is likely to lead to commercial success; while mine does not.

brucehoult · « **Reply #47 on:** June 24, 2021, 01:49:26 am »

I haven't read that paper, but I'm well aware of the use of FS for thread local storage.

Other ISAs with more registers than i386's eight can just dedicate a general purpose register for this e.g. TP is register 4 in the standard RISC-V ABI. If I recall correctly, arm64 has a CSR that contains a pointer to thread-local storage and this is moved to a GP register in each function that needs it.

ataradov · « **Reply #48 on:** June 24, 2021, 01:50:00 am »

Quote from: SiliconWizard on June 24, 2021, 12:55:44 am

Ah, that's not true, at least on RISC-V

So what is the problem overflowing it from the other side? Or really any side you want with the code like this:

Code: [Select]

int foo(int a)
{
  volatile char zz[10];
  zz[a] = 1;
  return zz[a];
}

And the resulting assembly:

Code: [Select]

0000000000000000 <foo>:
   0:	1141                	addi	sp,sp,-16
   2:	081c                	addi	a5,sp,16
   4:	953e                	add	a0,a0,a5
   6:	4785                	li	a5,1
   8:	fef50823          	sb	a5,-16(a0) // OOPS
   c:	ff054503          	lbu	a0,-16(a0)
  10:	0141                	addi	sp,sp,16
  12:	0ff57513          	zext.b	a0,a0
  16:	8082                	ret

Call this with foo(100) or foo(-100) and it will overwrite the memory well outside the stack, while SP is perfectly fine.

And this is the most common way stack overflows happen, so any system that doers not catch this is not worth considering.

brucehoult · « **Reply #49 on:** June 24, 2021, 02:42:38 am »

Wow gcc could use some improvement there.

The second instruction is unnecessary, as the 2nd instruction could just be add a0,a0,sp and change the offsets on sb and lbu to the more natural 0. And the zext.b is completely unnecessary as the byte was loaded unsigned.

I guess you compiled this for rv64 as I got -12 offset on the sb/lbu in rv32.

Was it done with the system compiler on a BeagleV/Unmatched?

ataradov · « **Reply #50 on:** June 24, 2021, 02:46:01 am »

Yes, I noticed the code quality too. This was compiled with riscv64-unknown-elf-gcc from SiFive's binary distribution (gcc version 10.2.0 (SiFive GCC-Metal 10.2.0-2020.12.8 )).

The only optimization flag was -O3.

brucehoult · « **Reply #51 on:** June 24, 2021, 03:12:55 am »

It's the same as long as you have at least -O.

Clang 11.0.1 does better:

Code: [Select]

foo:                                    # @foo
        addi    sp, sp, -16
        addi    a1, sp, 6
        add     a0, a0, a1
        addi    a1, zero, 1
        sb      a1, 0(a0)
        lbu     a0, 0(a0)
        addi    sp, sp, 16
        ret

Though, again, the add 6 could be removed and 6 used as the offset for the sb and lbu.

I feel as if, after a few years of playing catchup, leading edge RISC-V compiler activity has shifted to LLVM in the last 6-12 months. For example support for both B and V extensions is better in LLVM.

westfw · « **Reply #52 on:** June 24, 2021, 03:41:26 am »

The venerable PDP-10 had a bunch of instructions where the upper half of a register would hold a count (or negative count), and the bottom half would hold an address.You could step through arrays with "Add one to both halves and jump if negative" instructions, and IIRC push/pop would inc/dec both halves and check the count, so you could trap or detect either stack overflows or stack underflows (but not both.)(it all fell apart when people wanted to address more than a megabyte of stuff. :-( (36bit word size for both memory and registers, 18bit address, no byte addressability.)
But it seems like if you have 64bit registers, you could revive that sort of strategy. At least for "moderately sized and possible mapped" address spaces.

(although no one seems to care much. Once you get 16MB+ stacks, you're generally on platforms than implement VM and are already paying for pager context switching, and stack protections is only incrementally more.)

Nominal Animal · « **Reply #53 on:** June 24, 2021, 05:32:31 am »

Quote from: brucehoult on June 24, 2021, 01:49:26 am

I'm well aware of the use of FS for thread local storage.

So why did you claim "pretty much never used", then? I'd say thread local storage on Linux in userspace is pretty damn far from "never used". I'm confused.

Note that this is NOT just a base register dedicated to point to the beginning of TLS; it is a real, completely separate address space. It just happens to be also mapped to be visible in the general address space, and has its starting address in the general address space stored at address zero, to make for C and C++ support easy. (g++ and clang (both C and C++) all generate the same machine instructions as gcc does, at least when using -O2 for all.)

DiTBho · « **Reply #54 on:** June 24, 2021, 08:19:44 am »

Quote from: brucehoult on June 24, 2021, 01:10:55 am

This is a rather C-centric problem. Many other programming languages can't take the address of an auto variable.

Yup, eRlang doesn't have this problem.

DiTBho · « **Reply #55 on:** June 24, 2021, 08:57:41 am »

Quote from: Nominal Animal on June 24, 2021, 01:38:46 am

Quote from: ataradov on June 24, 2021, 12:46:43 am
Quote from: Nominal Animal on June 24, 2021, 12:06:04 am
I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.
This is not true. There are already sufficient support in the hardware and software to catch stack overflows.
That statement pair is evidence of my claim.

The support for catching stack overflows is based on heuristics. It is not deterministic. By the very definition, your "sufficient" can only mean "to the extent that I care". I find the extent to which you care, lacking.

I worked in places where your job requires "Software Considerations in Mission Critical Systems and Mission Critical Equipment" certifications(1), I can assure you for things where a single software bug can kill people, there are commercial ICEs, simulators and tools to catch these defects deterministic-ally.

They is no public documentation, they are not open source, and you need a lot of money to buy them, but they exist!

(1) like life support systems for emergency landings in arctic areas. If the plane has a failure in flight, the firmware of the air-brake must work without any defect, and it's the only thing can save people's life during the landing, it makes the difference between crashing on ice (usually deadly) and having a chance to make a hard, violent, but safe landing.You need to test every single details of your firmware, you have powerful hardware and software tools to achieve it.

Talking about "air-brake", may be it a an air-brake "under steroids" will be used in future missions to Mars. The idea is to use super fast space-ship coupled with gravitational slingshot to reach Mars in shorter time (less than 5 months, probably 2 months), and to use the air-brake to decelerate entering the Martian atmosphere, kind of looooooooong landing with the air-brake activated full time.

The firmware must be perfect, a single error would make the mission fail and worse yet no-help can be sent to Mars (supposing someone survives the crash by using the airbag).

brucehoult · « **Reply #56 on:** June 24, 2021, 09:14:33 am »

Because it's very very far from what the 386 designers intended.

Using *one* segment register for a trivial purpose -- mapped, as you say, into the usual address space, so there's no protection benefit, no expansion of address space benefit, no cheap remapping (within a process) benefit.

It really is using a segment register in a way that a base register would do just as good a job for, if the ISA wasn't so desperately short of registers.

The 386 designers would have thrown up their hands in despair if they knew that was what their work was going to come to.

brucehoult · « **Reply #57 on:** June 24, 2021, 09:28:36 am »

Quote from: DiTBho on June 24, 2021, 08:57:41 am

Quote from: Nominal Animal on June 24, 2021, 01:38:46 am
Quote from: ataradov on June 24, 2021, 12:46:43 am
Quote from: Nominal Animal on June 24, 2021, 12:06:04 am
I know; you've made it clear having devices crash does not bother you, as long as it does no harm to you personally.
This is not true. There are already sufficient support in the hardware and software to catch stack overflows.
That statement pair is evidence of my claim.

The support for catching stack overflows is based on heuristics. It is not deterministic. By the very definition, your "sufficient" can only mean "to the extent that I care". I find the extent to which you care, lacking.

I worked in places where your job requires "Software Considerations in Mission Critical Systems and Mission Critical Equipment" certifications(1), I can assure you for things where a single software bug can kill people, there are commercial ICEs, simulators and tools to catch these defects deterministic-ally.

They is no public documentation, they are not open source, and you need a lot of money to buy them, but they exist!

:
:

Talking about "air-brake", may be it a an air-brake "under steroids" will be used in future missions to Mars. The idea is to use super fast space-ship coupled with gravitational slingshot to reach Mars in shorter time (less than 5 months, probably 2 months), and to use the air-brake to decelerate entering the Martian atmosphere, kind of looooooooong landing with the air-brake activated full time.

I worked here in NZ in 1999 with a guy who previously worked on a spacecraft called "Mars Climate Orbiter".

Apparently that deterministic catching of bugs wasn't good enough, because the mission was lost due to a mismatch between SI and colonial units.

You're going to tell me that was a long time ago and people were stupid then and such a thing can't happen now?

Nominal Animal · « **Reply #58 on:** June 24, 2021, 03:35:56 pm »

Quote from: brucehoult on June 24, 2021, 09:14:33 am

Because it's very very far from what the 386 designers intended.

Uh, who cares what the designers intended? The way they intended their design to be used was not the way it ended up being used. It would be pretty fair to say the '386 was a success not because of its design choices, but despite them.

The designers made the absolutely critical error of relying on descriptor tables anyway – and that's what killed it on '386, and got full segmented memory model support removed from AMD64. Descriptor tables, or mappings from arbitrary small integers as address space keys, "segment selectors", to the definitions of those address spaces, just are not an useful abstraction, and creates an extra unneeded point of failure particularly from a security standpoint.

Also, having the result of the segment mappings be an address in a single virtual address space was probably thought of as a necessity, because there was no (and still is no) named address space support in standard C. Nevertheless, that turned out to be a failure, because later processors had to implement PAE to get over the 4GB hump.

If, in the first place, the segments themselves would have had their own page tables, and not limited to a single virtual address space (which itself was then optionally paged in the '386), only the maximum consecutive memory region and the simultaneously addressable memory would have been limited (to 4GB, and number of simultaneous segments×4GB = 16 GB). So, by designing in an unified virtual address space, they shot themselves in the foot.

Right now, comparing the memory model in OpenCL to the neutered one provided by SYCL just to cater to compilers that do not want or cannot to provide named address space support, is an excellent repeat of the mistakes the i386 designers did in choosing how to implement segmented memory; they re-do the exact same proven erroneous choices (by never being used the way their designers intended), and apparently hope that this time it leads to different results.

I do recommend reading the A. Gozillon, P. Keir: Towards Programmable Address Spaces paper from 2017. I cannot say I wholeheartedly agree or support the choices they describe (and ended up being implemented, and is now available in for example Clang 10), but I did find it informative and interesting; and very relevant to address spaces and segmented memory in general. Note that OpenCL, as discussed in that paper, has a four-level memory hierarchy: "global", "constant", "local", "private". Although this hierarchy is based on the asymmetric multiprocessing hardware OpenCL runs on, the way this memory hierarchy is used, matches pretty darn well with the segmented memory features I'd like to see; call my "ideas" security paranoia and attempts at future-proofing it, to avoid the erroneous design choices already known to be erroneous.

For the topic at hand, assuming it is still something about pointers in embedded/freestanding/nonstandard-nonhosted C and C++ environments, those four also map very well to the address spaces I'd personally love to see in such environments, for security and robustness. The "constant" address space obviously matches the Flash and ROM currently ubiquitous; "local" and "private" match the two types of limited-duration/local-scope variables and objects (that Ataradov called auto), the former being the ones that need to be accessible to the caller or callees if nontrivial function calls are made in this scope, and the latter being those that are completely local to the current scope. "global" matches whatever hardware or physical address space the environment uses, if it has such a single unified address space. The missing one is globally accessible data, possibly split into static mutable objects and variables and dynamically managed mutable objects and variables.
I find it funky that what works fine for OpenCL, is considered "too hard" or "too complex" for the embedded/freestanding environments.

Quote from: ataradov on June 24, 2021, 01:50:00 am

Code: [Select]
int foo(int a) { volatile char zz[10]; zz[a] = 1; return zz[a]; }[...] Call this with foo(100) or foo(-100) and it will overwrite the memory well outside the stack, while SP is perfectly fine.

And this is the most common way stack overflows happen, so any system that does not catch this is not worth considering.

If by system you include both the compiler and the hardware, then I absolutely agree, and to both points in that last sentence.

And will revise my understanding of your actual attitude toward robustness and reliability accordingly. (Not that it matters to anyone but me, but I do find it important to point out my opinions are based on my observations, and when provided with new information, my opinions are likely to change.)

I do suspect that to truly fix this, we do need fundamental changes to C and C++.

Consider hardware that applies a check to each and every effective access using any stack pointer relative addressing modes. The check is a simple bounds check, perhaps written as (EA < base || EA >= limit), where base and limit are internal registers, and when the check triggers, a hardware interrupt is raised, with the effective address available in another internal register. (As discussed, this interrupt can default to just updating base or limit, becoming just stack size instrumentation.)

If the compiler does not add an extra software check similar to one verifying a >= 0 && a < 10 before zz[a] = 1 in ataradov's example, then even the above hardware effective address check would fail to catch

Code: [Select]

int bar(char *p, int b);

int foo(int a, int b)
{
    volatile char  zz[10];
    return bar(zz + a, b);
}

simply because the error occurs when the pointer value is constructed – it is out of bounds for the referred to object – and the pointer p function bar() receives, will not be dereferenced using stack pointer relative addressing anyway (because a single unified address space is used).

A lot of the blame can be placed on the programmer, too. If we wanted bar to be able to detect invalid indexing, we should declare it something like int bar(char *buf, size_t len, size_t index, int b); instead. The standard C library in particular could have better interfaces. It would only need one line of added code to check the value of a is a valid index to the zz array before constructing the pointer zz+a. And so on.

It is not an easy problem to solve; and is basically impossible, if the compiler developers choose not to participate.

For what it is worth, I have not found a combination of options to get gcc-7.5.0, g++-7.5.0, nor clang-10 compiling C or C++, to complain even a peep about my example above. Yet, it is something that immediately sticks in my eye when I look at code, exactly because it so often leads to annoying bugs.

DiTBho · « **Reply #59 on:** June 24, 2021, 04:28:39 pm »

Quote from: Nominal Animal on June 24, 2021, 03:35:56 pm

Note that OpenCL, as discussed in that paper, has a four-level memory hierarchy: "global", "constant", "local", "private"

like the ijvm machine invented and described by Andrew S. Tanenbaum

four-level memory!
- global-pool
- constant-pool
- local (it means local stack, since ijvm is a stack-based machine)
- private

SiliconWizard · « **Reply #60 on:** June 24, 2021, 05:32:12 pm »

Quote from: ataradov on June 24, 2021, 01:50:00 am

Quote from: SiliconWizard on June 24, 2021, 12:55:44 am
Ah, that's not true, at least on RISC-V
So what is the problem overflowing it from the other side? Or really any side you want with the code like this:

Code: [Select]
int foo(int a) { volatile char zz[10]; zz[a] = 1; return zz[a]; }
And the resulting assembly:

Code: [Select]
0000000000000000 <foo>: 0: 1141 addi sp,sp,-16 2: 081c addi a5,sp,16 4: 953e add a0,a0,a5 6: 4785 li a5,1 8: fef50823 sb a5,-16(a0) // OOPS c: ff054503 lbu a0,-16(a0) 10: 0141 addi sp,sp,16 12: 0ff57513 zext.b a0,a0 16: 8082 ret
Call this with foo(100) or foo(-100) and it will overwrite the memory well outside the stack, while SP is perfectly fine.

And this is the most common way stack overflows happen, so any system that doers not catch this is not worth considering.

You're mixing two things. They may be equally bad, but two different things nonetheless.

The first thing is the typical stack overflow, which I was talking about. That would come from using more stack than is available, usually due to a greater call depth than expected and/or too much data allocated on the stack within one particular function.

What you are showing here is just what we call a buffer overflow, and more generally speaking, is writing (or reading) at a memory location that is not supposed to be accessed by the piece of code of interest. It can happen in all kinds of situations, not just when said memory is supposed to be a "stack".

You mentioned it probably because "buffer overflows" are a very typical and very well known software security issue, but this is a separate issue from pure stack overflows.

SiliconWizard · « **Reply #61 on:** June 24, 2021, 05:37:38 pm »

Quote from: brucehoult on June 24, 2021, 01:10:55 am

Quote from: ataradov on June 23, 2021, 10:58:46 pm
But auto variables are stored on the stack, and they should be accessed by regular instructions.

On modern machines with a sufficient number of registers, auto variables are stored in registers UNLESS they have their address taken.

If you run out of registers then some auto variables may be on the stack. The function that declares them knows this, and can use the special stack access instructions.

The difficulty is if the address is taken and passed to another function. Then the other function that uses the address has no way of knowing that special stack access instructions are needed.

This is a rather C-centric problem. Many other programming languages can't take the address of an auto variable.

Yes, but this doesn't make it a C-centric problem.
Many languages don't allow directly taking the address of a variable, be it on the stack or anywhere else. Some only on the stack. But that's from a programmer's POV. Many other languages still allow calling functions on local variables passed "by reference", which is essentially getting an address to it behind the scenes. So that's the same. It's just that said address is not directly accessible to the programmer.

Nominal Animal · « **Reply #62 on:** June 24, 2021, 07:47:58 pm »

Quote from: SiliconWizard on June 24, 2021, 05:32:12 pm

stack overflow [versus] buffer overflow [on a stack based buffer]

Very good point; I missed that myself.

Perhaps it is a good idea to remind oneself that on small microcontrollers with limited RAM, heap and stack are typically the opposite ends of a single continuous block of RAM. Dynamic memory allocations reduce the space left for the stack (unless they use a hole left from an earlier allocation since freed), so basically we have this waterline that varies at runtime (indicating the end of currently allocated dynamic memory with the hightest address) that the stack must not cross.

One reason runtime heuristics like stack canaries have such a bad time detecting this before the device has already crashed and pooped all over, is that that waterline does not stay constant, it moves (if any dynamic memory allocations are done), and it could be either a dynamic memory allocation or the stack growing that caused the waterline to be crossed.

Now, add a nasty buffer overrun – especially the kind that does not just fill an array over its allocated size, but nefariously accesses/modifies a single byte or a group of buffer entries way past the end (or the beginning) of the buffer – and you get the kind of bugnest that can cause one to decide to switch to woodworking. At least there you get to use a hammer on any bugs you see. Canaries are rather unlikely to happen to be exactly where that access ended up modifying memory, so may not help at all.

(That said, off by one errors seem to be the most common buffer overrun cases, i.e. overwriting a byte/int just preceding or immediately succeeding the intended object. Those are relatively easy to catch. But the nasty ones are the jumpy ones, as they can be very hard to spot in the code. Integer promotion causing sign extension on something that was intended to be an unsigned value can be very hard to spot, and if they occur at an index calculation, the end result can be way off. This is one reason you'll see my own code using way more explicit casts than what are technically required; since the casts typically only cost human observation and do not generate extra machine code, I consider it an appropriate way to try and avoid some of those nasty kangaroo indexing bugs. A semi-related case in point: how many C programmers know or care that if they happen to have a char or int c, the proper way to test in a hosted environment if c is a whitespace character, is NOT isspace(c), but isspace((unsigned char)c)?)

On an embedded architecture, it would be rather nice to have that waterline address in a special register, even one that is relatively slow to access and update, if the stack pointer crossing it would cause a hardware interrupt. It would not help with the buffer under/overrun/overflow bugs, but it would make the stack waterline crossing detection at runtime, deterministic.

I can even imagine/describe a couple of C programming patterns (admittedly using setjmp()/longjmp() which I do not like at all to use) that could set up a safe state to revert to if a waterline crossing event were to occur, so that a reboot or crash could actually be avoidable in many situations. (It won't complete/revert I/O done meanwhile, so it is more about cancelling computational rather than I/O work when that work cannot be done with the currently available stack space.)

SiliconWizard · « **Reply #63 on:** June 24, 2021, 09:00:25 pm »

Quote from: Nominal Animal on June 24, 2021, 07:47:58 pm

Perhaps it is a good idea to remind oneself that on small microcontrollers with limited RAM, heap and stack are typically the opposite ends of a single continuous block of RAM. Dynamic memory allocations reduce the space left for the stack (unless they use a hole left from an earlier allocation since freed), so basically we have this waterline that varies at runtime (indicating the end of currently allocated dynamic memory with the hightest address) that the stack must not cross.

Yes, that is the usual layout.

I tend to avoid dynamic allocation on embedded stuff. But when I have to use it, here is what I do: I write a linker script so as to reserve space for the stack. It exports a symbol with the lowest stack address. Then I implement the _sbrk() function so that dynamic allocations can never overflow into the stack. Such that if a dynamic allocation would get into the stack, it will just fail (returning a NULL).

Reserving the stack in the linker script also prevents static allocation from decreasing the usable stack size.

Drawback of this scheme is that of course, now you have a fixed reserved stack space that can't be used for anything else, but I wouldn't trade this for the ability to allocate more heap if not all stack is used, or conversely. Way too slippery.

Of course, this scheme prevents heap allocation from eating the reserved stack, but it doesn't prevent the stack from overflowing. And this is where an hardware-based check would be useful.

brucehoult · « **Reply #64 on:** June 24, 2021, 09:43:56 pm »

Quote from: SiliconWizard on June 24, 2021, 05:37:38 pm

Many languages don't allow directly taking the address of a variable, be it on the stack or anywhere else. Some only on the stack. But that's from a programmer's POV. Many other languages still allow calling functions on local variables passed "by reference", which is essentially getting an address to it behind the scenes. So that's the same. It's just that said address is not directly accessible to the programmer.

Which ones, that anyone uses today? Fortran?

Note that neither Pascal's "var" nor Ada's "in out" require by reference.

Java, Python etc don't have address-taking.

DiTBho · « **Reply #65 on:** June 25, 2021, 08:09:07 am »

Quote from: brucehoult on June 24, 2021, 09:43:56 pm

Which ones, that anyone uses today? Fortran?
Note that neither Pascal's "var" nor Ada's "in out" require by reference.
Java, Python etc don't have address-taking.

Technically, Fortran is mandatory in a couple of Linux distributions.

Code: [Select]

# mandatory languages
enable-languages += c,c++,fortran

I don't know what for, I know that as an administrator I have to spend a lot of my time building gcc with it enabled and properly patched.

Both the rootfs for the Jetson and Coral dev-clusters have dependencies with fortran

Speaking about things I have to support, it appears that PHP requires var by reference in functions that require "in out".

Code: [Select]

function do_foo
(
   &$core      /* in out */
)

I don't know how PHP interpreters are implemented and invoked by web-server (e.g. apache2 PHP-mod), but I have here a PHP compiler written by a crazy guy from a company I collaborate with, it's quite a personal project, but two months ago he started using it for some things in production. That's really crazy.

I hope no one follows his idea, but who knows?

Veketti · « **Reply #66 on:** June 25, 2021, 08:29:27 am »

These pointers have been like voodoo to me and never really had to get involved with them. So far I’ve been managing with functions returning values. That is easier to understand. But I’m starting to get the hang of it, because of you. However is there a case when you should use function return instead of pointers?

Then about the different variable names in caller and callee. So if caller sends Mike to callee and in callee it is called Tiffany. It doesn’t matter as Mike and Tiffany has the same address? Let’s say Mike’s memory address is 7, we pass just address 7 to callee and don’t care about the names we call them. Did I understood right? This is confusing, why would you do that, if you’re not meaning to confuse.

Then regarding this:

Code: [Select]

a[b] == *(a + b) == *(b + a) == b[a]
Lets assume

Code: [Select]

a{11,22,33,44} and b==3 then “a[b]” a[3] ==33, but how come b[a] is equal? Does not make sense to me, please explain?

Then if it’s ok to bring volatile here as well. If I have global variable which is used in two threads, I must declare it to be volatile. However is there a case that global variable shouldn’t be declared as volatile? And could they always be volatile as default?

Thanks again for your help.

Nusa · « **Reply #67 on:** June 25, 2021, 10:31:42 am »

Quote from: Veketti on June 25, 2021, 08:29:27 am

These pointers have been like voodoo to me and never really had to get involved with them. So far I’ve been managing with functions returning values. That is easier to understand. But I’m starting to get the hang of it, because of you. However is there a case when you should use function return instead of pointers?

Then about the different variable names in caller and callee. So if caller sends Mike to callee and in callee it is called Tiffany. It doesn’t matter as Mike and Tiffany has the same address? Let’s say Mike’s memory address is 7, we pass just address 7 to callee and don’t care about the names we call them. Did I understood right? This is confusing, why would you do that, if you’re not meaning to confuse.

Then regarding this:
Code: [Select]
a[b] == *(a + b) == *(b + a) == b[a]
Lets assume
Code: [Select]
a{11,22,33,44} and b==3 then “a[b]” a[3] ==33, but how come b[a] is equal? Does not make sense to me, please explain?

Then if it’s ok to bring volatile here as well. If I have global variable which is used in two threads, I must declare it to be volatile. However is there a case that global variable shouldn’t be declared as volatile? And could they always be volatile as default?

Thanks again for your help.

Actually, in your example, a[3] is 44, not 33. C is zero-indexed, even if the real world likes to start counting at 1.
Lets say that a points to memory address 1000, and b is 3. a[3], *(a+b),*(1000+3) are equivalent, no? Ditto for b[a], *(b+a), *(3+1000). The address is just math under the surface, and it's commutative because of math.

As for volatile, that's a keyword that tells the compiler that it can't make assumptions about the value of the variable when optimizing -- it forces an actual check of the memory value every time instead of reusing a register value. If you want an analogy....if you're the only one using a blackboard for data, you don't bother looking at the blackboard if you remember what you wrote. But if more than one person is using the blackboard, what you wrote may have been changed by the other guy when you weren't looking. So you have to look every time to get the current value.

Veketti · « **Reply #68 on:** June 25, 2021, 11:45:12 am »

Ah, yes of course 44, that was brainfart from me, I forgot it starts from 0.
Thanks for your explanation.

Siwastaja · « **Reply #69 on:** June 25, 2021, 01:33:03 pm »

a[ b ] being equivalent to b[ a ] is most often just a funny remark, I don't remember ever seeing actual use for this. After all, [] is eye candy making things more readable, and idx isn't readable. But I'm sure there's some obscure use for this I haven't seen.

Nominal Animal · « **Reply #70 on:** June 25, 2021, 04:33:15 pm »

Quote from: Veketti on June 25, 2021, 08:29:27 am

However is there a case that global variable shouldn’t be declared as volatile? And could they always be volatile as default?

Declaring a variable volatile is always safe, just potentially inefficient.

You see, the C standards define volatile as

Quote

Accesses to volatile objects are evaluated strictly according to the rules of the abstract
machine.

and points out in a footnote that

Quote

An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant.

Indeed, some C compilers did do just that.

A core method current C compilers generate much more efficient code, is that if an object is not examined, and it is not volatile, its value does not matter. (This is also why you will see all memory-mapped I/O register objects in C and C++ declared volatile. If they are not, it is a bug.)

The way I define volatile may not be exactly correct (in the language lawyer sense), but it is very useful intuitive definition and correct in the real world: it tells the compiler that the object may be concurrently modified by some other code the compiler does not know about, and therefore the compiler must not, is not allowed to, make any assumptions. Without volatile, an assumption a C compiler can make, is for example that if object foo is not modified by any code the compiler knows about between sequence points X and Y, the compiler can reuse the value of foo it had at sequence point X at the later sequence point Y.

For example, if you have say

Code: [Select]

double doh(const double *const xref, const double *const yref, const double *const zref)
{
    double  result;
    result  = (*xref) * (*yref);
    do_something_slow_1();
    result += (*xref) * (*zref);
    do_something_slow_2();
    result += (*yref) * (*zref);
    do_something_slow_3();
    return result;
}

a C compiler is free to generate the same machine code as it would for

Code: [Select]

double doh(const double *const xref, const double *const yref, const double *const zref)
{
    const double x = *xref;
    const double y = *yref;
    const double z = *zref;
    do_something_slow_1();
    do_something_slow_2();
    do_something_slow_3();
    return x*y + x*z + y*z;
}

only because the pointers do not point to volatile doubles, and result is only observable within its local scope (and not in do_something_slow_n() functions).

If the pointers were declared as const volatile double *, then the compiler would NOT be allowed to do this: it would have to dereference the pair of pointers between the calls to do_something_slow_n() functions, to acquire the values of the referred to objects without "caching" them across sequence points.

To see why volatile matters, just consider another thread modifying the values that xref, yref, and zref point to, during the calls to the the do_something_slow_n() functions. The result you obtain from the function call then depends on whether you declare the values the pointers point to volatile or not. (Declaring the pointer variable itself volatile, say const double *const volatile xref, would be silly, because it'd tell the compiler that the pointer may be modified by some unseen code.)

In all cases, having the volatile there means the compiler will follow the C standard abstract machine model more strictly, so if you ever find code that behaves correctly without volatile, and incorrectly with volatile, then that code is strange and very suspect indeed; it must rely on the compiler to generate the code in some specific way, regardless of what the C standard says the compiler is allowed or should do in that situation. Bad, bad code, that; needs a rewrite for sure.

The final wrinkle is exactly what a sequence point is in the C standard. Fortunately, the standards have an informal annex (so not "this is what it is", but "we the standard writers believe that the sequence points are these, but if the text of the standard disagrees, then the text of the standard is right and this list wrong") stating that sequence points are:

Between the evaluations of the function designator and actual arguments in a function
call and the actual call
Between the evaluations of the first and second operands of logical AND (&&), logical OR (||), and the comma operator (,)
Between the evaluations of the first operand of the conditional ? : operator and
whichever of the second and third operands is evaluated
The end of a full declarator
Between the evaluation of a full expression and the next full expression to be
evaluated. (Full expressions being an initializer that is not part of a
compound literal, the expression in an expression statement, the
controlling expression of a if or switch selection statement, the
controlling expression of a while or do statement, each of the (optional)
expressions of a for statement, and the (optional) expression in a return
statement.)
Immediately before a library function returns
After the actions associated with each formatted input/output function conversion
specifier
Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function

according to the final published draft of the C11 standard, also known as n1570.pdf. Sequence points themselves are just the concept of how the C standard defines the order of effects. Between two sequence points, effects or side effects can occur in whatever order; but generally speaking, the sequence points are defined such that each useful effect or observable result or side effect is nicely bracketed between two sequence points.

Nominal Animal · « **Reply #71 on:** June 25, 2021, 06:27:06 pm »

I anticipate that there might be some discussion looming whether a C compiler is allowed to generate the same code for the two doh() functions I showed.

Instead of getting bogged down in language-lawyerism, let's expand it a bit into a complete example we can compile and examine:

Code: [Select]

static volatile int  n = 0;

void do_something_slow_1(void) { n += 1; }
void do_something_slow_2(void) { n += 2; }
void do_something_slow_3(void) { n += 3; }

static inline double doh1i(const double *const xref, const double *const yref, const double *const zref)
{
    double  result;
    result  = (*xref) * (*yref);
    do_something_slow_1();
    result += (*xref) * (*zref);
    do_something_slow_2();
    result += (*yref) * (*zref);
    do_something_slow_3();
    return result;
}

static inline double doh2i(const double *const xref, const double *const yref, const double *const zref)
{
    const double x = *xref;
    const double y = *yref;
    const double z = *zref;
    do_something_slow_1();
    do_something_slow_2();
    do_something_slow_3();
    return x*y + x*z + y*z;
}

double doh1(const int ix, const int iy, const int iz)
{
    const double x = ix, y = iy, z = iz;
    return doh1i(&x, &y, &z);
}

double doh2(const int ix, const int iy, const int iz)
{
    const double x = ix, y = iy, z = iz;
    return doh2i(&x, &y, &z);
}

Using clang-10 -Wall -O2 -std=c11 -c doh.c this compiles to

Code: [Select]

doh1:                           doh2:
    cvtsi2sd %edi, %xmm1            cvtsi2sd %edi, %xmm1
    cvtsi2sd %esi, %xmm0            cvtsi2sd %esi, %xmm0
    cvtsi2sd %edx, %xmm2            cvtsi2sd %edx, %xmm2
    addl     $1, n(%rip)            addl     $1, n(%rip)
    movapd   %xmm1, %xmm3           addl     $2, n(%rip)
    mulsd    %xmm0, %xmm3           addl     $3, n(%rip)
    mulsd    %xmm2, %xmm1           movapd   %xmm1, %xmm3
    addsd    %xmm3, %xmm1           mulsd    %xmm0, %xmm3
    addl     $2, n(%rip)            mulsd    %xmm2, %xmm1
    mulsd    %xmm2, %xmm0           addsd    %xmm3, %xmm1
    addsd    %xmm1, %xmm0           mulsd    %xmm2, %xmm0
    addl     $3, n(%rip)            addsd    %xmm1, %xmm0
    retq

You do not need to understand AT&T syntax AMD64 assembly (which has source on the left and destination on the right, opposite to Intel syntax): All you need to know is that the instructions that load the doubles from memory have the -offset(%rsp), %xmmN format, and the slow function calls correspond to the addl $1, n(%rip) instructions.

Simply put, Clang-10 keeps the instruction order basically intact even without the volatile.

GCC-7.5.0 (gcc -Wall -O2 -std=c11 -c doh.c) generates

Code: [Select]

doh1:                           doh2:
    pxor     %xmm2, %xmm2           pxor     %xmm2, %xmm2
    movl     n(%rip), %eax          movl     n(%rip), %eax
    pxor     %xmm3, %xmm3           pxor     %xmm1, %xmm1
    pxor     %xmm1, %xmm1           pxor     %xmm3, %xmm3
    cvtsi2sd %edi, %xmm2            cvtsi2sd %edi, %xmm2
    addl     $1, %eax               addl     $1, %eax
    cvtsi2sd %edx, %xmm3            cvtsi2sd %esi, %xmm1
    movl     %eax, n(%rip)          movl     %eax, n(%rip)
    cvtsi2sd %esi, %xmm1            cvtsi2sd %edx, %xmm3
    movl     n(%rip), %eax          movl     n(%rip), %eax
    addl     $2, %eax               addl     $2, %eax
    movl     %eax, n(%rip)          movl     %eax, n(%rip)
    movl     n(%rip), %eax          movl     n(%rip), %eax
    addl     $3, %eax               addl     $3, %eax
    movl     %eax, n(%rip)          movl     %eax, n(%rip)
    movapd   %xmm2, %xmm4           movapd   %xmm2, %xmm0
    mulsd    %xmm3, %xmm2           mulsd    %xmm3, %xmm2
    mulsd    %xmm1, %xmm4           mulsd    %xmm1, %xmm0
    mulsd    %xmm3, %xmm1           mulsd    %xmm3, %xmm1
    movapd   %xmm2, %xmm0           addsd    %xmm0, %xmm2
    addsd    %xmm4, %xmm0           addsd    %xmm1, %xmm2
    addsd    %xmm1, %xmm0           movapd   %xmm2, %xmm0
    ret                             ret

which is basically identical for both, aside from register naming differences.

Language-lawyerism aside, it means that if you use GCC-7.5.0, with this kind of a code pattern, what I described in my previous post will happen to you too:
without volatile, the two versions of doh() will generate the same machine code.

The instruction pattern GCC-7.5.0 generates for updating the counter n is
movl n(%rip), %eax
addl $N, %eax
movl %eax, n(%rip)
which annoys the heck out of me. It is not just the sane addl $N, n(%rip) clang-10 uses, and I cannot fathom why; I thought this kind of superfluous register dance was more or less fixed a couple of major versions ago... This is also why I don't trust compilers any further than I examine their output, and is the reason why I use extended inline assembly functions for oddball memory-mapped I/O register accesses: to ensure the exact instruction I want will be used.

Nevertheless, I should be happy, because it backs up my argument. (I'm not, because I don't want to win. I want to help others write better code, and especially to write and show me better code than I myself can write, because I'm selfish and self-centered and only care about winning my past self. That dude was an asshole.)

If we replace const double *const with const volatile double *const, then clang-10 generates

Code: [Select]

doh1:                           doh2:
    cvtsi2sd %edi, %xmm0            cvtsi2sd %edi, %xmm0
    cvtsi2sd %esi, %xmm1            movsd    %xmm0, -8(%rsp)
    movsd    %xmm0, -8(%rsp)        xorps    %xmm0, %xmm0
    movsd    %xmm1, -16(%rsp)       cvtsi2sd %esi, %xmm0
    xorps    %xmm0, %xmm0           cvtsi2sd %edx, %xmm1
    cvtsi2sd %edx, %xmm0            movsd    %xmm0, -16(%rsp)
    movsd    %xmm0, -24(%rsp)       movsd    %xmm1, -24(%rsp)
    movsd    -8(%rsp), %xmm0        movsd    -8(%rsp), %xmm1
    mulsd    -16(%rsp), %xmm0       movsd    -16(%rsp), %xmm0
    addl     $1, n(%rip)            movsd    -24(%rsp), %xmm2
    movsd    -8(%rsp), %xmm1        addl     $1, n(%rip)
    mulsd    -24(%rsp), %xmm1       addl     $2, n(%rip)
    addl     $2, n(%rip)            addl     $3, n(%rip)
    addsd    %xmm0, %xmm1           movapd   %xmm1, %xmm3
    movsd    -16(%rsp), %xmm0       mulsd    %xmm0, %xmm3
    mulsd    -24(%rsp), %xmm0       mulsd    %xmm2, %xmm1
    addsd    %xmm1, %xmm0           addsd    %xmm3, %xmm1
    addl     $3, n(%rip)            mulsd    %xmm2, %xmm0
    retq                            addsd    %xmm1, %xmm0
                                    retq

the difference being that now doh1() has exactly the behaviour we/I/the author intended.

Like I claimed, volatile stops clang-10 from generating the same code it does for doh2(). This means we can use volatile as I described in my previous post to control what kind of assumptions the compiler can make. Here, we want to do the slow calls in between accesses to the pointed-to doubles, so we need to tell the compiler the pointed-to objects are volatile, and it does what we expect it to. Nice.

As I'm writing this, I'm seriously considering switching from gcc-7.5.0 to clang-10 on at least AMD64. I didn't realize before that clang-10 output is that much better.

Anyway, here is the GCC-7.5.0 output, when function parameters are declared as const volatile double *const ref:

Code: [Select]

doh1:                           doh2:
    pxor     %xmm0, %xmm0           pxor     %xmm0, %xmm0
    subq     $40, %rsp              subq     $40, %rsp
    movq     %fs:40, %rax           movq     %fs:40, %rax
    movq     %rax, 24(%rsp)         movq     %rax, 24(%rsp)
    xorl     %eax, %eax             xorl     %eax, %eax
    cvtsi2sd %edi, %xmm0            cvtsi2sd %edi, %xmm0
    movsd    %xmm0, (%rsp)          movsd    %xmm0, (%rsp)
    pxor     %xmm0, %xmm0           pxor     %xmm0, %xmm0
    movsd    (%rsp), %xmm1          movsd    (%rsp), %xmm2
    cvtsi2sd %esi, %xmm0            cvtsi2sd %esi, %xmm0
    movsd    %xmm0, 8(%rsp)         movsd    %xmm0, 8(%rsp)
    pxor     %xmm0, %xmm0           pxor     %xmm0, %xmm0
    cvtsi2sd %edx, %xmm0            movsd    8(%rsp), %xmm1
    movsd    %xmm0, 16(%rsp)        cvtsi2sd %edx, %xmm0
    movsd    8(%rsp), %xmm0         movsd    %xmm0, 16(%rsp)
    movl     n(%rip), %eax          movapd   %xmm2, %xmm0
    mulsd    %xmm0, %xmm1           movsd    16(%rsp), %xmm3
    addl     $1, %eax               movl     n(%rip), %eax
    movl     %eax, n(%rip)          mulsd    %xmm1, %xmm0
    movsd    (%rsp), %xmm0          mulsd    %xmm3, %xmm2
    movsd    16(%rsp), %xmm2        mulsd    %xmm3, %xmm1
    movl     n(%rip), %eax          addl     $1, %eax
    mulsd    %xmm2, %xmm0           movl     %eax, n(%rip)
    addl     $2, %eax               movl     n(%rip), %eax
    movl     %eax, n(%rip)          addsd    %xmm2, %xmm0
    addsd    %xmm0, %xmm1           addl     $2, %eax
    movsd    8(%rsp), %xmm0         movl     %eax, n(%rip)
    movsd    16(%rsp), %xmm2        movl     n(%rip), %eax
    movl     n(%rip), %eax          addsd    %xmm1, %xmm0
    mulsd    %xmm2, %xmm0           addl     $3, %eax
    addl     $3, %eax               movl     %eax, n(%rip)
    movl     %eax, n(%rip)          movq     24(%rsp), %rax
    movq     24(%rsp), %rax         xorq     %fs:40, %rax
    xorq     %fs:40, %rax           jne      .L12
    addsd    %xmm1, %xmm0           addq     $40, %rsp
    jne      .L8                    ret
    addq     $40, %rsp          .L12:    
    ret                              call    __stack_chk_fail@PLT
.L8:                            
    call    __stack_chk_fail@PLT

If we were to pore through it, we'd find that doh1() indeed does the slow function calls (inlined) in between (re)loading the double values and using the recently (re)loaded values for the product it adds to the sum; exactly as we/I/the author apparently intended it to work.
doh2() now has completely different machine code compared to doh1(), and indeed does the slow function calls (inlined) in a batch near the end of the function.

Simply put, volatile made GCC generate machine code for doh1() with exactly the order of side effects (increments of n) we want, exactly as I claimed.

I'm just not happy at how inefficient code GCC-7.5.0 generates here, at all. It does not detract from my argument, and kinda even reinforces the idea that no matter what the standards say, you're better off examining the actual output of your tools.

TL;DR: This long-ass examination of GCC-7.5.0 and Clang-10 output on AMD64, proves that even if my understanding of the C standard is wrong, the example case shown here shows that what I described does happen in real life, as it happens exactly as I described to this particular example code on AMD64. I could still be wrong, but even if I am, it means the situation is even more complex in reality, and while my understanding may need fixing, it does apply to at least this here case exactly.

SiliconWizard · « **Reply #72 on:** June 25, 2021, 06:59:49 pm »

Quote from: brucehoult on June 24, 2021, 02:42:38 am

Wow gcc could use some improvement there.

The second instruction is unnecessary, as the 2nd instruction could just be add a0,a0,sp and change the offsets on sb and lbu to the more natural 0. And the zext.b is completely unnecessary as the byte was loaded unsigned.

Yep. I tried this with latest GCC 11.1.0. The unnecessary zext.b is not generated, but the rest of the sequence is the same.

Quote from: brucehoult on June 24, 2021, 02:42:38 am

I guess you compiled this for rv64 as I got -12 offset on the sb/lbu in rv32.

Yep, that's for RV64. This is what I get for RV32:

Code: [Select]

foo:
        addi    sp,sp,-16
        addi    a5,a0,16
        add     a0,a5,sp
        li      a5,1
        sb      a5,-12(a0)
        lbu     a0,-12(a0)
        addi    sp,sp,16
        jr      ra

Anyway. This illustrates what I said earlier: the 'sp' register is a good indicator of how much stack is being used.

As to again what I said above: do not confuse "stack overflows" with "stack-based buffer overflows". And, do not confuse monitoring and protection.

gf · « **Reply #73 on:** June 25, 2021, 10:03:12 pm »

Quote

The instruction pattern GCC-7.5.0 generates for updating the counter n is
movl n(%rip), %eax
addl $N, %eax
movl %eax, n(%rip)
which annoys the heck out of me.

It obviously does not generate the movl,add,movl sequence if the variable n is not volatile. Seems to me that gcc wants to clearly separate the volatile fetch and the volatile store ¹⁾, by avoiding memory read-modify-write instructions. Still I'm not sure whether this really makes a difference, unless (non-atomic) RMW instructions would lead to a different bus access pattern than separate fetch + store instructions. Does it, possibly?

¹⁾ "Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine..."

Nusa · « **Reply #74 on:** June 25, 2021, 10:27:27 pm »

Do you really think the essay length discussions of compiler details is of any help to the original poster who is still asking very basic questions? It's sort of like talking about different tire treads to someone who is still learning to steer. Overwhelming and mostly irrelevant to what he needs to know right now.

gf · « **Reply #75 on:** June 25, 2021, 10:44:09 pm »

Sorry, just responded to the prevous message w/o reading the whole thread. Seems indeed that it drifted a bit off-topic.

Nominal Animal · « **Reply #76 on:** June 26, 2021, 12:06:19 am »

Quote from: Nusa on June 25, 2021, 10:27:27 pm

Do you really think the essay length discussions of compiler details is of any help to the original poster who is still asking very basic questions?

When it shows step by step techniques on how to find out the answer for oneself, I do.

Have you noticed that my essay length answers do not just state things and tell you to trust me, but that the length is because I show why and how one can check? Because I have zero faith in authority, and not much trust in beliefs, yours or mine. You say Einstein, I say boo-hoo. You say keep it short, I say hogwash: it is better to show how to find the answer than to just state the answer and hope they believe you.

Nominal Animal · « **Reply #77 on:** June 26, 2021, 12:11:40 am »

Quote from: gf on June 25, 2021, 10:03:12 pm

¹⁾ "Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine..."

That is exactly the sort of argument I want to avoid.

Put yourself in the developer shoes. Tell me the situation where you want your code to do a three-instruction load-modify-write cycle instead of a single update instruction (especially which on AMD64, happens to be atomic too), when both work without additional external side effects (so no "well if you have a special memory-mapped register that" garbage; those need the inline assembly accessors anyway). Every single case I can think of has something to do with trying to trick something or someone, and none have anything to do with trying to compute or achieve a result efficiently. For the latter, the single update instruction always wins.

Just because the standard says something, does not mean its precise wording is the way it should be done. Reality always wins over theory. It does not matter how the C standard abstract machine works; what matters is whether the code a compiler generates is fit for practical purposes or not.

Consider the C standard the sales pitch for all C compilers. If it delivers what it promises, then all is good. If a compiler does not deliver, it better have a good reason for it. But the sales pitch should never be how you measure things.

brucehoult · « **Reply #78 on:** June 26, 2021, 02:42:57 am »

Quote from: Siwastaja on June 25, 2021, 01:33:03 pm

a[ b ] being equivalent to b[ a ] is most often just a funny remark, I don't remember ever seeing actual use for this. After all, [] is eye candy making things more readable, and idx isn't readable. But I'm sure there's some obscure use for this I haven't seen.

You can use it to make your C code look more like assembly language.

Code: [Select]

char* silly_memcpy(char *a0, char *a1, char *a2){
  int a4; char *a5;

  if (!a2) goto L2;
  a2 = a1 + (int)a2;
  a5 = a0;

 L3:
  a4 = 0[a1];
  0[a5] = a4;
  a5 = a5 + 1;
  a1 = a1 + 1;
  if (a1 != a2) goto L3;

 L2:
  return a0;
}

This compiles to:

Code: [Select]

00000000 <silly_memcpy>:
   0:   ca19                    beqz    a2,16 <.L2>
   2:   962e                    add     a2,a2,a1
   4:   87aa                    mv      a5,a0

00000006 <.L3>:
   6:   0005c703                lbu     a4,0(a1)
   a:   00e78023                sb      a4,0(a5)
   e:   0785                    addi    a5,a5,1
  10:   0585                    addi    a1,a1,1
  12:   feb61ae3                bne     a2,a1,6 <.L3>

00000016 <.L2>:
  16:   8082                    ret

Sorry.

PlainName · « **Reply #79 on:** June 26, 2021, 11:19:14 am »

Quote

One reason runtime heuristics like stack canaries have such a bad time detecting this before the device has already crashed and pooped all over

Aren't such things meant to be debug/test tools, much like asserts, that highlight issues during development but aren't intended to be a cure for anything in production? In that context, a canary is useful after a crash because it tells you that some particular memory got corrupted. Sure, there may be several potential culprits, but you now know what you're looking for, and knowing it happened just after some particular operation is a massive clue. In production, like asserts, it's pretty useless since all you can really do is a reset rather than a halt.

Nominal Animal · « **Reply #80 on:** June 26, 2021, 02:26:34 pm »

Quote from: dunkemhigh on June 26, 2021, 11:19:14 am

Aren't such things meant to be debug/test tools, much like asserts, that highlight issues during development but aren't intended to be a cure for anything in production?

What is "a cure for [anything] in production"? I do not believe such a thing exists, just like I do not think security is something you can add on to a design. You design it in in the first place; and if it is sick or needs a cure, you need a redesign. Adding something on top to "cure" the design is a fallacy, and will fail.

In my view, in production, you either prevent a problem, or detect a problem; anything else is useless.

Heuristics like canaries are post-mortem indicators without false positives. They can help pinpoint the mechanism of the problem (because if a canary is dead, showing positively a problem, then it for sure was involved), but as problem indicators they are of dubious utility (because a canary can be dead, but look perfectly alive).

The only reason I would use stack canaries in production, was if I would have access to core dumps from problem cases, or if I had nothing better available.

Ataradov's objection to hardware stack pointer tracking was that existing techniques like canaries or its extension, pre-filling the stack with a detectable pattern, is trivially implemented and sufficient for the use cases he can see.

I strongly disagreed, because what I want in production, is to catch and prevent the stack overflow in the first place. (In later messages, after SiliconWizard pointed out both Ataradov and I were conflating buffer under/overruns with stack overflow – the first one being accesses outside of their intended target object, and the second one needing/using more stack than is available –, I pointed out that buffer overflows need support from the compiler, because the current C and C++ rules are such that they currently do not complain about obvious "this is very possibly outside the target object, and therefore a possible bug" code patterns that stick in my eye like a dirty thumb.)

Seeing that you seem to share how Ataradov sees the situation, forces me to think hard why that is.

To me, the situation in production is simple: A) Heuristics are useless for post-mortems because we don't get core dumps. B) Heuristics can detect some problem situations, but not all; and the likelihood of a problem being detected is related to the computational effort spent. This means that do make a robust thing, we must always balance computational efficiency and the statistical likelihood of detecting problems immediately when they occur: we cannot have both.

And that is the issue I have. I do want both. I need both efficient code, but also detect all problems that are possible to detect, reliably, when they do occur.

To anyone designing a commercial product, that is just one of the issues being balanced. There, not having both is a practical limitation, and accepting it as such, and moving on to solving other problems, is perfectly acceptable, even clearly preferable attitude compared to mine. If I was a CTO or product line head honcho, I would always hire Ataradov over myself. And that is saying a lot, if you have any sort of a clue of how capable a developer I believe myself to be.

My own primary motivation on anything related to this stuff is to make sure that the stuff we have in the future is not just "better", but more robust than the shit we have now.

I know, for a fact, that robustness is something undesirable from business point of view; planned obsolesence is not malice, but simply an obviously working business strategy, one of the few ones that you can mathematically show from basic principles will work.
So, when you see me rail against long-term development efforts being directed using business rules, I am railing against choosing to race to the bottom-quality, maximum-profit products. I see business (or more precisely, market competition) as one absolutely required part of a functioning society; but it too must balance/compete against humanitys own long term interests. Thus, I do not object to business at all, just to using business as the yard stick here.

Elsewhere, I have described my own long term efforts, currently focusing at a replacement base library for systems programming in C (and as a variant, the base subset of functionality needed for the C/C++ freestanding environment used to develop low-resource embedded targets like microcontrollers). It is slow going, exactly because it is not a business proposal but a research project. It will not stop a developer using pointer expressions that can scribble over unrelated memory, because the C and C++ we have right now just does not even detect many of such patterns, and I do not expect the compiler developers to be interested in helping with that either; but if my shenanigans actually work, then the replacement base library might just induce developers to use patterns that avoid those problematic cases completely.

One surprisingly hard problem I'm chewing at is how to show that passing more information, often theoretically unneeded information, on the object being accessed is worth the extra "cost".

(This ought to be interesting to even beginners at C.)

Consider languages like Fortran and Python that support array slicing. That is, instead of just passing a pointer to a consecutive sequence of elements, they can actually pass a subset or slice from an array. The way it is used in high-performace computing in Fortran shows that while it does have overhead (more parameters passed per function call) shows that it is worth it, because code of average complexity doing this stuff tends to be more efficient if written in Fortran than when written in C.

At some point over a decade ago, I investigated this at the machine code level, read some computer science papers about efficient operations on 2D matrices, and discovered that passing the full description of how the matrix data is to be accessed, makes average complexity code more efficient and robust.

Here are the two structures used for double-precision floating-point data:

Code: [Select]

struct owner {
    long    refcount;
    size_t  size;  /* In doubles */
    double  data[];
};

struct matrix {
    struct owner *owner;
    int           rows, cols;
    long          rowstep, colstep;
    double       *origin;
};

Given struct matrix m, the expression used by the underlying code that implements the basic matrix operations to access the matrix element on row r, column c, is (m.origin + c*m.colstep + r*m.rowstep) , which evaluates to a pointer to a double. At runtime, r >= 0, c >= 0, r < m.rows, c < m.cols. As an optional consistency check,the pointers m.origin, (m.origin+(m.rows-1)*m.rowstep), (m.origin+(m.cols-1)*m.colstep), and (m.origin+(m.rows-1)*m.rowstep+(m.cols-1)*m.colstep) all must be at or above m.owner->data and below m.owner->data+m.owner->data.size.

You might think that that means the code has to do two multiplications per matrix element access, but that's not true in practice. You see, to optimize the efficiency of matrix operations, and to leverage single-instruction-multiple-data instructions available on many architectures, depending on the exact operation, we'll want to rearrange the data anyway. You see, for all but the smallest fixed-size matrices, the access order and data locality determines how long the operation takes. The arithmetic operations themselves (addition, subtraction, and multiplication in particular) are not the bottleneck; the time needed to access the elements is.
(This also makes this annoying to benchmark. If you use a synthetic microbenchmark, you are reserving the entire cache for this operation only. But, in real life code, the operation is only a part, a single step, in some chain of operations; and overusing cache at one step often means another step has to pay a much higher price for memory access. So, optimizing the heck out of one operation, can easily lead to real world code that performs worse.)

The true strength those structures bring to the programming table, is that now you can have a matrix and its transpose refer to the exact same data. You can even have a vector corresponding to its diagonal elements. Modifying an element in one, modifies the elements in all others, because they refer to the same data in memory. None of them are "secondary" or "views"; each is just as primary as every other matrix referring to the same data. The refcount is optional, and normally records the number of uses of the referred to data, including matrices and temporary uses in elementary calculations. (That is, when a function does say a matrix-matrix multiplication, it starts by incrementing the refcounts of the two owners of the data. After the operation is completed, the refcounts are decremented.)

As an example for single-dimensional arrays, C still does not have a standard function or a really efficient way to repeat a byte pattern at the beginning of an array to the rest of the array –– even though that is exactly what the unused branch of every single memmove() implementation would do!

PlainName · « **Reply #81 on:** June 26, 2021, 02:54:42 pm »

A canary is most useful when the program doesn't crash. It may appear to run fine and you release to production, whereupon it goes wrong. Or it seems to run fine until something strange happens, the cause of which may have occurred 20 minutes previously so you have no idea how or why it went wrong. The canary will tell you it has just done something bad, and then preferably halt the system since continuing wouldn't be useful (and also because if you have a debugger attached you'll drop into that).

As you note, detecting a problem like this in production is only useful in that you can do a reset. What you're trying to do is not have that problem in the first place, but without something like a canary you may never know one is lurking. That's why I suggest that canaries, like asserts, are entirely a test tool and not a production save-our-ass thing. Core dump or not, production isn't the place to have to deal with it.

Hardware-based protection in production code is a different matter. AFAICS, its use is not to make things recoverable (other than by restarting) but to prevent further Bad Things from happening. It shouldn't be needed, but shit happens and it will make sure the splashes stay in the bowl rather than propagate to the floor and walls, in a manner of speaking.

DiTBho · « **Reply #82 on:** June 26, 2021, 04:00:26 pm »

Quote from: dunkemhigh on June 26, 2021, 02:54:42 pm

shit happens and it will make sure the splashes stay in the bowl rather than propagate to the floor and walls, in a manner of speaking.

This is exactly the philosophy behind the carbon fans used in Arctic airplanes. They can break at some point, but are designed to keep all splinters segregated inside a cage rather than letting them propagate

DiTBho · « **Reply #83 on:** June 26, 2021, 04:11:33 pm »

Quote from: dunkemhigh on June 26, 2021, 11:19:14 am

Aren't such things meant to be debug/test tools, much like asserts, that highlight issues during development but aren't intended to be a cure for anything in production? In that context, a canary is useful after a crash because it tells you that some particular memory got corrupted.

That was a great help when I debugged a PCI-board on an RISC workstation. The hardware comes with a special NVRAM assisted by the firmware where the last PCI-transactions are logged; if something crashes the Linux kernel, you can read the logs and have some good clue.

In my case, there was a bug with the PCI-sATA chip, thanks to this trick I catched and fixed it

Nominal Animal · « **Reply #84 on:** June 26, 2021, 08:55:29 pm »

Quote from: dunkemhigh on June 26, 2021, 02:54:42 pm

Hardware-based protection in production code is a different matter. AFAICS, its use is not to make things recoverable (other than by restarting) but to prevent further Bad Things from happening. It shouldn't be needed, but shit happens and it will make sure the splashes stay in the bowl rather than propagate to the floor and walls, in a manner of speaking.

Emphatic yes, and a tentative no. What you describe is the minimum we know is achievable; so definitely yes. What we really do not know, is what real-world patterns would work with regards to say stack overflow (not due to any programming errors, just needing more stack than is available).

Here is that practical example of mine, more fleshed out this time.

Let's assume we have hardware with three internal registers not directly supported by the instruction set, but available (albeit perhaps a bit "slow") to interrupt code. Two of them define a minimum and maximum address to be used with stack pointer relative addressing, and instructions that manipulate the stack pointer. The hardware compares the effective address (of the stack pointer relative address), and compares to the internal registers. If outside, the effective address is loaded to the third register, and a hardware interrupt is fired.

Let's say we are using freestanding C/C++ similar to many microcontroller development environments currently in use, without any specific compiler modifications or additions.

For the hardware interrupt, we implement a function that keeps two addresses somewhere in RAM not likely to be clobbered by stack or heap operations. One address is used to reset the stack pointer, and the other address is used to pass control to after the interrupt returns. (Since this kind of interrupt cannot use RAM, there likely is a fourth register containing the address to jump to to "return" from the interrupt, so that the body of the interrupt function would simply set the stack pointer and that register to the stored values.)

By default, the runtime environment (bootloader) would set the stack reset address to beginning of RAM, and the jump target to a hardware reset routine. This means that without any other changes, if the stack grows out of bounds, the hardware gets reset. No heuristics, guaranteed correct operation. (However, this will not be able to catch second-hand buffer overflow issues, ie. when other code accesses beyond an object that is stored on stack, because such accesses, as Ataradov showed, do not use stack pointer relative addressing. The basic pattern is passing the address of the object on stack to a function, and that function doing an access beyond the object it is given.)

However, now consider a microcontroller firmware interfacing to some data acquisition circuitry or sensors, using say a SCPI compatible protocol. (Processing such protocol messages is where I typically see stack use exceeding the available.)

At the beginning of processing an incoming control message, the firmware stores the current stack overflow address pair, and sets them to a specific instruction in the same C scope, and the current stack pointer. Then, it starts parsing the control message data. After the message has been parsed, the original stack overflow address pair is restored, and execution continues as normal.

Here is where the Useful Magic could happen.

Let's say that because of available stack space limits, the function call chain trying to parse the message runs out of available stack space, and triggers the stack overflow interrupt. (Here, we assume that the interrupt occurs after the calculation of effective address, but before the access to that address is performed, so nothing unrecoverable like stack spilling over to non-stack RAM has yet happened.)

If the call chain has done no changes to globally observed state –– and this being a message parsing call chain, there either are no such changes, or the changes can be safely ignored –– then this mechanism reverts the essential machine state back to the one at beginning of the parsing, so the code can simply discard the message and respond with "Sorry; I'm so low on stack, I cannot process that message right now."

See? This is the second pattern I personally would use with such hardware. The first would be that hard restart on stack overflow. Third would be to use it for dynamic instrumentation, where the effective address just pushes the limit onwards; with some other code regularly checking the stack pointer against the limit, and shrinking the limit if it looks to be "too far". This does not give optimum granularity, but during development, especially with purpose-built nasty testcases, it would be very informative. I'd love to have this too.

And, to repeat, this pattern is not unprecedented in C at all: setjmp()/longjmp() provides basically the same "revert to earlier state" functionality in bog-standard C.
The pair does have a bad reputation, because it is one of the footguns that seems to have maimed more feet than the bugs it has prevented. The most widely known one is probably the privilege escalation in ftp servers by simply interrupting the server side at just the right moment.

I don't know if there are other practical patterns one could use with such hardware stack pointer related effective address checking, but for embedded targets alone, the above two use cases makes it –– in my opinion –– obviously desirable on any and all microcontroller architectures.

Then again, I am not an EE nor have I designed even the simplest core on an FPGA yet, so I don't really know how much effort and resources such effective-address checking and interrupt facilities would require. I personally would happily accept say an additional clock cycle use for stack pointer relative addressing, if necessary for the support; I see the above two use cases so darned useful in practice, if one wants to make more robust embedded devices of this described type. And I want the more robust embedded devices, because we already have enough cheap shit.

DiTBho · « **Reply #85 on:** June 26, 2021, 11:10:14 pm »

Quote from: Nominal Animal on June 26, 2021, 08:55:29 pm

And I want the more robust embedded devices, because we already have enough cheap shit.

That's the problem: cheap! These tools already exist for mission critical CPUs, e.g. Lauterbach TRACE32 costs a lot of money, but offers a couple of power ICE and simulators which can check everything everywhere run-time. It can also be used for dynamic coverage, profiling ... etc

DiTBho · « **Reply #86 on:** June 26, 2021, 11:42:35 pm »

Quote from: Nominal Animal on June 26, 2021, 08:55:29 pm

I don't really know how much effort and resources such effective-address checking and interrupt facilities would require.

Talking about "poor man" solutions, the previously mentioned 68hc11 trick comes from a companion chip used by a company affiliated with Motorola back in the 90s.

Unfortunately this company is no more in business and I wasn't able to contact any retired engineer. Back in 1989, they made a super powerful and relatively not too expensive ICE. I found one "as-is" on eBay, I don't have the software, except a demo program that shows how the ICE can be useful for checking SP. I started reverse engineering the board to find out how it works, only partially understood something, but only found there is a companion chip monitoring the CPU at the bus level and I guess it's able to decode each instruction that uses the stack.

Code: [Select]

(68hc11) is_stack_ok = (address_on_the_bus >= SP_begin) and (address_on_the_bus < SP_end) and is_it_a_stack_operation;

is_it_a_stack_operation = fetch_and_decode the data_bus of the CPU
         for certain op-codes like { push, pop, function_call, ...} the answer is 1, otherwise 0

There are several modules on the ICE, I don't know what they do and how they work, but in particular the companion chip is really a stand-alone hardware "stack boundary checker". Looking at what goes on the bus from the 68hc11, I found it's memory mapped device provided with two 16 bit registers to set SP_begin and addr < SP_end, plus a register to "enable/disable" a pin that is set high when SP is not within its programmed range.

Code: [Select]

0x0800 SP_begin
0x0802 SP_end
0x0804 control
(how the ICE11's companion chip maps its registers)

You can connect its output pin with whatever you want. To a flip-flop with a LED, or to an interrupt pin, or why not to both?

The companion chip can be memory mapped and programmed, so in my setup, I have a 68HC11EVB in expanded mode (the CPU directly addresses its RAM, ROM, and peripherals), the companion chip is memory mapped into the "expansion area" of the 68hc11 like if was a ACIA device (uart); I can load programs from the ACIA0 to programs the boundaries, I can load other programs and run them step-by-step. That's how I discovered how it reacts when it's programmed by the demo application.

----- Can it work with a softcore? -----

I don't know how the ICE11 works internally, I only use it for my purpose and I have only added a latch and a LED, but I can reproduce its behavior for a MIPS32 Softcore.

How?

Code: [Select]

(mips32) is_stack_ok = (EA >= SP_begin) and (EA < SP_end) and (is_SP_involved_with_EA);

is_SP_involved_with_EA = ((ireg(reg) and mask) not_equal_to zero)

On a simple RISC (like MIPS and RISC-v), the load/store stage always comes in the form (reg + offset),
which always ends with a simple EA register, EA = reg + offset, that is then directly passed to the MAR (memory address register,) along the CPU data-path

MAR = EA = reg + offset That's good for the trick!

You set a mask, say register { 29 30 }.

Code: [Select]

0          1          2          3
01234556789012345567890123455678901
00000000000000000000000000000000110

Say you use "reg 30" as stack-pointer and "reg 29" for a push operator; here is how crazy Gcc goes with that stuff, other compilers always use SP + offset, while Gcc first loads SP into another register, does some math with it, then it uses it in load/store, and moral of the story, you have to check from time to time which registry is involved in the operations.

Anyway, supposing you somehow know it's "reg 29"

The ireg(reg) circuit will output

Code: [Select]

00000000000000000000000000000000100

The circuit "is_SP_involved_with_EA":

Code: [Select]

ireg 00000000000000000000000000000000100 <---- in0 which reg is currently used during load/store?
mask 00000000000000000000000000000000110 <---- in1 which reg/regs are supposed to be SP-related?
and  00000000000000000000000000000000100
....
out  00000000000000000000000000000000001 <---- out is the load/store is actually a stack-operation?

Bingo! This stuff doesn't cost many resources, it just costs "something" to properly update the mask *before the stack operation* (good news, it could be integrated with the machine-layer of Gcc), plus a couple of stall-cycles in the load/store unit (only in a pipe-lined design), and you have a circuit that checks if the effective address of a load/store is within its assigned range.

On MIPS, this stuff can be implemented
1) as Coprocessor, and programmed by using special cop-instructions
2) as memory mapped device, similar to the ICE11's companion chip

PlainName · « **Reply #87 on:** June 27, 2021, 12:03:23 am »

Quote

Here is where the Useful Magic could happen.

That's an interesting thought experiment, but I don't think you need hardware for that. Typically, stack grows one way and heap grow the other, but it's usually possible to know where the limit of the stack is. All you need is a quick check of where the stack pointer is pointing vs the known end of the stack, and you can figure out if you don't have room for another call.

Admittedly, hardware would be quicker, but then you're having to clean up after the fact rather than just not get in the mess in the first place.

Looked at another way, if it were heap you'd know if you've got a problem because malloc() will say so. What you're suggesting is, it seems to me, malloc() always returning a pointer but when you try to access the memory you get an access interrupt or exception. So in that context that the stack thing is like you'd say to malloc() "Hey, got an extra 10K I can have?" and it replies OK or not, then you can grab it or.. well, however you want to fail.

Maybe

Quote

And I want the more robust embedded devices, because we already have enough cheap shit.

Well, as DiTBho says, you can apparently have all that if you pay the price. The thing is that cheap shit is why everyone has at least one mobile phone, IoT are ten a penny, etc.

Nominal Animal · « **Reply #88 on:** June 27, 2021, 12:17:59 am »

Quote from: DiTBho on June 26, 2021, 11:10:14 pm

Quote from: Nominal Animal on June 26, 2021, 08:55:29 pm
And I want the more robust embedded devices, because we already have enough cheap shit.
That's the problem: cheap! These tools already exist for mission critical CPUs, e.g. Lauterbach TRACE32 costs a lot of money, but offers a couple of power ICE and simulators which can check everything everywhere run-time. It can also be used for dynamic coverage, profiling ... etc

That's the business way to go about it, yes. You make a product you can charge through the nose for, even though the same features could be had for cheaper if anyone cared. Anyone does not care, because cheap == unreliable; you must spend money to be believable in the business world.

That is also the reason I myself am not trying to design a new programming language and a new compiler that would beat the competitors off the waters. Because to try and do that, is to forget the human aspect: you never know beforehand what those clever monkeys get up to. So, instead of trying to force them into a pre-designed mold, I want to see small incremental steps, an evolutionary pressure if you will, towards more robust devices. I don't want everyone to switch to expensive Enterprise-quality Mission Critical Devices with fifteen Certificates of Quality from ten different agencies, because that is just as shitty but in a different way.

Quote from: DiTBho on June 26, 2021, 11:42:35 pm

Talking about "poor man" solutions

The true "poor man" solution would be to modify the compiler to emit machine code doing the checks explicitly, and emulate the interrupt whenever necessary. It costs zero hardware.

Quote from: dunkemhigh on June 27, 2021, 12:03:23 am

it's usually possible to know where the limit of the stack is

You still do not seem to be able to differentiate between "heuristic" and "deterministic", it seems.

hamster_nz · « **Reply #89 on:** June 27, 2021, 03:08:39 am »

Just throwing this on the fire...

C support for variable length arrays.

WORST MOVE EVER.

brucehoult · « **Reply #90 on:** June 27, 2021, 04:31:59 am »

But C doesn't support arrays at all, whether fixed or variable length.

hamster_nz · « **Reply #91 on:** June 27, 2021, 04:45:09 am »

Quote from: brucehoult on June 27, 2021, 04:31:59 am

But C doesn't support arrays at all, whether fixed or variable length.

Code: [Select]

#include <stdio.h>
#include <string.h>

void puts_both(char *a, char *b) {
   char c[strlen(a)+strlen(b)+2];  // Urgh! What where they thinking!
   strcpy(c,a);
   strcat(c," ");
   strcat(c,b);
   puts(c);
}

int main(int argc, char *argv[]) {
   puts_both("Hello","World");
   return 0;
}

Builds fine without errors.

Code: [Select]

hamster@hamster-acer5:~/vla$ make
gcc -o vla main.c -Wall -pedantic
hamster@hamster-acer5:~/vla$ ./vla
Hello World
hamster@hamster-acer5:~/vla$

brucehoult · « **Reply #92 on:** June 27, 2021, 06:28:25 am »

That's not C until c99. Using --std=c89 -pedantic will cause it to be rejected. I had a feeling it's allowed in c++14 but gcc rejects it in all cases with -pedantic. gcc/g++ accept it in all cases without -pedantic.

But I think you miss my point. That's not declaring what I or several others here would call an "array". It's merely allocating space and giving you a pointer to the start of it. If you forgot the +2 or made a mistake and put +1 then you would have potential catastrophe, with no warning.

hamster_nz · « **Reply #93 on:** June 27, 2021, 07:07:38 am »

Nevertheless that is what this stack-shagging feature is called:

https://c-for-dummies.com/blog/?p=3488

Quote

The C99 standard added a feature to dimension an array on-the-fly. The official term is variable length array or VLA. And while you might think everyone would love it, they don’t.

It also had that problematic issues that people who use it can't see the issue with using it... "But you are allocating what could be huge blocks of memory on the stack at runtime!" usually gets the response of either a look of confusion, or "I know! It's neat eh?"

DiTBho · « **Reply #94 on:** June 27, 2021, 08:36:51 am »

Quote from: Nominal Animal on June 27, 2021, 12:17:59 am

That's the business way to go about it, yes. You make a product you can charge through the nose for, even though the same features could be had for cheaper if anyone cared.

I did this mistake years ago with an electric kart ... I had said "I don't see what could go wrong, let's do it with the the cheapest wrenches", then I lost one wheel in a corner ... and I crashed into a hay bale. Now I know, why pneumatic ratchet wrenches are better.

Oh, but talking about things that could go wrong ... have you ever used a gasoline arctic camping stove? I made the two big mistakes of underestimating the idea of paying for some good advice, and bought the cheapest rubber seals.

Inspected, they looked good ... just ... I then found out that below -20C they break. At -45 ° C, it means no fire, no liquid water to drink, no food.

Arctic rubber seals cost much much more the cheapest rubber seals, but they are worth each penny they cost

DiTBho · « **Reply #95 on:** June 27, 2021, 09:15:57 am »

Quote from: Nominal Animal on June 27, 2021, 12:17:59 am

The true "poor man" solution would be to modify the compiler to emit machine code doing the checks explicitly, and emulate the interrupt whenever necessary. It costs zero hardware.

Yeah, I have massively used when I was a student and seen in paid job several times.

I remember a couple of paid consultancies, they were working on a Engine control unit based on Motorola 332, and I used a very powerful Avoget toolsuite to instrument the code and check everything in software(1), but then they bought me a professional BDM-base ICE to complete the job.

(1) the m68k architecture facilitates this task.

DiTBho · « **Reply #96 on:** June 27, 2021, 09:27:25 am »

Quote from: hamster_nz on June 27, 2021, 03:08:39 am

Just throwing this on the fire...
C support for variable length arrays.
WORST MOVE EVER.

Languages like eRlang don't even allow you to reassign a value to a variable, if you want to do it, you have destroy the variable and recreate it, which looks extremely restrictive but it has its purpose, while that stuff in C is ... I do find it very stupid because too prone to fail and very hard to be debugged, even with a smart ICE, you should instrument the ICE to check it.

I agree with you: worst move ever!

brucehoult · « **Reply #97 on:** June 27, 2021, 11:04:46 am »

Quote from: hamster_nz on June 27, 2021, 07:07:38 am

Nevertheless that is what this stack-shagging feature is called:

Sure. Because C "arrays" in general are not arrays, but only allocating space and pointing at it.

Quote

It also had that problematic issues that people who use it can't see the issue with using it... "But you are allocating what could be huge blocks of memory on the stack at runtime!" usually gets the response of either a look of confusion, or "I know! It's neat eh?"

It's fine if you're on a machine with GB of stack space in VM. Or if you do some sanity check on the size.

Linus complains that it generates less efficient code than a statically-sized array. True, but it's a lot more efficient than malloc() at the start of the function and free() at the end. You can't always know a sensible maximum size to pick for that statically sized array, especially in library code where you don't know how big a stack the machine you're running on will have -- a few KB may seem sensible until someone runs your code on a machine with hundreds of MB of stack and passes in big data. Or until they run your code on an AVR.

I think I can safely say that I have never written a dynamically sized array in C. I do however use alloca() sometimes, which amounts to the same thing.

Once you get to a certain size, the execution time to actually do something with the array will be enough more than the execution time of malloc() and free() that you don't care and it may as well be on the heap. If your malloc package is sensible and doesn't fragment things all to hell with large malloc()/free() repeated millions of times.

PlainName · « **Reply #98 on:** June 27, 2021, 11:16:41 am »

Quote

but it's a lot more efficient than malloc() at the start of the function and free() at the end.

Efficiency counts for nowt if you run out of stack. With malloc() at least you can see it failed before just running off the end. I'm surprised Linus went for efficiency over robustness with his argument.

Code: [Select]

char c[strlen(a)+strlen(b)+2]; // Urgh! What where they thinking!
ummm... shouldn't that be +1? I mean, 2 does no harm but only 1 is necessary. Unless I haven't yet had enough coffee this morning and missed something.

hamster_nz · « **Reply #99 on:** June 27, 2021, 11:23:20 am »

Quote from: dunkemhigh on June 27, 2021, 11:16:41 am

Code: [Select]
char c[strlen(a)+strlen(b)+2]; // Urgh! What where they thinking!
ummm... shouldn't that be +1? I mean, 2 does no harm but only 1 is necessary. Unless I haven't yet had enough coffee this morning and missed something.

I wanted room for:

* the first word
* a space
* the second word
* the terminating null character.

So it seems right to me...

Or maybe more explicitly:

Code: [Select]

char c[strlen(a)+1+strlen(b)+1]; // Urgh! What where they thinking!

PlainName · « **Reply #100 on:** June 27, 2021, 11:34:58 am »

Ah, yes. As I say, not enough coffee yet $:=\$

Nominal Animal · « **Reply #101 on:** June 27, 2021, 11:38:10 am »

Quote from: DiTBho on June 27, 2021, 08:36:51 am

I did this mistake years ago with an electric kart ... I had said "I don't see what could go wrong, let's do it with the the cheapest wrenches", then I lost one wheel in a corner ... and I crashed into a hay bale. Now I know, why pneumatic ratchet wrenches are better.

Dude, pick the right tool for the job!

Price alone is not a yardstick you want to use for comparing tools. You don't get what you don't pay for.
(The opposite, "You get what you pay for", is patently untrue, because a LOT of people are willing to sell you repackaged shit if they can make a profit.)

(And no, that is not ironic, because I've already explained the exact cost I'd be willing to pay for the stack guard interrupt stuff: Basically zero on the price of the MCU, only a small reduction in performance for stack-related operations. If I had more faith in, or knew compiler developers well enough they'd be willing to listen to me, I would probably implement this in software. As is right now, the darn things don't even recognize the most typical bug scenarios, so if I go that route, I'll be bogged down in that swamp for years.. with very little results, because of the human aspect, and because of "the standard says we can do inane thing, and it is easy; so that's what we'll do".)

Quote from: DiTBho on June 27, 2021, 08:36:51 am

I then found out that below -20C they break. At -45 ° C, it means no fire, no liquid water to drink, no food.

Yep. Just a couple of weeks ago, my brother found out what I meant when I told him "these devices cannot handle being stored in freezing temperatures". In his haste to keep things tidy, he decided to store a bunch of power supplies in a shed outside for the winter, because that way they weren't an eyesore in the storage closet indoors. In a locale where winter temperatures usually reach -30°C or below. The end result? A box of broken supplies, wasted. (I haven't gotten one to do a post-mortem, but first suspect would be capacitors and physically cracked components due to differential thermal expansion.)

It's not just proper silicone rubber gaskets; silicone rubber leads and mains cables have exactly the same issue.

Have you tested your "silicone" leads? If they burn, they're not actually silicone, just over-plasticized PVA, which has completely different performance to silicone, but at room temperature feels pretty much exactly the same to a human. If you never use them in extreme conditions, you usually only notice when your "silicone" leads somehow embed themselves in the plastic container or pouch (Aneng AN8008 pouch and the test leads it comes with in particular) they're in. That's because the extra plasticizer slowly leeches off the leads, and if what it leeches onto is also plastic compatible with that plasticizer, it ends up fusing the two plastics together. Better than gluing, too.

I did get him to spend an extra ten euros for a proper silicone rubber exhaust hose for the new heat pump he installed. It is not critical for operation, but I think it is nice to know that you don't one nice day in the spring find out that some of the yucky condensate is now leaking along the wall both inside and outside, because the hose couldn't handle the environmental changes without cracking.

(Edited to add: I am using the term "silicone rubber" here for the family of materials, including buna-n AKA nitrile rubber, with "rubber" referring to the properties of said materials and not specifically to isoprene or isoprene derivatives.)

Nominal Animal · « **Reply #102 on:** June 27, 2021, 12:25:34 pm »

Quote from: brucehoult on June 27, 2021, 11:04:46 am

Linus complains that [variable length arrays on stack] generates less efficient code than a statically-sized array. True, but it's a lot more efficient than malloc() at the start of the function and free() at the end.

Well, not "a lot", really. A LOT of Linux kernel code now uses page-sized allocations for temporary data, and there had been a lot of comparisons done when moving from VLAs-on-stack to these explicit allocations (via domain-specific get/put allocators/deallocators), especially in the filesystem layers.

A scope-assisted no-realloc heap allocator would be very nice. Within each scope, allocating more memory for local data would basically boil down to extending the stack (but that stack pointer probably being at say TLS rather than in a dedicated register). Passing out of a scope would simply reset that data stack pointer to a previous value, essentially freeing everything without any explicit free() type calls. So, it would really boil down into a secondary stack for local data, without a hardware stack pointer register.

I have played some with "stack buckets", basically data structures implementing stacks as a bucket brigade. This has the benefit of letting the data stack scatter in the address space. It is surprisingly cheap and robust, but as usual, I never tried to add support for that into a C or C++ compiler, only experimented it with pure C in more-or-less freestanding environment. If you have 48 or 64 bit address space, I would not bother; a single linear growable/shrinkable region (in the sense of whether it is backed by actual RAM or not) should work better.

Quote from: brucehoult on June 27, 2021, 11:04:46 am

I think I can safely say that I have never written a dynamically sized array in C. I do however use alloca() sometimes, which amounts to the same thing.

I have; often. But those functions always have explicit checks on the size. I never use VLAs when the function might be used in a recursive call chain.

With thread pools and worker threads, I do the opposite: I keep the stack use to an absolute minimum, and use malloc()/free() for everything. This is important, because then one can create the threads with much smaller stacks –– I nowadays tend to 2*PTHREAD_STACK_MIN –– reducing the cost of worker threads in terms of virtual address space used.

(Pool workers end up not doing many of those, because the work tends to be already packaged in a suitable bucket. Typical ones I use are things like work space, when additional memory is needed to speed up the processing, with the work space being "owned" by the thread and not any particular work. I do heavily reuse freed work buckets, too, so the actual number of malloc()/free() calls in my pool workers tends to be rather minimal.)

Quote from: dunkemhigh on June 27, 2021, 11:16:41 am

Efficiency counts for nought if you run out of stack.

Very true. My beef with you and Ataradov can be similarly simplified to "An unreliable heuristic telling you a problem might or might not have occurred is speculation. Please do not speculate with other peoples work and data."

brucehoult · « **Reply #103 on:** June 27, 2021, 12:53:06 pm »

Quote from: Nominal Animal on June 27, 2021, 12:25:34 pm

A scope-assisted no-realloc heap allocator would be very nice. Within each scope, allocating more memory for local data would basically boil down to extending the stack (but that stack pointer probably being at say TLS rather than in a dedicated register). Passing out of a scope would simply reset that data stack pointer to a previous value, essentially freeing everything without any explicit free() type calls. So, it would really boil down into a secondary stack for local data, without a hardware stack pointer register.

I fail to see the point of this, assuming you're not on a 6502 or something.

Appropriately sizing one stack per thread can be a challenge. Figuring out the right size for *two* stacks per thread, and the balance between them, just seems harder.

gf · « **Reply #104 on:** June 27, 2021, 01:25:49 pm »

Quote from: Nominal Animal on June 27, 2021, 12:25:34 pm

A scope-assisted no-realloc heap allocator would be very nice. Within each scope, allocating more memory for local data would basically boil down to extending the stack (but that stack pointer probably being at say TLS rather than in a dedicated register). Passing out of a scope would simply reset that data stack pointer to a previous value, essentially freeing everything without any explicit free() type calls. So, it would really boil down into a secondary stack for local data, without a hardware stack pointer register.

Reminds me a bit of the GNU obstack library (not exactly, but some similarities).

Quote from: brucehoult on June 27, 2021, 12:53:06 pm

I fail to see the point of this, assuming you're not on a 6502 or something.
Appropriately sizing one stack per thread can be a challenge. Figuring out the right size for *two* stacks per thread, and the balance between them, just seems harder.

One advantage I see is that overflow of the separate allocation stack can be easily and reliably detected, and max. usage can be tracked (w/o support from the compiler or hardware).
Furthermore, the "stack" of this allocator could also be segmented (i.e. a list of big chunks), so that it could even grow dynamically by adding additional chunks/pages on demand.
(Depending on the particular use case, dynamic growth may or may not be desired, of course.)

Edit:

I just noticed that gcc and llvm support a "split stack" / "segmented stack" feature, too, for the regular call stack. The generated code seems to be OS/platform-specific, though, and the feature is obviously not available for all platforms (I just tried a few ones on godbolt.org, e.g. https://godbolt.org/z/hbnGYsxqx).

PlainName · « **Reply #105 on:** June 27, 2021, 03:11:13 pm »

Quote from: dunkemhigh on June 27, 2021, 11:34:58 am

Ah, yes. As I say, not enough coffee yet $:=\$

I realise the example was aimed at showing displeasure of a feature, but I think it illustrates something really well (unintentionally).

The example line is:

Code: [Select]

char c[strlen(a)+strlen(b)+2]; // Urgh! What where they thinking!
And the first glance of this triggers the off-by-one reflex because there are two strings, and as any fule no, a string has a hidden nul. Thus the copy is clearly assumed to copy two nuls, although only one is actually required.

It's only if one looks further down the function that the added space is noticed. Or not noticed - these things are easily overlooked, and since strcat() is used the terminating nul is dealt with automatically. (You can see when the coffee was needed!).

This is an excellent example of the value of apparently useless comments. Why use 2 instead of 1? Anti-commenters would say that it's obvious from the code and you don't repeat yourself, but IMV an appropriate comment here would be as critical and useful as a coded range check.

PlainName · « **Reply #106 on:** June 27, 2021, 03:19:05 pm »

Quote

Quote from: dunkemhigh on Today at 01:03:23

it's usually possible to know where the limit of the stack is

You still do not seem to be able to differentiate between "heuristic" and "deterministic", it seems.

I am well aware of the difference. I think you forget that we are only here discussing this because we can present alternative views, or want to dig a bit deeper into stuff. If the only thing we said was "Yes, agree 100%" the board would be a dead dodo within a month.

And it must be recognised that pragmatic engineering doesn't consist of implementing ivory-towered ideals but is mostly about 'good enough' solutions. Sure, you can wish for clever whizzy stuff, but making something here and now is a matter of balancing compromises. Sometimes a compromise is indeed 'good enough', and I think a decently implemented canary used appropriately falls into that category.

Nominal Animal · « **Reply #107 on:** June 27, 2021, 03:49:18 pm »

Quote from: brucehoult on June 27, 2021, 12:53:06 pm

Appropriately sizing one stack per thread can be a challenge. Figuring out the right size for *two* stacks per thread, and the balance between them, just seems harder.

The second one grows and shrinks dynamically; it is not sized beforehand.

Quote from: dunkemhigh on June 27, 2021, 03:19:05 pm

Quote
You still do not seem to be able to differentiate between "heuristic" and "deterministic", it seems.
I am well aware of the difference.

Yet, you insist on counterarguments that hide their core statement in weasel words like "usually" that mean nothing when considering security or robustness.

To simplify, your (both your and Ataradovs) arguments seem to boil down to "I don't see why you want deterministic behaviour; the heuristics we have right now are sufficient". My counterargument to that is "the shit quality of software we have now, shows that argument is patently false –– so much so to be utterly ridiculous, really".

What gives? Me fail English, or you getting angry for making me realize you're basing your work product on belief and statistics instead of science, math, and engineering?

(Oh, and make no mistake: I much prefer you – everybody – disagreeing with me over agreeing with me, because I can only learn more in the first case. I don't like people enough to get a dopamine kick when someone agrees with me; I only get those when I learn something new and useful, or someone else discovers something with my help. Agreeing with me is useless, except when incidental, like when adding detail or scenarios or discoveries that support the approach or opinion.)

Nominal Animal · « **Reply #108 on:** June 27, 2021, 04:36:28 pm »

Quote from: gf on June 27, 2021, 01:25:49 pm

One advantage I see is that overflow of the separate allocation stack can be easily and reliably detected, and max. usage can be tracked (w/o support from the compiler or hardware).
Furthermore, the "stack" of this allocator could also be segmented (i.e. a list of big chunks), so that it could even grow dynamically by adding additional chunks/pages on demand.
(Depending on the particular use case, dynamic growth may or may not be desired, of course.)

Without dynamic growth, it is no better than a single pre-sized stack, and the extra computational cost makes it not worth the effort.

(The "list of big chunks" is what I used term "bucket brigade" for.)

Additional benefits include things like trivial runtime control of resource use during data processing. Like the stack waterline control registers (that I'd love to use for detecting stack overflow), a scope could trivially restrict the amount of local/automatic memory any sub-scopes to it may consume. In systems programming, this is an useful feature, because that sort of a limit corresponds well to per-request limits: it provides a simple way to limit the amount of memory used to service a request, without having to reimplement your own memory allocator like Apache Runtime does.

So, in a very real sense, it is just a separate malloc()/free() interface intended and optimized for local data storage; leveraging the fact that there is no realloc(), and allocations and deallocations always occur in opposite order. (That is, if A is allocated before B, then B is always freed before A. No holes possible.)

But then, because actual call stack would be used in small, fixed-size units, you could rely on a single stack guard page to detect the need for dynamically growing the stack – letting you switch from pre-sized stacks to dynamically growing ones.

(Currently, it is not feasible in practice, because stacks grow down, and when objects are initialized, they are usually initialized in increasing addresses. On AMD64, a stack-based allocation (alloca(), for example), subtracts the desired size from the stack pointer to obtain the starting address of the allocated memory. When that memory is first accessed, it is likely to be accessed at its initial address or close by. This means that when objects are allocated on stack, first accesses are often quite far from the previous stack pointer, and easily beyond a single stack guard page. To work around this, you could have a SIGSEGV handler that examines the current stack pointer (or stack frame base address, if one is used) and compares it to the current size of the stack; and if the difference is within acceptable limits, simply attempts to grow the stack mapping to cover the stack pointer and/or stack frame base address. It then lets the kernel retry the same instruction. If it was a valid stack access, it now succeeds, with the stack having grown dynamically to sufficient size. In theory it should work, but the context in which an userspace SIGSEGV handler is executed even in Linux is so fragile, I am not sure if it would be robust enough in practice.)

For example, if the compiler agreed that accesses to a new stack frame must be done in steps of less than half a page apart – even inserting dummy reads if necessary –, then we could immediately switch to dynamically growing stack segments, using just two guard pages per stack. (The first one is the functional one, the second one acts as a pre-allocation without memory backing, making the SIGBUS handler more robust against temporary mmap() allocation failures.)

@brucehoult:

All that said, I do perfectly understand that it is difficult to see the benefits of any such, without seeing code that implements it.

I could whip up an example library this afternoon, but omitting making the real stack dynamic (as opposed to fixed size). Would it be useful? Interesting?

Also, I was thinking of starting a separate thread here, with the title "Mind over bugs: the C pointer edition", where I explain how and why changing how one sees or considers C pointers, makes an entire class of bugs (namely, the annoying buffer underrun/overrun ones) transparent and much, much easier to avoid. All C programmers, even beginners, as a target audience. It would be better as a video, Dave style (rather similar to the one about capacitance multipliers, actually), but since I have the on-screen personality of a slightly annoyed gerbil, I can't do it. It might spawn a few other useful approaches in quashing C pointer bugs from other members such as yourself (never underestimate the value of anecdotes for those still learning!), and I for one would be very interested to read those.

But, like my dear friend Blabbo is Verbose says, I might already be causing more discord by my lengthy posts here than their content could ever offset.

SiliconWizard · « **Reply #109 on:** June 27, 2021, 05:21:17 pm »

A dynamic-size stack would be pretty inefficient. Obviously depending on the memory layout, it could require being reallocated, and thus possibly relocated on the fly. I don't think this would be the best idea, although, if it's a rare enough event (assuming not on a real-time system), it would certainly be better than a crash. If you have a reliable way of estimating the stack usage at all times (which was actually what I was after and the start of the whole discussion), then you can certainly implement this. You'd just need to pair this with an exception if the current stack usage approaches the max size by a predefined amount (I would favor this approach rather than waiting for an overflow to actually happen), and in the exception handler, just reallocate the stack (in the realloc() C sense, thus copying its current content if it has to be relocated, and update the current stack pointer accordingly). Why not.

With that said, there is a common conception that heap allocation would be more "reliable" than allocating stuff on the stack. This just isn't true. It mostly comes from two facts: one, that on most systems these days, return addresses are stored in the same stack as local objects, which is pretty bad (we already discussed this in other threads), and second, that again on most common systems and programming languages, the allocated stack is usually much smaller than the space reserved for the heap, because heap allocation is favored. We can think of almost-fully stack-based languages that would make this irrelevant. I think we discussed this before also, but stacks have benefits too. Such as automatic memory management at very low cost.

Although risky as many other C features, I wouldn't completely reject VLAs either. It's pretty common, for instance, to allocate fixed-size arrays as local variables and pick a max size. Using VLAs can make more efficient use of the stack. But I agree they introduce a whole range of new potential bugs, and are almost impossible to analyze statically, so security-wise, they aren't that good. But just note that some of the bugs they allow are the same as the ones made possible with heap allocation.

Nominal Animal · « **Reply #110 on:** June 27, 2021, 08:58:03 pm »

Quote from: SiliconWizard on June 27, 2021, 05:21:17 pm

A dynamic-size stack would be pretty inefficient. Obviously depending on the memory layout, it could require being reallocated

No, that won't work. We do not know which of the values in the stack are addresses to the stack, so cannot do a fixup pass. In other words, the stacks must never move.

We can allocate new pages (and if using a bucket brigade approach, the local data stack does not need consecutive address space), and code can choose to check and release unused stack pages back to the kernel (shrinking the stack), but not move or reallocate it.

The overhead from using a dynamic hardware stack is from the SIGBUS/SIGSEGV signal on first access to each new stack page. Typical work done in the handler is one mprotect() call, and one mmap() call, plus an update to at least one TLS variable (keeping the current stack size; another variable is a set of flags, one of which indicates whether the stack can be grown by at least one page or not).

Perhaps a better way to think about it, is to compare to using a PROT_NONE mapping for the entire stack address space except for the initially used pages. Whenever a SIGBUS/SIGSEGV fires, we essentially do an mprotect() call, telling the kernel that it needs to switch that page from inaccessible (PROT_NONE) to backed by RAM (PROT_READ|PROT_WRITE) and populated.

(So why don't we do just that instead? Because no resources are really saved if one does it this way. You then do in userspace what the kernel would do automatically.)

The similarity to the fixed-size stack is that stack accesses up to the current stack size has no overhead. None; no difference. When the stack size exceeds that estimate, a SIGBUS/SIGSEGV handler gets evoked. The process itself implements it, and its purpose is to decide whether to kill the thread or to grow the stack. Without dynamically growing stacks, such policy can only be done at the function scope, checking whether the current stack pointer or stack frame is past some previously set limit; or, if the exact amount of stack needed by the next scope is known (i.e., this is possible to do in a compiler, not so much in ones own C without help from the compiler) checking whether the needed amount would put the stack beyond the previously set limit.

Furthermore, there is no real need to spend resources for the virtual address space mapping ("address reservation", as is done with PROT_NONE), genuinely freeing resources from rather large default stacks (8 MiB per thread typical in Linux on AMD64).

It may sound like it amounts to only a little overhead, but I do sometimes write code with dozens of worker threads in pools; hundreds, when using a dedicated thread per client. Address space is cheap but not free. Especially for service processes run on virtual hosts, I would rather trade a bit of performance for smaller memory requirements.

Because the SIGBUS/SIGSEGV handler uses `mmap(nextpage, page, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_GROWSDOWN | MAP_STACK, -1, 0)` to ask for a new stack guard page just beyond/next to the current one, the kernel either provides it that address, not at all, or provides a different address (which will be rejected/freed, and the stack marked no longer growable), no existing mappings will be clobbered. Note the lack of MAP_FIXED flag; this is crucial.

I do not see ordinary code ever shrinking the stack; only a worker pool library implementation might (when returning a worker), but possibly not even those. I suspect it would be more efficient just to let ballooned workers die when they complete their current work, and create fresh new ones instead. However, exactly how large a thread stack would be allowed to grow is a policy question, something that only the userspace process itself can and should decide; and this lets it do so at runtime, even changing its mind, instead of deciding a hard limit up front.

Let's estimate, then, the overhead for a real world case. We start with say PTHREAD_STACK_MIN (16,384 bytes on AMD64 in Linux) stack size, and during the lifetime of the thread, it expands to the default maximum of 8,388,608 bytes, in increments of single page (4,096 bytes in this particular case). That means an overhead of 2,048 SIGBUS/SIGSEGV signals, and about twice that in system calls. Is that a lot? No, it is not. To be easily detectable, it should happen within one second or so; anything longer, and it will be lost in the noise, and be very difficult to notice.

This is exactly why I wondered if an example implementation would help, because even trivial test cases would show the *real* overhead associated with the approach. Unless you have already implemented something similar, human intuition is unlikely to give an useful initial guess as to what the overhead in reality is.

(I have only done enough testing myself to have that as an *opinion*; but I know that even if I do a test case, if I am the only one who tests it, the results are still only enough to base ones own opinion on. Stuff needs to be rigorously investigated and pushed to the limit, and I think this kind of stuff needs more than one viewpoint to properly examine.)

SiliconWizard · « **Reply #111 on:** June 27, 2021, 09:09:27 pm »

Quote from: Nominal Animal on June 27, 2021, 08:58:03 pm

Quote from: SiliconWizard on June 27, 2021, 05:21:17 pm
A dynamic-size stack would be pretty inefficient. Obviously depending on the memory layout, it could require being reallocated
No, that won't work. We do not know which of the values in the stack are addresses to the stack, so cannot do a fixup pass. In other words, the stacks must never move.

Ah, I was thinking too fast and overlooked that.

So I guess what you are suggesting relies on virtual addressing to be able to resize it dynamically without having to move the already existing stack? That doesn't sound practical for small embedded stuff obviously though. It requires a full-fledged MMU with corresponding software support.

OTOH, it's on a small target that using "ad-hoc" stacks (meaning: as small as possible) would really make sense, due to limited memory. On larger systems, that's usually not a big problem to largely oversize the stacks, even if that means some wasted memory. Just a thought. As I said above, if you can afford to reserve a large amount of stack, then you don't need to care all that much about it, except maybe for very specific cases, with very deep recursive functions.

Nominal Animal · « **Reply #112 on:** June 27, 2021, 09:45:36 pm »

Quote from: SiliconWizard on June 27, 2021, 09:09:27 pm

That doesn't sound practical for small embedded stuff obviously though. It requires a full-fledged MMU with corresponding software support.

For sure. I do not believe separate data stacks or dynamically growing stacks are useful on embedded targets. I see some use for them in systems programming: specifically, daemons with lots of threads. (In Linux, identity (uid, gid, supplementary gids) and capabilities (CAP_CHOWN et cetera) are per-thread properties, not per-process ones; the standard C library just reserves a realtime signal for synchronizing them across the process, because POSIX says they are per-process properties.)

On embedded targets, I'd love to have that hardware stack pointer relative effective address checks as a tool for "cancelling" recursive processing when it cannot complete within currently available stack space. It would rely on the compiler generating code that always uses stack-pointer relative addressing (even a dummy load) before assigning it to a pointer, so I guess it would make more sense to work on integrating stack address check support into a compiler one uses instead.

Hey, does anyone have any contacts with clang/llvm AVR developers? If they did not reject such an extension (function preamble and stack allocations checking the resulting effective address against a runtime limit) as an idea before seeing some patches first, I might just try and find out what they'd think of some suggested patches. As mentioned before, I do not have the ability to push through GCC levels of cliqueness, though.
It would not help with buffer under/overrun bugs, but it would give those of us interested almost complete control over stack overflows. (Almost, because correctness when an interrupt causes the stack overflow is something I have not worked out yet. Is complicated.) And, of course, it would also work as runtime stack use instrumentation.

Quote from: SiliconWizard on June 27, 2021, 09:09:27 pm

On larger systems, that's usually not a big problem to largely oversize the stacks, even if that means some wasted memory.

Except for those pesky daemons with lots of threads running on virtual hosts. You end up paying money for that memory address space, even though it won't be used.

How people are happy to run Java Virtual Machines with their hunger for fixed-size RAM, does still surprise me a bit. I'm not saying it cannot or should not be done; I'm saying it is part of the crappiness of the current software design style I'd like to do something about.

Just like I'd love to do something about how silly and inefficient the core design is for all molecular dynamics simulators whose sources I've seen. They really are straight from the seventies. With the latest CUDA snazziness sprinkled on top.

Yep; I may be wailing at the windmills here, and completely missing more important problems I might actually help. But until shown proof and not just opinions, I have no better information to work on than my own experience and observations.

brucehoult · « **Reply #113 on:** June 27, 2021, 10:07:58 pm »

Quote from: Nominal Animal on June 27, 2021, 03:49:18 pm

Quote from: brucehoult on June 27, 2021, 12:53:06 pm
Appropriately sizing one stack per thread can be a challenge. Figuring out the right size for *two* stacks per thread, and the balance between them, just seems harder.
The second one grows and shrinks dynamically; it is not sized beforehand.

Huh?

The stack pointer of course moves dynamically, but you need to decide on the initial stack pointer at the outset, and also decide on the initial stack pointer for the thread with its stack at the next lowest place in the address space, and decide on how far apart to put them. You can't move them afterwards (assuming things are allowed to contain pointers into the stack, including from other parts of the stack, and that you don't know where those pointers are i.e. we're talking about C still)

This is easier for something with a MMU where you can just place things very far apart, especially with a 64 bit address space, but you still have to decide on the maximum stack size beforehand.

brucehoult · « **Reply #114 on:** June 27, 2021, 10:21:59 pm »

Quote from: SiliconWizard on June 27, 2021, 05:21:17 pm

With that said, there is a common conception that heap allocation would be more "reliable" than allocating stuff on the stack. This just isn't true. It mostly comes from two facts: one, that on most systems these days, return addresses are stored in the same stack as local objects, which is pretty bad (we already discussed this in other threads), and second, that again on most common systems and programming languages, the allocated stack is usually much smaller than the space reserved for the heap, because heap allocation is favored. We can think of almost-fully stack-based languages that would make this irrelevant.

There is a large and important class of programs which absolutely can not be fully stack based.

This is programs based on an event loop, with an amount of state retained between iterations that is not known in advance.

This includes any interactive program (whether character based or GUI) where the user is creating or maintaining a document or in-memory database of some kind.

gf · « **Reply #115 on:** June 27, 2021, 11:03:56 pm »

Quote from: Nominal Animal on June 27, 2021, 09:45:36 pm

On embedded targets, I'd love to have that hardware stack pointer relative effective address checks as a tool for "cancelling" recursive processing when it cannot complete within currently available stack space. It would rely on the compiler generating code that always uses stack-pointer relative addressing (even a dummy load) before assigning it to a pointer, so I guess it would make more sense to work on integrating stack address check support into a compiler one uses instead.

Hey, does anyone have any contacts with clang/llvm AVR developers? If they did not reject such an extension (function preamble and stack allocations checking the resulting effective address against a runtime limit) as an idea before seeing some patches first, I might just try and find out what they'd think of some suggested patches. As mentioned before, I do not have the ability to push through GCC levels of cliqueness, though.
It would not help with buffer under/overrun bugs, but it would give those of us interested almost complete control over stack overflows. (Almost, because correctness when an interrupt causes the stack overflow is something I have not worked out yet. Is complicated.) And, of course, it would also work as runtime stack use instrumentation.

Already now, gcc's split stack feature basically needs to check the stack pointer against the (thread-specific) limit, in order that a new segment can be allocated when the current segment is exhausted. So I guess that replacing the __morestack... routines (so that the allocation of additional segments fails) would degrade the functionality to an overflow checker for the one and only stack segment. As always, the devil will be certainly in the details (you already mentioned interrupts, etc.). Llvm obviously has a similar feature, called "segmented stack".

Nominal Animal · « **Reply #116 on:** June 28, 2021, 12:44:35 am »

Quote from: brucehoult on June 27, 2021, 10:07:58 pm

decide on how far apart to put them.

Exactly; but that does NOT determine the size of the stack, only the maximum possible size of the stack.

Quote from: brucehoult on June 27, 2021, 10:07:58 pm

This is easier for something with a MMU where you can just place things very far apart, especially with a 64 bit address space, but you still have to decide on the maximum stack size beforehand.

Yes, exactly, and this is only really useful there (say anything with at least 48-bit address space).

To repeat – because I now see I did not indicate in any way that I intended this completely separate from stack stuff on embedded devices; I apologise –, this is useful in cases where you have lots of threads, reducing the up-front cost of threads by trading a small performance tradeoff whenever the amount of stack space needed crosses a new page boundary.

A compiler could generate code for this right now. (In Linux, signal handlers can run on an alternate stack, so if one required that, there would be no "interrupts" to worry about, and a simple function preamble check – "does adding a new N-byte stack frame run afoul of the limit foo in TLS?", with room enough for full processor state and a return address beyond the set limit – would suffice.)

By splitting the return address and saved register state from local data storage, the physical stack would be accessed sequentially (or at most separated by full processor state size), so a virtual memory based guard page (pages being more than twice larger than complete userspace process state and a return adress) mechanism would work also.

The local data stack could be separately instrumented for verification on a function by function basis, to help detect buffer under/overrun bugs, and it might (or might not) help convert the nastiest heisenbugs overwriting return addresses to more easily detected and debugged bugs... but I am more interested in what kind of changes we could introduce to how us human programmers use these things. (Yet, the idea of having full guard pages around the current local data stack frame is kinda interesting as a debugging tool. You know, if one could just mark/label a function as suspect, and have the compiler generate extra-special but slow code generating that local data stack frame. Again, almost doable with current compilers; only the automatic release of the memory-mapped stack frame when it passes out of scope is iffy.)

Even having the local data stack grow upwards instead of downwards might make a measurable difference in buffer overflow bug behaviour, simply because this way an overflow is less likely to corrupt local data belonging to parent scopes. I don't know for sure, but it would be interesting to find out and check; perhaps a pattern pops out that helps eliminate those annoying bugs in plain-ol' C, that we could shoehorn back into traditional stack implementations too. Or it might be an excuse for bad programmers to write even worse code, in which case I'd end up chewing my own foot off in anger and disappointment.

If the data stack concept is morphed into context, we could look at efficient coroutine implementations using the same. (The local data stack or scope being the same for each coroutine, but the real stack used to maintain call chain and register data.) But that passes beyond C.

As a side note, it is interesting to examine how C processes asked more data memory from the kernel prior to memory mapping interfaces: the brk() function.
Essentially, it maintains the size of the uninitialized data section (or program break, the end of the uninitialized data section) in exactly the way a local data stack would be maintained by functions requiring space for local variables.

Nothing new under the sun, eh?

brucehoult · « **Reply #117 on:** June 28, 2021, 01:18:57 am »

Quote from: Nominal Animal on June 28, 2021, 12:44:35 am

Quote from: brucehoult on June 27, 2021, 10:07:58 pm
decide on how far apart to put them.
Exactly; but that does NOT determine the size of the stack, only the maximum possible size of the stack.

Uhhh.

When I say something like "you have to figure out the size to allocate for the stack" I'm of course talking about the maximum possible size the stack can grow to. The stack starts off empty.

Quote

As a side note, it is interesting to examine how C processes asked more data memory from the kernel prior to memory mapping interfaces: the brk() function.
Essentially, it maintains the size of the uninitialized data section (or program break, the end of the uninitialized data section) in exactly the way a local data stack would be maintained by functions requiring space for local variables.

Nothing new under the sun, eh?

Sure. You can call sbrk() with a negative increment (or call brk() with a smaller value than you previously called it with) and thus use it as a stack. But not if, as is usual, you used that space as a heap and there are objects allocated and not free'd at the end of the heap space. If you get to some point where the last object(s) in the heap are free'd then you can move the break down. This *might* return memory to the OS and make it available for other processes.

This is more likely in a (compacting) GC environment than in a malloc()/free() one.

Nominal Animal · « **Reply #118 on:** June 28, 2021, 08:46:07 am »

Quote from: brucehoult on June 28, 2021, 01:18:57 am

When I say something like "you have to figure out the size to allocate for the stack" I'm of course talking about the maximum possible size the stack can grow to. The stack starts off empty.

When we have virtual memory, and this indeed requires that to be useful, that is not a useful simplification.

We do need to consider the stack a mapping to physical memory here. The current size of this mapping is the current size of the stack. Assuming the stack grows downwards, the stack pointer (roughly, *) points to the smallest address currently in use in the stack. Above that the stack contains data needed by code in the current call chain; below that is unused stack space.

(* ignoring red zone and such, and details like whether the stack pointer points to the first free element or the last used element, and so no.)

Resizing a stack only changes the amount of unused stack space. In virtualized environments, the total memory available to the virtualized system is often controlled via a similar control mechanism; memory ballooning.

Why would anyone have the stack mappings vary in size? For the exact same reason memory ballooning is used and useful in virtualized systems: that way we do not tie up resources in preallocations, and can instead use the resources where they are needed. We trade some run time cost –– small delays in initial accesses –– for this.

Right now, on AMD64 in Linux, the default stack size is 8 MiB, and typical virtual host environments start at 2 GiB of RAM.
If you use a kernel which initializes and/or fully backs each stack with RAM, you are limited to less than 256 processes and threads, because their stacks alone consume all available RAM.

Even when the stack is only mapped – virtual memory data structures set up to describe everything, but instead of reserving actual RAM for all of it, most of the pages are marked "unpopulated" –, we run hard into the limit called overcommit.

Overcommitting means having more mappings set up than you can ever back with available RAM or swap. Essentially, you are relying on never having to back all your promises.

Overcommit has to be limited, because even a single tiny moment when more RAM backing is needed than is available, leads to the kernel having to kill running processes. This is what the OOM killer in Linux does. It is pretty darn smart, but it isn't an AI, and configuring it optimally for a known workload is harder than database optimization. (You know it if you've done both. I hate databases, and I'd still prefer working on one over trying to get the OOM killer to behave in a deterministic, workload-optimal manner.)

So, the answer is not, cannot be to just leave those stacks unbacked by RAM, because we know that vm.overcommit_ratios above 100 or so tend to make systems unstable.
And that ratio only doubles the maximum number of threads and processes with those 8 MiB stacks on 2 GiB system to 512.
(The typical default on desktop systems is 50, for a 50% overcommit. Even that is known to cause troubles with dense workloads – those that do use all the memory they request from the kernel.)

What if most processes only need about 1 MiB of stack during their entire lifetime, with rare ones – but it being unpredictable exactly which ones! – needing more, perhaps up to that 8 MiB limit? Instead of preparing backings for 8 MiB stacks, we create smaller mappings, and when necessary, "waste" some CPU time at run time to decide whether a process gets it stack mapping transparently expanded, or whether that process gets killed then and there.

I'm sure you can do the math. The above numbers are not indicative of any specific workload, but even as hand-waving, they are in the right ballpark or scale.

Perhaps one is tempted to say that it would be better to not count stack pages against the overcommit limit. That way lies madness: it is exactly the precise nature of the overcommit accounting that makes it the least bit useful in the first place. Add fuzzyness around the accounting, and the first thing that happens is that overcommit gets disabled on every single virtualized instance. Besides, overcommitting those stack pages does contribute to the overall stability issues when the overcommit factor is set too high, so relaxing their accounting rules would be just a particularly nasty footgun.

To a first approximation, we can say dynamically resizing stack mappings does to threads and processes running in virtualized systems, what memory ballooning does to previously overprovisioned virtualized systems. Needs and reasons are the same.

Quote from: brucehoult on June 28, 2021, 01:18:57 am

Sure. You can call sbrk() with a negative increment (or call brk() with a smaller value than you previously called it with) and thus use it as a stack. But not if, as is usual, you used that space as a heap and there are objects allocated and not free'd at the end of the heap space.

Right, right. My point was that "heap" (in the sense of having a C function to ask the OS for more memory) really started as a separate data stack. It is nowhere near as odd a concept to C as one might think.

PlainName · « **Reply #119 on:** June 28, 2021, 11:09:07 am »

'Dynamic' is the killer here. If you statically assign everything there is no problem. If you dynamically assign/create/delete stuff you have implicitly overcommitted and there is the potential to go tits up. Thus with dynamic anything you have to check that asking for stuff has succeeded, and have a plan to manage the situation if it fails. With the likes of malloc() that's easily done (which isn't to say easily coped with), but the stack is at too low a level for that to be appropriate.

Automatically increasing stack space by nicking memory from somewhere else (if it were possible) isn't a fix. It merely puts off the moment of truth until more memory is needed to replace the stolen memory. Rather, it is a fix in the same way that malloc/free is and suffers the same issues.

golden_labels · « **Reply #120 on:** June 28, 2021, 11:43:29 am »

Quote from: brucehoult on June 27, 2021, 11:04:46 am

Sure. Because C "arrays" in general are not arrays, but only allocating space and pointing at it.

They are not. Arrays are proper C objects and they’re accessible as arrays, not pointers. Don’t confuse object type with implicit conversions it may undergo. A few examples:

Code: [Select]

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

struct Foo {
    int x;
    char y[100];
    int z;
};

struct Bar {
    int x;
    char* y;
    int z;
};

int main(void) {
    printf("Foo: x:%zu z:%zu\n",
          offsetof(struct Foo, x), offsetof(struct Foo, z));
    printf("Bar: x:%zu z:%zu\n",
          offsetof(struct Bar, x), offsetof(struct Bar, z));
    
    return EXIT_SUCCESS;
}

Code: [Select]

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    int array[10];
    
    printf("%s", _Generic((&array),
        int*: "int*", int(*)[10]: "int[10]"));
    
    return EXIT_SUCCESS;
}

Code: [Select]

void bar(int** ptr);

void foo(void) {
    int array[10];
    bar(&array); // Would be no error, if `array` was a pointer
}

In C++, in which arrays work in a similar manner, it is much easier not notice, as you can have C++ references to arrays as a type.

brucehoult · « **Reply #121 on:** June 28, 2021, 11:44:41 am »

Quote from: Nominal Animal on June 28, 2021, 08:46:07 am

Quote from: brucehoult on June 28, 2021, 01:18:57 am
When I say something like "you have to figure out the size to allocate for the stack" I'm of course talking about the maximum possible size the stack can grow to. The stack starts off empty.
When we have virtual memory, and this indeed requires that to be useful, that is not a useful simplification.

We do need to consider the stack a mapping to physical memory here. The current size of this mapping is the current size of the stack. Assuming the stack grows downwards, the stack pointer (roughly, *) points to the smallest address currently in use in the stack. Above that the stack contains data needed by code in the current call chain; below that is unused stack space.

Y'know, I do actually know how virtual memory works. I've been using it for literally 40 years, and was using computers in the transition from a PDP11 program having 8 segments of 8 KB each, which had base and limit registers and could be transparently copied around and relocated in physical memory by the OS and swapped to disk and back (as a unit) on a heavily loaded system, to the VAX with paged virtual memory, a tree-structured page table, and a TLB.

I've *implemented* virtual memory. I've hacked Android to have compressed virtual memory (compressing and uncompressing pages in RAM instead of swapping them to/from flash).

Quote

Right now, on AMD64 in Linux, the default stack size is 8 MiB, and typical virtual host environments start at 2 GiB of RAM.
If you use a kernel which initializes and/or fully backs each stack with RAM, you are limited to less than 256 processes and threads, because their stacks alone consume all available RAM.

And that's fine on 64 bit, as long as no one wants more than 8 MB of stack, which can very easily happen if a program decides to load even a simple a JPG into a buffer on the stack instead of on the heap.

It would be much more sensible for a 64 bit system to put thread stacks at least a couple of GB apart. It costs almost nothing to do so, and you'd still be able to have a billion threads in one process. (Not as Linux threads -- you'd have to do your own thread manager)

If you try that on a 32 bit system then you'll only be able to create -- as you say -- 256 threads in one process.

The fact that most threads are probably using only a few kb and that is all that is mapped and allocated in physical memory is irrelevant to my point. My point is that on a 16 or 32 bit machine you can very quickly run out of VIRTUAL address space. You have to manage it.

I've worked on systems that have tens of thousands of threads in a single process, on Linux. For example an "Intelligent Network" system attached to a telephone exchange. Every active phone call that someone is charged for has a thread in the IN machine. The thread decides which physical number to connect the call to (can be different for e.g. 0800 numbers, possibly depending on the caller's location and other things), decides who should be charged for the call (caller, callee, someone else), checks to see if they have enough credit for the next 3 minutes / 1 minute / 1 second, and tells the exchange to connect them, and to ask again in 1 minute (or whatever).

Here in NZ, in ~2002, such systems were spec'd to handle 500 call initiations per second, with average call times of 3 minutes. That's up to 90000 calls being tracked, with a thread for each. Other clients we supplied the same software to, such as networks in Poland and Indonesia and India, had much higher call initiation rates.

You can program such things as state machines (and some systems do) but that's a far more complex thing to manage and very error prone. A thread abstraction with local variables and function calls that you don't have to return from before switching to the next thread is far more convenient. Especially when you business is coming up with and implementing and selling new IN features on a short time-line. For example, I helped implement a feature where people in shared accomodation could each have their own section of the phone bill on the shared phone. Whenever a chargeable call was initiated the system would go to an IVR menu and prompt for which resident was making the call and have them enter a PIN.

One of my contributions at that company in 2002/3 was to enable call processing code to be written as a thread per call, with local state and function calls etc, and transparently convert that to continuations.

The same principles led six or seven years later to a revolution in the ease and reliability of making complex web sites with complex interactions when systems such as node.js made it possible to create them using an abstraction of a thread per session instead of a state machine.

SiliconWizard · « **Reply #122 on:** June 28, 2021, 05:36:33 pm »

Quote from: brucehoult on June 27, 2021, 10:21:59 pm

Quote from: SiliconWizard on June 27, 2021, 05:21:17 pm
With that said, there is a common conception that heap allocation would be more "reliable" than allocating stuff on the stack. This just isn't true. It mostly comes from two facts: one, that on most systems these days, return addresses are stored in the same stack as local objects, which is pretty bad (we already discussed this in other threads), and second, that again on most common systems and programming languages, the allocated stack is usually much smaller than the space reserved for the heap, because heap allocation is favored. We can think of almost-fully stack-based languages that would make this irrelevant.

There is a large and important class of programs which absolutely can not be fully stack based.

This is programs based on an event loop, with an amount of state retained between iterations that is not known in advance.

This includes any interactive program (whether character based or GUI) where the user is creating or maintaining a document or in-memory database of some kind.

I don't agree with this in the least. The use of stack and heap is, IMHO, entirely dependent on the programming model, not on the applications.
We can absolutely write GUI programs with a stack only. With appropriate languages and implementations, of course.

No whether this would be a good idea/be worse/be better than the usual paradigms, that's a question that could take a whole thesis to answer. Or several.

SiliconWizard · « **Reply #123 on:** June 28, 2021, 05:50:04 pm »

Quote from: golden_labels on June 28, 2021, 11:43:29 am

Quote from: brucehoult on June 27, 2021, 11:04:46 am
Sure. Because C "arrays" in general are not arrays, but only allocating space and pointing at it.
They are not. Arrays are proper C objects and they’re accessible as arrays, not pointers. Don’t confuse object type with implicit conversions it may undergo.

I more or less agree with you there, but it is nitpicking depending on what exactly you mean by "accessible as arrays".

Bruce is right: in C, accessing arrays is implemented in totally the same way as any pointer. The only extra things an array gives you:
- Static allocation (that's not linked to how they are accessed);
- The sizeof operator gives you the size of the allocated memory;
- Some possibility of static analysis obviously not possible with mere pointers;
- Some compiler extensions/options may allow to automatically generate run-time out-of-bounds checks when accessing an array, but that's non-standard, not necessarily available and must be explicitely enabled. This of course affects performance.

If you exclude the last point, which is clearly non-standard, there is no difference how access is done between an array and a pointer.
Your example with _Generic is not per se about "access". It's just about the fact in C, arrays are indeed types of their own. Allowing what I mentioned above, plus the use of _Generic (beginning with C11) as you mentioned.

In the end, this just means arrays are not pointers in terms of data type. But "item" access is strictly the same.

In C++, things are a bit more elaborate, of course.

PlainName · « **Reply #124 on:** June 28, 2021, 05:56:47 pm »

Quote

- The sizeof operator gives you the size of the allocated memory;

Tells you the size of the element too.

SiliconWizard · « **Reply #125 on:** June 28, 2021, 06:12:09 pm »

Quote from: dunkemhigh on June 28, 2021, 05:56:47 pm

Quote
- The sizeof operator gives you the size of the allocated memory;

Tells you the size of the element too.

Like 'sizeof(array[0])'.

But, you also get that with a pointer: 'sizeof(*pointer)'. Note that 'sizeof(pointer[0]') would also work.

So, nothing specific to arrays with this. Of course the only pointers you can't get the size of the pointed-to type from are pointers to void. And of course, you can't declare an array of void.

Nominal Animal · « **Reply #126 on:** June 28, 2021, 07:28:08 pm »

Quote from: brucehoult on June 28, 2021, 11:44:41 am

Y'know, I do actually know how virtual memory works.

I know, and you know I know; I've said so in other threads.

I am trying to make the argument understandable to anyone encountering this issue, and not "elevate" it into technical jargon only those with a minimum of two decades of practical experience, plus at least one decade of research and development, have.

If you think of a way how I could do the above and avoid be taken as implicitly denigrating your or anyone elses skills or experiences, let me know. I'll pay cash for that advice.

I already know that if I use self-deprecating language, that just causes everybody to ignore what I say, because they take humility for uncertainty or weakness.
At least when they get annoyed, there is still at least a possibility of two-sided communication. So, better to err on the side of friction, but keep reminding others it is only to try and keep learners in tow (or at least theoretically possible to follow the discussion even if details are hazy), I think; I loved to do that when I was a learner, and it was often invaluable.

I have already mentioned elsewhere on this forum that I never, ever "talk down"; that is a nasty social trick I absolutely detest, and I don't do those. I'd rather chew off my own foot and eat it raw. If I sound like "talking down", it is because I don't have the kind of command of English that I sometimes may appear: I do spend a lot of effort (and post-submit edits after re-reading my own posts to see if I still believe I managed to convey the idea or not). Not backed by skill, only by sheer hard effort.

I do avoid, hard, jargon language, and repeat known facts like the definitions of concepts, because I don't want to be one of those snobs who think newbies are not worth listening to or talking to just because they don't know the jargon and exact terms to use to sound "professional".

If you feel slighted, I do apologise. No slight or aspersions of any kind was ever intended on my part.

Same applies to Ataradov and dunkemhigh. I am genuinely puzzled as to their opinions, because I do value their skills, highly, but I seem to be observing contradictions between their suggestions and their stated design goals. I feel I do need to be pushy rather than placating here, because being placating will be misunderstood as subservience or uncertainty or tacit admission of some kind of personal or social error; but what I want, is to drag out the reasoning and experiences behind the opinions. Those are always informative, interesting, and useful.

Quote from: brucehoult on June 28, 2021, 11:44:41 am

My point is that on a 16 or 32 bit machine you can very quickly run out of VIRTUAL address space. You have to manage it.

Yes, for sure. 100% in agreement here.

When I read your "Appropriately sizing one stack per thread can be a challenge. Figuring out the right size for *two* stacks per thread, and the balance between them, just seems harder." response, it seemed to me you did not consider the resource savings achievable in reality.

After reading your above post, it appears to me that your underlying argument is something along the lines of "address space policy is complicated, difficult, and fragile; and in my experience, doubling the number of regions involved in such policy can make things untenable".

I obviously agree with that. But running out address space is not the problem at hand on 48-64 bit address space architectures, where the provisioning problems this could help with are common.

I outlined the scenario where a typical virtualized hosting environment (AMD64, 2GB RAM) is limited to an artificially low number of processes and threads, because having their entire stacks mapped (instead of just "reserved" somehow without actual mappings) means we need to either risk high overcommit ratios (that tend to be unstable because of how they behave in sudden increases in connection attempts et cetera), or pay real money to overprovision the server.

Anyway, I do understand it is also difficult to track my stack-related suggestions and descriptions, when they range from embedded (stack pointer relative effective address checking) to 64-bit only (dealing with correctly provisioning services and virtualized systems, when the prevailing programming paradigm is "just throw more memory and/or more CPU cores at it and it'll work fine") in the very same thread. Sorry about that.

TheCalligrapher · « **Reply #127 on:** June 29, 2021, 12:49:21 am »

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm

Bruce is right: in C, accessing arrays is implemented in totally the same way as any pointer.

That's is absolutely incorrect. The "pointer" in array access semantics is purely an imaginary concept. It does not exist in reality, meaning that it does not exist as an lvalue. This alone already overturns the claim that array is accessed "the same way as any pointer".

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm

The only extra things an array gives you:
- Static allocation (that's not linked to how they are accessed);

That's false. Array is not a pointer that points to statically allocated memory, if that's what you mean.

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm

- The sizeof operator gives you the size of the allocated memory;

That's false. `sizeof` never works with any kind of "memory" and has no relation to "memory" at all. `sizeof` works with types, types and only types. Both forms of `sizeof` are, by definition, funneled into working with the type supplied as an argument (in one way or another).

When `sizeof` is applied to an array, it returns the size of the array type. Array types are regular types in C. No "pointers" of any kind is involved here. No "allocated memory" is involved either. Note that this is also true when `sizeof` is applied to an VLA.

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm

If you exclude the last point, which is clearly non-standard, there is no difference how access is done between an array and a pointer.

That's is false. One can clam that array access is equivalent to rvalue pointer access, rvalue being the critical detail here. But without it, the above statement is patently false.

TheCalligrapher · « **Reply #128 on:** June 29, 2021, 01:02:37 am »

Quote from: brucehoult on June 27, 2021, 04:31:59 am

But C doesn't support arrays at all, whether fixed or variable length.

Quote from: brucehoult on June 27, 2021, 11:04:46 am

Sure. Because C "arrays" in general are not arrays, but only allocating space and pointing at it.

Here we go. This popular confusion is the root of the above incorrect claims.

I suggest you read Dennis Ritchie's "The Development of the C Language" article (https://www.bell-labs.com/usr/dmr/www/chist.html), where he describes the decisions he made when developing the idea of arrays in C. Specifically, in that article he explains why he decided to abandon the BCPL's "only allocating space and pointing at it" approach as completely unacceptable for C.

He also explains why arrays in C tend to "mimic" BCPL's behavior, pretending to act as pointers in some contexts. Yet, this simple and rather easily understood "mimicking" behavior managed to confuse quite a few people down the road, who for some reason managed to trap themselves in a silly belief that arrays in C are "only allocating space and pointing at it".

hamster_nz · « **Reply #129 on:** June 29, 2021, 01:54:47 am »

Quote from: TheCalligrapher on June 29, 2021, 01:02:37 am

Quote from: brucehoult on June 27, 2021, 04:31:59 am
But C doesn't support arrays at all, whether fixed or variable length.
Quote from: brucehoult on June 27, 2021, 11:04:46 am
Sure. Because C "arrays" in general are not arrays, but only allocating space and pointing at it.

Here we go. This popular confusion is the root of the above incorrect claims.

I suggest you read Dennis Ritchie's "The Development of the C Language" article (https://www.bell-labs.com/usr/dmr/www/chist.html), where he describes the decisions he made when developing the idea of arrays in C. Specifically, in that article he explains why he decided to abandon the BCPL's "only allocating space and pointing at it" approach as completely unacceptable for C.

He also explains why arrays in C tend to "mimic" BCPL's behavior, pretending to act as pointers in some contexts. Yet, this simple and rather easily understood "mimicking" behavior managed to confuse quite a few people down the road, who for some reason managed to trap themselves in a silly belief that arrays in C are "only allocating space and pointing at it".

I think partially Bruce was baiting me in a friendly way, and partially what he says is true.

C doesn't have "high functioning" arrays, the type that allow you to append, insert or delete items.

And because of C arrays so easily decays into a pointer it is hardly like they are there at all as the information about the size of the array is so quickly lost. All you have to do is call a function or return and it is gone, and that array is now a pointer that points to an unknown number of items.

gf · « **Reply #130 on:** June 29, 2021, 07:42:10 am »

Quote

C doesn't have "high functioning" arrays, the type that allow you to append, insert or delete items.

The term "varialbe-length array" is indeed a bit misleading. Once a VLA has been allocated (in the stack frame, at runtime), its size still remains fixed. The size just isn't a copile-time constant.

DiTBho · « **Reply #131 on:** June 29, 2021, 09:35:27 am »

Arrays are not "objects".
In C there are no "objects".
To be an "object", an array should have
- methods
- identifier
- auto pointer (this)

In C a lists are a "construct" of the language, with some properties like
- data alignment
- data stuffing
- data size

But for example there is no "+=" method to append an item to the tail of the list, neither there is a method to sort or reverse the order of the list.

Yesterday I wrote an A* algorithm in C for a micro-robotic application, the A* is a sub-optimal path finder, you can use it to help your robot to better manage its motion planning. My robot is happy to have a new LiDAR, a RAMSAC algorithm, and this new A* algorithm, but personally for all of these things, especially with the A*, I suffered the lack of some good "lists object" with proper methods to manage items, and I had to implement everything by myself.

Not to mention that my algorithm is polymorphic, I have to fit "lists" of different types, those of LiDAR are different from those of A *, and those of the RAMSAC algorithm are also different, but all these lists must remain in a super binder called "motion-list", and once again I had to implement a wild polymorphism by hand since this is another thing that doesn't exist in C.

If the C language supported objects, lists would be objects, and if lists were objects I would also have the polymorphism ready.

You may wonder, so why didn't you use C++? Well ... good question, long story on the reasons

DiTBho · « **Reply #132 on:** June 29, 2021, 09:48:27 am »

Quote from: TheCalligrapher on June 29, 2021, 12:49:21 am

the type supplied as an argument (in one way or another).

It was decomissioned years ago since more than obsolete, but with my old Avoget compiler, it was the linker who knew and used this information, and the type size of everything was stored into specific fields within the ".o" file and used only in linking stage.

I mean, when I deassembled ".o" files, I always saw an header with the sizeof(everything), but in the .section code area, the sizeof(something ) was always left "to be completed in linking stage".

Never tried with Gcc, anyway

DiTBho · « **Reply #133 on:** June 29, 2021, 10:02:15 am »

Quote from: TheCalligrapher on June 29, 2021, 01:02:37 am

He also explains why arrays in C tend to "mimic" BCPL's behavior, pretending to act as pointers in some contexts. Yet, this simple and rather easily understood "mimicking" behavior managed to confuse quite a few people down the road, who for some reason managed to trap themselves in a silly belief that arrays in C are "only allocating space and pointing at it".

You didn't explain why it should be "silly", while my Avoget compiler does exactly that

When I declare an array in C and look at the produced assembly, I see
- declare the array as "uninitialized" or "initialized"
- reserve space in the proper section (here, "initialized" could mean "constant pool")
- report the array name, its size, and it's address into the object file

Exactly: space allocation, possibly filled with pre-initialized data, and pointing

PlainName · « **Reply #134 on:** June 29, 2021, 10:19:37 am »

Quote

hen I declare an array in C and look at the produced assembly, I see

In this context, surely what is produced as assembler is irrelevant. C is not assembler, just as an engine is not a lump of aluminium, and EEVBlog is not a database.

gf · « **Reply #135 on:** June 29, 2021, 10:24:17 am »

Quote from: DiTBho

In C there are no "objects".

In C, an "object" is defined as "region of data storage in the execution environment, the contents of which can represent values"

Quote from: DiTBho on June 29, 2021, 09:48:27 am

Quote from: TheCalligrapher on June 29, 2021, 12:49:21 am
the type supplied as an argument (in one way or another).

It was decomissioned years ago since more than obsolete, but with my old Avoget compiler, it was the linker who knew and used this information, and the type size of everything was stored into specific fields within the ".o" file and used only in linking stage.

I mean, when I deassembled ".o" files, I always saw an header with the sizeof(everything), but in the .section code area, the sizeof(something ) was always left "to be completed in linking stage".

Never tried with Gcc, anyway

The sizeof operator is an operator of the C language. It has nothing to do with the linker. Except for variable-length arrays, sizeof(type) or sizeof(expression) is generally a constant C expression, which is known at compile-time. And for VLAs, the size is not known at link time either, but only at runtime.

golden_labels · « **Reply #136 on:** June 29, 2021, 10:31:35 am »

Quote from: DiTBho on June 29, 2021, 09:35:27 am

Arrays are not "objects".
In C there are no "objects".

Really? Damnit, someone should tell everyone who was involved in creation of the C language — starting from Kerninghan and Ritchie themselves — that they don’t know the language they have defined. Explain to them, how there are no objects, because the language lacks some syntactic sugar! </sarcasm>

Or — that will probably be a better idea — learn the actual abstract machine the C language models.

DiTBho · « **Reply #137 on:** June 29, 2021, 11:39:00 am »

Quote from: dunkemhigh on June 29, 2021, 10:19:37 am

In this context, surely what is produced as assembler is irrelevant.

For me it's the *ONLY* thing that really matters.

DiTBho · « **Reply #138 on:** June 29, 2021, 11:57:40 am »

Yup, Gcc does it differently than Avoget does

Code: [Select]

#define my_array_t_size 32
typedef uint8_t my_array_t[my_array_t_size];

...

   my_array_t my_array;
   console_out_nl("sizeof(my_array)=%ld",sizeof(my_array));

Code: [Select]

...
        stmfd   sp!, {fp, lr}
        add     fp, sp, #4
        sub     sp, sp, #32
        ldr     r0, .L3
        mov     r1, #32 <---------------- sizeof(my_array) = 32, it's known at compile time!!! WOW!!!
        bl      console_out_nl
        sub     sp, fp, #4
        ldmfd   sp!, {fp, pc}
...

.l3:
        .ascii  "sizeof(my_array)=%ld\012\000"

(gcc-armv5tel)

PlainName · « **Reply #139 on:** June 29, 2021, 12:23:52 pm »

Quote from: DiTBho on June 29, 2021, 11:39:00 am

Quote from: dunkemhigh on June 29, 2021, 10:19:37 am
In this context, surely what is produced as assembler is irrelevant.

For me it's the *ONLY* thing that really matters.

Sure, but only in the assembler context. For all the language cares, an array could resolve to a box of jellybeans so long as it acts like an array should. Thus a pointer in assembler is as meaningless to the C language as would be a homing pigeon carrying a bit of paper.

Nominal Animal · « **Reply #140 on:** June 29, 2021, 12:32:13 pm »

Quote from: DiTBho on June 29, 2021, 09:35:27 am

In C there are no "objects".

object ≢ object.

See e.g. n1570 (final draft of the C11 standard). In that context, "object" is not the same "object" that say C11++ uses.

I've been very amazed to see exactly the kind of approaches to pointers in this thread that I keep seeing leading to horribly hard to fix bugs in the real life. I guess I shouldn't be, because the root cause is NOT a technical problem or definition, but a simple human one. The compilers are not the cause of most buffer/object overflow/underflow bugs –– I don't actually recall any ––, we humans are. To change that, we need to change how we think and use pointers to express our ideas. "Just don't do that then" won't work; we're already at the proverbial doctor's office, and simply "avoid writing buggy code" is unrealistic. Thus, the "mind over" hyperbole.

It is not just programming, though. In physics, it is at the core of whether one can apply their "knowledge" to solve new types of problems or not; whether their "knowledge" integrates with their other "knowledge" or just stays as separate never-interacting blobbets. Just read any of my posts in the physics question thread in the chat section; they deal with exactly this, just in physics. (And "knowledge" itself is a damnably hard concept, because you can "know" something because you have repeated it so many times by rote it is now handled by your autonomous nervous system –– like bike riding; I definitely call it "knowing how to ride a bike" –– or you can "know" how something happens and why, but have difficult expressing it to others because you don't know any/all the words to express those concepts properly. I was 14 or so when I described to a physicist my idea of using at least two lasers in a transparent box to excite atoms of the enclosed gas in multiple stages, so that only the atoms at the intersection of those lasers would be excited enough to emit visible light. (I had just learned the very, very basics of how lasers work, and was looking at reference data in an effort to see if a suitable gas was already known (with metastable excitation states in infrared, and one or more within visible light, that could be cascaded with those lasers), especially hoping for nitrogen to be viable here. I was damned lucky, and instead of laughing at me, the physicist talked to me for a good couple of hours about the practical issues, told me the keywords I'd need to look up and understand, and even gave me a couple of trade magazines about lasers he had lying around. I want to be like him myself. A few years later, a sharp young lady at MIT did exactly that as their graduate thesis – except of course where I just had a kids' idea without any fixed form or understanding of the whole, just a vague idea, she did both the theory and a practical implementation. I just don't remember whether she managed to do it in nitrogen (air), or whether she had to use some other gas.)

No, this isn't holography; it is just laser-induced light-emitting true 3D display tech. While it was demonstrated in practice over 20 years ago, I do not believe even militaries use it today, although it does work, and looks better than those "flickering holograms" with occasional shear plane errors you see in movies. Works as a high-power space heater, too.

gf · « **Reply #141 on:** June 29, 2021, 01:07:05 pm »

Quote from: DiTBho on June 29, 2021, 11:57:40 am

Yup, Gcc does it differently than Avoget does

The assembly/object code generated by the compiler certainly matters when platform-specific issues like interfacing, ABI, or performance issues, or similar stuff is considered.
From the abstract point of view, you rather treat the compiled code as black box which is supposed to do (according to the semantics of the C language) what you have written in your C program -- regardless how the compiler achieves this goal.

SiliconWizard · « **Reply #142 on:** June 29, 2021, 05:44:47 pm »

Quote from: TheCalligrapher on June 29, 2021, 12:49:21 am

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm
Bruce is right: in C, accessing arrays is implemented in totally the same way as any pointer.

That's is absolutely incorrect. The "pointer" in array access semantics is purely an imaginary concept. It does not exist in reality, meaning that it does not exist as an lvalue. This alone already overturns the claim that array is accessed "the same way as any pointer".

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm
The only extra things an array gives you:
- Static allocation (that's not linked to how they are accessed);

That's false. Array is not a pointer that points to statically allocated memory, if that's what you mean.

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm
- The sizeof operator gives you the size of the allocated memory;

That's false. `sizeof` never works with any kind of "memory" and has no relation to "memory" at all. `sizeof` works with types, types and only types. Both forms of `sizeof` are, by definition, funneled into working with the type supplied as an argument (in one way or another).

When `sizeof` is applied to an array, it returns the size of the array type. Array types are regular types in C. No "pointers" of any kind is involved here. No "allocated memory" is involved either. Note that this is also true when `sizeof` is applied to an VLA.

Quote from: SiliconWizard on June 28, 2021, 05:50:04 pm
If you exclude the last point, which is clearly non-standard, there is no difference how access is done between an array and a pointer.

That's is false. One can clam that array access is equivalent to rvalue pointer access, rvalue being the critical detail here. But without it, the above statement is patently false.

Not sure what all this pile of crap means. A lot of words, but I haven't seen any concrete argument that would actually make my few above points "false".
Like, a pointer for array access 'an imaginary concept"? Do you actually know what a compiler emits for array access? How is that different from pointers (except for the few points I mentioned)? And as I said, except for exotic compiler extensions, the compiled code for accessing an array is no different from doing this directly with a pointer. You can of course assign arrays to pointers for accessing them, and it makes no difference either.

Or possibly you're being overly pedantic to the point of expressing worthless points that can only confuse people instead or helping them.

To make things very clear, I'm strictly talking about C here. C++ is completely different. In C++ you can even overload the [] operator, so that's a completely different territory.

Arrays in C are just effectively allocated memory buffers, whether statically or on the stack.
And then, as obviously a different type that pointers themselves, you get some compiler checks and extensions if any. Nothing more. As I clearly said, of course an array identifier is NOT a pointer. But for accessing its content, it makes no difference. If you think it does, will you please show us how.

Accessing an array in my world (maybe yours is in a different elevated sphere) means: accessing data within it. Which works exactly the same as with any other memory buffer. (Again as I said, you may get some static analysis checks if your compilers have them that can't be done with pointers. But static checks are just static checks.)

Of course an array identifier can't be an l-value, just like a constant pointer. Just like functions.

SiliconWizard · « **Reply #143 on:** June 29, 2021, 05:54:42 pm »

Quote from: gf on June 29, 2021, 10:24:17 am

Quote from: DiTBho
In C there are no "objects".

In C, an "object" is defined as "region of data storage in the execution environment, the contents of which can represent values"

Yes absolutely. It's a common "trap" to think of "objects" in a purely object-oriented manner (so with all the baggage, methods, etc.)
Actually, it's the "oriented" in the "object-oriented" term that further defines a language in which objects can define their own behavior, as opposed to languages in which functions/procedures operate on objects but are not "bound" to them.

In non-object-oriented languages, it's relatively common to use the term "object" to refer to what you quoted above. What other generic term to describe this would you use anyway?

gf · « **Reply #144 on:** June 29, 2021, 06:29:16 pm »

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Of course an array identifier can't be an l-value, just like a constant pointer.

It is still an lvalue, but in the context of most (but not all) expressions it is implicitly converted to a (non-lvalue) pointer to the first element.
https://en.cppreference.com/w/c/language/array, section "Array to pointer conversion"

PlainName · « **Reply #145 on:** June 29, 2021, 06:35:39 pm »

Quote

Do you actually know what a compiler emits for array access? How is that different from pointers (except for the few points I mentioned)?

How is that relevant to C?

Or C++? Or, indeed, anything at all.

TheCalligrapher · « **Reply #146 on:** June 30, 2021, 05:48:13 pm »

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Not sure what all this pile of crap means.

This "pile of crap" is known as a C programming language. It often triggers an emotional reaction from people who fail to understand its concepts, and principles it is built upon. The latter are quite simple, so I attribute that lack of understanding to the lack of effort, aka laziness.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

A lot of words, but I haven't seen any concrete argument that would actually make my few above points "false".

The "words" are actually contained in the language specification, known as C standard. Moreover, the amount of effort spend by language enthusiasts on explaining that peculiar matter of arrays and pointers in enormous. I don't see any reason to copy-paste all that here for umpteenth time.

My intent is to point out your errors. And I did that with my signature surgical precision. The subsequent research is left entirely to you. I can only give you direction.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Like, a pointer for array access 'an imaginary concept"?

It is something that exists only in the mind of "C abstract machine". It only exist in paper documents, in formal definitions of various language features that rely on implicit array-to-pointer conversion (e.g. `[]` operator).

It has no reason to exist in the semantics of an actual real-life C implementation. And it doesn't.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Do you actually know what a compiler emits for array access?

Oh, yes, very well.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

How is that different from pointers (except for the few points I mentioned)?

Um... You are diverging into some unrelated area here. At the machine language level all accesses to memory are performed through "pointers". From that point of view every lvalue in C can be though of as "pointer". Every time you access a variable in memory, it will go through some sort of "pointer". So, from that point of view you can ask the same question ("How is that different from pointers") about the entire object model of C language.

Sorry, dear. The machine-level concepts and your "how is that different from pointers" have no bearing on the C language concepts. You are trying to mix two different C language concepts: a concept of lvalue and a concept of pointer. In C these are not the same.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

And as I said, except for exotic compiler extensions, the compiled code for accessing an array is no different from doing this directly with a pointer.

That is false. The compiled code for accessing an immediate array will be very different from the compiled code for accessing a pointed-to array through a pointer.

Moreover, compiled code for accessing an immediate struct object is the same as compiled code for accessing an immediate array. And compiled code for accessing a struct object through a pointer is the same as compiled code for accessing an array through a pointer.

Array objects are no different from struct objects. (Referring to commonly applicable operations, of course. You can't index-access a struct in C). Yet, you don't claim that struct are also "pointers", do you?

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

You can of course assign arrays to pointers for accessing them, and it makes no difference either.

It makes no difference cosmetically: the code looks the same. But it makes huge difference semantically and at the level of generated code. As I said above, accessing an immediate array and accessing an array through a pointer will generate completely different code.

The difference will become much more obvious when we start considering multi-dimensinal arrays. In fact, many people trapped in that misguided belief that "arrays are no different from pointers", get their first rude-awakening-style experience when they start working with multi-dimensinal arrays.

For example

Code: [Select]

int a[2][2] = { 0 };
int *const b[2] = { (int [2]) { 0 }, (int [2]) { 0 } };

a[0][0] = 42;
b[0][0] = 42;

In this little example above we have two different implementations of two-dimensional arrays `a` and `b`. One is a "classic" built-in 2D array. Another is a hand-made pointer-based "jagged" array. In other to build the latter we "assign arrays to pointers", as you said above. Yet, even though array access looks the same for both arrays (i.e. we just use `[k][m]` with both) this is purely a superficial similarity. The semantics of these two data structures are completely different and completely incompatible.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Or possibly you're being overly pedantic to the point of expressing worthless points that can only confuse people instead or helping them.

I'm a better judge of what helps people better in this matter.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

To make things very clear, I'm strictly talking about C here. C++ is completely different.

When it comes to the matter of raw built-in arrays? No, C++ is not different at all, as long as we are talking about meaningfully comparable semantics.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

In C++ you can even overload the [] operator, so that's a completely different territory.

No, you can't overload `[]` operator for a built-in type. So, I don't see why you are even mentioning it here. No need to bring C++ into the picture.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Arrays in C are just effectively allocated memory buffers, whether statically or on the stack.

All objects in C are just memory buffers (see my reference to structs above, for a specific example), with the exception of bit-fields. So, I don't see what point you are trying to make by this.

Yes, array `T [N]` in C is just a contiguous block of memory of size `N * sizeof(T)`. That's all. No pointers here.

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Accessing an array in my world (maybe yours is in a different elevated sphere) means: accessing data within it. Which works exactly the same as with any other memory buffer.

I agree. And as I said above, accessing an immediate array is no different from accessing an immediate struct object. Yet, this is no reason to claim that struct objects are "just pointers".

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm

Of course an array identifier can't be an l-value, just like a constant pointer. Just like functions.

Um... What??? This is patently false.

Arrays in C are lvalues. Just like constant pointers are lvalues

Code: [Select]

const int *a = 0; /* `a` is an lvalue */
int b[100]; /* `b` is an lvalue */

It is possible to create a non-lvalue array in C, but that would require some deliberate jumping through some hoops.

So, what are you talking about?

(Functions in C are not lvalues though - you got that right)

SiliconWizard · « **Reply #147 on:** June 30, 2021, 06:43:22 pm »

Quote from: gf on June 29, 2021, 06:29:16 pm

Quote from: SiliconWizard on June 29, 2021, 05:44:47 pm
Of course an array identifier can't be an l-value, just like a constant pointer.

It is still an lvalue, but in the context of most (but not all) expressions it is implicitly converted to a (non-lvalue) pointer to the first element.
https://en.cppreference.com/w/c/language/array, section "Array to pointer conversion"

You can't assign anything to an array identifier, which is exactly what I meant.
I'm not sure if we are talking about the same thing or whatever else.

If you convert an array identifier to some other type, obviously you can do weird stuff with it. But you still can't modify the *value* of the idenfitier itself. It will still refer to the same area in memory.
Again, just like a declared function.

like if you declare:
int array[10];
there is no way you can do: array = xxx; at any point. Which is the same as with const pointers, which I mentioned. And, do not confuse const pointers with pointers to const... (which can be confusing syntax-wise in C, btw.)

Just like if you declare a function:
int foo() { ... }
there is no away you can do: foo = xxx; at any point.

If you assign an array identifier to a pointer, then obviously the pointer can be anything as long as the conversion is legit and/or you use a proper cast.

I don't get why you'd mention array to pointer conversion to reply to my statement. An array identifier itself can't be used as an l-value. If you convert it to anything else, then it's not an array identifier anymore. That's all I meant.

Like what I've read above, it looks like a couple of you confuse "arrays" with accessing their content. I can't see how a C "array" itself could ever be an l-value, if by "array" you consider the array identifier itself, and not expressions accessing its contents.

TheCalligrapher · « **Reply #148 on:** June 30, 2021, 07:04:51 pm »

Quote from: SiliconWizard on June 30, 2021, 06:43:22 pm

You can't assign anything to an array identifier, which is exactly what I meant.

If that's what you meant, then you certainly expressed yourself rather... ambiguously. The property of being (or not being) and lvalue has nothing to do with the ability (or inability) to serve as the recipient side of an assignment.

Quote from: SiliconWizard on June 30, 2021, 06:43:22 pm

If you convert an array identifier to some other type, obviously you can do weird stuff with it. But you still can't modify the *value* of the idenfitier itself. It will still refer to the same area in memory.

Of course. Every (!) object, every variable in C is permanently attached to a specific location in memory for its entire lifetime. You can't "move" to another location in memory. It is just how language-wide object model in C is defined. No reason to bring it up in the context of arrays. Arrays are in no way special in that regard.

Quote from: SiliconWizard on June 30, 2021, 06:43:22 pm

I don't get why you'd mention array to pointer conversion to reply to my statement. An array identifier itself can't be used as an l-value.

You are abusing the terminology again. In C language the property of "being an lvalue" is approximately equivalent to "data that can serve as a valid argument for unary &". (This is not a strict equivalence, but a still a fairly precise rule of thumb). If a piece of data has a location in memory, then it is an lvalue

Code: [Select]

int a[10];
&a; /* a valid expression, which immediately proves that an array is an lvalue */

This is why the term "lvalue" is often expanded as "location value". Yes, we all know how this term was originally coined, but the truth is that its anecdotal origin never had any relation to its actual meaning. So, why are you trying to use that term in a meaning it never had is not clear to me. And you are the one who accused me of confusing people...

Quote from: SiliconWizard on June 30, 2021, 06:43:22 pm

Just like if you declare a function:
int foo() { ... }
there is no away you can do: foo = xxx; at any point.

As a side note: the decision to consider (or not) functions as lvalues is a purely administrative one, a matter of decree. You can decree it either way. E.g. in C++ functions are considered lvalues, even though in C++ you can't assign anything to a function either.

Nominal Animal · « **Reply #149 on:** June 30, 2021, 07:49:40 pm »

Not directed at anyone in particular, intended to be read as a help guide or study point:

Consider the value of the C expression
(buffer == (char *)(&buffer))
given
char buffer[1];

If you do not believe the expression is a compile-time constant with value 1 –– remember, this is C and not C++ ––, your belief is incorrect. I suggest revision and practical experimentation to fix.

If you have problems seeing how the expression is a constant with value 1, your understanding of C is incomplete. I suggest further study.

If you don't have the terminology to put in a single paragraph with two or three sentences exactly why the expression is a constant with value 1, your terminology is incorrect or incomplete. I suggest redefining the terms and idioms you use internally to describe C structures and rules, until you can.

If you think you need to consult the C standard to prove why the above is not a constant with value 1, I recommend switching to a different programming language, or perhaps another career. The purpose of the C standard is to describe existing behaviour developers should be able to rely on, according to the very cover sheet of the standard. Proving why existing, intuitive, correctly working as expected code should not work because of an esoteric strict reading of the standard, has nothing to do with programming, and everything to do with linguistics (of human language) and language-lawyerism.

TheCalligrapher · « **Reply #150 on:** June 30, 2021, 08:42:56 pm »

Quote from: Nominal Animal on June 30, 2021, 07:49:40 pm

If you think you need to consult the C standard to prove why the above is not a constant with value 1, I recommend switching to a different programming language, or perhaps another career. The purpose of the C standard is to describe existing behaviour developers should be able to rely on, according to the very cover sheet of the standard. Proving why existing, intuitive, correctly working as expected code should not work because of an esoteric strict reading of the standard, has nothing to do with programming, and everything to do with linguistics (of human language) and language-lawyerism.

While I understand the point you are trying to make, I don't see why you decided to abuse the language terminology while doing so.

In general case the above expression is not a constant at all (regardless of its value). And if you don't want to dive into the peculiarities of C standard, I can instead demonstrate it with a simple practical example

Code: [Select]

char static_buffer[1];
int a[static_buffer == (char *)(&static_buffer)];

int main()
{
  char local_buffer[1];
  int b[local_buffer == (char *)(&local_buffer)] = { 0 };
}

In the above code sample declarations of `a` and `b` are invalid, i.e. they will fail to compile in many real-life C compilers. And they will fail to compile specifically (!) because in C your `(buffer == (char *)(&buffer))` is in general case not a constant at all, not a constant expression.

http://coliru.stacked-crooked.com/a/24d47b2b9caf797e
http://coliru.stacked-crooked.com/a/9c6591575b93393d

Again, while I understand the point you are trying to make, your remarks along the lines of "If you think you need to consult the C standard..." clearly and unambiguously indicate that it is you that would be better off "switching to a different programming language". I hope you simply misspoke.

This kind of abuse of terminology is underlying reason for 99 misunderstanding out of 100. Most of unnecessary internet noise grows out of such roots.

DiTBho · « **Reply #151 on:** June 30, 2021, 08:58:11 pm »

@TheCalligrapher
are you a C-compiler writer? if so, for which architecture, and based on what? LCC-v3? LCC-v4? lib-llvm/clang?

Nominal Animal · « **Reply #152 on:** June 30, 2021, 10:01:28 pm »

Quote from: TheCalligrapher on June 30, 2021, 08:42:56 pm

In general case the above expression is not a constant at all (regardless of its value).

If that is exactly true, give me one example case where it evaluates to anything but 1. I'll wait right here; don't you dare just brush this off.

The fact that compilers have difficult time recognizing that fact, and instead treat the arrays as VLAs, and therefore interpret the code to be in error, is a separate issue; compilers are stupid, but we work with them because we have nothing better. And I know at least half a dozen expressions that do that to a compiler; so what?

(Edited to add: Yes, I fully expect you to respond with something as inane as "Even if the value of an expression is always 1 and can never be anything but 1 in C, that does not make the expression a constant". From the point of view of someone using C to write actual real world programs, that statement is illogical and insane. I prefer sanity and the real world over the world of language lawyers and theoretical abstract state machines every single day. The only concession I am willing to make to a C compiler writer is, if they just come out and say that "okay, the expression is constant, but we don't really have a good way to make the compiler see it, so at least for now, we cannot treat it as a compile time constant in all cases, especially those calculating array sizes. It has to do with the phase at which the constant-ness of the expression is detected, you see", then I nod and shrug and deal with it. Real world and all.)

TheCalligrapher · « **Reply #153 on:** June 30, 2021, 10:19:07 pm »

Quote from: Nominal Animal on June 30, 2021, 10:01:28 pm

If that is exactly true, give me one example case where it evaluates to anything but 1. I'll wait right here; don't you dare just brush this off.

It cannot. The value of that expression is always 1. But from the language point of view it is neither a "constant", nor a "constant expression".

Quote from: Nominal Animal on June 30, 2021, 10:01:28 pm

The fact that compilers have difficult time recognizing that fact, and instead treat the arrays as VLAs, and therefore interpret the code to be in error, is a separate issue; compilers are stupid, but we work with them because we have nothing better.

Compilers just follow the language specification. It is the language specification that is supposedly "stupid", by your logic. But the reality is that the language specification often has good (if non-obvious) reasons to make the decisions it makes. Especially in situations where address comparisons are involved.

C language does not even recognize `const int a = 42;` as a constant. This might be called "stupid", but that's just a historical peculiarity.

Nominal Animal · « **Reply #154 on:** June 30, 2021, 10:20:14 pm »

As an aside: If you ever use a C or C++ compiler to write complex freestanding code, for example to implement an operating system kernel, you do need to be fully prepared to deal with the developers of the compiler you use. Some of them will insist that just because the C standard says that this expression is undefined, although you have shown that the only sensible real-world usable code generated for that expression is this machine code, and it yields a crucial operation needed in practice, and here is the patch that makes it do so in a manner that is natural to the codebase, does not mean that the compiler should be changed. In fact, they will reject any such patches with extreme prejudice, because they simply believe they are servicing the C standard, and not the users of their product.

Let that sink in. It is true, and for example the GCC - Linux Kernel cases well documented, and numerous.

This is annoying, but even very technically adept humans can be utterly, utterly stupid about the real world. No, they never change or learn; but they do sometimes get replaced when the number of complaints by the users reach high enough numbers (a number of core GCC devs got kicked out because of exactly this years ago).

We have to deal with them, because killing them on sight is a crime. Which is a pity.

Nominal Animal · « **Reply #155 on:** June 30, 2021, 10:32:18 pm »

Quote from: TheCalligrapher on June 30, 2021, 10:19:07 pm

Compilers just follow the language specification.

That's where we fundamentally disagree.

Compilers translate human-readable code to machine-executable code. They do so on behalf of their users. They are not envoys of the C standard.

And there is nothing on this Earth that will make you understand or accept that. Not even if I point out that even if the C standard was completely abolished and removed from all media today, it would not affect whether a single C compiler worked tomorrow or not. However, if all C compilers started suddenly producing code that technically follows the C standard, but is useless in the real world, they would be replaced with something sane faster than you can read your Holy Standard from end to end.

Quote from: TheCalligrapher on June 30, 2021, 10:19:07 pm

C language does not even recognize `const int a = 42;` as a constant.

Are you terminally stupid, or do you just skip what I've written? I've explained what const volatile int x means from the programmer perspective, and it has nothing to do with constancy, and everything to do with promises between the programmer and the compiler.

At this point, I'm dropping you into my ignore list, because I just cannot stand people who believe that doing X is okay because "the standard says so" or worse, because "the standard doesn't say that doing X is okay, so I must assume it is not okay", even if it is utterly idiotic and useless in the real world. I'm not interested in the make-believe La-La world where The Standard is the truth and the real world is just an irrelevant side note, because I live in the real world, and utterly reject the relevance of the standards except to the extent they are useful in the real world

DiTBho · « **Reply #156 on:** June 30, 2021, 10:39:00 pm »

Quote from: Nominal Animal on June 30, 2021, 10:20:14 pm

As an aside: If you ever use a C or C++ compiler to write complex freestanding code, for example to implement an operating system kernel, you do need to be fully prepared to deal with the developers of the compiler you use [...]

Yup

Just to give a fresh example of what I am fighting against just right now, Linux kernel v3.4.39 doesn't compile with anything > gcc-v4, and if you look at the reasons ... you find it uses several inner functions that explicitly rely on how gcc-v4 handles some implementation details.

If you hack the compiler-check to force a build up with gcc-v5...v10, the kernel somehow compiles, but then it doesn't work correctly, it emits oops like if it was raining and does panic pretty immediately after the boot, whereas when compiled with gcc-v4.1.2, everything works as expected.

(edit: yes, it for that bloody Allwinder H3 SoM ... it's really a horse kicking)

TheCalligrapher · « **Reply #157 on:** June 30, 2021, 11:14:30 pm »

Wow!

I think, folks, we are dealing with a classis case of either

1. An individual, who realizes that understanding the language specification (aka "The Holy Standard") is beyond their intellectual capacity, and subsequently assumes defensive/hostile/combative stance towards said standard. Basically, it is a case of "if I can't comprehend it, then it's stupid". This is a rather widespread behavioral pattern, a form of "sour grapes syndrome", which we see expressed quite often on the Net. Expressed towards mathematics, towards physics, and now towards language standards, apparently.

or

2. An individual gets so embogged in their own grassroot delusions and misconceptions about the language (and likely for a long time), that when they finally get exposed to the proper fundamental ideas and principles of that language, they assume defensive/hostile/combative stance towards said concepts. They choose to stick to their delusions because it is easier that making an effort to relearn. It is form of a "baby duck syndrome" apparently.

I don't see why the proper formal knowledge of the language should be seen as contradictory to its real-life usage. In fact, such perception usually indicates incompetence. We do use C language in massive real-life projects, yet I've never seen a professional who'd have to perceive the language specification as some sort of impediment in that process. The latter would get laughed off the marked rather quickly.

PlainName · « **Reply #158 on:** June 30, 2021, 11:19:29 pm »

Quote

Linux kernel v3.4.39 doesn't compile with anything > gcc-v4, and if you look at the reasons ... you find it uses several inner functions that explicitly rely on how gcc-v4 handles some implementation details.

Isn't that an excellent illustration that persuading the compiler writers to ignore the C standard and "do what's right in the real world" just leads to tears later on? If you need particular assembler, write in assembler rather than C.

Nominal Animal · « **Reply #159 on:** July 01, 2021, 02:01:51 am »

Quote from: dunkemhigh on June 30, 2021, 11:19:29 pm

Quote
Linux kernel v3.4.39 doesn't compile with anything > gcc-v4, and if you look at the reasons ... you find it uses several inner functions that explicitly rely on how gcc-v4 handles some implementation details.

Isn't that an excellent illustration that persuading the compiler writers to ignore the C standard and "do what's right in the real world" just leads to tears later on?

No; it shows you that not everything useful and necessary is captured by the C standard.

Even the GNU C library relies on GCC extensions and GCC-specific behaviour. Are you too seriously suggesting that instead of talking to the tool developers to see if everyone agrees a specific behaviour is useful, desired, and does not negatively affect other use cases, people should just consult The Oracle Of Correct Behaviour, preferably via a lawyer who gets to redefine all terms to mean what they want them to mean, regardless of their real-world usage?

Sounds pretty damn stupid and inefficient to me.

Development does not occur because the C standard writers invent it. It grows from the grassroots up. Features get added, because they are needed and useful. When enough people agree the features are useful and necessary, it gets codified into the standard. I'm liking Clang over GCC right now, exactly because they do this, and GCC is once again moving away from this.

Only idiots like Microsoft push stuff through standards first. How many of you actually use the Microsoft-specific C11 and later Annex K functions (bounds-checking with a _s suffix)? Or their OOXML formats? If you develop only for Windows, and believe no other OS actually matters; sure. Makes sense then, as your world is limited to that. None of my systems have those. Why? Because users did not ask for those, Microsoft pushed them into the standard by stuffing the committee (well known and documented business tactic of theirs, as they need to stop C diverging from C++ so they can keep claiming their C++ compiler does C too, and at the time they were desperately looking for walls to stop leaking developers from their walled garden). Using the standard to steer where C is heading for is not working, and will not work, because the users do what users do, not what the C standard says them to do. And if you wonder why, you are much less smart than I gave you credit for.

Then again, it seems I am among the very small minority who believe programmers, as software engineers, should be as responsible for their work product as e.g. structural engineers are.

brucehoult · « **Reply #160 on:** July 01, 2021, 03:37:12 am »

Quote from: Nominal Animal on July 01, 2021, 02:01:51 am

Then again, it seems I am among the very small minority who believe programmers, as software engineers, should be as responsible for their work product as e.g. structural engineers are.

I also believe that.

I just no longer -- unlike my first couple of decades in the industry -- think bondage&discipline languages are the key to that, or even all that helpful.

The things that do make a difference are creating and using appropriate safe abstractions -- which can be done with structs, functions, and macros -- and test driven development. Garbage collection and appropriate exception handling techniques (*very* different to C++ and Java ones) allow the use of better abstractions.

A good engineer should be able to write perfectly safe code in assembly language.

However, you can do it faster and generate faster code by using an optimising C compiler. That's because of the automatic management of register allocation, minimising peak register usage in each function (and thus the number saved/restored) by reusing registers, and cleaning up code generated by macro-expansion or function inlining using compile-time evaluation, constant propagation, dead code elimination (including if/then/else with known condition).

A higher level language such as Swift or Lisp can further increase programmer productivity and program performance (from more efficient implementation of high level abstractions), but the additional gains are small and there is a big price to pay in loss of portability compared to C/C++ and lack of support for small or unusual machines.

Nominal Animal · « **Reply #161 on:** July 01, 2021, 05:38:52 am »

Quote from: brucehoult on July 01, 2021, 03:37:12 am

A higher level language such as Swift or Lisp can further increase programmer productivity and program performance (from more efficient implementation of high level abstractions)

Yes; and for those learning C, exposing oneself to different languages can help understand abstractions and how abstractions differ between programming languages.

Furthermore, there are situations where a single programming language is not the most efficient approach. (I often mention I like to write UIs in high-level interpreted languages like Python, because that way the UI is most malleable to end user modification, and I can still keep the heavy computational core in C or C++ or perhaps some other systems programming language.)

There are good reasons why so many games and applications incorporate domain-specific languages (from Lua to Lisp to Python), and it is not just "because it lets us use cheaper developers for the unimportant stuff" –– sometimes it just makes new and worthwhile things possible. NPC logic is easier if you have abstractions to support behaviour creation, instead of lifting it directly from low-level arithmetic with few abstractions, like you'd do if you do it in C. Recent discussions on design software supporting arithmetic expressions, instead of just numerical constant is one too: you can implement a simple numerical processor, but if you instead embed a scripting language, you suddenly make things parametric and programmable. Of course the key is whether users need these or find them useful or not; a feature nobody uses nor needs is only a plus in the marketing wank. Seeing the need for another language is very difficult to see unless you have experience using them as an user in similar situations and have found their power first hand, or have some experience in using those different languages to solve different problems can have some grasp at the different abstractions they provide; and you notice your mind telling you "feature X of Y would be nice here". (Sometimes it is wrong, though; mine is, at least. It is not an oracle, just bubbles up ideas to test/check/verify.)

I myself use C for freestanding and systems programming. I can do full graphical UIs (and have used GTK+ for this) in C, but it is not a very good fit; I only do so if I am resource-constrained and higher-level abstractions cannot perform sufficiently well on a given hardware. Computational power growth in the last couple of decades means even the cheap SBCs have enough memory and CPU power so that just doesn't happen anymore, so I don't. Using a language just because it is the one you know is not really a sensible way to pick a language, unless you are doing it for learning purposes or fun more than any other long-term reason. Besides, if you use C++, you can use Qt even on top of a raw framebuffer on an embedded machine.

Quite a lot of high-performance computing (often written in C, or a subset of C++) nowadays embeds CUDA or OpenCL code. On the graphics side, we have HLSL and GLSL (Direct3D and OpenGL shader programming languages), and so on. Thus, sometimes an "embedded" or domain-specific language is a compiled, low-level one, too. There are counter-efforts like SYCL, that try to unify these so that a single compiler can handle all in the name of programming productivity, but I'm not convinced: there will always be cases where specialized, purpose-designed tools beat the generic ones, no matter how powerful its abstractions, for the very simple reason that some abstractions are contradictory so a single tool can never hold them all. I think of the history of PHP as an example why trying to do that – be everything and all things for everybody – backfires.

If someone tells you that language X is the only one you ever need; I suggest you think of it the same way you would if they had told you foodstuff Y is the only one you should ever eat. Even if they were technically correct, and I don't think they are or ever will be, it'd be rather dull and constricting.

I am following Rust with interest. The way its "borrow checker" operates when it constructs "pointers" to provide/enforce its safety guarantees, seem similar/compatible to the approaches I've used in C for a couple of decades, and described here and in the other pointer thread. No, I'm not saying that shows I'm smart; I'm only saying one reason I'm interested is because I can see useful common ground to build on. I do like the idea of people trying to create something better, to avoid pitfalls found by earlier effort, while reaching for better heights. But, I haven't an opinion on Rust overall yet, not even on things like adding Linux kernel support for writing (parts of) it in Rust. Could be good, could be irrelevant, could be bad; I don't know yet. Only interested.

Comparing to the C standard, well, I just haven't believed the standard writers have C programmers' interests anywhere near the top of their priorities for well over a decade now, because of the things they have concentrated on and pushed forward, and things they have completely failed to try and address, so I am slowly moving on. For the longest time, POSIX (at the systems programming level, which is what I've done most of) offset/overrode any issues I could have had with the C standard, but now that "almost-POSIX" environments like WSL are cropping up again, it too may be going the way of the Dodo, due to confusion and frustration engendered in the developers. Reality wins over theory and texts, but if reality becomes unreliable or chaotic, developers tend to move on. This is not fast, however; I'm talking about one or two decades here, not next year.

None of this should alarm a new programmer. Their experience will be different to mine; just take note, observe, check/verify, and decide for oneself.

DiTBho · « **Reply #162 on:** July 01, 2021, 08:25:24 am »

Quote from: Nominal Animal on July 01, 2021, 02:01:51 am

people should just consult The Oracle Of Correct Behaviour, preferably via a lawyer who gets to redefine all terms to mean what they want them to mean, regardless of their real-world usage?

That's precisely the attitude I see with theoretical researchers.

Basically their minds do operate trying to grab some good construct from the HyperUranium, but since their minds are limited they cannot physically grab anything, so they operate in the mathematical world where math can be used by their intellect to reconstruct a an imperfect copy (but hey? better than nothing) of what partially seen in HyperUranium, ... that's their daily job, knowing it will be up to someone else to push their math models into in the miserable prosaic of the world

Code: [Select]

HyperUranium ---> mathematical world ---> miserable prosaic of the world
(Plato's vision)

HyperUranium is a true concept of Plato expressed in the flesh. According to Plato the Hyperuranium the world beyond the celestial vault that has always existed in which there are immutable and perfect ideas, reachable only by the intellect, which is not tangible by the earthly bodies and corruptible.

It is an evergreen vision as old as the Greek philosophy of the aforementioned Oracle.

Speaking of this, the modern view is ... only after the singularity will there be an AI capable of penetrating the HyperUranium ... which leads to the question ... if the AI was created by humans, and the human being is imperfect, how can a creature be superior to the creator? ... you see this question in movie like Alien Prometheus , when David asks some questions to Young Peter Weyland, and he replies him to serve a cup of tea.

peter-h · « **Reply #163 on:** July 21, 2021, 07:26:16 am »

I think the problem is that everybody here is a C expert, while those who are not aren't posting

I run a "tech" forum (not electronics) so I am well familiar with the psychology.

Most people I know who do C spent some time writing crap, because they didn't do a formal course.

And if you go back far enough, programming teaching was crap anyway. At univ, 1975-78, the 8080/6800/Z80 already existed, but were they teaching them? No. We did Pascal! An almost totally useless language for embedded work (IAR did a sort of usable Pascal compiler, for about 1000 quid, IIRC) due to its ridiculously strict typing. It survived in Delphi for many years though. And I even developed a user programmable protocol converter c. 1990 which had a Pascal compiler (from HiSoft - another long gone company) built-in, along with a Wordstar compatible editor

Most C coders learnt the hard way, and a lot of crap got written.

My exposure to C is occassional so I will never get really current in it, so I use it as a "simple language" and that way I avoid getting subtle bugs.

One of the issues with pointer arithmetic (rather than operating on an explicit array index, and bounds-limiting it) is that bugs are more likely to remain hidden because the program can appear to work but is actually crapping over some memory which you just happen to not be using at that moment. That was always a risk in assembler, of course (I wrote literally megabytes of that) but one was naturally more careful with that.

I also think asm background is useful for C because you tend to know where the skeletons are likely to be buried

brucehoult · « **Reply #164 on:** July 21, 2021, 08:23:14 am »

Quote from: peter-h on July 21, 2021, 07:26:16 am

And if you go back far enough, programming teaching was crap anyway. At univ, 1975-78, the 8080/6800/Z80 already existed, but were they teaching them? No. We did Pascal!

What did you use Pascal *on* in 1975-78? It would pretty much have had to be on a CDC 6000 or ICL 1900, and in the UK, I'd think. Or Switzerland, obviously.

UCSD Pascal started widespread use of Pascal, but wasn't available until late 1977 and probably not widely at first -- it became very popular on the Apple ][ starting in late 1979.

The university I went to was teaching FORTRAN to 1st year students until 1980. In 1981 (the year I started) we got Pascal on a PDP 11/34. At first we were using a compiler from the US NBS (National Bureau of Standards). It was rather buggy. Halfway through the year we switched to OMSI Pascal.

We learned assembly language programming and interfacing to hardware in 1982 using Rockwell AIM65 (6502) boards, though I taught myself PDP-11 assembly language/machine code the year before.

Quote

An almost totally useless language for embedded work due to its ridiculously strict typing.

Perhaps in the official standard.

I've never used a Pascal compiler that didn't have extensions making it effectively a different syntax for C and just fine for low level programming, if a bit more verbose. That includes Apple (UCSD) Pascal, OMSI Pascal (it had octal constants, bitwise operations, direct access to memory, inline assembly language), VAX Pascal, Turbo Pascal, THINK Pascal, MPW Pascal.

PlainName · « **Reply #165 on:** July 21, 2021, 08:54:46 am »

Quote

I also think asm background is useful for C because you tend to know where the skeletons are likely to be buried

I agree with pretty much all of your post, but this bit I think is quite important in a non-obvious way and applicable not just to programming. One doesn't need to be an expert in the lower (and higher) levels, but a working familiarity smooths things along.

brucehoult · « **Reply #166 on:** July 21, 2021, 09:30:34 am »

Quote from: dunkemhigh on July 21, 2021, 08:54:46 am

Quote
I also think asm background is useful for C because you tend to know where the skeletons are likely to be buried

I agree with pretty much all of your post, but this bit I think is quite important in a non-obvious way and applicable not just to programming. One doesn't need to be an expert in the lower (and higher) levels, but a working familiarity smooths things along.

Absolutely.

I always recommend that people should start by learning an assembly language before high level languages.

Just not x86.

peter-h · « **Reply #167 on:** July 21, 2021, 10:03:31 am »

Pascal and Fortran were on ICL1900S, punched cards

But in 1978 we got teletypes and you could feed in paper tape!

The unfortunate thing about the crappy univ course ("Electronics and Applied Sciences", Sussex Univ) was that IF you could program a micro anytime in the 1975-1985 timeframe, you could print your own money. We were paying £500/DAY to somebody, early 80s, writing a token ring LAN around a WD2840, for months, before I decided to abandon throwing money at him and did it all myself with a Z180+85C30 (SDLC, MILSTD1553 physical). People I knew who were good at embedded devt and working as product devt consultants were driving Ferraris (back when a Ferrari cost real money), but they had to be good at digital hw and sw and analog, which was always rare, and is even more rare today.

80x86 asm is horrible.

I don't think many used Pascal for actual products. It was all done in asm and/or C. I didn't touch C until 1994.

I still don't really get subtle pointer stuff

I find it is like the devt tools. You need a not of currency to get good and stay good. I struggle with the POS called ST Cube IDE, and the crappy libraries where somebody was paid €10/line and put in error traps even in places where the code cannot possibly fail. I don't think any of this has got better. The silicon does a lot more; that's all. The software is as crap as ever.

SiliconWizard · « **Reply #168 on:** July 21, 2021, 03:59:31 pm »

Even at the time when C was still officially taught at universities as a major programming language, courses were already kind of crap IMO. I'm not as old as some others here I guess, but at uni, Pascal, was still the main language taught for introductory courses at the time. That was in the early 90s. (But I had already learned it in high school on CP/M machines.) I think courses were much better than anything I've seen with C. Which is not surprising; Pascal was designed from the ground up for teaching purposes; C definitely wasn't.

Apart from the particular merits and characteristics of C, one point explaining this IMO is a matter of chronology. C wasn't standardized before 1989, and frankly, original C was not all that good for teaching, especially for teaching good programming practice. In the 90's and even 2000's, most teachers were still knowing C from original C. So C was a moving target and there was some confusion between original C and ANSI C. Then things moved on and universities switched to other languages, so time was up for teaching C. After that, C was mainly taught as a tool for EE students, not in proper programming language courses. So the situation is not surprising.

And, Pascal was still actually used in products in the 80s and 90s. Heck, it's still used today in the form of Delphi and derivatives. Although a small market, it's definitely alive.

Siwastaja · « **Reply #169 on:** July 21, 2021, 04:29:10 pm »

C is problematic for teaching because C doesn't have a clear scope which it tries to solve. It's kind of universal tool and the official specifications are not touching all practical aspects.

C was originally conceived as a high-level language to create applications (something like what we run on our desktop computers, or which run on servers), definitely not for what's now called "embedded", especially microcontroller. But nowadays, C has primarily become de facto embedded language, despite not being most optimal for that purpose.

In reality, C isn't optimal for anything. But it's good enough; or surprisingly good for many different uses which explains its popularity; but also explains why it's difficult to teach, because teachers want a clear, easily specified goal, not a messy bazaar where anything goes but you get what you needed. Instead, universities like cathedrals such as UML diagram specified over 5 years and then implementation written in C++ or Java over the next 5 years, after which is obvious to everyone the resulting software does not work and no part of it can be practically reused because everything is built on the top of the object hierarchy.

Being able to efficiently use C in real world projects (where the aim is not sadomasochism, but getting the job actually done) also requires to understand that the standard is not the Holy Bible. This is unsuitable in the mental model of university teachers.

peter-h · « **Reply #170 on:** July 21, 2021, 05:25:41 pm »

The main "problem" with C, but also what makes it so powerful for embedded work, is that you need a clear understanding of what it is doing underneath i.e. the machine, addressing, word sizes, etc. It is thus like assembler but a lot more productive.

If one doesn't understand the hardware in great detail, one just writes something like ch=getc(), not realising this is a blocking function and will hang for ever if it is a serial port and nothing comes in. That's probably why you can't teach C generally; anybody who can use it well could code the job in assembler, given enough time.

SiliconWizard · « **Reply #171 on:** July 21, 2021, 05:38:51 pm »

Quote from: Siwastaja on July 21, 2021, 04:29:10 pm

C is problematic for teaching because C doesn't have a clear scope which it tries to solve. It's kind of universal tool and the official specifications are not touching all practical aspects.

Yes and no. I only partly agree with this. As a general-purpose language, of course it doesn't have a clear scope. Isn't that what general-purpose is all about? That can be said about many other programming languages.

As I said, I personally think that the best languages for teaching programming are those that are specifically designed for this task. In CS teaching history, if you think about it, the languages that were most successful for teaching all were designed in universities, often by professors and their teams. OTOH, those languages were often not quite fit for "industrial use", so to speak. Conversely, languages designed in industrial settings have been a better fit for real-world use, but poorer for teaching. Yes, even Java - I'm not very fond of it, and I'm not too convinced it's all that good for teaching. Sure there are worse alternatives out there. Python is definitely one IMO.

Quote from: Siwastaja on July 21, 2021, 04:29:10 pm

C was originally conceived as a high-level language to create applications (something like what we run on our desktop computers, or which run on servers), definitely not for what's now called "embedded", especially microcontroller. But nowadays, C has primarily become de facto embedded language, despite not being most optimal for that purpose.

I'm not sure I completely agree with this. It was clearly designed to be low-level enough for implementing low-level parts of an OS, and at the same time be high-level enough to be fit for writing complex applications. It addressed both right from its conception. You can argue that it ended up not being particularly good for either, but it's simple enough and it "works".

Quote from: Siwastaja on July 21, 2021, 04:29:10 pm

Being able to efficiently use C in real world projects (where the aim is not sadomasochism, but getting the job actually done) also requires to understand that the standard is not the Holy Bible. This is unsuitable in the mental model of university teachers.

I'm not sure what you mean exactly. The standard is, IMHO, actually one of C's strong points. Of course reading it is clearly not enough for using C right and efficiently, but IMHO it doesn't hinder much either. As is often discussed here, not following the standard should be usually done with a lot of caution, sparingly and should be the exception. I'd much rather have young engineers taught perfectly standard C properly, and then teach them some possible deviations and compiler extensions in the field, than engineers taught weird non-compliant stuff that are likely to get them more confused than anything and later on have to teach them the standard to get back on track, which unfortunately is the most common option I've seen, by far.

Now again there certainly are languages that are much better for both teaching and everyday development. C is popular though because in the end it seems to have made the "right" compromises for real-world use and wide acceptance.

hamster_nz · « **Reply #172 on:** July 22, 2021, 03:10:42 am »

Quote from: peter-h on July 21, 2021, 05:25:41 pm

The main "problem" with C, but also what makes it so powerful for embedded work, is that you need a clear understanding of what it is doing underneath i.e. the machine, addressing, word sizes, etc. It is thus like assembler but a lot more productive.

The main "problem" with C is that the C programmer has two key skills:

- Solve the programming problem that they have at hand.

- Keep the execution environment health (i.e. keep track of resources, be sure to only address things that you should, avoid undefined behavior, detect and handle errors appropriately)

Most learning resources seem to think that the second skill is not required, or can be treated as "on the job" learning. Heck, even the man pages for "fopen()" don't advise you that you should call "fclose()". But keeping the execution environment health is the most important thing.

For example the 'man' page for fopen() should explicitly state "all valid file pointers returned from fopen() should be be closed by calling fclose() then the file is no longer needed."

Have a look at https://www.geeksforgeeks.org/c-fopen-function-with-examples/ - sure it uses fclose() in the example, but no motion of why it is vital to close files. FWIW it doesn't even detect if fopen() fails either.

Here's code from http://faculty.winthrop.edu/dannellys/csci325/02_c_IO.htm - which I assume is part of "CSCI 325 - File Structures (3).":

Code: [Select]

/**********************************************
Demo Three - Use C to copy file1 to file2
*********************************************/
#include <stdlib.h>
#include <stdio.h>

/******************************************************/

int main ()
{
   FILE *infile, *outfile;
   char string[20];

   /***** open input and output files *****/
   infile = fopen ("file1", "r");
   if (infile == NULL)
     {
      fprintf(stderr, "Error opening input file\n\n");
      exit (1);
     }
   outfile = fopen ("file2", "w");
   if (outfile == NULL)
     {
      fprintf(stderr, "Error opening output file\n\n");
      exit (1);
     }

   /***** copy everything  ****/
   while (!feof(infile))
     {
      fscanf  (infile,  "%s", string);
      fprintf (outfile, "%s ", string);
     }
}

When people are learning with these piles of crap, is it any surprise we get such bad results?

brucehoult · « **Reply #173 on:** July 22, 2021, 04:00:33 am »

Quote from: SiliconWizard on July 21, 2021, 05:38:51 pm

As I said, I personally think that the best languages for teaching programming are those that are specifically designed for this task. In CS teaching history, if you think about it, the languages that were most successful for teaching all were designed in universities, often by professors and their teams. OTOH, those languages were often not quite fit for "industrial use", so to speak. Conversely, languages designed in industrial settings have been a better fit for real-world use, but poorer for teaching. Yes, even Java - I'm not very fond of it, and I'm not too convinced it's all that good for teaching. Sure there are worse alternatives out there. Python is definitely one IMO.

Python has the very great advantage over Java of being able to write a simple program e.g. HelloWorld in a single line of code, without ten lines of boilerplate code that that won't be and can't be explained until you know much more about the language. C is worse than Python in that, but a lot better than Java.

I have three major beefs with Python:

- I will never accept significant whitespace.

- having no type declarations at all is bad. Even if you have dynamic typing -- i.e. you can accept objects of different types -- it's good to be able to declare what messages you will be sending to the objects and expect them to be able to respond to. It's documentation of intent for yourself, later. Or others, of course.

- the language is not orthogonal. Scheme has all the same advantages as Python (except popularity), but you can combine Scheme features in arbitrary ways and it works. Dylan and Julia have the advantages of Scheme, but with optional type annotations.

My attributes for a good language to learn programming:

1) trivial programs are small

2) it's easy to learn essentially all of the language, so you spend your time learning to program, not learning the language.

3) it's easy to form a correct mental model of what is happening when the program runs

4) it's easy to know what you CAN'T write. If something seems to make syntactic sense then it should be permitted and make semantic sense.

5) a good facility for abstraction, so you can create features that could have been built into the language, but weren't

6) when you create new abstractions they look but especially PERFORM as if they had been built into the language.

Obviously there are other desirable characteristics of the programming environment, but I'm concentrating on the language here.

peter-h · « **Reply #174 on:** July 22, 2021, 05:43:46 pm »

I was never a fan of "teaching languages" because you basically leave university having learnt little that's actually useful anyway, and not learning useful "computing stuff" makes it even worse.

I designed a function generator which drove a 256 byte fusible link PROM from an 8-bit sync counter and fed its output into an 8-bit DAC (state of the art stuff in 1978!) and I needed to generate the binary data to program into the PROM (setting up every byte with toggle switches on the programmer!). So, to generate a sinewave, I wanted a program which would print out sin(x), with x going 0-255 representing 0-360deg, and the value going 0-255. In Pascal. Impossible! Nobody knew how to write it. I did it in Fortran...

brucehoult · « **Reply #175 on:** July 22, 2021, 11:41:58 pm »

Quote from: peter-h on July 22, 2021, 05:43:46 pm

I was never a fan of "teaching languages" because you basically leave university having learnt little that's actually useful anyway, and not learning useful "computing stuff" makes it even worse.

I designed a function generator which drove a 256 byte fusible link PROM from an 8-bit sync counter and fed its output into an 8-bit DAC (state of the art stuff in 1978!) and I needed to generate the binary data to program into the PROM (setting up every byte with toggle switches on the programmer!). So, to generate a sinewave, I wanted a program which would print out sin(x), with x going 0-255 representing 0-360deg, and the value going 0-255. In Pascal. Impossible! Nobody knew how to write it. I did it in Fortran...

I have no idea why you think that would be impossible or even difficult in Pascal in 1978. It would be trivial in BASIC on a TRS-80, or Apple ][, or Commodore Pet in 1978, and no more difficult in Pascal.

SiliconWizard · « **Reply #176 on:** July 22, 2021, 11:45:12 pm »

Quote from: brucehoult on July 22, 2021, 11:41:58 pm

Quote from: peter-h on July 22, 2021, 05:43:46 pm
I was never a fan of "teaching languages" because you basically leave university having learnt little that's actually useful anyway, and not learning useful "computing stuff" makes it even worse.

I designed a function generator which drove a 256 byte fusible link PROM from an 8-bit sync counter and fed its output into an 8-bit DAC (state of the art stuff in 1978!) and I needed to generate the binary data to program into the PROM (setting up every byte with toggle switches on the programmer!). So, to generate a sinewave, I wanted a program which would print out sin(x), with x going 0-255 representing 0-360deg, and the value going 0-255. In Pascal. Impossible! Nobody knew how to write it. I did it in Fortran...

I have no idea why you think that would be impossible or even difficult in Pascal in 1978. It would be trivial in BASIC on a TRS-80, or Apple ][, or Commodore Pet in 1978, and no more difficult in Pascal.

Ditto?

DiTBho · « **Reply #177 on:** July 23, 2021, 12:02:47 am »

MicroPascal is used in embedded

brucehoult · « **Reply #178 on:** July 23, 2021, 03:39:25 am »

Quote from: brucehoult on July 22, 2021, 11:41:58 pm

Quote from: peter-h on July 22, 2021, 05:43:46 pm
I was never a fan of "teaching languages" because you basically leave university having learnt little that's actually useful anyway, and not learning useful "computing stuff" makes it even worse.

I designed a function generator which drove a 256 byte fusible link PROM from an 8-bit sync counter and fed its output into an 8-bit DAC (state of the art stuff in 1978!) and I needed to generate the binary data to program into the PROM (setting up every byte with toggle switches on the programmer!). So, to generate a sinewave, I wanted a program which would print out sin(x), with x going 0-255 representing 0-360deg, and the value going 0-255. In Pascal. Impossible! Nobody knew how to write it. I did it in Fortran...

I have no idea why you think that would be impossible or even difficult in Pascal in 1978. It would be trivial in BASIC on a TRS-80, or Apple ][, or Commodore Pet in 1978, and no more difficult in Pascal.

See for example page 23 of the 1976 Pascal classic "Algorithms + Data Structures = Programs", where we see real variables used with sqrt, sin, cos.

https://doc.lagout.org/science/0_Computer%20Science/2_Algorithms/Algorithms%20%20%20Data%20Structures%20%3D%20Programs%20%5BWirth%201976-02%5D.pdf

Plus of course you can convert real to integer, do div and mod on integers to get hex digits, chr() to convert ASCII codes to char etc etc. I don't see any obstacle to doing trig calculations and printing the results as 0..255 or 00..FF or in binary if you want. In Pascal circa 1972-1976.

hamster_nz · « **Reply #179 on:** July 23, 2021, 05:12:38 am »

Quote from: brucehoult on July 23, 2021, 03:39:25 am

I don't see any obstacle to doing trig calculations and printing the results as 0..255 or 00..FF or in binary if you want. In Pascal circa 1972-1976.

Maybe the documentation was lacking

.

We quickly forget how hard it was to find good documentation for dev tools in the early years, and what a revelation/revolution "man" pages were.

One of my first holiday jobs was data entry into a mainframe database all of a bank's IBM publications, as then updating manuals in 3-ring binders with corrected pages - and that was mid 80s!

SiliconWizard · « **Reply #180 on:** July 23, 2021, 06:05:46 pm »

Quote from: brucehoult on July 22, 2021, 04:00:33 am

Quote from: SiliconWizard on July 21, 2021, 05:38:51 pm
As I said, I personally think that the best languages for teaching programming are those that are specifically designed for this task. In CS teaching history, if you think about it, the languages that were most successful for teaching all were designed in universities, often by professors and their teams. OTOH, those languages were often not quite fit for "industrial use", so to speak. Conversely, languages designed in industrial settings have been a better fit for real-world use, but poorer for teaching. Yes, even Java - I'm not very fond of it, and I'm not too convinced it's all that good for teaching. Sure there are worse alternatives out there. Python is definitely one IMO.

Python has the very great advantage over Java of being able to write a simple program e.g. HelloWorld in a single line of code, without ten lines of boilerplate code that that won't be and can't be explained until you know much more about the language. C is worse than Python in that, but a lot better than Java.

Well, frankly, if all it has going for it is that it will take a couple fewer lines to write an Hello World program, huh...

Sure some languages just take a one-liner for this. Basic was one of them. That doesn't make them particularly good languages as such. I understand that it makes the very first hours of learning programming, for someone who has never been exposed to it, easier.

Otherwise, I agree with your other points about Python.
I personally sum up my view of it like so: Python is a great tool, but very poor as a programming language. Teaching it as a now widely-used tool is fine. Using it for teaching the fundamentals of programming and good programming practices is uh... not.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee