EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: Dmeads on June 14, 2020, 08:15:50 am

Title: What does the "&" do in my fscanf statement in C
Post by: Dmeads on June 14, 2020, 08:15:50 am
Hullo!

I thought the "&" operator (when dealing with pointers) just returns the address, so why do I need it to put a value from my file into an array?

Thanks.

-Dom

Code: [Select]
#include <stdio.h>
#include <math.h>    // for rand()
#include <stdlib.h>  // for files



int main()
{
    float my_array[8] = {};
    float max = 2147483647;  // RAND_MAX
    float read_array[8] = {};  // to put the read values into.
   
    FILE * fp;
   
    fp = fopen("MyRandomNumbers.txt","w+");  // open/create file (w+ means write or read)
   
    for(int i = 0; i < 8; i++)
    {
        my_array[i] = rand() / max;  // create normalized random number
        fprintf(fp,"%f\n",my_array[i]);  // print a random number to the file
    }
   
    rewind(fp);  // reset position to beginning of file
   
    for(int i = 0; i < 8; i++)
    {
        fscanf(fp,"%f", &read_array[i]);  // read each line of file and store in array
        printf("line %d of the file holds the number %f\n", i, read_array[i]);  // print the values
    }
   
   
    fclose(fp);  // close file
   
    return 0;
}
Title: Re: What does the "&" do in my fscanf statement in C
Post by: greenpossum on June 14, 2020, 08:21:34 am
When you index the array it's no longer an address so you need the &.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: T3sl4co1l on June 14, 2020, 11:41:32 am
Indeed, array access foo[bar] is syntactic sugar for *(foo + bar).  (Note that when C performs arithmetic on pointers, it's done in terms of elements of the defined type, in this case float, which is most likely 4 bytes on the physical platform.  The physical address is fully (uint8_t*)foo + sizeof(foo) * bar.)

In effect, & undoes *.  Heh, hmm, I've never actually written &*&*&*&*&*&foo[bar] before, but it should work.

Likewise, you could write fscanf(fp,"%f", read_array + i);, which should work just as well.

Tim
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 14, 2020, 04:33:01 pm
The physical address is really (char*)foo + sizeof(foo[0]) * bar because sizeof(foo) is the size of a pointer and unit8_t may not even be defined.

Generally, read_array is of type "float pointer" while read_array[anything] is of type "float" because you wouldn't want to write *read_array[n] to get the n-th element. Hence the need for &.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: T3sl4co1l on June 15, 2020, 03:19:06 am
Sorry, yes, sizeof element, not the array! :D

Also, sizeof(char) is defined as 1 (should've looked this up a long time ago!).  So uint8_t should? always be defined as unsigned char.

With uint8_t, I am assuming #include <stdint.h>, yes.

Tim
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 15, 2020, 06:19:36 am
Well, uint8_t may not be defined at all (old implementation, weird architecture, lazy implementator) while char is guaranteed to exist and mean the lowest addressable unit of storage, in which sizeofs are counted. I don't really get that recent fad to replace char* with uint8_t*, unless one wants to prevent accidental compilation on systems with 9 or 64 bit char because the code treats it as strictly 8 bit.

As for whether sizeof(uint8_t) may be anything other than 1 when it exists, hopefully not. ISO mandates that char holds at least ±127 or 0-255 so there would be no obvious need for more than 1 char to store a uint8_t in a conforming implementation. But when it comes to strict lawyerism, I'm not sure if anything prohibits padding every uint8_t with a second dummy char just to break your code ;)
Title: Re: What does the "&" do in my fscanf statement in C
Post by: brucehoult on June 15, 2020, 11:12:01 am
Well, uint8_t may not be defined at all (old implementation, weird architecture, lazy implementator) while char is guaranteed to exist and mean the lowest addressable unit of storage, in which sizeofs are counted. I don't really get that recent fad to replace char* with uint8_t*, unless one wants to prevent accidental compilation on systems with 9 or 64 bit char because the code treats it as strictly 8 bit.

As for whether sizeof(uint8_t) may be anything other than 1 when it exists, hopefully not. ISO mandates that char holds at least ±127 or 0-255 so there would be no obvious need for more than 1 char to store a uint8_t in a conforming implementation. But when it comes to strict lawyerism, I'm not sure if anything prohibits padding every uint8_t with a second dummy char just to break your code ;)

It is not defined whether plain char is signed or unsigned, so if you care about the difference (i.e. need negative values or ones greater than 127) then you either need to use int8_t or uint8_t or else create your own typedef.

It's a bit pointless to create your own typedef to signify an 8 bit integer, but it makes complete sense to make a typedef for something in the application domain such as PortNum or whatever.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: Nominal Animal on June 15, 2020, 11:21:27 am
I don't really get that recent fad to replace char* with uint8_t*
No, it's the signedness thing.  (That char may be signed on some systems and unsigned on others.)

Here's an example.  Did you know that to write a function that returns the number of whitespace characters in a string, say
Code: [Select]
size_t  whitespaces(const char *s)
{
    size_t  n = 0;
    if (s) {
        while (*s) {
            n += isspace((unsigned char)(*s));
            s++;
        }
    }
    return n;
}
you have to cast the argument of isspace() to unsigned char, or it may not work for non-ASCII characters?  (When using e.g. ISO Latin 1 or 9, ISO 8859-1 or ISO 8859-15, this includes non-breaking space.)  And that this is actually stated in the C standard? The same for all the is*() functions declared in <ctype.h>.

Clunky, yeah.

The solution would be to use unsigned char instead of char everywhere, but uint8_t is shorter.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 15, 2020, 12:07:44 pm
I still wouldn't make this function take a uint8_t* argument. Can anyone actually guarantee that uint8_t is equivalent to unsigned char when it exists?

I'm not talking about using char for maths, although you could if you specify signedness explicitly. I'm talking about the fad of treating random pointers as uint8_t* rather than char*. Begone, Satan.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: Nominal Animal on June 15, 2020, 02:00:15 pm
TL;DR: uint8_t is equivalent to unsigned char if you only use them to describe unsigned integers between 0 and 255.  This is quaranteed.

I still wouldn't make this function take a uint8_t* argument.
Me neither, I'm quite comfy using the casts.  Others don't like typing.

In my own code, I tend to use const unsigned char and unsigned char throughout, which also avoids the problems and the need for aforementioned casts.  I've gotten comments on how odd that looks, too.  :-//

Can anyone actually guarantee that uint8_t is equivalent to unsigned char when it exists?
It is true if and only if CHAR_BIT == 8.  (It is a macro defined in <limits.h> which is available in both hosted and freestanding environments.)

The underlying issue is that the C standard allows unsigned char to represent values between 0 and 2CHAR_BIT-1, inclusive, with CHAR_BIT required to be at least 8, but allowed to be larger. uint8_t on the other hand is an unsigned integer type that can represent values between 0 and 28-1, exactly.  The arithmetic etc. rules are otherwise the same for both.

An indirect consequence of the wording in the standard is that sizeof (uint8_t) == sizeof (unsigned char) == sizeof (char) == 1 is required/quaranteed for a conforming implementation.  As far as I know, it has been true on even on oddball and experimental architectures.

If you write code that expects an ASCII-compatible 7- or 8-bit execution character set, then having CHAR_BIT > 8 does not matter, and effectively unsigned char and uint8_t are the same, since you only use the types for values 0 to 255, inclusive.  Because of Unicode (and UTF-8), this is already a perfectly acceptable expectation.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 15, 2020, 09:21:56 pm
Well, I found a few interesting counterarguments on the Interwebz.

There is supposedly no guarantee that encoding is identical for the two types. Might be different bit order or perhaps even weirder bullshit, AFAIK the standard says nothing about encoding of non-stdint types.
There are special aliasing rules concerning char pointers which affect optimizations and UB, nothing like that for uint8_t. I wouldn't be surprised if casting float* to uint8_t* and then trying to load the first byte as uint8_t actually results in UB, though it is explicitly legal to do so with char.

Nobody mentioned it, but I'm still not convinced if an implementation which always stores uint8_t at 16 bit alignment wouldn't be legal.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: golden_labels on June 15, 2020, 11:47:42 pm
As for whether sizeof(uint8_t) may be anything other than 1 when it exists, hopefully not. ISO mandates that char holds at least ±127 or 0-255 so there would be no obvious need for more than 1 char to store a uint8_t in a conforming implementation.
Aren’t you conflating uint8_t with uint_least8_t or uint_fast8_t? uint8_t, like all exact-width types, has very strict requirements regarding size and implementation. It must be exactly that many bits, it can’t have any padding bits, it must have all the arithmetic behaviour of exactly that many bits. Signed exact-width types are even worse, as they must be 2's completent. This is why all exact types are optional: they may be simply unimplementable in a given platform. So if CHAR_BIT is anything greater than 8, uint8_t can’t even be imlemented. Which also means that sizeof(uint8_t) is always 1.

leastN and fastN, on the other hand, may be larger than N, may have padding bits, may have any representation and the arithmetic is not limited to the N-bit one. However, uint_least8_t and int_least8_t by definition must always be defined as (unsigned/signed) char and their size is always 1, which makes them pretty useless.

It is not defined whether plain char is signed or unsigned, so if you care about the difference (i.e. need negative values or ones greater than 127) then you either need to use int8_t or uint8_t or else create your own typedef.
You can simply use signed char or unsigned char. No need for exact-width types here.

you have to cast the argument of isspace() to unsigned char, or it may not work for non-ASCII characters?  (When using e.g. ISO Latin 1 or 9, ISO 8859-1 or ISO 8859-15, this includes non-breaking space.)  And that this is actually stated in the C standard? The same for all the is*() functions declared in <ctype.h>.
Since Nominal Animal han’t provided that: to get an explanation better than “standard requires” (in this case the specs are not clear on why you should do that), see STR37-C (https://wiki.sei.cmu.edu/confluence/display/c/STR37-C.+Arguments+to+character-handling+functions+must+be+representable+as+an+unsigned+char) in SEI CERT CCS.

Code: [Select]
float max = 2147483647;  // RAND_MAX
Then simply use RAND_MAX — do not guess values, do not hardcode your guesses. This is exacly why you have this macro defined: because the value may be different.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 16, 2020, 06:27:39 am
As for whether sizeof(uint8_t) may be anything other than 1 when it exists, hopefully not. ISO mandates that char holds at least ±127 or 0-255 so there would be no obvious need for more than 1 char to store a uint8_t in a conforming implementation.
Aren’t you conflating uint8_t with uint_least8_t or uint_fast8_t?
I think I'm not.

Now, let's consider a platform with 16 bit registers and memory words and special "8 bit" instructions which truncate the input and output to accurately mimic 8 bit calculations. Say that I pack two chars per word for space efficiency but decide to store 8 bit integer types one per word to avoid shifting overhead on loads and stores at odd addresses. There are padding bits, but my implementation makes them invisible to arithmetic operations in C by the use of 8 bit truncating instruction. You can only discover those bits by reading a variable stored in memory char by char, because its sizeof is 2.

Are you sure the wording about padding makes it illegal to call this contraption a uint8_t? I think it depends on how the "padding bits" are defined and to what extent the implementation needs to hide them from the user. I don't know, IANAL ;)
Title: Re: What does the "&" do in my fscanf statement in C
Post by: golden_labels on June 16, 2020, 07:21:20 am
Now, let's consider a platform with 16 bit registers and memory words and special "8 bit" instructions which truncate the input and output to accurately mimic 8 bit calculations. Say that I pack two chars per word for space efficiency but decide to store 8 bit integer types one per word to avoid shifting overhead on loads and stores at odd addresses.
Not only “let’s say”, but we would have to do it. Object pointers are expressed in terms of sizeof(char) and can’t be “fractional”. This is the case with _Bool, which is required to only store integers in range [0, 1], but it can’t have less than 8 bits. So 8 wasted bits would be a requirement for uint8_t.

There are padding bits, but my implementation makes them invisible to arithmetic operations in C by the use of 8 bit truncating instruction. You can only discover those bits by reading a variable stored in memory char by char, because its sizeof is 2.

Are you sure the wording about padding makes it illegal to call this contraption a uint8_t? I think it depends on how the "padding bits" are defined and to what extent the implementation needs to hide them from the user. I don't know, IANAL ;)
That would be even more interesting, in the negative sense, because it has potential to produce trap representations. But even assuming it does not, we have 8 padding bits. And those are explicitly forbidden for fixed-width types. And for a good reason: fixed-width types are expected to have very strictly defined behaviour in wide range of operations across platforms (if they are present).

Padding bits of an integer are defined as any bits that are not value bits or sign bit.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: hamster_nz on June 16, 2020, 09:09:25 am
One thing that made it clear (to me at least) that char should be unsigned, even though it isn't, is the value used for EOF.

Code: [Select]
#include <stdio.h>

int main(int argc, char *argv[]) {
  signed char sc = (signed char)EOF;
  unsigned char uc = (unsigned char)EOF;
  char c = (char)EOF;

  if(sc == EOF) {
     printf("Signed char can be EOF\n");
  } else {
     printf("Signed char cannot be EOF\n");
  }

  if(uc == EOF) {
     printf("Unsigned char can be EOF\n");
  } else {
     printf("Unsigned char cannot be EOF\n");
  }

  if(c == EOF) {
     printf("Char can be EOF\n");
  } else {
     printf("Char cannot be EOF\n");
  }

  return 0;
}
Title: Re: What does the "&" do in my fscanf statement in C
Post by: magic on June 16, 2020, 09:13:50 am
Padding bits of an integer are defined as any bits that are not value bits or sign bit.
The matter is how it is defined whether padding bits exist or not. What kind of C statements are expected to produce what kind of results depending on presence or absence of padding bits. If it's only arithmetic, I will get away with it, but if I'm not allowed to store padding bits in memory in a manner which permits seeing them by scanning the variable char by char then my wonderful architecture is screwed.

It probably is, I think you are right.

But there is still the interesting issue of aliasing. I know for sure that loading a float value through a uint32_t pointer is undefined behavior, the same almost certainly applies to uint16_t and then why not to uint8_t? I suspect that the following actually is illegal but people do it all the time and implementations probably wouldn't dare to break it:

Code: [Select]
void *memcpy(void *dest, const void *src, size_t n) {
  uint8_t *d = (uint8_t*)dest;
  uint8_t *s = (uint8_t*)src;
  while (size--) *d++ = *s++;
  return dest;
}
Title: Re: What does the "&" do in my fscanf statement in C
Post by: greenpossum on June 16, 2020, 09:25:52 am
void *s don't need to be cast to any other (data) pointer, straight assignment will work.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: andersm on June 16, 2020, 10:27:39 am
One thing that made it clear (to me at least) that char should be unsigned, even though it isn't, is the value used for EOF.
EOF is deliberately defined to be of type int. It is a bug to assign a return value to a char (of any type) and only then test against EOF.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: hamster_nz on June 16, 2020, 12:34:35 pm
One thing that made it clear (to me at least) that char should be unsigned, even though it isn't, is the value used for EOF.
EOF is deliberately defined to be of type int.
Not in my Linux box's 'stdio.h':

Code: [Select]
/usr/include/stdio.h:# define EOF (-1)

Anyhow we have what we have and learn to live with it  :)
Title: Re: What does the "&" do in my fscanf statement in C
Post by: greenpossum on June 16, 2020, 12:45:14 pm
The important fact is that the functions that return EOF like getchar() and getc() return int, not char.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: Nominal Animal on June 16, 2020, 01:44:38 pm
Since Nominal Animal han’t provided that: to get an explanation better than “standard requires” (in this case the specs are not clear on why you should do that), see STR37-C (https://wiki.sei.cmu.edu/confluence/display/c/STR37-C.+Arguments+to+character-handling+functions+must+be+representable+as+an+unsigned+char) in SEI CERT CCS.
Yeah, sorry about that; it's a bit of a chore to grab the exact references, as only the final draft PDFs (n1256 for C99 (http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf), n1570 for C11 (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), and n2176 for C18 (https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf)) are available on the net.

In all three versions it is clearly stated in 7.4p1:
Quote
The header <ctype.h> declares several functions useful for classifying and mapping characters.  In all cases the argument is an int , the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF . If the argument has any other value, the behavior is undefined.

C99 7.19.1p3, C11 7.21.1p3, and C18 7.21.1p3 further states that
Quote
EOF [is a macro that] expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream [.]

This means that if one intends to simply and easily use the standard C library, one must cast individual character codes to unsigned char before supplying to e.g. the <ctype.h> functions; and that EOF is an integer macro and not a character at all: it indicates a condition (so like a non-error error) many standard C library functions can report.

Code: [Select]
float max = 2147483647;  // RAND_MAX
Unfortunately, RAND_MAX can be as small as 32767, and actually is on many older standard C libraries.

For typical use, the Xorshift (https://en.wikipedia.org/wiki/Xorshift) family of generators is superior: faster, and more random: these pass almost all BigCrush randomness tests, whereas standard C library linear congruential generators (https://en.wikipedia.org/wiki/Linear_congruential_generator) produce quite poor sequences, and are slower to compute.

For statistical work, use Mersenne Twister (https://en.wikipedia.org/wiki/Mersenne_Twister). (It too fails two test in BigCrush, though, but if you are writing an article, you don't have to defend its use because it is the commonly used one; using Xorshift would require at least a minimal argument for it.)

For cryptographic work, concatenate high bits of the generator output (say, half, or perhaps three quarters of the bits) to the block width of a cryptographic hash algorithm (like 512 bits for SHA512), and use the result.  It will pass all randomness tests, and even with a complete output sequence of any reasonable length and a cryptographic oracle (that provides all ciphertexts given the hashes) you cannot reliably estimate the next generated value, because it depends on the discarded bits that cannot be extrapolated from the output.  The number of opportunities depends on the number of bits you drop.  If you drop a quarter of the bits, you have a 1 in 2128 chance of guessing the next generated 512-bit value – and that is with a cryptographic oracle.  (This assumes generators that mostly pass BigCrush, and whose period is on the order of their output size, or larger.)
Title: Re: What does the "&" do in my fscanf statement in C
Post by: Nominal Animal on June 16, 2020, 01:50:26 pm
Not in my Linux box's 'stdio.h':
Code: [Select]
/usr/include/stdio.h:# define EOF (-1)
By standard C rules ("integer promotion" in the drafts I linked to above), -1 (and (-1)) are of type int, so even on that machine, EOF is defined to a negative value of type int.
Title: Re: What does the "&" do in my fscanf statement in C
Post by: golden_labels on June 16, 2020, 01:56:34 pm
The matter is how it is defined whether padding bits exist or not. What kind of C statements are expected to produce what kind of results depending on presence or absence of padding bits. If it's only arithmetic, I will get away with it, but if I'm not allowed to store padding bits in memory in a manner which permits seeing them by scanning the variable char by char then my wonderful architecture is screwed.
Padding bits is about representation, not semantics of the expressions. In particular they are accessible if you access the object through char*, even if they have no meaning to the object itself.

But there is still the interesting issue of aliasing. I know for sure that loading a float value through a uint32_t pointer is undefined behavior, the same almost certainly applies to uint16_t and then why not to uint8_t? I suspect that the following actually is illegal but people do it all the time and implementations probably wouldn't dare to break it:
It does too. It doesn’t produce UB only because uint8_t is typically just an alias to unsigned char and you can access any data through a pointer to char, unsigned char and signed char.

Also that particular piece of code is an internal implementation tailored for particular compilers. Those are not written in “just any C”, but in the very specific flavour of C suppoted by the given compiler.


A side note. Earlier I’ve mentioned padding bits and trap representations. This is something often dismissed as “this happens only on imaginary platforms and no one has ever seen that in the wild”. The truth is quite opposite. Both are present even on platforms as popular as x86_64. long double has padding bits. Right, no one uses long double, so what obscure type will I use for trap repreentation? bool! :D For example this code, if compiled with clang or gcc on x86_64 Linux, with -O2 optimizations, produces undefined behaviour:
Code: [Select]
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

static int decide(bool* b) {
    if (*b) {
        return 1;
    } else {
        return 0;
    }
}

int main(void) {
    bool cond;
   
    *(char*)&cond = 3;
   
    printf("Decide = %d\n", decide(&cond));
    return EXIT_SUCCESS;
}
It’s not hard to determine what has happened under the hood. But the code, as seen from programmer’s perspective, doen’t express such behaviour. The only two possible outputs are "Decide = 0\n" or "Decide = 1\n", because decide can’t return anything other than 0 or 1. Yet, due to UB and optimization, it “returns” 3.

Not in my Linux box's 'stdio.h':
Code: [Select]
/usr/include/stdio.h:# define EOF (-1)
Which is… int. The expression -1 has type int. The point is, however, not what is the type of the expression generated by the EOF macro, but the type with which it is to be used, which is int.