Author Topic: Are data type compiler-dependent or target dependent (Read 7359 times)

King123 · « **on:** August 02, 2022, 01:19:22 pm »

I am being confused with data type in c programming language.

My question:

Are data type compiler-dependent or target dependent?

golden_labels · « **Reply #1 on:** August 02, 2022, 01:50:28 pm »

What exactly do you mean by “data type”? Or ather, what kind of dependence do you mean?

If you mean ranges, sizes, representation: both the compiler and the platform. Even more: may also depend on particular options passed to the compiler. The platform limits what makes sense, so it’s the primary factor, but not the only one.

tellurium · « **Reply #2 on:** August 02, 2022, 02:48:06 pm »

Quote from: King123 on August 02, 2022, 01:19:22 pm

I am being confused with data type in c programming language.

My question:

Are data type compiler-dependent or target dependent?

There are basic types, like int, long. They do not require any header file. Their size depends on the target. For example, if you're compiling on 64-bit Windows machine using Arduino IDE, the AVR compiler uses 8-bit AVR compiler as a target, where "int" type is 2 bytes. If you're compiling for the 64-bit Windows, the "int" type would be 4 bytes.

There are other types, like size_t, uitn32_t, etc. They do require header files. When C compiler compiles a piece of code, all headers gets inlined and more complex types resolve to the basic types. The header files are, too, depend on the target. Usually, header flies are bundled together with the compiler.

Hope that clarifies

DiTBho · « **Reply #3 on:** August 02, 2022, 03:28:49 pm »

Quote from: King123 on August 02, 2022, 01:19:22 pm

Are data type compiler-dependent or target dependent?

data-type is always the same, but the data-size is a bit weird

e.g.
on MIPS4 "long" means 64-bit, "long long" means 64-bit
on hc11, gcc 3.0.*, "int" means 16 bit, while with icc-v11, "int" means 32-bit
on hc11, gcc.3.4.6 + patch, "int" are either 16 or 32-bit entities depending on a special flag

DiTBho · « **Reply #4 on:** August 03, 2022, 07:03:33 am »

endianess { BE, LE } is also target dependent.
MIPS, SH, POWER and PowerPC can be LE or BE depending on a configuration bit at boot.
AMD and intel x86 are always LE.

King123 · « **Reply #5 on:** August 03, 2022, 11:05:17 am »

I am trying to understand What are difference between int, short and long in context of c standards?

tggzzz · « **Reply #6 on:** August 03, 2022, 12:03:18 pm »

Quote from: King123 on August 02, 2022, 01:19:22 pm

I am being confused with data type in c programming language.

My question:

Are data type compiler-dependent or target dependent?

Yes

And add processor dependent.

DiTBho · « **Reply #7 on:** August 03, 2022, 12:04:22 pm »

Quote from: King123 on August 03, 2022, 11:05:17 am

difference between int, short and long in context of c standards

char is always supposed to be 8bit
short, int, long are always bigger than 8bit

int and long can vary their data-size, when unsure, you can check with sizeof(x)

Code: [Select]

#include <stdio.h>

int main()
{
    printf("the size of \"char\"      is %d byte\n", sizeof(char));
    printf("the size of \"short\"     is %d byte\n", sizeof(short));
    printf("the size of \"int\"       is %d byte\n", sizeof(int));
    printf("the size of \"long\"      is %d byte\n", sizeof(long));
    printf("the size of \"long long\" is %d byte\n", sizeof(long long));

    return 0;
}

the data-size and endianess are the only target&processor-depended and compiler-dependent differences

Code: [Select]

the size of "char"      is 1 byte
the size of "short"     is 2 byte
the size of "int"       is 4 byte
the size of "long"      is 4 byte
the size of "long long" is 8 byte

Gcc v4.1.2 on PowerPC-7550

emece67 · « **Reply #8 on:** August 03, 2022, 12:13:04 pm »

T3sl4co1l · « **Reply #9 on:** August 03, 2022, 12:38:39 pm »

https://en.cppreference.com/w/cpp/language/types

Nominal Animal · « **Reply #10 on:** August 03, 2022, 12:40:50 pm »

Quote from: King123 on August 03, 2022, 11:05:17 am

I am trying to understand What are difference between int, short and long in context of c standards?

The range of values they can represent. If you include <limits.h>, the compiler and the C library will expose a set of constants:

SHRT_MIN and SHRT_MAX describing the minimum and maximum integer values a short can represent.
SHRT_MIN will be -32767 or smaller (-32768 is most common), and SHRT_MAX will be 32767 or larger (32767 is most common)
INT_MIN and INT_MAX describing the minimum and maximum integer values an int can represent.
INT_MIN will be -32767 or smaller (-2147483648 is most common), and INT_MAX will be 32767 or larger (2147483647 is most common)
LONG_MIN and LONG_MAX describing the minimum and maximum integer values a long can represent.
LONG_MIN will be -2147483647 or smaller (-2147483648 is most common), and LONG_MAX will be 2147483647 or larger (2147483647 is most common)

The type char can be signed or unsigned; it too varies depending on the compiler and the target (C library).
The type size_t is some unsigned integer type that can handle any in-memory size; int range may not suffice (and does not on typical 64-bit architectures).

The main thing to remember is that when doing arithmetic or logic, any integer type with smaller range than an int, will be promoted to int. This is a "quirk" in the C language. To limit the range and precision of an expression, we use casts: (type)(expression). For example, if we have unsigned char a, b; then (a + b) yields an int, but (unsigned char)(a + b) yields an unsigned char.

Similarly, when <stdarg.h> variable argument lists are used, types smaller than int are promoted to int, float is promoted to double, and so on.

When you include <stdint.h> (or <inttypes.h>), additional types may be (and are in practice) exposed with very useful features:

intN_t and uintN_t with N being 8, 16, 32, and 64.
These are signed (with two's complement representation) and unsigned integer types of exactly N bits with no padding bits.
These are extremely useful for file formats and other data interchange. You still need to consider byte order, since that varies from architecture to architecture (although the vast majority is "little endian" or "big endian"), but there are very simple ways to do that.
int_fastN_t and uint_fastN_t with N being 8, 16, 32, and 64.
These are signed (with two's complement representation) and unsigned integer types of at least N bits, that provide "fastest" arithmetic and logic on a given architecture.
For example, on some architectures 16-bit arithmetic requires extra machine instructions (masking out the extra bits). On those, int_fast16_t might correspond to int32_t or int64_t for example, whichever yields faster arithmetic.
These are extremely useful for internal variables, say for loops and such. Instead of hoping the compiler will generate efficient code, by choosing a suitable N to match the range you expect, using these types the compiler will generate the fastest code it can.
intmax_t and uintmax_t, corresponding to the signed and unsigned integer types with the largest range of representable values the architecture supports.
intptr_t and uintptr_t, corresponding to signed and unsigned integer types that are compatible with pointers; a pointer converted to one of these and then back to its original pointer type will retain its value.
These are useful when one needs to represent a pointer as an integer for some reason. There is a lot of code that does (int)(pointer_expression) , but that is a BUG: the int type often does not have the range to represent all possible pointer values, so such code may work on some machines, but fail on others, on the exact same architecture, depending on the actual pointer values! The pain and suffering this assumption alone has caused is immense: please do not let anyone do that.

Quote from: T3sl4co1l on August 03, 2022, 12:38:39 pm

https://en.cppreference.com/w/cpp/language/types

C and C++ are two completely different languages. Please, do not let the superficial similarities confuse people into thinking they are the same. While a C++ compiler can compile most C code, it cannot compile all C standards compliant code.

T3sl4co1l · « **Reply #11 on:** August 03, 2022, 12:58:04 pm »

Quote from: Nominal Animal on August 03, 2022, 12:40:50 pm

Quote from: T3sl4co1l on August 03, 2022, 12:38:39 pm
https://en.cppreference.com/w/cpp/language/types
C and C++ are two completely different languages. Please, do not let the superficial similarities confuse people into thinking they are the same. While a C++ compiler can compile most C code, it cannot compile all C standards compliant code.

Well fuck me, guess I better dump all my code in the bitbucket and start over.

(The table shows same as what you quoted, unless you mean to tell me they do in fact differ on this most basic of properties and I've missed something?)

Tim

DiTBho · « **Reply #12 on:** August 03, 2022, 01:10:31 pm »

Quote from: tggzzz on August 03, 2022, 12:03:18 pm

add processor dependent.

yup, weird example, but I am on my R18200 MIPS4 prototype right now, and I have some issue accessing the ASIC chip due to the way the CPU performs load/store operations.

There is a memory mapped ASIC chip, I need to access its registers with byte-granularity, but the hardware doesn't support such operations.
The CPU is 64bit register file, sizeof(long long) is the same as sizeof(long), and at the load/store side, the CPU only perform 64bit load/store accesses.

When you issue a load/store.byte, they always access 64 bit as single read cycle, and they they only consider the lowest byte
e.g.

Code: [Select]

load data from address EA
return data = data bitwiseAnd 0x00.00.00.00.00.00.00.ff <------ only consider the lowest byte

Thus, to access a byte with odd address, you have to properly calculate both EA and the mask

e.g. load byte @ EA=0x8000.0002 <----- you cannot use this address as is because it would trap an hw exception, bad alignment address

Code: [Select]

load {byte[0..7]} data from address (EA0 & 0xfffffffc) <----------- the address becomes 0x8000.0000
return data = { 0x00.00.00.00.00.00.00.byte[EA % 8] } <------------ it only consider the third byte, namely "byte2"

This can be done in assembly but the C compiler, or in hardware if the CPU supports it.

From the high level, unless you directly play with a sensible ASIC chip, you don't notice the difference. For you, it's a "long"-datatype read from memory.

Other CPUs like ijvm do exactly the opposite. They have a 8bit load/store unit, so when you access 64 bit they issue eight read cycles on the bus
e.g.
load data0 from address EA+0
load data1 from address EA+1
load data2 from address EA+2
load data3 from address EA+3
load data4 from address EA+4
load data5 from address EA+5
load data6 from address EA+6
load data7 from address EA+7
return data= { data0.data1.data2.data3.data4.data5.data6.data7 }

Again, from the high level, unless you directly play with a sensible ASIC chip, you don't notice the difference. For you, it's a "long"-datatype read from memory.

Endianess is also exposed, but only if you do low-level things.

DiTBho · « **Reply #13 on:** August 03, 2022, 01:21:07 pm »

Quote from: Nominal Animal on August 03, 2022, 12:40:50 pm

C and C++ are two completely different languages. Please, do not let the superficial similarities confuse people into thinking they are the same. While a C++ compiler can compile most C code, it cannot compile all C standards compliant code.

Yup

Code: [Select]

c       is 0xdeadbeaf
c++     is 0xdeadcafe
C ^ C++ is 0xdead8aae

Nominal Animal · « **Reply #14 on:** August 03, 2022, 02:24:07 pm »

Quote from: T3sl4co1l on August 03, 2022, 12:58:04 pm

(The table shows same as what you quoted, unless you mean to tell me they do in fact differ on this most basic of properties and I've missed something?)

No, that's not what I meant.

I meant that using a C++ reference to show something about C gives new programmers an untenable intuition: that the two are the same, or that one is a superset/subset of the other.

You used the link because in this detail, C and C++ use the same rules. You didn't mention that, just posted the link. So, I did the "That dog äläht's to which the kalig calaht's" thing, and barked, because therein lies a hidden danger that I've seen bit others in the butt.

Maybe it does not matter to you, but it for sure matters to me, especially because I like to do stuff in a mixed freestanding C and C++ environment. Why? Try explaining such an environment and quirks to a coder who thinks C is a subset of C++, and is angry that nothing works like they expect it to. I have tried, and I've found it is useless, unless I first disabuse them from their incorrect notions (including C being a subset of C++; just consider the struct namespace, which is separate in C but the same as type namespace in C++). Or when you port such freestanding code, and instead of understanding the boundary between standards and implementations, the author just made a full set of unstated assumptions that are only valid with that specific version of the compiler and target –– possibly because they checked it on their compiler, and it seemed to produce the wanted results. Holy hell is that annoying and horrible to work with; basically unsalvageable mess. I just want to save others from that sort of pain, OK?

T3sl4co1l · « **Reply #15 on:** August 03, 2022, 02:41:32 pm »

Fine, use this link:
https://en.cppreference.com/w/c/language/arithmetic_types

Nominal Animal · « **Reply #16 on:** August 03, 2022, 03:14:00 pm »

Quote from: T3sl4co1l on August 03, 2022, 02:41:32 pm

Fine, use this link:
https://en.cppreference.com/w/c/language/arithmetic_types

No, that's not it either. I'd prefer you wrote something along the lines of "See C++ Reference for example. C and C++ have the same definitions for integer types."

Personally, I do not trust cppreference.com at all when it comes to C, by the way. (Edited to note: It is a Wiki directed towards C++ developers, after all.) Just like I don't trust microsoft.com when it comes to Mac OS, either. In particular, any "C reference" that excludes POSIX C is worthless shite in my opinion. But you do you.

eugene · « **Reply #17 on:** August 03, 2022, 03:39:26 pm »

Quote from: King123 on August 03, 2022, 11:05:17 am

I am trying to understand What are difference between int, short and long in context of c standards?

That question has been answered and tellurium mentioned <stdint.h> in reply #2. I just want to emphasize that if what you are really asking is how to get an int of a specific size, then don't try to second guess the compiler. Just use types that are in <stdint.h>: int8_t when you want signed 8 bit integer, uint32_t when you want unsigned 32 bit, etc.

Nominal Animal · « **Reply #18 on:** August 03, 2022, 05:48:11 pm »

In case this is OP's homework, let's drown them with useful information.

The following terms will help when one is looking for details on this stuff:

architecture – the instruction set and/or processor family used on the target
ABI – the binary interface provided by the compiler and the base libraries
Common C data models:
- ILP32: int, long, and pointers are all 32 bits in size
- LLP64: int and long are 32-bit, but long long and pointers are 64-bit values
- LP64: int is 32-bit, but long and pointers are 64-bit
  (plus some rare others, like ILP64 and SILP64)
The primitive used for atomic access: CAS (compare and exchange) or LL/SC (load-link, store-conditional)
CAS is based on an instruction that will update target memory atomically if it contains a specific value, and will fail otherwise. LL/SC implements a load instruction that pairs with a store to the same address, with the store only succeeding if nothing modified that memory in between.
Endianness: when storing multi-byte values, in which order are the bytes stored in.
Note that bits have only one order within a byte, based on their numerical value: bit n has numerical value 2ⁿ, with the least significant (integer) bit being bit 0. In an N-bit unit, the most significant bit is bit N-1.
When transmitted using a serial connection like SPI or I2C, the bits can be transmitted either the least significant or the most significant bit first.
Finally, some documentation (like old IBM docs) labels bits starting at 0 at the most significant bit. This can be problematic when trying to map the bit to a specific value (of a register containing bits labelled using such a scheme).
API or application programming interface – the programming interface that the base libraries and/or the kernel provides in a specific programming language like C.

For example, Windows running on 64-bit Intel and AMD x86-64 processors uses the Windows ABI and API, which has an LLP64 data model, is little-endian, and the base libraries provide a subset of C11, and even C code is intended to be compiled using a C++ compiler (as Microsoft does not provide a C compiler, only a C++ compiler that can compile most C code).

In comparison, Linux running on that same hardware uses the System V ABI, providing almost all of POSIX.1-2008 C, and some additional Linux- and GNU-specific C extensions. It has an LP64 data model, is little-endian.

However, Linux also runs on a lot of other architectures (processors and instruction sets), on both 32-bit and 64-bit ones. The 32-bit ones all have an ILP32 data model, and the 64-bit ones an LP64 data model. (Because of this, in Linux, long and unsigned long are the same size as pointers.)
Depending on the architecture, byte order is either little-endian or big-endian. (Some, like many ARM cores, can even switch between the two at run time, but I do not believe it is supported for userspace programs in Linux.)

The <stdint.h> header is actually provided by the compiler, in the sense that it is available even for freestanding code (with no standard C library features available). You could say it provides much saner, easily predictable types for one to use in a reliable, portable manner.

To solve byte order issues, one can use wither the __BYTE_ORDER macro (which matches either __BIG_ENDIAN or __LITTLE_ENDIAN, if defined; see Pre-defined Compiler Macros for further info) at compile time, or C code similar to the following at run time:

Code: [Select]

#include <stdint.h>

typedef union {
    uint64_t       u64[1];
    int64_t        s64[1];
    uint32_t       u32[2];
    int32_t        s32[2];
    uint16_t       u16[4];
    int16_t        s16[4];
    uint8_t        u8[8];
    int8_t         s8[8];
    unsigned char  uc[8];
    signed char    sc[8];
    char           c[8];
    double         d[1];    /* Assumes IEEE 754 Binary64 - verify at run time */
    float          f[2];    /* Assumes IEEE 754 Binary32 - verify at run time */
} word64;

typedef union {
    uint32_t       u32[1];
    int32_t        s32[1];
    uint16_t       u16[2];
    int16_t        s16[2];
    uint8_t        u8[4];
    int8_t         s8[4];
    unsigned char  uc[4];
    signed char    sc[4];
    char           c[4];
    float          f[1];    /* Assumes IEEE 754 Binary32 - verify at run time */
} word32;

static inline word64 byteorder64(word64 w, int_fast8_t order)
{
    if (order & 1)
        w.u64[0] = ((w.u64[0] >>  8) & UINT64_C(0x00FF00FF00FF00FF))
                 | ((w.u64[0] & UINT64_C(0x00FF00FF00FF00FF)) <<  8);
    if (order & 2)
        w.u64[0] = ((w.u64[0] >> 16) & UINT64_C(0x0000FFFF0000FFFF))
                 | ((w.u64[0] & UINT64_C(0x0000FFFF0000FFFF)) << 16);
    if (order & 4)
        w.u64[0] = ((w.u64[0] >> 32) & UINT64_C(0x00000000FFFFFFFF))
                 | ((w.u64[0] & UINT64_C(0x00000000FFFFFFFF)) << 16);
    return w;
}

static inline word32 byteorder32(word32 w, int_fast8_t order)
{
    if (order & 1)
        w.u32[0] = ((w.u32[0] >>  8) & UINT32_C(0x00FF00FF))
                 | ((w.u32[0] & UINT32_C(0x00FF00FF)) <<  8);
    if (order & 2)
        w.u32[0] = ((w.u32[0] >> 16) & UINT32_C(0x0000FFFF))
                 | ((w.u32[0] & UINT32_C(0x0000FFFF)) << 16);
    return w;
}

Note that the inline above is irrelevant to the C compiler; static alone suffices. I just use the static inline as an indicator for us humans that these are accessor or helper functions, typically defined in the header file, and not just ordinary "internal" functions (that I declare static only).

The basic idea here is that type punning via an union is described in a footnote in all C ISO standards as a way to reinterpret the storage of a variable, so we use that to reinterpret the storage representation of the known fixed-size integer types, as well as the two floating-point types that usually match IEEE 754 binary32 (single precision, float) and binary64 (double precision, double) types.

For 32 bit values, there are only four possible byte orders: native (0), swapped (3), or the two intermediate ones (1 and 2) that were only used in certain old systems. If you check ((word32){ .uc = { 0x01, 0x02, 0x03, 0x04 }}).u32[0], it will have value 0x04030201 on little-endian architectures, and 0x01020304 on big-endian architectures. The two other possibilities are 0x03040102 and 0x02010403, but they are nowadays extremely, extremely rare.

For 64 bit values, there are eight possible byte orders, with native (0) and swapped (7) being the most common ones.
If you check ((word64){ .uc = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 }}).u64[0], it will have value UINT64_C(0x0807060504030201) on little-endian architectures, and UINT64_C(0x0102030405060708) on big-endian architectures. The others are exceedingly rare.

The reason for using the UINTN_C(value) macros is that since the value might exceed the range int can represent (see my previous post above), <stdint.h> provides macros that adds a suffix (U, L, UL, LL, or ULL) if needed to tell the compiler the base type of the constant. They can be used even in preprocessor #if statements.

If you have a peer (or client or server) you are communicating with, and it uses its native byte order but the types used in the above word32 and word64 unions, all you need to do is to agree with a prototype numeric value –– I recommend one double, one float, one 64-bit signed and negative integer, and one 32-bit signed or unsigned integer, for completeness; these take a total of 24 bytes ––, and then do a byte order discovery loop. For example:

Code: [Select]

int_fast8_t find_byteorder64(word64 w, word64 expected)
{
    for (int_fast8_t  order = 0; order < 8; order++)
        if ((byteorder64(w, order)).u64[0] == expected.u64[0])
            return order;
    return -1;
}

int_fast8_t find_byteorder32(word32 w, word32 expected)
{
    for (int_fast8_t  order = 0; order < 4; order++)
        if ((byteorder32(w, order)).u32[0] == expected.u32[0])
            return order;
    return -1;
}

These will return the order needed for byteorderN() to convert the byte order to current native byte order, or -1 if no byte order conversion of the given word w matches the expected word expected.

You can use a heuristic check to verify that the target architecture has the same byte order for floating-point and integer types, and that the floating-point types match the assumption above, via for example

Code: [Select]

    if (((word32){ .f[0] = 0.0498918667435646f }).u32[0] != UINT32_C(0x3D4C5B6A))
        fprintf(stderr, "Warning: Invalid 32-bit byte order, or 'float' not IEEE 754-2008 Binary32.\n");
    if (((word64){ .d[0] = -2.125982314494425 }).u64[0] != UINT64_C(0xc001020304050607))
        fprintf(stderr, "Warning: Invalid 64-bit byte order, or 'double' not IEEE 754-2008 Binary64.\n");

Most C compilers, when optimizing (-Og or -O2) this code, generate no machine code because they can determine at compile time that the code cannot ever issue a warning, given the target properties. Of course, instead of printing a warning to standard error, you can make this an assert() (#include <assert.h>).

DiTBho · « **Reply #19 on:** August 03, 2022, 06:42:09 pm »

Quote from: Nominal Animal on August 03, 2022, 05:48:11 pm

The primitive used for atomic access

CAS, CAS2, TAS, TAS2 are usually read-modify-write, looooooooooooooooong operation on the bus.
I used them on 68020 and 68030.

The CAS instruction compares the value in a memory location with the value in a data-register, and copies a second data register into the memory location if the compared values are equal. This provides a means of updating system counters, history information, and globally shared pointers. The instruction uses an indivisible read-modify-write cycle; after CAS reads the memory location, no other instruction can change that location before CAS has written the new value.

In a multiprocessor environment, the other processors must wait until the CAS instruction completes before accessing a global pointer, this doesn't perform well because it locks the bus.

read-modify-write is not used in pure RISC design (like MIPS) because for such special and lengthy bus operations the load / store requires a special stage to keep the pipeline busy which would be a big problem with superscalar machines like PowerPC where you would need special pipeline instructions { isync sync and eieio } to have minimal guarantees.

So, you will find CAS/TAS on neither PowerPC nor MIPS

LL/SC is better, and simpler to be implemented in a RISC design, and it's also better with multiprocessors because it doesn't lock the bus.

SiliconWizard · « **Reply #20 on:** August 03, 2022, 07:21:41 pm »

Quote from: DiTBho on August 03, 2022, 12:04:22 pm

Quote from: King123 on August 03, 2022, 11:05:17 am
difference between int, short and long in context of c standards

char is always supposed to be 8bit

Well, nope. It's supposed to be at least 8-bit.

Quote from: DiTBho on August 03, 2022, 12:04:22 pm

short, int, long are always bigger than 8bit

Well. Not necessarily. They are supposed to be at least 16-bit for short and int, and at least 32-bit for long.
All that being implementation-defined per the standard.
Which means that on some implementation, char, short and int could all be the same, like 32-bit, or even wider, and it would still be compliant with the standard.

The standard gives minimum ranges for all those types. They are not guaranteed to have a different width.

DiTBho · « **Reply #21 on:** August 03, 2022, 09:46:42 pm »

Quote from: SiliconWizard on August 03, 2022, 07:21:41 pm

Well, nope. It's supposed to be at least 8-bit.

yeah, will a char always-always-always have 8 bits?

Technically not always, most often *de-facto* it will, and that's the big shit with C since on the vast majority of platforms out there (including the Linux kernel and u-boot), "char" is always assumed to be of the same size of a byte even if technically the C standard says that it's supposed to be at least 8-bit.

Why? Well, because "8 bit" on CPUs(1), MPUs and DSPs made in 90s, 2000s, 2010s and 2020s (earlier CPUs were a bit different) is *de-facto* the smallest addressable amount of memory(2), and so *it should be* a char in C, even because sizeof(char) is supposed to always returns 1.

What I usually tend to assume

Code: [Select]

typedef unsigned char uint8_t;

(and it must be always verified)

And you know, it must be always verified because 99% of times it's true on the vast majority of platforms out there, but then someday you find that "char" on Ti320 is 16bit and, worse still, sizeof(char) should return 1, but on a some weird C-Ti320 compiler I have personally seen that sizeof(char) returns 2, which is "intuitively" correct, but it's completely wrong according to C standard, because sizeof(char) MUST always return 1 even if the size of char is 2 bytes.

if char is 16bit, sizeof(char)=1
if char is 8bit, sizeof(char)=1

WTF?

That's pretty shit with the C language definition, which honestly is nothing but insane.

And, worse still, if you are interested in finding out just exactly how many bits of space your data types consume on your system, you can use the following line of code:

Code: [Select]

sizeof(type) * CHAR_BIT

So, that's how you can verify how many bits char is on a system

Code: [Select]

printf("The number of bits a 'char' has on my system: %zu\n", sizeof(char) * CHAR_BIT);

(taken from the GNU C Library Reference Manual)

crazy, but ... that's it

(1) CPUs, where "char" is 8 bit
8080, 8085, 8088, z80, z8000, etc
8051, 80c390, 80C400, etc
68hc11 and 68xx
m68xxx & Coldfire
m88xxx
PowerPC, PPC60x, 62x, 7xx, 74xx, e500, 40x, 440, 460, ...
POWER9, POWER10, POWER11
MIPS1, MIPS2, MIP3, MIPS4, MIPS32, MIPS64
x86
SH1, SH2, SH3, SH4
HPPA1 and HPPA2
ARM*

(2) ok, on my R18200 there is a problem with the load / store unit, and it is not physically possible to output a single byte bus-cycle, if you want to access a byte, the cpu physically emits a uint64 bus-cycle, but The CPU is somehow still able to handle 8-bit granularity thanks to special instructions that a clever C compiler can use, so there is no reason to define char as 8 bytes in size.

Corollary
it's my personal opinion that defining an hardware architecture where char is more than 8bit, and you don't have any other way to access 8 bit, means the design is crappy.

In fact the Ti320, where char is 16 bit, sucks.

SiliconWizard · « **Reply #22 on:** August 03, 2022, 09:55:05 pm »

This is why stdint was introduced in C99.
Note that exact-width types are only optional, but they are supported on most platforms these days except possibly for the very odd ones.

While I (and many others) wholeheartedly suggest using stdint's when you need to have some control over the width of integers, since exact-width types are only optional, if you use them, then your code is not guaranteed to be strictly portable anymore. Yeah, just the way it is.

The odd targets on which char may not be 8-bit and on which none, or only some of the exact-width integers are defined, are usually DSPs these days.

All this is not "insane", it all comes from the fact that the C standard has always had the goal of supporting a very wide range of targets while making it possible for compilers to produce efficient code.

westfw · « **Reply #23 on:** August 04, 2022, 08:25:16 am »

Quote

Are data type compiler-dependent or target dependent?

Compiler dependent.

But Compiler authors DO pay attention to the underlying CPU architecture...
You could theoretically write an ARM C compiler that made "int" 16 bits (say, if you were making a misguided attempt to address the 8bit microcontroller market), but AFAIK, no one has done that, and anyone who tried would probably be laughed at.
"char" is famously ambiguous as to whether it is signed or not (and different between different compiles.)
And breaking the 4Gbyte memory addressability limit introduced all sorts of "issues" when people were faced with pointers than maybe shouldn't be the same size as either "long" or "int."

HwAoRrDk · « **Reply #24 on:** August 04, 2022, 08:57:41 am »

Quote from: Nominal Animal on August 03, 2022, 03:14:00 pm

Personally, I do not trust cppreference.com at all when it comes to C, by the way. (Edited to note: It is a Wiki directed towards C++ developers, after all.) Just like I don't trust microsoft.com when it comes to Mac OS, either. In particular, any "C reference" that excludes POSIX C is worthless shite in my opinion. But you do you.

I frequently refer to the C language information on cppreference.com. Can't say I recall the information provided there ever leading me astray or being flat-out wrong.

That it doesn't cover POSIX stuff is even a boon for me when doing embedded stuff, because many embedded C platforms don't have POSIX libraries.

What resource would you suggest instead? Straight from the horse's mouth, poring over insipid C standard PDF documents?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Are data type compiler-dependent or target dependent (Read 7359 times)

Share me