I'm trying to understand why compilers allocate additional memory when using a structure in a program ( padding). I'm puzzled by the fact that why it doesn't allocate the exact memory needed for each structure object.
In my observation, when having two structure members—one of int type and the other of char type—the compiler allocates a total of 8 bytes of memory (4 bytes for the int type object and 4 bytes for the char type object). However, I'm curious about why there's an extra 3 bytes allocated for the char type object.why compiler not allocate 1 byte
Could someone kindly elaborate reason of structure padding c language
First off, it's not about the C language except that the C language defines that platforms are *allowed* to add padding bytes within a structure (with some restrictions). It's a platform specific requirement.
The reason they do this, and the reason it is allowed is that most CPUs require data of different types to be aligned (least few bits of the address zero) for maximum performance. For instance, on most 32-bit and 64-bit platforms, a 32-bit integer needs to be aligned to an address that is a multiple of 4 bytes. Failure to do that will either result in the CPU taking more cycles than necessary to load the data or it could even cause a fault that either crashes the program or requires software handler to fix.
When you write:
struct foo {
int32 a;
char b;
}
the structure needs to be aligned on a 4 byte boundary for performance. For a single structure you might thing it would be OK to have the length be 5 as long as the starting address is properly aligned. But if you want to make an array of struct foo, the elements need to be spaced 8 apart to keep the alignment correct for every element. C doesn't allow padding *between* elements of arrays, so the struct itself must have padding. That is why sizeof(foo) == 8.
The way data is represented by compilers on a specific platform is governed by the platform ABI (application binary interface). This sets up rules to make sure that code interoperates properly, and includes specifying the data representation and layout of data types. The ABI will specify padding based on the architecture's requirements, and compilers will follow the ABI even when it wouldn't be necessary in a particular situation so that code can interoperate.
In C itself there is no such requirement. The actual quote, with the part you removed, is (deleted part underlined):The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested.
The reason you may see apparent alignment to some specific value is the implementation detail of the allocator, not language’s requirement. Described in my post above. It’s also likely to be higher than “word”. For example on my x86_64 Linux glibc uses 32-byte alignment for one-byte allocations.
The C language is perfectly fine with creating objects as dictated by their actual alignment requirements. The example below works only in one direction (only the positive result is meaningful) and it must use stack to avoid using allocator, but — if it works on your platform — you can clearly see the language itself doesn’t care at all:#include <stdlib.h>
#include <stdio.h>
int main(void) {
char a[1];
char b[2];
char c[4];
char d[1];
char e[3];
char f[8];
printf("%p\n%p\n%p\n%p\n%p\n%p\n", a, b, c, d, e, f);
return EXIT_SUCCESS;
}
malloc and friends are also not “higher level interface” to brk, as they do a completely different thing. brk is merely a step in obtaining address space from the system. You wouldn’t be able to do anything with it alone. Management is done by the allocator, and malloc & co. are a standardized interface to the allocator. Relying on brk, or at least solely on brk, is also a somewhat dated concept. Nowadays mmap is used also/instead. For example glibc uses hybrid approach (https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html).