Author Topic: nesting of .h files and .h files which start with an underscore  (Read 2126 times)

0 Members and 1 Guest are viewing this topic.

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 5789
  • Country: fi
Re: nesting of .h files and .h files which start with an underscore
« Reply #25 on: October 14, 2021, 09:28:04 am »
*in fairness, about C, I only remember one minor style thing where I did not.

Was it shift vs. division, my argument being explicit instead of implicit?  :)

If it was, I think I have become less strict about the reasoning because if the exact asm op is important, one should carefully look at the listing anyway, because being explicit with C is not always possible, even if it is with the shift. So now I kinda agree with you on that as well.
« Last Edit: October 14, 2021, 09:30:58 am by Siwastaja »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 3995
  • Country: fi
    • My home page and email address
Re: nesting of .h files and .h files which start with an underscore
« Reply #26 on: October 14, 2021, 10:18:09 am »
Perhaps a simple example might be illustrative here.

Because of "stuff", I decided I wanted a simple C/C++ implementation of efficient (hardware-accelerated, if possible) 3- and 4-component vectors of 32-bit integers, and basic arithmetic operations (addition, subtraction, multiplication, division, dot product, and three-component cross product).

Note: This requires compiler-specific extensions, but those are available at least in GCC, clang, and Intel CC.  This is my design choice here; I chose to trade some portability for the efficiency of the generated code on SIMD-capable architectures in an architecture-agnostic manner.

Anyway, I started by creating a wrapper header file, ivec.h:
Code: [Select]
#ifndef   IVEC_H
#define   IVEC_H
/*
 *  Four-component hardware-accelerated vector type support
 *
 *  The r component is first, followed by the x, y, and z components.
*/

#include <stdint.h>

#define  ALT4(_1, _2, _3, _4, NAME, ...) NAME

#if defined(__GNUC__)
/* GCC has vector support built-in as an extension to C. */
typedef  int32_t  ivec __attribute__((vector_size (4 * sizeof (int32_t))));
#ifndef  PURE_ACCESSOR
#define  PURE_ACCESSOR  __attribute__((unused, const, always_inline)) static inline
#endif
#include "ivec_internal.h"

#elif defined(__clang__)
/* Clang has vector support built-in as an extension to C. */
typedef  int32_t  ivec __attribute__((vector_size (4 * sizeof (int32_t))));
#ifndef  PURE_ACCESSOR
#define  PURE_ACCESSOR  __attribute__((unused, const, always_inline)) static inline
#endif
#include "ivec_internal.h"

#else
/* Unsupported compiler. */
#error  File "ivec.h" does not contain an implementation for this compiler!
#endif

#endif /* IVEC_H */

Because the header file is wrapped in #ifndef IVEC_H, #define IVEC_H, and #endif /* IVEC_H */, it can be included safely more than once.  The idea is, that any source or header file that needs the types or functionality, does an #include "ivec.h".

To detect the compiler, we can use pre-defined compiler macros.  Here, __GNUC__ is defined if compiling with GCC, and __clang__ if compiling with clang.  Recent versions of Intel C++ also defines __GNUC__, so if one wanted to add specific quirks for it, one would need to test for __INTEL_COMPILER first.  I don't have it installed on this particular machine –– only some versions of gcc and clang ––, so I omitted it here.

The ALT4() macro is used to choose macros or functions depending on the number of arguments (2, 3, or 4).  It's one of those "tricks" you sometimes see that seem really odd, but they really are just an useful preprocessor trick.  The way it is used, is
    #define  generic_name(...)  ALT4(__VA_ARGS__, four_args, three_args, two_args)(__VA_ARGS__)
so that generic_name(a,b) expands to two_args(a,b); generic_name(a,b,c) expands to three_args(a,b,c); and generic_name(a,b,c,d) expands to four_args(a,b,c,d).  This works in both C and C++.

The PURE_ACCESSOR macro should evaluate to static or static inline with compiler-specific attributes that describe the functions as only operating on its parameters (not accessing any other objects at all), with calls having the same arguments always producing the same results.  The intent is to help the compiler maximally optimize these pure accessor type helper functions, without affecting the results.

All compilers currently only need the same ivec_internal.h:
Code: [Select]
#ifndef IVEC_H
#error  Include "ivec.h", never "ivec_internal.h" directly!
#endif

#ifndef   IVEC_INTERNAL_H
#define   IVEC_INTERNAL_H

#include <stdint.h>

#define  IVEC4(x, y, z, r)  ((ivec){ r, x, y, z })
#define  IVEC3(x, y, z)     ((ivec){ 0, x, y, z })
#define  IVEC2(x, y)        ((ivec){ 0, x, y, 0 })
#define  IVEC(...)          ALT4(__VA_ARGS__, IVEC4, IVEC3, IVEC2)(__VA_ARGS__)

#define  IVEC_R(a)          ((a)[0])
#define  IVEC_X(a)          ((a)[1])
#define  IVEC_Y(a)          ((a)[2])
#define  IVEC_Z(a)          ((a)[3])

/* Accessors */
PURE_ACCESSOR int32_t  ivec_r(const ivec  a) { return a[0]; }
PURE_ACCESSOR int32_t  ivec_x(const ivec  a) { return a[1]; }
PURE_ACCESSOR int32_t  ivec_y(const ivec  a) { return a[2]; }
PURE_ACCESSOR int32_t  ivec_z(const ivec  a) { return a[3]; }

/* Definition (in a function form) */
PURE_ACCESSOR ivec  ivec4_def(const int32_t  x,
                              const int32_t  y,
                              const int32_t  z,
                              const int32_t  r)
{
    const ivec  result = { r, x, y, z };
    return result;
}

/* Definition (in a function form) */
PURE_ACCESSOR ivec  ivec3_def(const int32_t  x,
                              const int32_t  y,
                              const int32_t  z)
{
    const ivec  result = { 0, x, y, z };
    return result;
}

/* Definition (in a function form) */
PURE_ACCESSOR ivec  ivec2_def(const int32_t  x,
                              const int32_t  y)
{
    const ivec  result = { 0, x, y, 0 };
    return result;
}

/* Pick definition function based on number of arguments. */
#define  ivec_def(...)  ALT4(__VA_ARGS__, ivec4_def, ivec3_def, ivec2_def)(__VA_ARGS__)

/* Component-wise addition */
PURE_ACCESSOR ivec  ivec_add(const ivec  a, const ivec  b) { return a + b; }

/* Component-wise subtraction */
PURE_ACCESSOR ivec  ivec_sub(const ivec  a, const ivec  b) { return a - b; }

/* Component-wise multiplication */
PURE_ACCESSOR ivec  ivec_mul_ivec(const ivec  a, const ivec  b) { return a * b; }

/* Component-wise division */
PURE_ACCESSOR ivec  ivec_div_ivec(const ivec  a, const ivec  b) { return a / b; }

/* Component-wise multiplication by a scalar */
PURE_ACCESSOR ivec  ivec_mul_i32(const ivec  a, const int32_t  s)
{
    const ivec  b = { s, s, s, s };
    return a * b;
}

/* Component-wise division by a scalar */
PURE_ACCESSOR ivec  ivec_div_i32(const ivec  a, const int32_t  s)
{
    const ivec  b = { s, s, s, s };
    return a / b;
}

#ifdef __cplusplus
PURE_ACCESSOR ivec  ivec_mul(const ivec  a, const ivec  b) { return ivec_mul_ivec(a, b); }
PURE_ACCESSOR ivec  ivec_mul(const ivec  a, const int32_t  b) { return ivec_mul_i32(a, b); }
PURE_ACCESSOR ivec  ivec_div(const ivec  a, const ivec  b) { return ivec_div_ivec(a, b); }
PURE_ACCESSOR ivec  ivec_div(const ivec  a, const int32_t  b) { return ivec_div_i32(a, b); }
#else

/* Generic multiplication depends on the second parameter */
#define  ivec_mul(a, b)  _Generic((b),                \
                                 ivec: ivec_mul_ivec, \
                              default: ivec_mul_i32   )(a, b)

/* Generic division depends on the second parameter */
#define  ivec_div(a, b)  _Generic((b),                \
                                 ivec: ivec_div_ivec, \
                              default: ivec_div_i32   )(a, b)

#endif

/* Dot product of the three first components */
PURE_ACCESSOR int32_t  ivec_dot3(const ivec  a, const ivec  b)
{
    const ivec  v = a * b;
    return v[1] + v[2] + v[3];
}

/* Dot product of all four components */
PURE_ACCESSOR int32_t  ivec_dot4(const ivec  a, const ivec  b)
{
    const ivec  v = a * b;
    return v[0] + v[1] + v[2] + v[3];
}

/* Cross product of the three first components; fourth component zero */
PURE_ACCESSOR ivec  ivec_cross3(const ivec  a, const ivec  b)
{
    const ivec  apos = { 0, a[2], a[3], a[1] },
                bpos = { 0, b[3], b[1], b[2] },
                aneg = { 0, a[3], a[1], a[2] },
                bneg = { 0, b[2], b[3], b[1] };
    return apos*bpos - aneg*bneg;
}

#endif /* IVEC_INTERNAL_H */

It first verifies that IVEC_H is defined, i.e. that this file was included through ivec.h and not directly.  It is also protected against multiple inclusion via IVEC_INTERNAL_H, which should not happen unless ivec.h has a bug, so the IVEC_H check alone would definitely be enough for now.  (The reason I have it, is because I'll expand this with float and double vectors, and functions that convert between those vector types may need to include the internal headers for those types.)

The definition of the IVEC() macro uses the aforementioned ALT4() macro to pick the correct initialization macro.  If only three arguments are specified, the r component is initialized to zero, and if only two, both r and z components are initialized to zero.

Since this file expects the compiler to provide the underlying vector (as in SIMD vector, not as in C++ vector) support, there is nothing unusual in the basic arithmetic operations.

In C++, ivec_mul() is an overloaded function, which calls either ivec_mul_ivec() or ivec_mul_i32(), depending on the type of the second parameter.
Similarly for ivec_div(), ivec_div_ivec(), and ivec_div_i32().

In C, ivec_mul() is a convenience macro based on the C11 _Generic() facility, expanding to ivec_mul_ivec() if the second parameter is an ivec, and to ivec_mul_i32() otherwise.
Similarly for ivec_div(), ivec_div_ivec(), and ivec_div_i32().

ivec_cross3() uses four temporary vector constants to reorder the components, so that the difference of the products yields the expected 3D vector cross product.  This tends to let the compiler generate faster SIMD code, compared to just defining the result as an array with the component-wise formulae.

Here is a simple test program to visually verify the operations do work.  (Note that this is not an unit test: for an unit test, I'd use a PRNG, most likely Xorshift64*, to generate random test values, and compare them to the results from naïve versions implemented in the unit test, written in as easy-to-read/verify form as possible.) test.c or test.cpp:
Code: [Select]
#include <stdlib.h>
#include <stdio.h>
#include "ivec.h"

static void describe_ivec(const char *description, const ivec v)
{
    printf("%s = (%d, %d, %d)\n", description, (int)IVEC_X(v), (int)IVEC_Y(v), (int)IVEC_Z(v));
}

int main(void)
{
    ivec  a = IVEC(1, 2, 3);
    ivec  b = IVEC(4, 5, 6);

    describe_ivec("a", a);
    describe_ivec("b", b);
    describe_ivec("ivec_add(a, b)", ivec_add(a, b));
    describe_ivec("ivec_sub(a, b)", ivec_sub(a, b));
    describe_ivec("ivec_mul(a, b)", ivec_mul(a, b));
    describe_ivec("ivec_div(a, b)", ivec_div(a, b));
    describe_ivec("ivec_mul(b, 3)", ivec_mul(a, 3));
    describe_ivec("ivec_div(b, 2)", ivec_div(b, 2));
    describe_ivec("ivec_cross3(a, b)", ivec_cross3(a, b));
    printf("ivec_dot3(a, b) = %d\n", (int)ivec_dot(a, b));

    return EXIT_SUCCESS;
}

You can compile the above (as either example.c or example.cpp) using gcc, g++, or clang; I haven't checked other compilers, but at least Intel compiler either works or can be made to work by adding a small snippet of code into ivec.h.

If you enable optimization (I tested with -O2), and examine the generated binary, you'll find that the example executable does not actually do any SIMD operations at all, and just prints preset constants!  This is exactly as I wanted: a desired side effect of how the PURE_ACCESSOR -annotated helper functions should be implemented by the compiler.  (But, if you generate one or both randomly at runtime or based on input, SIMD instructions will be used, of course.)

(Additional functions, like component-wise minimum and maximum, and Manhattan distance (sum of components magnitudes), could use <immintrin.h> on x86 and x86-64, or GCC x86/x86-64/ARM vector built-ins, and rely on the compiler to generate sane code from component-wise expressions on others.  I omitted those to keep to some semblance of simplicity, but I'd probably put those alternatives into sub-header-files ivec_internal_generic.h, ivec_internal_sse3.h, ivec_internal_avx.h, and ivec_internal_neon.h, for example.)



If I add fvec (four-component float (32-bit FP) vectors), I'll add an fvec.h and fvec_internal.h.

Then, I'll also create an vecs.h that includes both ivec.h and fvec.h, and a new vecs_internal.h that defines the conversion functions between ivec and fvec types.

This way, files or headers that use either type (or both but not the conversion functions), just include "ivec.h" and/or "fvec.h"; and those that need the conversion functions too, include "vecs.h".  Because the features are split into the internal parts, simple include guards as implemented above will take care of the order and re-inclusion absolutely fine.  Even including all tree (ivec.h, fvec.h, vecs.h) is absolutely fine, and yields the same generated code as if one only included vecs.h.

In this scheme, therefore, #include "header" means "This particular source or header file requires the facilities declared by header".
When tracing the source code, it is common for the compilation unit to have multiple includes of the same header, just in different header files.

This sceme also plays perfectly well with dependency tracking.  For example, if you use gcc , options -M -MF sourcefile.deps generates no object file, but saves the Makefile rule specifying the dependencies of the object file as file sourcefile.deps.  These can then be include'd in the Makefile, so that changes (modification timestamps to later than the dependent object file) to any source or header files, including system header files, will cause (only) the affected object files to be recompiled.  Usually, the dependency tracking target is called deps, so that if you create new files or delete old ones, or change any #include directives, you only need to run make deps to update the dependency information.  It only takes a few lines in the Makefile to automate all this, but it depends very much on how you structure your source trees, so there is no one single recipe to fit everyone.
« Last Edit: October 14, 2021, 10:24:11 am by Nominal Animal »
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1428
  • Country: se
Re: nesting of .h files and .h files which start with an underscore
« Reply #27 on: October 14, 2021, 12:54:41 pm »
Was it shift vs. division, my argument being explicit instead of implicit?  :)
:-+ Exactly, I'm honoured you remember or took the time to search for it!

As I've a "say what you mean"* approach to C programming, I think it applies also here for include rules: include what you need - do not rely on side effects.

*E.g. the mentioned '/' vs '>>' and '%' vs '&', using 'name[]' for a function parameter if it is an array rather than a pointer to a single object, etc..
Nandemo wa shiranai wa yo, shitteru koto dake.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf