One ability I'd like to have, is to tell the compiler to accomplish unrelated things in whichever order is best.
You'd probably need to be more specific though. At high optimization levels, good C compilers already re-order operations when they can safely do so. You probably had something more advanced in mind, would you care to give an example?
I'd like to allow reordering and especially interleaving of operations, ignoring sequence points and order of side effects. (If you consider C++ atomics, you'll immediately see how this is essentially extending the models to sequence points and order of side effects.)
Essentially, I'd like to be able to tell the compiler that if I have code like
{ v1 = complex_expression_one(); v2 = complex_expression_two(); }the compiler is allowed to evaluate the contents of the block in parallel (interleaved, using a single thread of execution), completely ignoring sequence points or the order of side effects.
These code segments are rare, but very hot, and I'd like to also annotate them so the compiler knows to do lots of extra work to optimize this particular chunk of code, even as far as brute-forcing some details. Technically, you could do that via attributes or pragmas, but I'd like the language to natively support it.
I really liked the Fortran 95 FORALL loop (where the iteration order is undefined), but they're deprecating that in future Fortran standards.
I've seen that kind of loop in ParaSail (the language I created a thread for - didn't get much traction), I think. Of course it's just a curiousity more than anything else at the moment.
Right. It is in the same category as
memrep(buffer, offset, length), that is the complementary of
memcpy()/
memmove(), in the sense that it fills
buffer by repeated copies of the first
offset bytes. (It is the inverse of memcpy() with respect to memmove(), implementation-wise. It is very useful for initializing arrays with structures and/or floating-point members, since only the storage representation matters. While you can trivially implement it yourself, it should be something the compiler can optimize for the target architecture; i.e., in GCC, a built-in function, and not just a library function.)
For the little I've explored with OpenMP, I think you can definitely do that with it.
OpenMP parallelizes the loops with multiple threads. That's not what I am talking about. I'm talking about generating machine code that interleaves several operations on superscalar architectures with enough registers.
Also, like I said, I want this even closer to the metal than C, and OpenMP is quite a complicated abstraction, with a lot of hidden costs in the thread management. When writing kernels, or simulators using MPI and a fixed number of threads per node (which is typical), OpenMP is a square peg in a round hole.
Does anyone know of a programming language that exposes the flag register after operations or function calls? The standard flags (Zero, Carry, Negative, Overflow) would be rather useful. I do realize that it would require emulation on MIPS and RISC-V at least, as they don't have them in a dedicated register.