Well after much discussion with others it has emerged that a useful improvement will be to have a thing that looks a bit like a procedure or function but is not, it could be simply called an "intrinsic" a new kind of entity. Then implement things so that it is only inside an "intrinsic" that one can embed assembler, it will not be possible to embed assembler freely in any old code.
Furthermore an intrinsic will specify a target in some way, perhaps as simply as "target(x64)" and then only the intrinsics that target the same target (as the compiler when it builds) will be visible and accessible to the rest of the code and if there are multiple intrinsics with the same name but differing targets, the can be resolved on that basis.
An intrinsic will appear like a procedure or function when referenced, but will not be, it will represent a literal embedding of the associated machine instructions with support for passing arguments and returning values in a similar way to many of today's common C intrinsics.
Yes, this pattern is already in use in some high-performance computational software in Linux. I've used it myself. In GNU C/C++, it look roughly like
__attribute__((always_inline, target("machine-dependent-options")))
static inline return_type name(args...) {
return_type return_value; // and any other local variables
asm volatile( "extended assembly"
: output-operands // return_value
: input-operands
: clobbers );
return return_value;
}
See common function attributes and extended asm for the details. This is supported by both GCC and Clang C and C++ frontends, although some of the attribute names and definitions vary a bit; better use preprocessor macros.
The key here is that it is extended assembly, not just copy-pasted to an assembler and the result included in the object code. Instead of specifying exact register names, you define operands, with their constraints specifying which registers the compiler may choose for each operand. Each operand is automatically numbered in order they're defined from %0 onwards, but can also be named. For example, on 32-bit x86, you might use
__attribute__((always_inline, const))
static inline int64_t mul32(const int32_t lhs, const int32_t rhs) {
int64_t result;
asm volatile ( "imul\t%2"
: "=A" (result)
: "a" (lhs), "r" (rhs)
);
return result;
}
instead of ((int64_t)((int32_t)(lhs))*(int64_t)((int32_t)(rhs))), to ensure you get a 64-bit product from two 32-bit multiplicands. The x86 imul machine instruction requires one multiplicand to be in the eax register (a constraint), but the other can be in any register (r constraint); and the result is put in edx:eax register pair (A constraint).
(We could also have used the named version, "imul %[rh]" : "=A" (result) : "a" (lhs), [rh] "r" (rhs) ; but the numbering is more common. I find the named version easier to maintain.)
(In case you wonder, the GCC convention is to add newline and tab, \n\t, after each instruction, but not after the last instruction. Due to how GCC does this, this makes the code look "normal" if someone compiles the file to assembly using -S, like e.g. Compiler Explorer does.)
When inlining, the compiler can choose the registers used to reduce unnecessary register moves. This is very important, because it means that instead of simply plopping down a copy of the inlined body, the compiler can optimize both the inlined body register choices, and the surrounding code, at the back-end/machine code level. (In other words, this is not usually translatable to intermediate representation... but it can yield pretty tight inlined code.)
Obviously, looking at this example above, the syntax GCC/Clang use for this is pretty bad. Nobody remembers the machine constraints; a list of allowed registers (and memory reference types) would be much better, for example. Also the named operand format in the assembly, %[name], is pretty cumbersome.
One option to consider is to let the entire function/intrinsic body be written in an assembly-like language, which is compiled by the same compiler, so that it can determine the possible registers to be used automatically, for example. This would be a very nice mechanism to embed other-language function-like objects, like OpenCL/CUDA/GPGL computing kernels, pixel shaders, and so on, in a much easier to maintain way; but it would require either a multi-language compiler, or co-operating compilers. But this is just one option to consider, and I'm just describing what I've thought of before myself having used the above pattern in real-world code (and how annoying it is to maintain – faster to rewrite, really, than it is to debug); I'm pretty sure there are even better ways of doing this.
Yes, the register protection is an interesting point, I guess today the coder has to be very careful to save and restore as needed, some machine help here would be nice. I looked at ARM assembly language (Thumb) in some detail and it seems rather straightforward to design a
text->binary operation so the actual assembler conversion can readily be coded in a uniform manner in .Net, a small library per target seems very doable.
I did a code generator years ago for X86 (originally 16 bit then later 32 bit) and found that rather interesting work, this is simpler too, literally just a text conversion and validation, perhaps even doable without a parser, just a simple lexing/regex might even do it.
The operation of the compiler choosing registers is quite interesting but would require the compiler to know the exact nature of the code at ever call site, that's a global analysis, no room for linking in precompiled object files that might also leverage an intrinsic.
I too find the text strings a little unwieldy, so I'm wondering if there might be a abstract grammar that could work for any assembly language, I'd need to some research on this as it could make the code much more readable and open the door to syntax coloring for assembler code. (take a look into
language servers, they are very powerful).
The Antlr grammar tools are very powerful too, one can define context aware grammars so that when some thing is recognized, the tokenizer can enter a mode specific to that thing. This literally would enable support for this, that is the compiler when it sees the intrinsic target, can then enter a target specific assembler mode and literally recognize that specific targets assembly syntax, totally eliminating the crude strings.
This is precisely how I'd implement say comments with embedded XML directives like seen in Java or C#, its designed for this kind of parsing.
If you're interested, here are some
formal Antlr grammars for a bunch of targets.