Well after much discussion with others it has emerged that a useful improvement will be to have a thing that looks a bit like a procedure or function but is not, it could be simply called an "intrinsic" a new kind of entity. Then implement things so that it is only inside an "intrinsic" that one can embed assembler, it will not be possible to embed assembler freely in any old code.
Furthermore an intrinsic will specify a target in some way, perhaps as simply as "target(x64)" and then only the intrinsics that target the same target (as the compiler when it builds) will be visible and accessible to the rest of the code and if there are multiple intrinsics with the same name but differing targets, the can be resolved on that basis.
An intrinsic will appear like a procedure or function when referenced, but will not be, it will represent a literal embedding of the associated machine instructions with support for passing arguments and returning values in a similar way to many of today's common C intrinsics.Yes, this pattern is already in use in some high-performance computational software in Linux. I've used it myself. In GNU C/C++, it look roughly like
__attribute__((always_inline, target("machine-dependent-options")))
static inline return_type name(args...) {
return_type return_value; // and any other local variables
asm volatile( "extended assembly"
: output-operands // return_value
: input-operands
: clobbers );
return return_value;
}
See common function attributes and extended asm for the details. This is supported by both GCC and Clang C and C++ frontends, although some of the attribute names and definitions vary a bit; better use preprocessor macros.
The key here is that it is extended assembly, not just copy-pasted to an assembler and the result included in the object code. Instead of specifying exact register names, you define operands, with their constraints specifying which registers the compiler may choose for each operand. Each operand is automatically numbered in order they're defined from %0 onwards, but can also be named. For example, on 32-bit x86, you might useCode: [Select]__attribute__((always_inline, const))
instead of ((int64_t)((int32_t)(lhs))*(int64_t)((int32_t)(rhs))), to ensure you get a 64-bit product from two 32-bit multiplicands. The x86 imul machine instruction requires one multiplicand to be in the eax register (a constraint), but the other can be in any register (r constraint); and the result is put in edx:eax register pair (A constraint).
static inline int64_t mul32(const int32_t lhs, const int32_t rhs) {
int64_t result;
asm volatile ( "imul\t%2"
: "=A" (result)
: "a" (lhs), "r" (rhs)
);
return result;
}
(We could also have used the named version, "imul %[rh]" : "=A" (result) : "a" (lhs), [rh] "r" (rhs) ; but the numbering is more common. I find the named version easier to maintain.)
(In case you wonder, the GCC convention is to add newline and tab, \n\t, after each instruction, but not after the last instruction. Due to how GCC does this, this makes the code look "normal" if someone compiles the file to assembly using -S, like e.g. Compiler Explorer does.)
When inlining, the compiler can choose the registers used to reduce unnecessary register moves. This is very important, because it means that instead of simply plopping down a copy of the inlined body, the compiler can optimize both the inlined body register choices, and the surrounding code, at the back-end/machine code level. (In other words, this is not usually translatable to intermediate representation... but it can yield pretty tight inlined code.)
Obviously, looking at this example above, the syntax GCC/Clang use for this is pretty bad. Nobody remembers the machine constraints; a list of allowed registers (and memory reference types) would be much better, for example. Also the named operand format in the assembly, %[name], is pretty cumbersome.
One option to consider is to let the entire function/intrinsic body be written in an assembly-like language, which is compiled by the same compiler, so that it can determine the possible registers to be used automatically, for example. This would be a very nice mechanism to embed other-language function-like objects, like OpenCL/CUDA/GPGL computing kernels, pixel shaders, and so on, in a much easier to maintain way; but it would require either a multi-language compiler, or co-operating compilers. But this is just one option to consider, and I'm just describing what I've thought of before myself having used the above pattern in real-world code (and how annoying it is to maintain – faster to rewrite, really, than it is to debug); I'm pretty sure there are even better ways of doing this.
I don't understand how that is different than a C function with the entire body an inline assembler block. Do you want the compiler to write the prologue and epilog and handle the ABI? Or do you want to do all that in assembly?
procedure breakpoint(X) intrinsic(arm) ;
emit ("bkpt 255");
emit ("bx lr");
end;
Extended syntax is pretty neat. It's peculiar, and the documentation isn't exactly exhaustive (I think... the docs are old, somewhat out of date?), and it varies between targets (it follows the traditional asm style(s) of the target), which I'm sure is a pain to keep all the docs updated or whatever. And of course it's peculiar, it has to encode additional information that asm never otherwise needs; in short, you need to tell the compiler what it can do with it and where/how.
I guess I just wish it weren't so god awful ugly? Get rid of all the quotes and tabs and newlines, it's all in the source as it is, just consume it verbatim, come on... Anyway.
Like, I finally sat down and crafted this operation last year:Code: [Select]/**
* Multiplies two 16-bit integers, with rounding, as an intermediate
* format in 16.16 fixed point, returning the top (integral, 16.0) part.
*/
uint16_t asm_umul16x0p16(uint16_t a, uint16_t b) {
uint16_t acc;
// acc = (((uint32_t)a * (uint32_t)b) + 0x8000ul) >> 16;
__asm__ __volatile__(
"mul %A[argB], %A[argA]\n\t"
"mov r19, r1\n\t"
"mul %B[argB], %B[argA]\n\t"
"movw %A[aAcc], r0\n\t"
"mul %A[argB], %B[argA]\n\t"
"add r19, r0\n\t"
"adc %A[aAcc], r1\n\t"
"eor r18, r18\n\t"
"adc %B[aAcc], r18\n\t"
"mul %B[argB], %A[argA]\n\t"
"adc r19, r0\n\t"
"adc %A[aAcc], r1\n\t"
"adc %B[aAcc], r18\n\t"
"subi r19, 0x80\n\t"
"sbci %A[aAcc], 0xff\n\t"
"sbci %B[aAcc], 0xff\n\t"
"eor r1, r1\n\t"
: [aAcc] "=&d" (acc)
: [argA] "r" (a), [argB] "r" (b)
: "r18", "r19"
);
return acc;
}
On AVR8, there is only 8x8 hardware multiply; this gets used inline well enough (and even 8x16 if you only need the low part), but 16x16 is implemented with a software library (_mulhisi3, etc.)*, the object of which is static, no optimization semantics or anything -- so it's never inlined, and usually makes a mess when patching up to whatever registers the calling function is using.
*Which interestingly enough, don't strictly obey the ABI; they bend it a bit to get better register utilization. (So, it's not even as bad as it could be. Mind, not to say what they did is bad in all respects; it's simply a cromulent solution. Clearly there is room to improve, but it's certainly better than a completely stand-alone function call would be.) I'm not sure what custom patchups/hacks are internal to GCC to enable this.
So I copied this bit from an earlier pure-asm module, fixed up the register allocations, and used the extended syntax. This can be inlined properly, and only uses two extra registers (clobber), which are normally free to use (r18-r27 are call-saved registers i.e. any function can use them without having to push/pop).
At least, I think I got the allocations and everything right? Maybe there's a few edge cases I forgot, but it compiled correctly (output matches on inspection) in the contexts I used it on. (Yes yes, one can work out all the constraints perfectly, well, must be nice to be able to reason about these things. I'm a bit hopeless on combinatorial problems like that, I'm afraid.)
So, besides being a bit shorter, it's got no calling overhead (when inlined), and less register pressure overhead, which did nicely for its purpose. I forget if it fully halved the cycles in the critical path, but it did help out.
Tim
Well after much discussion with others it has emerged that a useful improvement will be to have a thing that looks a bit like a procedure or function but is not, it could be simply called an "intrinsic" a new kind of entity. Then implement things so that it is only inside an "intrinsic" that one can embed assembler, it will not be possible to embed assembler freely in any old code.
Furthermore an intrinsic will specify a target in some way, perhaps as simply as "target(x64)" and then only the intrinsics that target the same target (as the compiler when it builds) will be visible and accessible to the rest of the code and if there are multiple intrinsics with the same name but differing targets, the can be resolved on that basis.
An intrinsic will appear like a procedure or function when referenced, but will not be, it will represent a literal embedding of the associated machine instructions with support for passing arguments and returning values in a similar way to many of today's common C intrinsics.Yes, this pattern is already in use in some high-performance computational software in Linux. I've used it myself. In GNU C/C++, it look roughly like
__attribute__((always_inline, target("machine-dependent-options")))
static inline return_type name(args...) {
return_type return_value; // and any other local variables
asm volatile( "extended assembly"
: output-operands // return_value
: input-operands
: clobbers );
return return_value;
}
See common function attributes and extended asm for the details. This is supported by both GCC and Clang C and C++ frontends, although some of the attribute names and definitions vary a bit; better use preprocessor macros.
The key here is that it is extended assembly, not just copy-pasted to an assembler and the result included in the object code. Instead of specifying exact register names, you define operands, with their constraints specifying which registers the compiler may choose for each operand. Each operand is automatically numbered in order they're defined from %0 onwards, but can also be named. For example, on 32-bit x86, you might useCode: [Select]__attribute__((always_inline, const))
instead of ((int64_t)((int32_t)(lhs))*(int64_t)((int32_t)(rhs))), to ensure you get a 64-bit product from two 32-bit multiplicands. The x86 imul machine instruction requires one multiplicand to be in the eax register (a constraint), but the other can be in any register (r constraint); and the result is put in edx:eax register pair (A constraint).
static inline int64_t mul32(const int32_t lhs, const int32_t rhs) {
int64_t result;
asm volatile ( "imul\t%2"
: "=A" (result)
: "a" (lhs), "r" (rhs)
);
return result;
}
(We could also have used the named version, "imul %[rh]" : "=A" (result) : "a" (lhs), [rh] "r" (rhs) ; but the numbering is more common. I find the named version easier to maintain.)
(In case you wonder, the GCC convention is to add newline and tab, \n\t, after each instruction, but not after the last instruction. Due to how GCC does this, this makes the code look "normal" if someone compiles the file to assembly using -S, like e.g. Compiler Explorer does.)
When inlining, the compiler can choose the registers used to reduce unnecessary register moves. This is very important, because it means that instead of simply plopping down a copy of the inlined body, the compiler can optimize both the inlined body register choices, and the surrounding code, at the back-end/machine code level. (In other words, this is not usually translatable to intermediate representation... but it can yield pretty tight inlined code.)
Obviously, looking at this example above, the syntax GCC/Clang use for this is pretty bad. Nobody remembers the machine constraints; a list of allowed registers (and memory reference types) would be much better, for example. Also the named operand format in the assembly, %[name], is pretty cumbersome.
One option to consider is to let the entire function/intrinsic body be written in an assembly-like language, which is compiled by the same compiler, so that it can determine the possible registers to be used automatically, for example. This would be a very nice mechanism to embed other-language function-like objects, like OpenCL/CUDA/GPGL computing kernels, pixel shaders, and so on, in a much easier to maintain way; but it would require either a multi-language compiler, or co-operating compilers. But this is just one option to consider, and I'm just describing what I've thought of before myself having used the above pattern in real-world code (and how annoying it is to maintain – faster to rewrite, really, than it is to debug); I'm pretty sure there are even better ways of doing this.
That would be niiice.
Practical real-world experience tends to provide very useful information in the design process.
That's why I suggested OP to get experienced before starting loooong discussions like he did in his topic.
Just, in Avionics you cannot use assembly inline, so when you need machine instruction level, you have to *segregate* code into assembly modules.
So who said you "cannot use assembly inline" in an avionics setting? what sources do you have for this claim?
So who said you "cannot use assembly inline" in an avionics setting? what sources do you have for this claim?It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for. Specs like MISRA C.
Ada fully supports embedding assembler code, Ada Core's GNAT Pro
So who said you "cannot use assembly inline" in an avionics setting? what sources do you have for this claim? I do hope you have sufficient experience to discuss this here...
Ada fully supports embedding assembler code, Ada Core's GNAT Pro
GNAT is not used in avionics, 99% of times, Green Hills Ada is what I find and use
Ada and GNAT Pro see a growing usage in high-integrity and safety-certified applications, including commercial aircraft avionics, military systems, air traffic management/control, railroad systems, and medical devices, and in security-sensitive domains such as financial services.
Lockheed Martin Aeronautics, Marietta, Georgia, will be using GNAT Pro to develop the Flight Management System Interface Manager and Radio Control software on the C-130J Super Hercules aircraft.
DO178B-C, with all of its integrated rules.
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
But once again, what is actually stipulated in this document that you interpret as a prohibition on the use of assembler or inline assembler?
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
Yup, ... you are allowed to use it, but *ONLY* if the project is level D or E.
Just, in Avionics you cannot use assembly_inline,
DO-178B is not a standard, it was in fact a guideline, you should be aware of DO-178C, the devil's in the detail as they say.
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
Yup, ... you are allowed to use it, but *ONLY* if the project is level D or E.
Yet you wrote earlier:QuoteJust, in Avionics you cannot use assembly_inline,
Levels A and B are for supersonic aircraft units [...], C can be used in FoM and MotoGP [...] and it's good in automotive.
Level D and E are for domestic gears and common units used in offices and coffee/food machines.
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
Yup, ... you are allowed to use it, but *ONLY* if the project is level D or E.
Yet you wrote earlier:QuoteJust, in Avionics you cannot use assembly_inline,
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
Yup, ... you are allowed to use it, but *ONLY* if the project is level D or E.
Yet you wrote earlier:QuoteJust, in Avionics you cannot use assembly_inline,
cannot = not allowed, if it's a cognitive problem, consider like never written.
It's not "incapable of using", it's "not allowed to use", i.e. due to design guidelines required by the client the software is written for.
Yup, ... you are allowed to use it, but *ONLY* if the project is level D or E.
Yet you wrote earlier:QuoteJust, in Avionics you cannot use assembly_inline,Because DiTBho referred to avionics projects, i.e. level A and B (and not D or E):Levels A and B are for supersonic aircraft units [...], C can be used in FoM and MotoGP [...] and it's good in automotive.
Level D and E are for domestic gears and common units used in offices and coffee/food machines.
So, no conflict. Avionics = A, B; and A, B, C do not allow inline assembly. What's your problem?
I am simply asking for evidence
If you do not trust a single word DiTBho says anyway, why would anyone bother to prove to you anything related to what DiTBho says?
My problem is that the claim that inline assembler is not permitted by some standard or other, in avionics software (in whatever context you care to dream up) remains unsubstantiated, anecdotal, conjecture.
If as has been repeatedly asserted dogmatically, this is true then it should be a simple matter to quote said paragraph here and cite the source, that's my problem.
I am simply asking for evidence, yet none is being presented, so I consider it false, or at best a misinterpretation, unless you can show why I should regard it otherwise.
I am simply asking for evidenceYou mean, the kind where my experience is "anecdotes", but the opinion of a random person whose blog you quote is "proof"?
No, you're playing the kind of social word games that you're used to playing in your workplace, instead of doing actual work.
DiTBho does work for clients that require DO-178 compliance. They're just telling you what their clients demand.
If you do not trust a single word DiTBho says anyway, why would anyone bother to prove to you anything related to what DiTBho says?
We've tried that with electrodacus and aetherist, and it just does not work, even when we do. You'll find a way to dodge that and find some other detail to complain about.
If you weren't such a colossal asshole, you would have just said "OK, my mistake; apologies." and moved on.
Instead, you scrambled like crazy to find any detail at Wikipedia or StackOverflow/StackExchange that you could use to take a snipe at DiTBho.
Stop it. This is a technical forum, not some social media politically-correct snide-fest. Talk tech, or take a walk.
Just, in Avionics you cannot use assembly inline, so when you need machine instruction level, you have to *segregate* code into assembly modules.
DO178B-C, with all of its integrated rules.
But once again, what is actually stipulated in this document that you interpret as a prohibition on the use of assembler or inline assembler?
This is America
This is AmericaNope.
Dave's site, Dave's forum, Dave's server, so this is Australia, mate.
It does tell a lot about you to assume this is America, though. Go stick your head somewhere moist.