In some parts of my code, I use short critical sections (all interrupts disabled) to avoid concurrency issues. But I find the C compiler is extending the time spent in critical sections, by putting things inside it that don't need to be there. Here's a specific example:
#define ZINLINE __attribute__((always_inline)) static __inline__
ZINLINE void MultiBitClear16(UI16 volatile * addr, UI16 mask) {
mask=~mask;
UI32 _mbcs=__builtin_disable_interrupts();
*addr&=mask;
_mtc0(_CP0_STATUS, _CP0_STATUS_SELECT, _mbcs);
};
ZINLINE void BitClear16(volatile UI16* addr, UNATIVE bit) { MultiBitClear16(addr,(1<<bit)); };
Calling BitClear16 and disassembling the code generated, I find that two steps (shift and invert) occur inside the critical section, that could have been done before it:
DI V1 // save status register and disable interrupts
EHB
LW A1, -32720(GP) // load addr
SLLV A0, S3, V0 // mask=1<<bit
NOR A0, ZERO, A0 // mask=~mask
AND A0, A0, A1 // addr&=mask
SW A0, -32720(GP) // store addr
MTCO V1, Status // restore prior status
EHB
And depending on the context in which BitClear16 is called, I suspect it could be worse. For example, if I called:
BitClear16(addr,bit+1);
There is nothing preventing the compiler from opting to do the +1 inside the critical section as well. It seems like ensuring operations occur in a particular order would be a common need, so I did some research, and found reference to this being done with a "compiler barrier":
asm volatile ("" ::: "memory")
But this doesn't help me. The offending instructions are performed on registers, so clobbering memory doesn't help. (I tried just to be sure.)
Is there some other hint I can provide to the compiler to prevent this? Preferably other than marking MultiBitClear16 with the 'noinline' attribute (adds overhead), or coding it in pure ASM (adds headaches)?