EEVblog Electronics Community Forum

Electronics => Microcontrollers => Topic started by: syntax333 on September 22, 2021, 05:24:09 am

Title: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: syntax333 on September 22, 2021, 05:24:09 am
I created a circular buffer structure for my MCU(STM32G431) where elements of structure are:


I use the events to drive a state machine. For my application ISRs are producers and my main is consumer. ISRs cannot preempt each other, each has same priority. ISRs push events to head index of event buffer and increment head variable while my main code pops events from tail index of event buffer and increment tail variable. If head or tail pointer reach end of the buffer they simply wrap around.

In main code I check the buffer like:
Code: [Select]
CircBuffer.Head = 0;
CircBuffer.Tail = 0;
enable_ISRs();

while(1){
  if(CircBuffer.Head == CircBuffer.Tail)
  {
  //Low power mode
  }
  else
  {
    statemachine_dispatch(CircBuffer.EventBuff[CircBuffer.Tail])
    increment CircBuffer.Tail & BUFFLEN_MASK
  }
}

In ISRs I add event to buffer like:

Code: [Select]
void example_ISR1()
{
  clear flag
  CircBuffer.EventBuff[CircBuffer.Head] = example_EVENT1;
  increment CircBuffer.Head & BUFFLEN_MASK
}
void example_ISR2()
{
  clear flag
  CircBuffer.EventBuff[CircBuffer.Head] = example_EVENT2;
  increment CircBuffer.Head & BUFFLEN_MASK
}

If I have to post an event to event buffer from main I disable and enable the interrupts.

I was wondering if this structure is thread safe or do I have to disable and enable the interrupts in my main loop while checking if buffer is empty (Head==Tail, No events to process) or dispatching the events or incrementing tail variable? Am I missing something?
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: capt bullshot on September 22, 2021, 07:45:25 am
I created a circular buffer structure for my MCU(STM32G431) where elements of structure are:

... snip ...

I was wondering if this structure is thread safe or do I have to disable and enable the interrupts in my main loop while checking if buffer is empty (Head==Tail, No events to process) or dispatching the events or incrementing tail variable? Am I missing something?

Looks fine to me. The interrupts don't use the tail pointer, and CPU read access to head pointers should be atomic, so the main will read always a valid variable. Don't forget to declare the pointers modified by the interrupts "volatile", otherwise the compiler might optimise your (Head == Tail) comparison out and you'll get never an input.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: newbrain on September 22, 2021, 09:03:22 am
Looks almost good to me.

One thing I think it's missing is checking for a full queue.
The way it's written, new events will overwrite old ones and this might be a design choice, but there's a latent problem.
If the processing of one event is slow, and enough events pile up while it's being processed, the case might arise when Head==Tail - so no further event dequeueing will happen, until a new one comes in (and a full queue of events would be lost).

Now this might or might not be a worry depending on many conditions, but in similar cases I've always used a separate atomic variable to keep count of the used (or free) places in the queue to be checked for empty (in main) or for full (in the ISRs). It's very little additional code and load, and allows to select the behaviour on a full queue (overwrite older events or throw new events away).

Another nitpick, but I know here opinions differ: if you mean modulo, say modulo (%). Using binary and (&) works only for 2^n sizes of the queue.
Any compiler worth its salt will change the % to a & even without optimizations (https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:20,endLineNumber:2,positionColumn:20,positionLineNumber:2,selectionStartColumn:20,selectionStartLineNumber:2,startColumn:20,startLineNumber:2),source:'%23define+CONSTANT1+16%0A%23define+CONSTANT2+133%0A%0A%0Aunsigned+int+modulo_mod(unsigned+int+num)+%7B%0A++++return+num+%25+CONSTANT1%3B%0A%7D%0A%0Aunsigned+int+modulo_and(unsigned+int+num)+%7B%0A++++return+num+%26+(CONSTANT1-1)%3B%0A%7D%0A%0Aint+modulo2(unsigned+int+num)+%7B%0A++++return+num+%25+CONSTANT2%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:armv7-clang1101,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O0',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'armv7-a+clang+11.0.1+(C%2B%2B,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4) and even when a modulo must be used, an efficient multiplication only algorithm is generally used (speaking for arm and similar, if you are on an 8bit machine, YMMV).
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: syntax333 on September 22, 2021, 09:53:32 am

Looks almost good to me.

One thing I think it's missing is checking for a full queue.
The way it's written, new events will overwrite old ones and this might be a design choice, but there's a latent problem.
If the processing of one event is slow, and enough events pile up while it's being processed, the case might arise when Head==Tail - so no further event dequeueing will happen, until a new one comes in (and a full queue of events would be lost).

Now this might or might not be a worry depending on many conditions, but in similar cases I've always used a separate atomic variable to keep count of the used (or free) places in the queue to be checked for empty (in main) or for full (in the ISRs). It's very little additional code and load, and allows to select the behaviour on a full queue (overwrite older events or throw new events away).

Another nitpick, but I know here opinions differ: if you mean modulo, say modulo (%). Using binary and (&) works only for 2^n sizes of the queue.
Any compiler worth its salt will change the % to a & even without optimizations (https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:20,endLineNumber:2,positionColumn:20,positionLineNumber:2,selectionStartColumn:20,selectionStartLineNumber:2,startColumn:20,startLineNumber:2),source:'%23define+CONSTANT1+16%0A%23define+CONSTANT2+133%0A%0A%0Aunsigned+int+modulo_mod(unsigned+int+num)+%7B%0A++++return+num+%25+CONSTANT1%3B%0A%7D%0A%0Aunsigned+int+modulo_and(unsigned+int+num)+%7B%0A++++return+num+%26+(CONSTANT1-1)%3B%0A%7D%0A%0Aint+modulo2(unsigned+int+num)+%7B%0A++++return+num+%25+CONSTANT2%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:armv7-clang1101,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O0',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'armv7-a+clang+11.0.1+(C%2B%2B,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4) and even when a modulo must be used, an efficient multiplication only algorithm is generally used (speaking for arm and similar, if you are on an 8bit machine, YMMV).

I ensure that Head pointer never catches tail so checking full queue is not necessary.
I only use 2^n size queue which is not a problem for my application but thanks for the advice.

I created a circular buffer structure for my MCU(STM32G431) where elements of structure are:

... snip ...

I was wondering if this structure is thread safe or do I have to disable and enable the interrupts in my main loop while checking if buffer is empty (Head==Tail, No events to process) or dispatching the events or incrementing tail variable? Am I missing something?

Looks fine to me. The interrupts don't use the tail pointer, and CPU read access to head pointers should be atomic, so the main will read always a valid variable. Don't forget to declare the pointers modified by the interrupts "volatile", otherwise the compiler might optimise your (Head == Tail) comparison out and you'll get never an input.


I checked the if statement and comparison consists 4 ldr instructions 2 for head and 2 for tail and then 1 cmp instruction so read access to head pointer is not atomic. So I changed my code to:

Code: [Select]
..
..

while(1)
{
  disable_interrupts();

  if(Head != Tail)
  {
     enable_interrupts();
     statemachine_dispatch(CircBuffer.EventBuff[CircBuffer.Tail])

     disable_interrupts();
     increment CircBuffer.Tail & BUFFLEN_MASK
     enable_interrupts();

  }
  else
  {
  //Low Power Mode
  enable_interrupts();
  }

}

Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: capt bullshot on September 22, 2021, 10:05:05 am

I checked the if statement and comparison consists 4 ldr instructions 2 for head and 2 for tail and then 1 cmp instruction so read access to head pointer is not atomic. So I changed my code to:

...


I don't think the disable / enable interrupts is necessary here. What I was thinking of when writing "atomic" is: Interrupt modifies Head pointer at any time. The CPU loads the value of Head pointer in one unbreakable memory access, this should be the case with almost all modern CPUs. Doesn't matter if it does load Head and Tail non-atomic, and check what the two ldr instructions do in detail - do they load e.g. 8 bits of a 16 bit word and combine that in a register (bad) or is the first ldr a pointer load and the second the pointer dereferencing, loading the register in one fetch (OK)?
From my (rather cursorily) knowledge of ARM CortexM assembly, the two ldr instructions would rather be one pointer register load and one indirect memory access to atomically fetch the Head pointer.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: newbrain on September 22, 2021, 11:22:08 am
I ensure that Head pointer never catches tail so checking full queue is not necessary.
I only use 2^n size queue which is not a problem for my application but thanks for the advice.
[---8<---]
I checked the if statement and comparison consists 4 ldr instructions 2 for head and 2 for tail and then 1 cmp instruction so read access to head pointer is not atomic.
Ok, this was not in the pseudo code - that's also fine if the wrapping cases are handled correctly.

About the '%' vs. '&' for me it's more of a "say what you mean" style thing, and letting the compilers work for you - also, a safety net if the code needs maintenance in future, where someone else might unknowingly relax the 2^n constraint.

I'm a bit surprised that for an STM32G4 (Cortex-M4) core two accesses are needed for what I suppose to be 32 bit (or less) variables.
In any case, relying implicitly on any access to be atomic is bad (and volatile will not help with that).
So it's either your solution, or, if you use a C11 compliant compiler with atomic support (gcc (https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:33,endLineNumber:7,positionColumn:33,positionLineNumber:7,selectionStartColumn:33,selectionStartLineNumber:7,startColumn:33,startLineNumber:7),source:'%23include+%3Cstdatomic.h%3E%0A%0Avolatile+atomic_uint_fast32_t+i%3B%0A%0Avoid+thread1(void)%0A%7B%0A++++i%2B%2B%3B+/*+Guaranteed+atomic+*/%0A%7D%0A%0Aextern+void+do_stuff(void)%3B%0A%0Avoid+thread2(void)%0A%7B%0A++++if(i%3D%3D10)+do_stuff()%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:carm1021,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-Og+-mcpu%3Dcortex-m4',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'ARM+gcc+10.2.1+(none)+(C,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4), clang (https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:33,endLineNumber:7,positionColumn:33,positionLineNumber:7,selectionStartColumn:33,selectionStartLineNumber:7,startColumn:33,startLineNumber:7),source:'%23include+%3Cstdatomic.h%3E%0A%0Avolatile+atomic_uint_fast32_t+i%3B%0A%0Avoid+thread1(void)%0A%7B%0A++++i%2B%2B%3B+/*+Guaranteed+atomic+*/%0A%7D%0A%0Aextern+void+do_stuff(void)%3B%0A%0Avoid+thread2(void)%0A%7B%0A++++if(i%3D%3D10)+do_stuff()%3B%0A%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:armv7-cclang1101,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,libs:!(),options:'-Og+-mcpu%3Dcortex-m4',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1,tree:'1'),l:'5',n:'0',o:'armv7-a+clang+11.0.1+(C,+Editor+%231,+Compiler+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)), taking advantage of the _Atomic type qualifier/specifier or the definitions in <stdatomic.h>
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: syntax333 on September 22, 2021, 11:54:34 am
I am not good with assembly the generated code for if is show below. You are probably right. My head pointer is 8 bit and "ldrb.w" says byte load.

Code: [Select]
ldr r3, [pc, #76];
ldrb.w r3, [r3, #256];
uxtb r2, r3
ldr r3, [pc, #68];
ldrb.w r3, [r3, #257];
uxtb r3, r3
cmp r2,r3
beq.n 0x8000640

However, I might want to use the same code for other non-ARM microcontrollers so I have to be little bit more generic. So I changed my code to

Code: [Select]

while(1)
{
  disable_interrupts();
  if(Head != Tail)
  {
     enable_interrupts();

     statemachine_dispatch(CircBuffer.EventBuff[CircBuffer.Tail])
     increment CircBuffer.Tail & BUFFLEN_MASK
  }
  else
  {
  //Low Power Mode
      enable_interrupts();
  }
     
}


I got rid off the disable/enable interrupts part from tail pointer incrementation since only time tail increases when state machine handles the event thus only CPU access tail. So can I assume that this part of the code is thread safe?
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: NorthGuy on September 22, 2021, 03:03:17 pm
However, I might want to use the same code for other non-ARM microcontrollers so I have to be little bit more generic. So I changed my code to

Code: [Select]
while(1)
{
  disable_interrupts();
  if(Head != Tail)

...


Don't get scared. It was perfectly fine without interrupt disables. Your Head and Tail are one-byte variables. These will be accessed atomically on any CPU. Don't forget "volatile" though.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: SiliconWizard on September 22, 2021, 04:57:21 pm
I'm still with newbrain here, I'm not quite sure how you handle a queue full condition. From the code you posted, if Head == Tail, then you're just going in a low power mode. Then what?
Another thing I don't quite like with this kind of approach: there is no difference between a queue full and a queue empty condition. Then you may be tempted to work around this with some tricks. It's not pretty and not easy to test and prove correct.

I personally favor adding an extra member indicating the number of items currently in the queue, instead of comparing Head and Tail. The benefit of this approach is that it makes it easy and fast to determine the current number of items in the queue, but also see if it's full or empty. Without tricks.

A related point is that when using this extra member, only its accesses need to be atomic. You don't need atomic accesses to Head and Tail, as long as there is only one 'thread' writing to the queue and another one reading from it, which is a very common case.

This may not matter if such members are so small as you "know" the accesses will be atomic anyway with no extra cost, but this is pretty much always implementation-dependent. If you want a truly portable implementation that is provably correct, then the best approach IMO is to add this extra "n" member (current number of items) and define it as an atomic type (stdatomic) as long as you have access to a C11-compliant compiler. This may add zero cost if the type in question can be accessed atomically without any extra instruction on  a given platform, but otherwise will add what it takes to make it atomic. So this is a portable approach.

Note that, as I hinted above, if there may be more than one thread reading the queue, or more than one thread writing to the queue, then atomic accesses would not be enough - you would need to add mutexes. So the above approach is thread-safe only if there is at most one thread reading the queue and one writing to the queue, which may two threads in total or a single thread depending on your needs. If typically the queue is written to in an ISR, and read somewhere else (either in another ISR or in main code), this will be the case, but if, say, two different ISRs with different priorities can write to the queue, then you'd need mutexes...
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: ejeffrey on September 22, 2021, 07:03:28 pm
However, I might want to use the same code for other non-ARM microcontrollers so I have to be little bit more generic. So I changed my code to

Code: [Select]
while(1)
{
  disable_interrupts();
  if(Head != Tail)

...


Don't get scared. It was perfectly fine without interrupt disables. Your Head and Tail are one-byte variables. These will be accessed atomically on any CPU. Don't forget "volatile" though.

I'm guessing he needs the disable interrupts to allow low power mode.

The way I expect this work is that the interrupt wakes the MCU up from sleep mode.  If an interrupt arrives between checking head==tail and entering sleep mode then you have a race condition where the MCU will sleep while there is pending data.  This may turn out OK if you have a periodic timer that wakes up from sleep mode anyway, but otherwise can result in arbitrary delays in processing data.  Disabling interrupts before checking for an empty queue means any subsequent events will leave the interrupt pending bit set and presumably prevent / end sleep mode.  At that time interrupts will be re-enabled and the ISR will fire.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: ejeffrey on September 22, 2021, 11:29:55 pm
Looks fine to me. The interrupts don't use the tail pointer, and CPU read access to head pointers should be atomic, so the main will read always a valid variable. Don't forget to declare the pointers modified by the interrupts "volatile", otherwise the compiler might optimise your (Head == Tail) comparison out and you'll get never an input.

The entire data structure should be volatile or it could be cached or even optimized away entirely.

Quote from: Silicon Wizard
I personally favor adding an extra member indicating the number of items currently in the queue, instead of comparing Head and Tail. The benefit of this approach is that it makes it easy and fast to determine the current number of items in the queue, but also see if it's full or empty. Without tricks.

A related point is that when using this extra member, only its accesses need to be atomic. You don't need atomic accesses to Head and Tail, as long as there is only one 'thread' writing to the queue and another one reading from it, which is a very common case.

One advantage of the head and tail pointers is that each variable is only written from a single context even though it is read from both.  That means you only need atomic load/store which are guaranteed on many platforms simply by using the correct type.  With the size variable you need atomic read-modify-write in the form of increment or decrement operations. 
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: NorthGuy on September 23, 2021, 01:20:04 am
Another thing I don't quite like with this kind of approach: there is no difference between a queue full and a queue empty condition. Then you may be tempted to work around this with some tricks. It's not pretty and not easy to test and prove correct.

It is entirely possible that the buffer cannot overflow by design, in which case you don't need to test for overflows, so his method is perfectly perfect.

I usually maintain head and tail longer than necessary to address the buffer, at least by one bit. This way you can always calculate

Code: [Select]
number_of_elements_in_the_buffer = (head - tail);
and then check if it's full or empty.

When I access the buffer, I just and the tail and head with the mask, e.g.

Code: [Select]
value = [tail & MASK];
tail++;

or

Code: [Select]
buffer[head & MASK] = value;
head++;

assuming the buffer's size is power of 2 (as is the case with all my buffers).

This has added benefit that you can peek on the buffer very easily

Code: [Select]
value = buffer[(tail + look_ahead) & MASK];
without taking the elements out.

Assuming one thread only writes and the other only reads, this is thread safe if head and tail are accessed atomically. For 8-bit CPUs this limits the useful buffer size to 128 bytes, which is typically Ok. For others, never a problem.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: syntax333 on September 23, 2021, 10:52:25 am
Quote
If an interrupt arrives between checking head==tail and entering sleep mode then you have a race condition where the MCU will sleep while there is pending data.  This may turn out OK if you have a periodic timer that wakes up from sleep mode anyway, but otherwise can result in arbitrary delays in processing data.  Disabling interrupts before checking for an empty queue means any subsequent events will leave the interrupt pending bit set and presumably prevent / end sleep mode.  At that time interrupts will be re-enabled and the ISR will fire.

Yes that is what I was concerned about. I am going to assume that below code is thread safe(without race condition).
I won't disable/enable interrupts for tail incrementation since only CPU access tail after dispatching event and ISRs access Head.
If I have to post event to event buffer(self event) from main then I disable/enable interrupt for Head pointer and event buffer.

Code: [Select]
while(1)
{
    disable_interrupts();
    if(Head != Tail)
    {
         enable_interrupts();
         
         state_machine_dispatch(EventBuffer[Tail]);
         tail = ( tail + 1 ) & BUFFMASK;
    }
    else
    {
      //Low Power Mode
      enable_interrupts();
    }
}

Thank you everybody for their opinion.
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: snarkysparky on September 23, 2021, 07:45:29 pm

Before adding to Que

if(((Head +1) & BUFFMASK) != Tail)
{
    //  Add to que

}
else
{
    loose the message but not the que
}
Title: Re: Is this Circular Event Buffer thread safe? (STM32G431)
Post by: SiliconWizard on September 23, 2021, 09:03:29 pm
Looks fine to me. The interrupts don't use the tail pointer, and CPU read access to head pointers should be atomic, so the main will read always a valid variable. Don't forget to declare the pointers modified by the interrupts "volatile", otherwise the compiler might optimise your (Head == Tail) comparison out and you'll get never an input.

The entire data structure should be volatile or it could be cached or even optimized away entirely.

Quote from: Silicon Wizard
I personally favor adding an extra member indicating the number of items currently in the queue, instead of comparing Head and Tail. The benefit of this approach is that it makes it easy and fast to determine the current number of items in the queue, but also see if it's full or empty. Without tricks.

A related point is that when using this extra member, only its accesses need to be atomic. You don't need atomic accesses to Head and Tail, as long as there is only one 'thread' writing to the queue and another one reading from it, which is a very common case.

One advantage of the head and tail pointers is that each variable is only written from a single context even though it is read from both.  That means you only need atomic load/store which are guaranteed on many platforms simply by using the correct type.  With the size variable you need atomic read-modify-write in the form of increment or decrement operations.

Those are all assumptions that depend on the exact target. It's probably even micro-optimizing, while I favor correctness, portability and ease of maintenance. In any case, you can compare the two approaches. Chances are, there won't be much difference in terms of code size and performance. But I still find my approach more portable and easier to get bug-free.

As to your last remark, this is typically done in C11 with the following - assuming 'n' is an atomic type:

atomic_fetch_add(&n, 1); // increment
atomic_fetch_sub(&n, 1); // decrement