Doesn't a shared var (one used in ISR's) also need to be flagged with volatile to prevent caching?
Also, unless you are absolutely certain that your particular processor supports reading/writing a particular integer type atomically, then it's best to not assume it. Different hardware will behave differently. For example, I wouldn't expect an 8-bit MCU to do 32-bit operations atomically. And technically the C and C++ languages don't make any guarantees of atomicity - not even for simple assignment statements on 32-bit hardware. If your data is naturally aligned then 32-bit ops will happen to be atomic - but not if you have packed structs that don't follow natural byte alignment.