I was debugging some Cortex-M4 assembly code tonight when I noticed that my code would run fine when I single-stepped it using GDB, but crashed with a Usage Fault when I just let it run.
I traced the usage faults down to some "BX r0" instructions. I had forgotten to set bit 0 (the Thumb bit) in r0, and the branch caused a usage fault for trying to change modes to "ARM" mode, which Cortex-M MCUs don't support. After I made sure all instances of "BX r0" followed code to set the Thumb bit, my code ran fine.
The question is: why did it run at all when single-stepping? Why didn't the first "BX r0" it executed when r0 bit 0 was set to 0 cause a usage fault?