That depends a lot on the actual core and implementation. Most implementations have at least write buffers. Reads will block execution, but may or may not block the bus entirely.
And no, there will be no status bits, it will just work as you expect. Except writes may happen much later in time than the actual instruction. There are synchronization barrier instructions that help dealing with this.
And some devices (like Atmel/Microchip SAM Dxx) will have specially designed peripherals that themselves will or may need synchronization, but the actual bus transfer happens as usual, peripheral just takes time to prepare the result, and you have to actively poll for it.