There are none STM32 chips with GPIOs sitting on a bus slower than the CPU core, are there?
Oops, missed that.
Well, actually yes there are. They implement pretty much the same architecture as most Cortex-based MCUs I've seen (there may be exceptions, but haven't seen them myself), following the AMBA architecture. Peripherals, including GPIOs, are usually connected to one (or more) APB bus. There's one bridge per APB bus connecting to the AHB bus. APB busses are typically clocked with a separate, and often slower clock (programmable on many MCUs.) Some higher-performance MCUs have some of their peripherals (like Ethernet) directly connected to the AHB bus, but most still are just connected to an APB. Some MCUs also have their GPIOs directly connected to AHB instead. I can't tell off the top of my head which STM32 lines have their GPIOs on AHB and which on APB, but I'm pretty sure both approaches have been implemented.
Edit: Took a deeper look to give more precise information. The STM32 series in which the GPIOs are on an APB bus is the famous F1 series (including the very popular F103). On other series, such as the more recent L4, GPIOs are on the AHB2 bus, which can be accessed faster than a typical APB, but AHB2 itself can be clocked at a lower speed than the CPU core. So synchronization issues are still potentially there (but would be due to a silicon bug).
On older ARM-based (non-Cortex) MCUs not implementing this AMBA architecture (which I think dates back to early 2000's), that may have been a problem indeed, but I don't know of any current STM32 that doesn't?
But it's still not completely impossible that some silicon bug in the AHB-APB bridges would require the use of workarounds (ot between the different AHB busses when there are several). I just haven't seen any myself in all STM32 (and most other Cortex-based) MCUs I've run into.