STM32 MCUs have a BSRR register for GPIOs (bit set/reset), I'm not sure if it's a common ARM Cortex feature or specific to these.
As it is a peripheral, it does vary among Cortex-M implementations. I suspect that each chip manufacturer has their own GPIO subsystem they use.
STM32F4 series uses 32-bit registers, but only up to 16 pins per bank. 16 low bits of
GPIOx_ODR correspond to the output pin states;
GPIOx_BSRR combines set (low 16 bits) and clear/reset (high 16 bits). You can use 8-bit, 16-bit or 32-bit accesses to these registers. There is no toggle, but one can do
GPIOx_BSRR = bus_lookup[new_bits & BUS_MASK] | (bus_lookup[(~new_bits) & BUS_MASK] << 16);to set and clear the bits corresponding to the new bus state (any subset within the same bank, up to 16 pins) in a single register write (after computing the 32-bit value to be written). Also note that on STM32F4,
bus_lookup[] values only need be 16-bit, not 32-bit, but larger than 16-bit wide buses need to be split into sub-buses and will use different GPIO ports.
On a comparable NXP Kinetis K20 Cortex-M4 (MK20DX64VLH7, MK20DX128VLH7, MK20DX256VLH7), there are up to 32 pins per bank. The
GPIOx_PDOR corresponds to the output pin states,
GPIOx_PSOR is the set-bits register,
GPIOx_PCOR is the clear-bits register, and
GPIOx_PTOR is the toggle-bits registers. 8-, 16-, and 32-bit accesses are supported.
If we switch to Cortex-M7, like NXP i.MX RT106x (as used on e.g. Teensy 4), there are actually two different GPIO subsystems that one can select between, similar to how one selects between pin functions. Both only support 32-bit accesses. One is for DMA access, and the other is for faster, processor access. This means that on RT106x, if you have only one DMA'd parallel output bus, you can have it direct it to
GPIO[1-4]_DR, and only enable GPIO1-4 for the pins used by that DMA'd bus. Processor (
digitalWriteFast() etc.) uses
GPIO6-9_DR,
_DR_SET,
_DR_CLEAR, and
_DR_TOGGLE for manipulating pin states, which will not interfere with
GPIO1-4_DR and related registers, so the two really behave as separate peripherals.
A comparable ST Cortex-M7, STM32F7 family, has the ST-style
_ODR and
_BSRR registers.
Raspberry Pi Pico AKA RP2040 has a Cortex-M0+ core, but each processor-programmable pin is controlled in a separate register; then again, you do have the eight PIO state machines instead. NXP Kinetis KL26 sub-family also has Cortex-M0+ cores, but have the same GPIO registers as the K20 Cortex-M4 above.
In my experience, Cortex-M's provide at least the set and reset facility (either via
_SET/
_CLEAR, or
_BSRR), but toggle is rarer; and the entire lineup of Cortex-M's from the same manufacturer tends to have a very similar GPIO subsystem. (Without looking at 16-bit PIC MCUs, I'd guess they too have the
_ODR and
_BSRR register interface; I would not be too surprised if the subsystem implementation was really similar to ST's Cortex-M's, too. It'd just make sense, really.)
Funnily enough, even some 8-bit MCUs like ATtiny4/5/9/10, and even ATmega32u4, have a 'toggle' facility: When a pin is configured an output,
writing to
PINx (port input register) causes the corresponding bits that were set to toggle in the
PORTx registers. (That might be an AVR specialty, since I haven't seen same in 8051's or MC68HC08's.)