fresh example: SAM4S series uC. At 120MHz, using pure assembly, bit-banding and unrolled loop (eg. STR [addr1], 1; STR [addr2] 1, STR [addr1], 1..... etc) the result was 10MHz. Output drivers do well up to 30-ish MHz.
STM32 outputs can do up to 50MHz, obviously using timer, above that you get a sine-like waveform. Aside from timers, also hardware SPI can be used to generate very fast square waves. Just setup DMA to write indefinitely from some address.