There are STM32H7 parts that combine an M7 and an M4. For more cores, the Propeller parts have eight "Cogs", which claim 80+MIPS each on the latest parts.
I don't know how many applications there are for MCUs that really would benefit from >2 cores. Like needing to manage a wireless stack plus a user application is a common use case, or generally one hard real time block plus some less time sensitive processing, which are sensible applications for dual core parts, but beyond that the market probably just isn't there to support a lot of 3+ core MCUs. Especially at the M0 level, there's a lot of room to just move up to a higher performance single- or dual-core part. You'd have to bolt together a ton of ~50DMIPS M0 cores to get the same raw processing capacity as a single ~1200DMIPS M7 that has an FPU, cache, etc. The application processor space is very different, where you're already generally squeezing the maximum reasonably possible performance out of a single core so the only way to increase performance is to add cores.