It's all about the peripherals only when the core is a good modern one.
...only when the core is "good enough" for a particular application, regardless of age.
The success of Microchip, and even the 8051 would suggest that these are good enough for a very large range of applications.
I'd contend that within a given class of MCUs ( width, clock speeds) , there is very rarely enough differentiation in the core alone to matter to the majority of people compared to peripherals, power and other non-core parameters.
For example, I did the HW/SW to a little handheld device recently. A few pushbuttons, LEDs, etc., you can change rate/intensity of the mechanical function among a few presets, and, that's about it. BMS/charger, and MCU with power-down state. Pretty much anything would do, but a pure software solution (plus GPIOs say) would need... well, not sure, as far as bit-banging the LEDs, which I did as a dimmable matrix. Maybe some 10s of MIPS if pure CPU. Needless to say, just one timer helps out greatly with that. I also used ADC to compensate for Vbatt, and another timer to PWM the mechanicals.
Almost anything has that set of features available, so, aside from size, probably most 80s MCUs even could've done that. I went with AVR, because of course, and ran at all of 4MHz. Plenty of CPU cycles to spare (plain C, no particular effort spent on optimization). Any dank old PIC or 8051 can do that. Well, I did need a couple multiplies (compensate for Vbatt), which maybe an 8051 wouldn't be able to do fast enough to reduce flicker. But one with hardware multiply would likely be fine.
Whereas, sometimes you have piddly little applications like this, but they need communications instead: I2C, SPI, USART, CAN, IrDA... Some to talk with external peripherals, some for direct communication, some just to generate relevant waveforms (like using SPI to generate WS2812 pulses, or IrDA, SPI, USART or others for generating remote control signals, radio comms / modem stuff, etc.). Or maybe you don't even need a timer (or many), or ADC, etc., but you're using up every single one of the six USARTs provided...
Another recentish project used AVR-DA's TCD as very much a core element; this could probably be handled by a bit of external logic with more traditional / basic timers, but having all the features integrated not only reduces external hardware but reduces CPU responsibility as well (the fault disable mechanism, routed via event system, means the CPU can do just whatever in the mean time). Or, put still another way: it reduces a traditional DAQ-hardware system, say CPU+ADC+DAC - TL494, to just MCU. If a minimal chip solution is required, it simply wouldn't be possible without a timer like this, and/or a regular timer with EVSYS and CCL.
So there's very much value in spamming every possible peripheral that can fit, and letting them do as much work as possible. The CPU needn't be all that important, sometimes it is; or sometimes you can get by with fewer/simpler peripherals thanks to extra CPU power, etc. Although, as CPUs get more complex, so too the peripherals, after all why shouldn't they?
Another perspective: look at what's been done on any classic PC or game console. Partly by absolutely heroic ASM coding, but also by eking every last clock cycle of value out of the available peripherals. Raster graphics devices, you see raster bars and other scanline tricks a lot; there's a bias for horizontal banding and movement. But you'll never* see full-screen rotation and scaling on a Sega Genesis, that feature is unique to the contemporary SNES hardware.
*There are hacks, like moving around tiles/sprites in software. The blocks themselves don't rotate, but their arrangement does. Which looks alright -- correct, even, at small angles. And, sprites up to a modest size can be blitted directly by the CPU -- thanks to the powerful 68k, something the SNES' pitifully slow 65C816 can't hope to do -- but not enough, I think, to cover the full screen, or while doing more than basic animation on other layers.
Or the modern C64/Amiga/etc. scene, people crafting accelerator cards for them for example. Basically plug in an SBC and use the base machine as a graphics terminal. Or, the main CPU is still in control as such, but the amount of power at its proverbial fingertips is as a magic oracle, able to compute whole megabytes in the blink of a scan line. For such an interface, you need to somehow enumerate exactly what operations can be performed (or somehow program them i.e. drop in "microcode" for the accelerator to run), but the number of operations can be extremely large, even for very modest APIs (e.g. 65k different operations from a 16-bit index); and if one does what you're looking for, or the composition of a few operations will get you there -- why not, right?
Or for a more realistic example: your boss doesn't need to be smart, as in, able to do every job under them; they just need to be responsible, and understand what those jobs do, how they go together, etc. (Much as we would like them to know better... the fact remains, there are more than enough successful (read: not constantly going bankrupt) companies which fit this pattern, for better or worse.)
Tim