And it's not just big company commercial application. Countless one-man-bands are doing Kickstarters these days and it's trivial to sell 1000 units in a kickstarter. That 70 cents extra on the BOM cost really matters at even 1000qty. And ordering your parts pre-programmed can be a big deal too. Also, it's not uncommon to have a little cheap pre-programmed 8bit 5 pin micro in a circuit just to do one small dedicated job, rather than have the main processor care about doing that.
Microchip Direct offers a flash service, which is reasonable cheap (at least for small parts, but I guess for larger parts it doesn't cost much more) and there is no lower limit for the number of parts, you could just order 10 pre-programmed chips (but shipping costs from Microship Direct would be higher than buying a PICKit from eBay
). E.g. for a client project I'm using a small PIC for the reset circuit, replacing a special dedicated >$1 reset IC, and additionally using it for controlling the main power of the circuit in sleep mode, which all fits in one of these small, ultra low power PICs, and programming costs only a few cents:
8 bit CPUs are perfect for such applications, because they are still cheaper than 32 bit CPUs (maybe this might change in future when there are more RISC5 CPUs out there, because no license cost) and they need less power.
But I don't think it is important, even for Kickstarter projects with 1,000 units. Sure, you might save $1,000 if you use an 8 bit MCU instead of a 32 bit MCU, but then you might need some additional days development time, trying to make it run with the less powerful MCU, and this would be worth more than what you save at the usual engineering rate. And this is not only because you have to do e.g. fixed point math instead of using a hardware floating point unit (you can get MCUs with hardware FPU for as cheap as EUR 2.55 e.g. for the ATSAMD51G18A nowadays). The peripherals of modern 32 MCUs (usually) are much better as well. For example last time I checked the ADC of the Silabs Giant Gecko series, and the internal band gap reference is factory calibrated and you can do hardware oversampling with it, resulting in 16 bit effective resolution. I could measure with better than 10 mV accuracy and better than 1 mV resolution over the full range from 0 V to 3.3 V out of the box, and with very few lines of code using the HAL. So this alone might save some additional external components and calibration setup time for some applications.
Of course, if you develop a dice with 6 LEDs and one battery, use an 8 bit MCU. Development time is the same, but the MCU is cheaper and needs less power. And numbers change again, if you plan to sell millions of it.