When you're writing in C, the underlying architecture is of minimal significance ...
Except when the underlying arch runs out of address space, then it becomes a pain in the neck. It's not really an issue for data/RAM, but it's getting easier to fill up your flash with networking code (e.g., BlueTooth, USB, TCP/IP, etc.) leaving little room for what actually makes the product valuable. For example, I tried putting an open-source BT stack on an MSP430 almost a year ago and nearly blew out the 64K address space before I got to add any application code. The 20-bit mspgcc implementation was still a little raw at that point, so I hopped over to Cortex M3 and never looked back.
Your points about pricing, packaging and "familiarity" are right on though. I suspect that most embedded engineers have more toolchain allegiance than code monkeys (like me) do. The proprietary toolchains make the segmented problems mostly disappear.
Another example of sorts is the Nike Fuel band you tore down a while back (I just watched the video again today :-). It has both an MSP430 *and* a low-power Cortex M3. Interestingly, the MSP430 is located right next to the BlueTooth tranceiver and I suspect the only thing that the TI part is doing is running a BT stack, with the ARM chip left to implement the "algorithm".
The STM32L family has more than enough juice to run a BlueTooth stack and the application code, but I think the lack of a qualified BT stack (for the M3) was the issue there. The hardware would certainly be cheaper with only one mcu, but that would require the stack vendor to port and/or open up their code to other architectures, which the vendor might have been reluctant to do.
You'd think that it'd be hardware driving the big design decisions for these products, but it seems like software (e.g., toolchains and networking stacks) is just as important.