It gets complicated. A lot of uCs can't run code from their RAM (eg AVR), because they have strict "Harvard Architectures" where the program memory bus is completely separate, and perhaps even a different size, than the data memory bus. (PIC baseline architectures has 12bit program memory, for example.)
Some of the MIPS (PIC32) and ARM microcontrollers can execute from RAM, but they have a "modified" Harvard Architecture that still has a separate bus to flash for program fetches. That means that while the flash is slower than RAM, using RAM results in more bus contention if you're trying to use the RAM bus for both instructions AND data. So it's not easy to decide what can best benefit from being moved to RAM. (I'm reminded of my beloved PDP10, on which the 16 "registers" were also addressable as memory, and you could make code faster by copying it into the registers and running it there. But then you'd have fewer registers available, and had to absorb the overhead of moving the code into the registers...)
There was at least one DSP-like processor I read about some years ago that had large amounts of on-chip RAM, and for NORMAL execution it would load the RAM from external nv memory of some kind (flash has gotten faster since then, though.)
At some point, vendors give up and put a well-defined instruction cache on the chip, instead of those quirky "flash accelerators", and then it doesn't matter so much.