The process also isn't compatible*.
*FeRAM actually uses DRAM made of the same stuff (more or less) -- the hysteresis of the material (which manifests itself to us as useless power loss) is used to store nonvolatile data. I don't know how much the extra steps change the process, in terms of cost or suitability for other purposes. So, at least when it comes to generic parts... it's added cost if nothing else.
It would also preclude some off-label uses, e.g., using an LM317 as an RF oscillator.
The bigger chips are bypassed on device, though not on die necessarily. For many years, CPU modules have had local bypass integrated into the module, at least to handle the highest frequency transients. Arguably, the Pentium IIs were possibly the first to do this, but since a Slot module is just a regular PCB, that's kind of silly; the PGA versions may've done things this way though.
Altera, for example, recommends a good power supply impedance up to 100MHz or so, beyond which they don't care, meaning on-device reserves take over. FPGAs are usually chip-on-board, where the board carries the die and whatever other bits, under encapsulation, with balls on the bottom side.
Tim