You've still not quite answered what kind of benefits you'd see using FPGAs instead of MCUs, and that question alone would help direct the discussion IMO if you really want to discuss it / have maybe some ideas that haven't been disclosed yet.
As I said, I see one benefit: if you need custom logic in your design that can't be implemented at all, or maybe not as reliably, in pure software on a MCU. This part, only you can see what you had in mind.
But if the main reason was that you thought that could actually be a way of reducing the power consumption, as we've explained above, it just won't work, as a general-purpose reconfigurable logic IC can't beat a fully custom IC such as a MCU. It's basic physics.
Now of course, you may achieve lower power consumption with a 100% custom design adapted to a specific application. You'd need to design an ASIC to achieve this, not use an FPGA, and possibly target process nodes that are out of reach for anyone but the larger companies. And doing this would have to be seriously justified.
Could you please explain in details this "basic physics.". This "basic physics." is not obvious for me.
An FPGA is just a bunch of logic "units" that can be configured and wired together through configuration. It's easy to understand that a directly synthesized logic function on an IC will use only the exact number of transistors required and nothing more. The same function synthesized/mapped to an FPGA will never be as efficient. It will always have to make use of the existing logic "units" (or tiles) that overall will make up for a lot more transistors than strictly needed. Also, all the unused tiles will still draw power - static of course, but also dynamic, as clock gating in FPGAs really depends on the underlying architecture and can never be done at the transistor level! So whole block of unused transistors are still likely to get clocked. Then FPGAs also contain large clock distribution structures, that will also draw significant dynamic power even if you use very little logic tiles.
The basic physics is mostly about that: an optimized implementation (using just the needed resources) vs. an implementation on a general-purpose, reconfigurable device. It's almost impossible to make the latter as efficient.
In practice, if your needs for added custom logic are very modest, only a very small reconfigurable logic block could be enough and still be adequate power-wise. It will still not be as efficient as a custom implementation, but could still be interesting. As some have mentioned above, there are now some SoCs that contain MCU cores + FPGA. Unfortunately, they are often targeted for "performance" and general-purpose applications rather than small and low-power ones. That said, I think a few MCUs actually contain very small blocks of reconfigurable logic. I don't remember off the top of my head, but I think Microchip has a couple of them, Cypress may as well? (Others probably, I can't remember exact models though.) AFAIR, none are on really "ULP" MCUs though. The market is probably just not large enough.
Also as others have said, I'm pretty sure you can get by with embedded peripherals on a decently chosen MCU by maybe re-thinking your design and/or selecting different sensors. Not quite sure I really got the problem here. It's something about being able to access external peripherals/sensors while the MCU is in deep sleep or something? I'm not sure what the practical use case is though. It would all depend on how frequent the acccesses are I guess, and how the data is going to be used? If the MCU is not waken up, all you could do (with the right embedded SPI or I2C peripheral and DMA) is to store some values in RAM and nothing else. You'd still have to find an MCU with a DMA that can run while the whole CPU is in deep sleep. If no DMA is involved, accessing an external peripheral while in deep sleep is kinda pointless as you can't do anything with the data? Also, some ULP MCUs now provide very short wake-up time from deep sleep, so just waking up, issuing a small transfer (during which it can go back to sleep) and then do whatever with the data and get back to deep sleep is largely usable, and the average power consumption can be negligible. Unless again the access is very frequent, in which case I'm not sure what you'd do with all the data without ever waking up...