I think most applications that use a 1GHz MCU are really better served by an FPGA. For example high speed concurrent I/O, realtime DSP, or anything requiring low latency. Microprocessors can NOT multitask no matter how you dress it up, so they are no good for I/O. They also suck at DSP. I had an application that needed to demodulate a certain signal, and the most optimized NEON implementation on a cortex-A9 took a few hundred clock cycles per sample to do it. In contrast a half decent hardware implementation using very few FPGA resources (only one multiplier) would take no more than 30 cycles per sample.
The sad state of affairs is simply because most people can't be bothered to learn hardware design, and even when they do they stick to software -ism like "processes" or behavioral logic design. IMO FPGA design should be like designing a circuit - you describe the circuit elements at a high level (for example in terms of adders, multipliers, and sub-modules), but in the end you are still wiring up a circuit and not describing "operations" to be performed.