Now that low processor power consumption is so fashionable, but computing power is still highly valued, I was wondering which processor will have the best ratio of these two variables.
Absolutely meaningless question without specifying the workload.
- do you need to run an OS, or bare metal?
- how much address space is needed?
- what is the element size of the important data?
- is the data integer? fixed point? floating point? Is overflow likely and does it need special handling e.g. saturation?
- does the algorithm use a small amount of code and a lot of data, or complex algorithms on small amounts of data?
- is the control flow predictable?
- is the data flow predictable?
- does the algorithm benefit from SIMD/SIMT?
- can the algorithm be partitioned into dozens / hundreds / thousands of independent cores? With how much communication?
There will be workloads most efficiently done by an 8 bit AVR-like ISA (whether or not Atmel's actual chips are power efficient). There will be workloads most efficient done by an FPGA. There will be workloads most efficiently done by a GPU. There will be workloads most efficiently done with a very wide SIMD/Vector e.g. 16k bits or more.
Ditto for Google's TPUs, Esperanto's ET-SoC-1 chip with 1088 simple in-order ET-Minion cores, each with big FP and int vector units, Microsoft's Azure Cobalt 100 with 128 Arm N2 cores ...
Pretty much every different or weird thing out there is aiming to get maximum performance per Watt or per dollar on something.