It can be tricky directly comparing two very different architectures. One of the other may be faster depending on what you take as a metric.
For DSP computation performance a good benchmark is FFT.
Here is a FFT implementation on the Cray 1:
https://www.ecmwf.int/file/24115/download?token=s6ilqSrlSo acording to the table it gets in compiled fortran for a single FFT:
N32: 157us
N64: 302us
N128: 736us
N1024: 7477us
But if you calculate 128 FFTs in parallel to fill up its wide datapaths and using a more efficient CAL compiler the average equivalent time per FFT is:
N32: 7us
N64: 15us
N128: 37us
N1024: 415us
And if we look at a modern ARM, say a CortexM4 at 180MHz in the form of a NXP Kinetis K66:
http://openaudio.blogspot.com/2016/09/benchmarking-fft-speed.htmlN32: 40000 per second = 25 us
N64: 21227 per second = 47 us
N128: 9524 per second = 105 us
This is with a library that is assembler optimised for ARM, in the generic architecture independant C implementation of KissFFT the speed is about 50% of that.
So if you take the first table then ARM is significantly faster, but if you take the second table Cray 1 is way faster. If you do the same test with fixed point FFT the results are again very different.