It can be tricky directly comparing two very different architectures. One of the other may be faster depending on what you take as a metric.

For DSP computation performance a good benchmark is FFT.

Here is a FFT implementation on the Cray 1:

https://www.ecmwf.int/file/24115/download?token=s6ilqSrlSo acording to the table it gets in compiled fortran for a single FFT:

N32: 157us

N64: 302us

N128: 736us

N1024: 7477us

But if you calculate 128 FFTs in parallel to fill up its wide datapaths and using a more efficient CAL compiler the average equivalent time per FFT is:

N32: 7us

N64: 15us

N128: 37us

N1024: 415us

And if we look at a modern ARM, say a CortexM4 at 180MHz in the form of a NXP Kinetis K66:

http://openaudio.blogspot.com/2016/09/benchmarking-fft-speed.htmlN32: 40000 per second = 25 us

N64: 21227 per second = 47 us

N128: 9524 per second = 105 us

This is with a library that is assembler optimised for ARM, in the generic architecture independant C implementation of KissFFT the speed is about 50% of that.

So if you take the first table then ARM is significantly faster, but if you take the second table Cray 1 is way faster. If you do the same test with fixed point FFT the results are again very different.