Incidentally, if I drop the FU540 down to 37.75 MHz (the slowest clock the board can generate) then I get:

`* memcpy(): 1 KiB, Exec. cycles = 554 (0.54) cycles/byte)`

* memcpy(): 2 KiB, Exec. cycles = 1016 (0.50) cycles/byte)

* memcpy(): 4 KiB, Exec. cycles = 1969 (0.48) cycles/byte)

* memcpy(): 8 KiB, Exec. cycles = 3906 (0.48) cycles/byte)

* memcpy(): 16 KiB, Exec. cycles = 9683 (0.59) cycles/byte)

* memcpy(): 32 KiB, Exec. cycles = 19035 (0.58) cycles/byte)

* memcpy(): 64 KiB, Exec. cycles = 115153 (1.76) cycles/byte)

* memcpy(): 128 KiB, Exec. cycles = 257464 (1.96) cycles/byte)

* memcpy(): 256 KiB, Exec. cycles = 486513 (1.86) cycles/byte)

* memcpy(): 512 KiB, Exec. cycles = 943371 (1.80) cycles/byte)

* memcpy(): 1024 KiB, Exec. cycles = 1751138 (1.67) cycles/byte)

* memcpy(): 2048 KiB, Exec. cycles = 3342729 (1.59) cycles/byte)

* memcpy(): 4096 KiB, Exec. cycles = 6900410 (1.65) cycles/byte)

* memcpy(): 8192 KiB, Exec. cycles = 14309351 (1.71) cycles/byte)

* memcpy(): 16384 KiB, Exec. cycles = 28623709 (1.71) cycles/byte)

* memcpy(): 32768 KiB, Exec. cycles = 57386550 (1.71) cycles/byte)

* memcpy(): 65536 KiB, Exec. cycles = 114705607 (1.71) cycles/byte)

So slowing the CPU down doesn't slow down the RAM by as much. (I hit ^C before completion of the program..)

I can get 2.00 cycles per byte at 182 MHz:

`* memcpy(): 1 KiB, Exec. cycles = 544 (0.53) cycles/byte)`

* memcpy(): 2 KiB, Exec. cycles = 1013 (0.49) cycles/byte)

* memcpy(): 4 KiB, Exec. cycles = 1937 (0.47) cycles/byte)

* memcpy(): 8 KiB, Exec. cycles = 3859 (0.47) cycles/byte)

* memcpy(): 16 KiB, Exec. cycles = 9441 (0.58) cycles/byte)

* memcpy(): 32 KiB, Exec. cycles = 33845 (1.03) cycles/byte)

* memcpy(): 64 KiB, Exec. cycles = 70787 (1.08) cycles/byte)

* memcpy(): 128 KiB, Exec. cycles = 160051 (1.22) cycles/byte)

* memcpy(): 256 KiB, Exec. cycles = 447387 (1.71) cycles/byte)

* memcpy(): 512 KiB, Exec. cycles = 990994 (1.89) cycles/byte)

* memcpy(): 1024 KiB, Exec. cycles = 1994527 (1.90) cycles/byte)

* memcpy(): 2048 KiB, Exec. cycles = 3930281 (1.87) cycles/byte)

* memcpy(): 4096 KiB, Exec. cycles = 7860005 (1.87) cycles/byte)

* memcpy(): 8192 KiB, Exec. cycles = 16408431 (1.96) cycles/byte)

* memcpy(): 16384 KiB, Exec. cycles = 33264848 (1.98) cycles/byte)

* memcpy(): 32768 KiB, Exec. cycles = 66778139 (1.99) cycles/byte)

* memcpy(): 65536 KiB, Exec. cycles = 133847271 (1.99) cycles/byte)

* memcpy(): 131072 KiB, Exec. cycles = 267943099 (2.00) cycles/byte)

* memcpy(): 262144 KiB, Exec. cycles = 536411759 (2.00) cycles/byte)

Which works out to about 87 MB/sec at that clock speed. Or 21 MB/sec at 37.75 MHz.