Small update..
Read the docs, which says not to write to TxBuffer space while a TCP packet is transmitted. However, I'm not sure why. I think they don't want you to change the Tx WR pointers while a packet is in transmission. But why should the buffer space be inaccessible? I suppose they would use dual-port RAM to have the Ethernet packet engine and SPI engine operate independently.. So I went ahead and implemented this dual process anyway.
The old code was:
while(1) {
// check for available space
do {
fsr = le2be(SpiW5500_Read<uint16_t>(s0, 0x20));
} while (fsr < sizeof(myPacket));
// write packet
myPacket.data[0]++;
SpiW5500_Write<Packet1K>(s0tx, wp, myPacket);
wp += sizeof(myPacket);
SpiW5500_Write<uint16_t>(s0, 0x24, le2be(wp)); // Tx WR pointer
// transmit packet
SpiW5500_Write<uint8_t>(s0, 1, 0x20); // Command: Send
// wait till transmit is done
do {
ir = SpiW5500_Read<uint8_t>(s0, 2); // Rd IR (SendOK = 0x10)
} while ((ir&0x10) == 0);
SpiW5500_Write<uint8_t>(s0, 2, 0x1F); // Clr IR (SendOK = 0x10)
This first checks for free space, then writes new 1K packet, and transmits it, which is sequential and slow.
New code does not:
while (1) {
fsr = le2be(SpiW5500_Read<uint16_t>(s0, 0x20)); // Rd Tx freespace
ir = SpiW5500_Read<uint8_t>(s0, 2); // Rd IR
if (!hasWritten && fsr > sizeof(myPacket)) { // write packet when space is available and it wasn't done so
myPacket.data[0]++;
SpiW5500_Write<Packet1K>(s0tx, wp, myPacket);
wp += sizeof(myPacket);
hasWritten = true;
}
if (ir & 0x10) { // Check if triggered: SendOK = 0x10, which arms the send command
SpiW5500_Write<uint8_t>(s0, 2, 0x10); // Clr IR (SendOK = 0x10)
canSend = true;
}
if (hasWritten && canSend) { // new data written & send armed => set new pointer and transmit
SpiW5500_Write<uint16_t>(s0, 0x24, le2be(wp));
SpiW5500_Write<uint8_t>(s0, 1, 0x20); // Command: Send
hasWritten = false;
canSend = false;
}
}
This code works fine. No TCP retransmits or weird issues to be seen in Wireshark, and the packet counter is working normally.
For tests, socket 0 has maximum buffer space (16KiB). Throughput went up from ~27.8Mbit/s (old code, SPI clock of 43.75MHz) to 37Mbit/s (new code).
When I change packet size to MTU (1472bytes), the throughput peaks at 28.1Mbit/s (old) and 38.4Mbit/s (new).
Improved SPI driver (without STM32 HAL overhead): max 28.75 / 39.3Mbit/s respectively.
All throughputs were measured in user space with a small python socket script.
That last test translates 90% throughput of the SPI bus clock to application throughput. That's a nice promise for designs that can get up to 70 or 80MHz SPI clocks. (which hopefully I can when I fix the board-to-board connectors).
Not sure why I see so many benchmark figures of this chip with lower numbers. Obviously SPI bus is a bottleneck (this STM32H7 has a FIFO, so with CPU cycles it doesn't need DMA to saturate the SPI bus), but if you DMA that it combined with this 'trick' it should be a matter of maximizing the SPI clock.