My take on this is that the 125k bytes/sec limit is set by how fast the CPU can service interrupts, moving bytes from the buffers to the SPI TX. If this requirement is met, then there will be no underrun, because the CPU has already laid out the packet(s) in the buffers.
I posted this Q on the U-BLOX forum, here
https://portal.u-blox.com/s/question/0D72p000009bAGVCA2/detail?s1oid=00D200000001ogf&t=1651691683831&s1nid=0DB2p0000008PMD&emkind=chatterCommentNotification&s1uid=0052p00000AKKM8&emtm=1651683154013&fromEmail=1&s1ext=0and while nobody there actually knows, the speculation is that there is a buffer for each packet type. So once you have seen the header of a packet, there should be no underrun (no FFs returned) for the rest of that packet.
But the doc doesn't say that
It seems obvious that the SPI is extremely primitive. Just a 1 byte buffer, loaded by an ISR, hence the 8us/byte speed limit.
If you were doing this properly, you would lay out the whole packet in a buffer and then SPI could fetch this at the max SPI clock speed (5.5MHz in this case). But this module was designed for the UART, and SPI was just tacked on.
I will email Alphamicro; I buy from them.
EDIT: that thread I started on the u-blox forum ended up being completely ridiculous.
Grampy, who must be an OSI 127 layer protocol specialist, in between teaching me that an octet is a byte
ended up basically agreeing that if the SPI clock is faster than some value (which by over-interpreting the data sheet seems to be 1MHz) then you will be getting 0xFF bytes received
within a packet and then the binary protocol is
impossible to decode (as I've been saying all along).
And Clive (who is a really clever guy and knows these modules in huge detail) stopped disagreeing with the above and moved to telling me that getting the data out too slowly results in a latency in time+position. That is obviously true but actually running this over SPI at 125kbytes/sec (1MHz SPI clock) is still
vastly faster than running it via the UART at 38400 baud, and still about 9 x faster than running it at 115200 baud.
So SPI is well worth getting going.
My current RTOS GPS thread polls the UART to see if there is any RX data (if not, it yields to the RTOS) and processes that with a state machine. So in this case instead of polling the UART I will continually transmit 0xFFs, with a 1MHz SPI clock, chuck away any 0xFFs received, and process the other bytes exactly as I have been doing