I have narrowed it down quite a bit.
The data loss happens only when certain debugs from the TLS code are enabled. These come out via USB VCP. So now I am dealing with something more complex: the USB VCP code. But that actually runs pretty well in other contexts. Only when invoked from TLS does this surface.
And USB VCP is unbelievably complex.
I will do a bit more work on this but might just document that that specific debug should not be used together with that serial port copy function. It is an extremely unlikely scenario anyway.
The issue disappears if I put a 1-2ms delay after the USB VCP debug! This is not wholly surprising since sending data to the USB host (a PC) is supposed to work with flow control (see past threads on USB VCP flow control) but this is such a complex area and almost nobody understands it. In this project, flow control has been implemented and tested, but it is hard to test because of the high data rates involved. The serial port (uart) data loss occurs only when a large amount of data (a few k) is being sent out via USB VCP.
So almost certainly nothing to do with the RTOS priorities
Does the serial port hardware have a HW fifo overflow flag?
I put in a GPIO wiggle in the rx ISR and verified that when the data loss occurs, the right number of interrupts have occurred.
Have you actually tried that: making the buffer impractically large? Does that actually work?
I can't really do that; not enough RAM left. But I did check that even with tiny packets, say 10 bytes, data is still lost. With 10 byte packets, sometimes the whole 10 get lost.
But it could well be something to do with the USB interrupt (the STM 32F4 USB CDC code is entirely interrupt driven) affecting the UART interrupt. The latter is pretty simple but the former is, as I said above, so complex...
It could be something simple but after a day spent on an extremely narrow-applicability scenario...
Hard to debug too, with TLS running infrequently.
Thank you all for your suggestions