the data stage could be much bigger than one sector,
I may well be misunderstanding but Windows always writes in multiples of 1 sector to the flash. It can't do anything else because that is the standard removable block device API. The write-flash func is just 1 sector.
At the higher USB stack level it may well be more, and the USB stack data buffers are a few k (IIRC) so yes probably a few sectors get in there and then there are multiple calls to my flash-sector function.
The plan would be to offload each flash write to an RTOS task, which checks a global flag every say 5ms and then picks up a buffer address and flashes it.
check if the flash is busy before STORAGE_Write_FS() call
This gets complicated because while I have a flash check status function, it can't be called from within an ISR because checking the flash status is done over SPI
This is why my flash r/w func is blocking
on the SPI portion. That conveniently produces a fully blocking
read function. But it doesn't produce a fully blocking
write function, because the SPI part is ~200us (and is fully blocking) and then you have ~17ms of the internal programming cycle.
It is actually very interesting how this can work. Until yesterday, the sector write function was blocking only on the SPI part but I realised that it was a miracle it worked, with USB interrupts being able to get in there
during the internal programming cycle. I think it worked because the flash is happy to continue the programming cycle while SPI is getting on with something else, and if SPI gets another write, that just gets delayed. I now have a fully blocking flash write, all 17ms of it, and it has reduced some "rare funny stuff" to zero.
No matter how I look at this, I think disabling USB interrupts is essential during all flash SPI activity, but that is quick and affects USB for only a short time (under 300us) and obviously doesn't affect anything else.
So yes I have a check status function but it could be delayed for 300us.
ready: proceed normally
That will work for the first sector, only
busy: STALL the data stage, send CSW with csw.dCSWDataResidue = cbw.dCBWDataTransferLength
I think this is problematic due to having to test it with "everything out there".
The Q is how do flash sticks do this? They all work.
they just set up direct DMA xfers from USB peripheral to NAND sequencer and wait for completion
Which, if I understand right, is exactly what I am doing, with the only issue being that the 17ms USB ISR is affecting other stuff in
my system. The host sees my system as it would see a USB stick. No CSW until write (sector or block?) complete, and note that most "cheap" flash chips have a 4k blocksize so probably no CSW for 4k (but then big flash chips have much faster writes than mine).
Well, it also affects the USB VCP (CDC) data flow, which has the benefit of some host retries but which I found gets corrupted if you are doing solid writing. This could be solved by documenting it
I think I need to implement what ataradov is suggesting, but I am not smart enough to understand the ST USB stack to implement the "return CSW after flash write ends" part.
The suggestions in the other thread that I write a "driver" for the flash just postpone the problem further down. One still needs a way to hold off the host on a series of USB writes.