I've found if you only wait for tx reg to be empty you will exit too early, causing problems if you raise CS just after that.
You're right, I've found that piece of code I wrote two years ago (skipped some meaningless lines):
59 unsigned short xfer_srio(void)
60 {
61 unsigned short out,in;
77 out=srio1_out | srio2_out;
78 SPI1->DR = out;
79 while (!(SPI1->SR & SPI_I2S_FLAG_TXE));
80 while ((SPI1->SR & SPI_I2S_FLAG_BSY));
81 in = SPI1->DR;
82 SRIO_LATCH_LOW;
83 asm("nop; ");
84 asm("nop; ");
85 asm("nop; ");
86 asm("nop; ");
87 asm("nop; ");
88 SRIO_LATCH_HIGH;
89 SRIO_OE_LOW;
90 srioz_out=out;
97 return in;
98 }
The nops are for creating a slightly longer latch pulse after having shifted the data in and out (some HC595 / HC 597 shift registers are involved here). I haven't touched this code since then and the device is still running. (used a STM32F303 afair). Never had problems using the SPI this way on various STM32F1 and F3 devices.
Using the SPI on a STM32F7 nucleo board (with STM HAL and Cube generated drivers) didn't work out well, these drivers appear to be quite buggy. Better roll your own.
This is the associated SPI init (GPIO init is done elsewhere):
void init_srio()
39 {
40 SPI_InitTypeDef SPI_InitStructure;
41
42 RCC_APB2PeriphClockCmd(RCC_APB2Periph_SPI1, ENABLE);
43
44
45 SPI_InitStructure.SPI_Direction = SPI_Direction_2Lines_FullDuplex;
46 SPI_InitStructure.SPI_Mode = SPI_Mode_Master;
47 SPI_InitStructure.SPI_DataSize = SPI_DataSize_16b;
48 SPI_InitStructure.SPI_CPOL = SPI_CPOL_High;
49 SPI_InitStructure.SPI_CPHA = SPI_CPHA_2Edge;
50 SPI_InitStructure.SPI_NSS = SPI_NSS_Hard;
51 SPI_InitStructure.SPI_BaudRatePrescaler = SPI_BaudRatePrescaler_16;
52 SPI_InitStructure.SPI_FirstBit = SPI_FirstBit_MSB;
53 SPI_InitStructure.SPI_CRCPolynomial = 7;
54 SPI_Init(SPI1, &SPI_InitStructure);
55 SPI_SSOutputCmd(SPI1, ENABLE);
56 SPI_Cmd(SPI1, ENABLE);
57 }
58