Author Topic: Interface mismatch, SPI protocol: ARM CPU to Microchip Nonvolatile memory (Read 1874 times)

AussieBruce · « **on:** August 31, 2022, 12:02:47 pm »

I’m trying to interface a Microchip nonvolatile memory device 25LC1024 with an ARM CPU, using SPI protocol. Reading the datasheets, there seems to be a significant mismatch, the memory requires the SSEL line to remain low all the way through the data transfer, which encompasses multiple bytes – an instruction byte, followed by 3 address bytes, then an additional data byte. Looking at the ARM SPI specs (also the other available peripheral, an SSI channel that can be configured for SPI), this streamed mode doesn’t appear to be available, SSEL returns high at the end of every byte transmitted.

That looks like a stopper to me, or am I missing something? If it is, can anyone suggest a SPI NV memory device that will work with the typical ARM SPI channel behaviour?

I’m locked into NXP ARM devices so changing to Microchip, which presumably will interface, isn’t an option here. CPU is LPC1768 but all the NXP devices I’ve checked have the same SPI behaviour.

newbrain · « **Reply #1 on:** August 31, 2022, 12:48:02 pm »

The SPI peripherals are a vendor choice, and their behaviour depends very much on the specific IP.
In NXP range, as an example, iMX RT 10xx series SPI allows to have continuous SSEL, provided that the transmit FIFO is kept non empty.

LPC176x do not have this capability, AFAICS, so the usual way is using a separate GPIO pin as SSEL, to be controlled by SW.

mikerj · « **Reply #2 on:** August 31, 2022, 12:51:19 pm »

You don't have to use the dedicated chip select line, you can just use a GPIO pin to control that signal instead so you can have whatever behaviour you need defined by your firmware. Dedicated chip select outputs for master mode SPI peripherals have a reputation for being annoying or not working as described.

ataradov · « **Reply #3 on:** August 31, 2022, 04:26:58 pm »

Hardware control of the SS line in the SPI peripheral is the stupidest thing in a world. It never works as you need it to work. Just use it in a GPIO mode and control it manually.

Scrts · « **Reply #4 on:** August 31, 2022, 06:06:45 pm »

Quote from: ataradov on August 31, 2022, 04:26:58 pm

Hardware control of the SS line in the SPI peripheral is the stupidest thing in a world. It never works as you need it to work. Just use it in a GPIO mode and control it manually.

I didn't have many problems and you need it when you use DMA.
However, in rare occasions, when it didn't work, the SPI SS could be alt-selected to be a regular GPIO pin by software.

ataradov · « **Reply #5 on:** August 31, 2022, 06:12:15 pm »

Quote from: Scrts on August 31, 2022, 06:06:45 pm

you need it when you use DMA.

No you don't. Set GPIO low, start DMA transfer, wait for the transfer to be done, set GPIO high.

It is actually the opposite in some cases, you don't want it with DMA. If at some point DMA can't fulfill the request in time, you will have a hole on the SS signal.

This happens on real devices all the time. And some controllers would include a protection method for this where first DMA transfer would set the SS low, but you have to manually release it at the end of the transfer. And I don't see this being any more clear or useful than just manual control.

rstofer · « **Reply #6 on:** August 31, 2022, 06:30:50 pm »

Quote from: ataradov on August 31, 2022, 06:12:15 pm

Quote from: Scrts on August 31, 2022, 06:06:45 pm
you need it when you use DMA.
No you don't. Set GPIO low, start DMA transfer, wait for the transfer to be done, set GPIO high.

It sometimes happens when sending data byte-by-byte that SSEL gets raised programmatically before the last byte clears the transmitter. There's a reason I know this...

ataradov · « **Reply #7 on:** August 31, 2022, 06:35:21 pm »

Quote from: rstofer on August 31, 2022, 06:30:50 pm

It sometimes happens when sending data byte-by-byte that SSEL gets raised programmatically before the last byte clears the transmitter.

Well, you need to make sure that you wait until the data is actually flushed from the shift register. You have all the information for this. It is far better than possible intermittent drops in SS when the bus gets busy or some other DMA transfers happen at the same time.

AussieBruce · « **Reply #8 on:** September 01, 2022, 06:06:32 am »

Thanks to all commenters. I'd partially figured out that the solution might be to handle SSEL myself, my reservation was whether the clock might have been doing things that interspersed dummy bits. However, a close look at the timing diagrams has dispelled that concern. Totally agree now, taking control of the select line is a nobrainer.

Siwastaja · « **Reply #9 on:** September 01, 2022, 09:18:43 am »

If you want automated things like some MCU peripheral transparently memory-mapping the external memory chip for you, then you would need hardware-nCS and compatibility between the two devices (same nCS usage pattern, or enough configurability on the MCU side).

Same if you need most performance with least CPU interaction.

But, usually you just toggle it in software. In such case, only having to toggle it for multi-byte packets instead of single bytes saves CPU work. It's the best case anyway, as the nCS will then work as a packet delimiter, setting the slave in correct state for each transaction even if the previous one missed a clock or something.

But as you probably won't run the SPI much over 20MHz anyway, 40-bit packet is already long enough so that two IO writes per packet is not a huge overhead; depending on how you design your program, it will be 1 or 2 ISRs per packet. If your SPI peripheral has FIFOs available, highest performance is just letting the FIFOs fill and in the same ISRs where you activate/deactivate the nCS, generate/process the data. If the FIFOs can't hold the 5 bytes, then DMA will be beneficial so that you won't need 5 extra ISRs for the data bytes.

peter-h · « **Reply #10 on:** September 02, 2022, 07:52:35 pm »

Quote

start DMA transfer, wait for the transfer to be done, set GPIO high.

It is a little bit more involved because the transmit completion happens too early to raise CS. One has to receive the data (which an SPI master always receives - even if it is junk) and when the transfer count has decremented to zero, then you know all the bits have been shifted out.

How exactly this works will depend on the chip, but on the 32F4, once you link a DMA to SPI, it is automatically bidirectional, and the transfer count is actually counting the receive data, so when NDTR=0 you are done.

If driving SPI in software then one definitely needs to make sure it is the receive data one is monitoring.

The mfgs should have provided a GPIO output which is LOW anytime NDTR != 0 (basically). On the face of it, it would have worked with just one SPI peripheral but with external gating, say a 3->8 demux, you could have loads of SPI slaves.

newbrain · « **Reply #11 on:** September 02, 2022, 08:49:10 pm »

Quote from: peter-h on September 02, 2022, 07:52:35 pm

How exactly this works will depend on the chip, but on the 32F4, once you link a DMA to SPI, it is automatically bidirectional, and the transfer count is actually counting the receive data, so when NDTR=0 you are done.

What are you talking about?

On the STM32F4, you can have DMA on Tx, Rx, both (on separate DMA channels, ça va sans dire) o none.
See chapter 28.39 in the RM:

Quote

When the SPI is used only to transmit data, it is possible to enable only the SPI Tx DMA channel. In this case, the OVR flag is set because the data received are not read.
When the SPI is used only to receive data, it is possible to enable only the SPI Rx DMA channel.

And the description of bit 0 and bit 1 in SPI_CR2 register:

Quote

Bit 1 TXDMAEN: Tx buffer DMA enable
When this bit is set, the DMA request is made whenever the TXE flag is set.
0: Tx buffer DMA disabled
1: Tx buffer DMA enabled
Bit 0 RXDMAEN: Rx buffer DMA enable
When this bit is set, the DMA request is made whenever the RXNE flag is set.
0: Rx buffer DMA disabled
1: Rx buffer DMA enabled

Maybe you were thinking of the HAL?

As for determining when a transmission ends, one can check the TXE flag, and then the BSY flag - I see an errata only for slave mode.
EtA: to clarify, one must check both, in sequence, as BSY is set with a small delay after a write to SPI_DR.

peter-h · « **Reply #12 on:** September 02, 2022, 08:57:40 pm »

Code: [Select]

/*
*
* DMA-only version of HAL_SPI2_TransmitReceive() but fixed for SPI3 and xx-only options added.
*
* For use where fast transfers are needed, on the limit of SPI3 speed so with zero gaps. This is impossible
* to do by polling at 10.5mbps or 21mbps and is probably marginal at 5.25mbps. The 16 bit SPI mode just
* manages gap-free with polling but works only with even block sizes, and has the "first byte problem" which
* DMA gets around.
*
* ** DMA ONLY SO NO CCM ACCESS SO THE TWO BUFFERS HAVE TO BE "STATIC" **
*
* This function is blocking so the caller can set CS=1 right away (check device data sheet!). A non-blocking
* version would make sense only if transmitting only (txonly=true) but the called would have to tidy up
* the DMA and SPI3->CR2.
*
*
* Two modes, obviously mutually exclusive, for tx-only and rx-only:
* If txonly, dumps rx data so you don't need to allocate a buffer for it
* If rxonly, transmits all-0x00 so you don't need to feed SPI with some "known garbage"
*
* The rx-only mode is superfluous in most cases but it does avoid shifting non-0x00 data to the device
* while we are reading data out of it. With some devices this can matter. The ADS1118 ADC is one such.
*
* The yield parameter just releases control to RTOS, so equal-priority tasks don't have to wait until DMA
* finished. This might be useful for long/slow DMA transfers.
* 31/8/22 if true, corrupts incoming USB VCP data!
*
* Returns (false) if memory is in CCM, size=0, null pointers...
*
* NULL pointers are allowed if using the txonly or rxonly modes; then the unused one can be NULL.
*
* Originally streams 2+5 were used but it didn't work and produced bizzare effects like transfers hanging
* with the transfer counters having 1 and 127 in them, for a required count of 1. Changing the streams to
* 0+7 fixed it. This is because DMA1 Stream 5 is also used by the wavegen DACs.
* https://peter-ftp.co.uk/screenshots/20220202184125715.jpg
*																								 
*/

bool SPI3_DMA_TransmitReceive(uint8_t *pTxData, uint8_t *pRxData, uint16_t Size, bool txonly, bool rxonly, bool yield)
{

	#ifdef TIMING_DEBUG
		ADS1118_an_2p5v_external(0);
	#endif
  	// Check for invalid inputs

	if ( Size==0 ) return (false);
  	if ( (pTxData==NULL) && !rxonly ) return (false);
  	if ( (pRxData==NULL) && !txonly ) return (false);

	uint32_t txadd = (uint32_t) pTxData;
	uint32_t rxadd = (uint32_t) pRxData;
	static uint8_t txonly_target=0;					// must not be in CCM
	static uint8_t rxonly_source=0;					// must not be in CCM

	// Test for CCMRAM (rw) : ORIGIN = 0x10000000, LENGTH = 64K (linkerscript.ld)
	if ( ((txadd>=0x10000000) && (txadd<0x10010000)) || ((rxadd>=0x10000000) && (rxadd<0x10010000)) )
	{
		return(false);
	}

	// DMA ch 1 clock enable already done in b_main.c
	//RCC->AHB1ENR |= (1u << 21);					// DMA1EN=1 - DMA1 clock enable
	//hang_around_us(1);							// give it a chance to wake up

	// DMA1 Ch 0 Stream 0 is SPI3 RX

	DMA1_Stream0->CR = 0;							// disable DMA so all regs can be written

	DMA1->LIFCR = (0x03d << 0);					// clear int flags & transfer complete - 111101 stream 0

	DMA1_Stream0->NDTR = Size;

	if (txonly)
	{
		DMA1_Stream0->M0AR = (uint32_t) &txonly_target;		// memory address to dump rx data to
	}
	else
	{
		DMA1_Stream0->M0AR = rxadd;					// memory address in normal mode
	}

	DMA1_Stream0->PAR = (uint32_t) &(SPI3->DR);		// peripheral address
	DMA1_Stream0->FCR = 0;							// direct mode

	if (txonly)
	{
		DMA1_Stream0->CR = 0 << 25					// CHSEL: ch 0
						|  0 << 23			  		// MBURST: memory burst - single transfer
						|  0 << 21					// PBURST: peripheral burst - single transfer
						|  3 << 16					// PL: highest priority
						|  0 << 15					// PINCOS: no peripheral address increment offset
						|  0 << 13					// MSIZE: memory data size: byte
						|  0 << 11					// PSIZE: peripheral data size: byte
						|  0 << 10					// MINC: memory address increment: 0
						|  0 << 9					// PINC: peripheral address increment: 0
						|  0 << 8					// CIRC: no circular mode
						|  0 << 6					// DIR: peripheral to memory
						|  0 << 5					// PFCTRL: DMA is flow controller
						|  1 << 0;					// EN: enable stream
	}
	else
	{
		DMA1_Stream0->CR = 0 << 25					// CHSEL: ch 0
						|  0 << 23			  		// MBURST: memory burst - single transfer
						|  0 << 21					// PBURST: peripheral burst - single transfer
						|  3 << 16					// PL: highest priority
						|  0 << 15					// PINCOS: no peripheral address increment offset
						|  0 << 13					// MSIZE: memory data size: byte
						|  0 << 11					// PSIZE: peripheral data size: byte
						|  1 << 10					// MINC: memory address increment: 1
						|  0 << 9					// PINC: peripheral address increment: 0
						|  0 << 8					// CIRC: no circular mode
						|  0 << 6					// DIR: peripheral to memory
						|  0 << 5					// PFCTRL: DMA is flow controller
						|  1 << 0;					// EN: enable stream
	}

	// DMA1 Ch 0 Stream 7 is SPI3 TX

	DMA1_Stream7->CR = 0;							// disable DMA so all regs can be written

	DMA1->HIFCR = (0x03d << 22);					// clear int flags & transfer complete - 111101 stream 7

	DMA1_Stream7->NDTR = Size;

	if (rxonly)
	{
		DMA1_Stream7->M0AR = (uint32_t) &rxonly_source;		// memory address to fetch dummy tx data from
	}
	else
	{
		DMA1_Stream7->M0AR = txadd;					// memory address in normal mode
	}

	DMA1_Stream7->PAR = (uint32_t) &(SPI3->DR);		// peripheral address
	DMA1_Stream7->FCR = 0;							// direct mode

	if (rxonly)
	{
		DMA1_Stream7->CR = 0 << 25					// CHSEL: ch 0
						|  0 << 23			  		// MBURST: memory burst - single transfer
						|  0 << 21					// PBURST: peripheral burst - single transfer
						|  0 << 16					// PL: priority low
						|  0 << 15					// PINCOS: no peripheral address increment offset
						|  0 << 13					// MSIZE: memory data size: byte
						|  0 << 11					// PSIZE: peripheral data size: byte
						|  0 << 10					// MINC: memory address increment: 0
						|  0 << 9					// PINC: peripheral address increment: 0
						|  0 << 8					// CIRC: no circular mode
						|  1 << 6					// DIR: memory to peripheral
						|  0 << 5					// PFCTRL: DMA is flow controller
						|  1 << 0;					// EN: enable stream
	}
	else
	{
		DMA1_Stream7->CR = 0 << 25					// CHSEL: ch 0
						|  0 << 23			  		// MBURST: memory burst - single transfer
						|  0 << 21					// PBURST: peripheral burst - single transfer
						|  0 << 16					// PL: priority low
						|  0 << 15					// PINCOS: no peripheral address increment offset
						|  0 << 13					// MSIZE: memory data size: byte
						|  0 << 11					// PSIZE: peripheral data size: byte
						|  1 << 10					// MINC: memory address increment: 1
						|  0 << 9					// PINC: peripheral address increment: 0
						|  0 << 8					// CIRC: no circular mode
						|  1 << 6					// DIR: memory to peripheral
						|  0 << 5					// PFCTRL: DMA is flow controller
						|  1 << 0;					// EN: enable stream
	}

	// Config SPI3 to let DMA handle the data. These need to be cleared when transfer complete!
	// This starts the transfer

	SPI3->CR2 |= 3;									// TXDMAEN, RXDMAEN: 11 - both set in one go
	SPI3->CR1 |= (1<<6);							// SPE=1 enable SPI

	// Wait for DMA to finish. Blocking is necessary to prevent device CS=1 too early.
	// There could be a timeout here but a failure is impossible short of duff silicon, because
	// we are a Master and generating the SPI clock.

	while(true)
	{

		// Either method below worked fine

		//uint16_t temp1;
		uint32_t temp2;

		//temp1 = DMA1_Stream0->NDTR;
		//if ( temp1 == 0 ) break;					// transfer count = 0

		temp2 = DMA1->LISR;
		if ( (temp2 & (1<<5)) !=0 ) break;			// TCIF0

		if (yield) taskYIELD();						// release to RTOS (see notes in comments)

	}

	SPI3->CR2 &= ~3;								// TXDMAEN, RXDMAEN: 00 - both cleared in one go

	// Clear int pending flags. They get cleared at the top of this function anyway, but...

	DMA1->LIFCR = (0x03d << 0);						// clear int flags & transfer complete - 111101 stream 0
	DMA1->HIFCR = (0x03d << 22);					// clear int flags & transfer complete - 111101 stream 7

    // Clear any rx data and the overrun flag in case not all received data was read

	SPI3->CR1 &= ~(1<<6);							// SPE=0 disable SPI

	//hang_around_us(1);
   	SPI3->DR;
	SPI3->DR;
   	SPI3->SR;

	#ifdef TIMING_DEBUG
		ADS1118_an_2p5v_external(1);
	#endif

   	return (true);

}

newbrain · « **Reply #13 on:** September 02, 2022, 09:21:49 pm »

Yes, and?

Maybe it's my limited English knowledge, but I don't see how:

Quote from: peter-h, emphasis mine

on the 32F4, once you link a DMA to SPI, it is automatically bidirectional

That's the way you have implemented it - IIUC based on the HAL - quite reasonable, though I did not check the details, apart from the yield you know I don't like.
The code always use two DMA channels, and check the receive one for the end of transaction , but this is just an implementation choice, where's the automatism?

Neither in the F4 SPI, which, apart from some combination of BIDIMODE and RXONLY, is always bidirectional, nor in the DMA, which requires one separate channel per direction, each one optional.

peter-h · « **Reply #14 on:** September 02, 2022, 09:36:03 pm »

Apologies - I thought that was normal.

That code is not HAL. It is one I have been using for a long time.

Quote

apart from the yield you know I don't like

That I don't understand.

newbrain · « **Reply #15 on:** September 02, 2022, 09:57:39 pm »

Quote from: peter-h on September 02, 2022, 09:36:03 pm

Apologies - I thought that was normal.

That code is not HAL. It is one I have been using for a long time.

Quote
apart from the yield you know I don't like

That I don't understand.

No worries, I was trying to understand if there was some kind of misconception behind it. Glad to see there's not.

As for the yield:

Blocks all lower priority tasks (I know, you don't have any today). This is against all good practices.
Does a lot of useless and expensive context switches.
May introduce a delay of up N-1 time slices in the loop (where N is the number of ready tasks at the same priority).

What's not to like

, especially considering that:

It breaks other parts of the code.

peter-h · « **Reply #16 on:** September 03, 2022, 06:00:38 am »

OK, yes, I have taken out the taskyield (as discussed in the other thread). OsDelay(1) works a lot better. But there are still errors, at a 10x lower rate, even with no switch at all, in USB code which is interrupt-driven and should not be affected by any of this, which suggests this is a timing issue of some sort, together with a bug in the ST USB code (which is ex Cube MX).

I would also totally agree doing any kind of "yield" while waiting for DMA to run is worthless unless moving a lot of data and/or the SPI clock is slow. For example I might be doing a few k bytes to an SPI RAM, or sending data to some ancient-design chip which has a max SPI clock of 500kHz.

Siwastaja · « **Reply #17 on:** September 03, 2022, 06:22:58 am »

Quote from: peter-h on September 02, 2022, 07:52:35 pm

It is a little bit more involved because the transmit completion happens too early to raise CS.

SPI master TX is completely deterministic and takes constant time, easily pre-calculated. Failing to use interrupt provided by the peripheral, just use a timer.

For continuous back-to-back transfers with devices which specify very short minimum nCS deactive period, you can just use one repeating timer interrupt which deactivates nCS, busy loops a few clock cycles if necessary, activates nCS and starts the next transfer (either using DMA or filling the SPI FIFO). This is the least amount of CPU intervention possible without hardware nCS. If the SPI slave requires longer nCS deactivation period, then you obviously would not use busy loop, but two separate interrupts.

It might be also possible to use some timer with output compare channels to generate the nCS waveform and also give triggers to DMA.

peter-h · « **Reply #18 on:** September 03, 2022, 06:37:38 am »

I prefer to keep things simple

I did this timing thing on a 2-wire RS485 product where the UART didn't have an "all sent" flag, so I had to wait until I saw TX buffer space and then I used a hardware timer. Then I had a table of timer values for every baud rate - just as one will need with SPI if feeding peripherals which use different SPI clocks. It was a bit of a hassle. In my current project I have 5+ slaves, at different rates from 650kHz to 21MHz.

It is IMHO better to use a (probably unused) DMA RX channel to receive the stuff you send out (well it won't be the same values, but it will be the right number of bits) and then you know when it's all gone out, without any timing, interrupts, etc. and can raise CS immediately.

It would be ingenious to feed the SPI clock to a timer set up as a counter, which raises CS after the right # of bits automatically. One would have to preload the counter compare register with #bytes * 8.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Interface mismatch, SPI protocol: ARM CPU to Microchip Nonvolatile memory (Read 1874 times)

Share me