Electronics > Microcontrollers

Anyone here familiar with LWIP?

(1/21) > >>

peter-h:
This is a weird one. I posted it on the ST forum but nobody there replies (a huge number of posts every day).

To interface LWIP to the low level ETH hardware, one has a file normally called ethernetif.c. This contains low_level_input() and low_level_output() functions. In the simplest case (mine) these are polled, not interrupt-driven.

My latter function - this code is all over the internet, in pretty well identical form - is


--- Code: ---static err_t low_level_output(struct netif *netif, struct pbuf *p)
{
  err_t errval;
  struct pbuf *q;
  uint8_t *buffer = (uint8_t *)(EthHandle.TxDesc->Buffer1Addr);
  __IO ETH_DMADescTypeDef *DmaTxDesc;
  uint32_t framelength = 0;
  uint32_t bufferoffset = 0;
  uint32_t byteslefttocopy = 0;
  uint32_t payloadoffset = 0;

  DmaTxDesc = EthHandle.TxDesc;
  bufferoffset = 0;
 
  /* copy frame from pbufs to driver buffers */
  for(q = p; q != NULL; q = q->next)
  {
    /* Is this buffer available? If not, goto error */
    if((DmaTxDesc->Status & ETH_DMATXDESC_OWN) != (uint32_t)RESET)
    {
      errval = ERR_USE;
      goto error;
    }
   
    /* Get bytes in current lwIP buffer */
    byteslefttocopy = q->len;
    payloadoffset = 0;
   
    /* Check if the length of data to copy is bigger than Tx buffer size*/
    while( (byteslefttocopy + bufferoffset) > ETH_TX_BUF_SIZE )
    {

  //osDelay(2); // TODO

      /* Copy data to Tx buffer*/
      memcpy( (uint8_t*)((uint8_t*)buffer + bufferoffset), (uint8_t*)((uint8_t*)q->payload + payloadoffset), (ETH_TX_BUF_SIZE - bufferoffset) );
     
      /* Point to next descriptor */
      DmaTxDesc = (ETH_DMADescTypeDef *)(DmaTxDesc->Buffer2NextDescAddr);
     
      /* Check if the buffer is available */
      if((DmaTxDesc->Status & ETH_DMATXDESC_OWN) != (uint32_t)RESET)
      {
        errval = ERR_USE;
        goto error;
      }
     
      buffer = (uint8_t *)(DmaTxDesc->Buffer1Addr);
     
      byteslefttocopy = byteslefttocopy - (ETH_TX_BUF_SIZE - bufferoffset);
      payloadoffset = payloadoffset + (ETH_TX_BUF_SIZE - bufferoffset);
      framelength = framelength + (ETH_TX_BUF_SIZE - bufferoffset);
      bufferoffset = 0;
    }
   
    /* Copy the remaining bytes */
    memcpy( (uint8_t*)((uint8_t*)buffer + bufferoffset), (uint8_t*)((uint8_t*)q->payload + payloadoffset), byteslefttocopy );
    bufferoffset = bufferoffset + byteslefttocopy;
    framelength = framelength + byteslefttocopy;
  }
 
  /* Prepare transmit descriptors to give to DMA */
  sys_mutex_lock(&lock_eth_if_out);
  HAL_ETH_TransmitFrame(&EthHandle, framelength);
  sys_mutex_unlock(&lock_eth_if_out);
 
  errval = ERR_OK;
 
error:
 
  /* When Transmit Underflow flag is set, clear it and issue a Transmit Poll Demand to resume transmission */
  if ((EthHandle.Instance->DMASR & ETH_DMASR_TUS) != (uint32_t)RESET)
  {
    /* Clear TUS ETHERNET DMA flag */
    EthHandle.Instance->DMASR = ETH_DMASR_TUS;
   
    /* Resume DMA transmission*/
    EthHandle.Instance->DMATPDR = 0;
  }
  return errval;
}
--- End code ---

Note the commented-out osDelay(2). This fixed a bug whereby a large data transfer (2MB) was fairly often corrupted. An examination of the data found that the size was preserved but some data appeared in 2 places consecutively, as if a buffer was being written before previous data was extracted out of it (by the dedicated DMA controller which services the 32F4 ETH subsystem).

I suspected the memcpy was running while the DMA was still reading that buffer. That delay immediately fixed the issue. I use (2) rather than (1) because osDelay(1) is actually 0ms to 1ms.

The obvious Q is: how come this works for other people? Or maybe lots of people fixed this bug and never posted about it. OTOH I can easily see that most users would never discover it, because it shows up only on

- big transfers (1MB+)
- data source is very fast (my flash FS read speed is > 1mbyte/sec)

If point #2 is not the case then this won't show up because LWIP will be feeding the packets to low_level_output too slowly and the DMA will always be under-running. Those people are wasting RAM having more than 1 TX buffer anywhere, too, so all the stuff all over the internet about "tuning TCP" is wasted ;)

My first suspicion was that this bit of code (which AIUI is supposed to check if there is a DMA TX buffer available) is supposed to be before that memcpy which is above it, but actually if one looks at the program flow, it should be ok because it is repeated elsewhere.


--- Code: ---      /* Check if the buffer is available */
      if((DmaTxDesc->Status & ETH_DMATXDESC_OWN) != (uint32_t)RESET)
      {
        errval = ERR_USE;
        goto error;
      }
--- End code ---

My other thought was: why return an error code if no TX buffer available when a) the TX DMA cannot fail (unless the silicon is duff); b) at 10/100mbps, a buffer will become available very fast; c) the error condition is returned to LWIP and according to google (there is virtually zero support on LWIP anywhere, even on LWIP mailing lists) LWIP does not always retransmit on a TX error.

Anyway, the osDelay(2) is a bad bodge, so I fixed it inside HAL_ETH_TransmitFrame() by waiting for the DMA status to show "all transfers complete", with


--- Code: ---  // Make this function blocking, otherwise following code overwrites the last DMA buffer!
  if ( (((heth->Instance)->DMASR) & (0x7 << 20)) != 0 )
  {
  taskYIELD();   // not really necessary since the time here would be a max of 1 MTU at 10-100mbps
  }
--- End code ---

which is probably suboptimal because it prevents LWIP getting the next packet ready while DMA is transmitting the previous one to ETH. With some test code, the output speed is about 200kbytes/sec which is totally fine for the application.

I also found that replacing the "check if buffer is available" with


--- Code: ---     // Is this buffer available? If not, wait
      while ( (DmaTxDesc->Status & ETH_DMATXDESC_OWN) != 0 ) {}
--- End code ---

works too, and the function never returns an error code.

Does anyone know anything about this stuff?

ttt:
The STM32F4 series have I+D caches. In case you have that enabled have you tried to flush the cache before the memcpy (setting DCRST and ICRST bits in FLASH_CR)?

wek:

--- Quote from: ttt on July 18, 2022, 08:34:27 pm ---The STM32F4 series have I+D caches. In case you have that enabled have you tried to flush the cache before the memcpy (setting DCRST and ICRST bits in FLASH_CR)?

--- End quote ---
Those caches are on the FLASH interface. There's no reason to flush them unless you change the FLASH content (i.e. reprogram FLASH), and it's not the case here.

JW

wek:

--- Quote ---An examination of the data found that the size was preserved but some data appeared in 2 places consecutively
--- End quote ---
Dropped frames should result in data missing, not corrupted.

And IP should be OK with missing data, that's why TCP.

I don't offer answers, just doubts.

JW

peter-h:

--- Quote ---The STM32F4 series have I+D caches. In case you have that enabled have you tried to flush the cache before the memcpy (setting DCRST and ICRST bits in FLASH_CR)?
--- End quote ---

I found that too but all I could find is that the H7 has this issue, not the F4 - as wek says above.


--- Quote ---Dropped frames should result in data missing, not corrupted.
--- End quote ---

I don't think frames are dropped. The packet serial numbering system would pick that up. I think, with my very limited knowledge of TCP/IP, that if you overwrite that buffer with "junk" and the buffer was holding a complete packet, then the ETH controller will generate a "good" CRC on the packet (this I believe is a low level hardware feature; LWIP is not computing a CRC32, is it?) as it is being transmitted. And since there is no end-to-end CRC, the corruption is not detected.

Possibly my bold text above is irrelevant and actually each of the buffers does simply get sent as a packet on ETH, with a 1:1 buffer-packet relationship. I know the 32F4 ETH controller picks up buffers according to order in some sort of list, but I don't think it concatenates them to fill up an MTU.

For amusement: Many years ago I designed a token-ring LAN (before ETH was cost-effective or even less than unbelievably complicated, using a Z180+85c30, SDLC packets, and a Manchester encoded isolated MIL1553 physical interface). I did that after a contracted-out LAN project with the WD2840 token ring controller ground to a halt (£500/day in 1985!) and I did it myself the hard way. I had issues like this too. With just CRC16, errors would slip through if there was enough noise.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod