Author Topic: LwIP freezes when unplugging the cable during a 'write' operation.  (Read 2204 times)

0 Members and 1 Guest are viewing this topic.

Offline tilblackoutTopic starter

  • Contributor
  • Posts: 31
  • Country: cn
Hello everyone.
My board connects to the internet through Ethernet RJ45, and I need to use LwIP to send approximately 50Hz data packets to a TCP server. However, if the program is in the middle of the write function and I suddenly unplug the network cable, the write function gets stuck in the lwip assertion, as shown in the diagram below:


The higher the frequency of the 'write' operation, the easier it is to reproduce the issue.

Similarly, if LwIP is in the process of connecting (lwip_netconn_do_connect), closing (lwip_netconn_do_close), or deleting (lwip_netconn_do_delconn), external exceptional operations during these functions can also cause the program to get stuck in the second state check of these functions.

Since I am using FreeRTOS, and multiple tasks are using LwIP-related functions, it seems that LWIP_TCPIP_CORE_LOCKING should be enabled. However, in the above situations, LwIP causes the program to enter the assertion directly, without any alternative handling. If I skip this assertion, LwIP will eventually crash due to insufficient memory.

Do any of you have good solutions or workarounds? I would greatly appreciate your responses.
 

Offline GromBeestje

  • Frequent Contributor
  • **
  • Posts: 280
  • Country: nl
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #1 on: December 16, 2023, 08:50:59 am »
What do you mean by skipping the assertion? Just commenting it out? I suppose that will cause trouble, as the function continues in the bas state. Better replace it with a return.
 

Offline tilblackoutTopic starter

  • Contributor
  • Posts: 31
  • Country: cn
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #2 on: December 16, 2023, 09:13:07 am »
What do you mean by skipping the assertion? Just commenting it out? I suppose that will cause trouble, as the function continues in the bas state. Better replace it with a return.

After I commented out this line of code, the code that followed was originally a return.But this seems unsafe, as there may be memory without freeing, which could ultimately lead to errors.
 

Offline ttt

  • Regular Contributor
  • *
  • Posts: 87
  • Country: us
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #3 on: December 16, 2023, 10:06:15 am »
My board connects to the internet through Ethernet RJ45, and I need to use LwIP to send approximately 50Hz data packets to a TCP server.

Just to make sure, where are you calling lwip_sendmsg/sendmsg/lwip_send/send from? From what I remember you are not supposed to call these functions from a low level interrupt, like a timer.
 

Offline tilblackoutTopic starter

  • Contributor
  • Posts: 31
  • Country: cn
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #4 on: December 18, 2023, 01:06:38 am »
My board connects to the internet through Ethernet RJ45, and I need to use LwIP to send approximately 50Hz data packets to a TCP server.

Just to make sure, where are you calling lwip_sendmsg/sendmsg/lwip_send/send from? From what I remember you are not supposed to call these functions from a low level interrupt, like a timer.
I called the "write" function in a high priority task of FreeRTOS, and I set the priority of each task related to LwIP differently to prevent conflicts between them.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3701
  • Country: gb
  • Doing electronics since the 1960s...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 
The following users thanked this post: tilblackout

Offline tilblackoutTopic starter

  • Contributor
  • Posts: 31
  • Country: cn
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #6 on: December 19, 2023, 08:25:07 am »
What have you done about core locking (lwipopts.h)?

See
https://www.eevblog.com/forum/microcontrollers/any-stm-32f4-eth-lwip-freertos-mbedtls-experts-here-(not-free-advice)/msg4298761/#msg4298761

I've set LWIP_TCPIP_CORE_LOCKING to 1 and MEMP_MEM_MALLOC to 0.

I read your article. Do you mean that the ethernet_input function cannot use the mutex lock related to the macro definition of LWIP_TCPIP_CORE_LOCKING?
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3701
  • Country: gb
  • Doing electronics since the 1960s...
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #7 on: December 19, 2023, 09:25:53 am »
I worked on this 2 years ago and haven't been back, but...

  * Mutex protection in low_level_input etc not needed according to here
  * https://community.st.com/s/question/0D73W000001P2M3SAK
  *
  * __dmb not needed according to https://community.st.com/s/question/0D73W000001PMYnSAO

This is my eth input - it uses polling since that avoids all kinds of issues, and in my system (most embedded systems actually) it is plenty fast enough

Code: [Select]

/**
  * This function is the ethernetif_input task. It uses the function low_level_input()
  * that handles the actual reception of bytes from the network interface.
  *
  * This is a standalone RTOS task so is a forever loop.
  * Could be done with interrupts but then we have the risk of hanging the unit with fast input
  * (unlikely if ETH is via a switch, but you could have a faulty LAN with lots of
  * broadcasts) plus we have the issue of link status change detection in a thread-safe way.
  *
  */

void ethernetif_input( void * argument )
{

struct pbuf *p;
struct netif *netif = (struct netif *) argument;
uint32_t link_change_check_count = ETH_LINK_CHANGE_COUNT;

// Define RX activity timer, for dropping fast poll down to slow poll
TimerHandle_t *rxactive_timer = xTimerCreate("ETH RX active timer", pdMS_TO_TICKS(ETH_SLOW_POLL_DELAY), pdFALSE, NULL, RXactiveTimerCallback);

// Start "rx active" timer
xTimerStart(rxactive_timer, 20); // 20 is just a wait time for timer allocation

do
    {

p = low_level_input( netif ); // This sets rxactive=true if it sees data

if (p!=NULL)
{
if (netif->input( p, netif) != ERR_OK )
{
pbuf_free(p);
}
}

if (rxactive)
{
rxactive=false;
// Seen rx data - reload timeout
xTimerReset(rxactive_timer, 20); // Reload "rx active" timeout (with ETH_SLOW_POLL_DELAY)
// and get osDelay below to run fast
rx_poll_period=ETH_RX_FAST_POLL_INTERVAL;
}

// This has a dramatic effect on ETH speed, both ways (TCP/IP acks all packets)
osDelay(rx_poll_period);

// Do ETH link status change check

link_change_check_count--;
if (link_change_check_count==0)
{
// reload counter
link_change_check_count = ETH_LINK_CHANGE_COUNT;

// Get most recently recorded link status
bool net_up = netif_is_link_up(&g_netconf_netif);

// Read the physical link status
ethernetif_set_link(&g_netconf_netif);

// Has the link status changed
if (net_up != netif_is_link_up(&g_netconf_netif))
{
ethernetif_update_config(&g_netconf_netif);

if (net_up) {
   // Link was up so must have dropped
   debug_thread_printf("Ethernet link down");
}
else {
   // Link was down so must be up - restart DHCP
   debug_thread("Ethernet link up");
   network_restart_DHCP();
}
}
}

    } while(true);

}


The link status detect is important

Code: [Select]

/**
 * @brief  This function is called on change of link status
 *         to update low level driver configuration.
 * @param  netif: The network interface
 *
 * Called from ethernetif_input if there was a link change.
 *
 */
static void ethernetif_update_config(struct netif *netif)
{
__IO uint32_t tickstart = 0;
uint32_t regvalue = 0;

if(netif_is_link_up(netif))
{
/* Restart the auto-negotiation */
if(EthHandle.Init.AutoNegotiation != ETH_AUTONEGOTIATION_DISABLE)
{
/* Enable Auto-Negotiation */
HAL_ETH_WritePHYRegister(&EthHandle, PHY_BCR, PHY_AUTONEGOTIATION);

/* Get tick */
tickstart = HAL_GetTick();

/* Wait until the auto-negotiation will be completed */
do
{
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_BSR, &regvalue);

/* Check for the Timeout ( 1s ) */
if((HAL_GetTick() - tickstart ) > 1000)
{
/* In case of timeout */
goto error;
}

} while (((regvalue & PHY_AUTONEGO_COMPLETE) != PHY_AUTONEGO_COMPLETE));

/* Read the result of the auto-negotiation */
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_SR, &regvalue);

/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
if((regvalue & PHY_DUPLEX_STATUS) != (uint32_t)RESET)
{
/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
EthHandle.Init.DuplexMode = ETH_MODE_FULLDUPLEX;
}
else
{
/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
EthHandle.Init.DuplexMode = ETH_MODE_HALFDUPLEX;
}
/* Configure the MAC with the speed fixed by the auto-negotiation process */
if(regvalue & PHY_SPEED_STATUS)
{
/* Set Ethernet speed to 10M following the auto-negotiation */
EthHandle.Init.Speed = ETH_SPEED_10M;
}
else
{
/* Set Ethernet speed to 100M following the auto-negotiation */
EthHandle.Init.Speed = ETH_SPEED_100M;
}
}
else /* AutoNegotiation Disable */
{
error :

/* Set MAC Speed and Duplex Mode to PHY */
HAL_ETH_WritePHYRegister(&EthHandle, PHY_BCR, ((uint16_t)(EthHandle.Init.DuplexMode >> 3) |
(uint16_t)(EthHandle.Init.Speed >> 1)));
}

/* ETHERNET MAC Re-Configuration */
HAL_ETH_ConfigMAC(&EthHandle, (ETH_MACInitTypeDef *) NULL);

/* Restart MAC interface */
HAL_ETH_Start(&EthHandle);
}
else
{
/* Stop MAC interface */
HAL_ETH_Stop(&EthHandle);
}

ethernetif_notify_conn_changed(netif);
}




// Called from ETHIF thread. Detects link change and resets the link state.

static void ethernetif_set_link(struct netif *netif)
{
uint32_t regvalue = 0;

// Read PHY_BSR
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_BSR, &regvalue);

regvalue &= PHY_LINKED_STATUS;

// Check whether the netif link down and the PHY link is up
if (!netif_is_link_up(netif) && (regvalue))
{
/* network cable is connected */
netif_set_link_up(netif);
}
else if (netif_is_link_up(netif) && (!regvalue))
{
/* network cable is disconnected */
netif_set_link_down(netif);
}

}



You can see the link change is detected by the PHY chip (LAN8742 etc) and this needs to be used to stop other things hanging.

But this stuff is so complex... and LWIP was done c. 15 years ago and everybody who worked on it has long since moved on, so there are no support forums for it, and the API descriptions are mostly useless unless you already know the answer. It works solidly though.

On the ST forum there is a super clever guy calling himself Piranha, but he feeds on people less clever than himself (IQ below 200) which is OK - I don't mind being insulted if I get a solution - but he rarely posts a complete solution to anything.

Which CPU are you using? I believe there are working sources for the H7.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline tilblackoutTopic starter

  • Contributor
  • Posts: 31
  • Country: cn
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #8 on: December 20, 2023, 01:33:52 am »
I worked on this 2 years ago and haven't been back, but...

  * Mutex protection in low_level_input etc not needed according to here
  * https://community.st.com/s/question/0D73W000001P2M3SAK
  *
  * __dmb not needed according to https://community.st.com/s/question/0D73W000001PMYnSAO

This is my eth input - it uses polling since that avoids all kinds of issues, and in my system (most embedded systems actually) it is plenty fast enough

Code: [Select]

/**
  * This function is the ethernetif_input task. It uses the function low_level_input()
  * that handles the actual reception of bytes from the network interface.
  *
  * This is a standalone RTOS task so is a forever loop.
  * Could be done with interrupts but then we have the risk of hanging the unit with fast input
  * (unlikely if ETH is via a switch, but you could have a faulty LAN with lots of
  * broadcasts) plus we have the issue of link status change detection in a thread-safe way.
  *
  */

void ethernetif_input( void * argument )
{

struct pbuf *p;
struct netif *netif = (struct netif *) argument;
uint32_t link_change_check_count = ETH_LINK_CHANGE_COUNT;

// Define RX activity timer, for dropping fast poll down to slow poll
TimerHandle_t *rxactive_timer = xTimerCreate("ETH RX active timer", pdMS_TO_TICKS(ETH_SLOW_POLL_DELAY), pdFALSE, NULL, RXactiveTimerCallback);

// Start "rx active" timer
xTimerStart(rxactive_timer, 20); // 20 is just a wait time for timer allocation

do
    {

p = low_level_input( netif ); // This sets rxactive=true if it sees data

if (p!=NULL)
{
if (netif->input( p, netif) != ERR_OK )
{
pbuf_free(p);
}
}

if (rxactive)
{
rxactive=false;
// Seen rx data - reload timeout
xTimerReset(rxactive_timer, 20); // Reload "rx active" timeout (with ETH_SLOW_POLL_DELAY)
// and get osDelay below to run fast
rx_poll_period=ETH_RX_FAST_POLL_INTERVAL;
}

// This has a dramatic effect on ETH speed, both ways (TCP/IP acks all packets)
osDelay(rx_poll_period);

// Do ETH link status change check

link_change_check_count--;
if (link_change_check_count==0)
{
// reload counter
link_change_check_count = ETH_LINK_CHANGE_COUNT;

// Get most recently recorded link status
bool net_up = netif_is_link_up(&g_netconf_netif);

// Read the physical link status
ethernetif_set_link(&g_netconf_netif);

// Has the link status changed
if (net_up != netif_is_link_up(&g_netconf_netif))
{
ethernetif_update_config(&g_netconf_netif);

if (net_up) {
   // Link was up so must have dropped
   debug_thread_printf("Ethernet link down");
}
else {
   // Link was down so must be up - restart DHCP
   debug_thread("Ethernet link up");
   network_restart_DHCP();
}
}
}

    } while(true);

}


The link status detect is important

Code: [Select]

/**
 * @brief  This function is called on change of link status
 *         to update low level driver configuration.
 * @param  netif: The network interface
 *
 * Called from ethernetif_input if there was a link change.
 *
 */
static void ethernetif_update_config(struct netif *netif)
{
__IO uint32_t tickstart = 0;
uint32_t regvalue = 0;

if(netif_is_link_up(netif))
{
/* Restart the auto-negotiation */
if(EthHandle.Init.AutoNegotiation != ETH_AUTONEGOTIATION_DISABLE)
{
/* Enable Auto-Negotiation */
HAL_ETH_WritePHYRegister(&EthHandle, PHY_BCR, PHY_AUTONEGOTIATION);

/* Get tick */
tickstart = HAL_GetTick();

/* Wait until the auto-negotiation will be completed */
do
{
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_BSR, &regvalue);

/* Check for the Timeout ( 1s ) */
if((HAL_GetTick() - tickstart ) > 1000)
{
/* In case of timeout */
goto error;
}

} while (((regvalue & PHY_AUTONEGO_COMPLETE) != PHY_AUTONEGO_COMPLETE));

/* Read the result of the auto-negotiation */
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_SR, &regvalue);

/* Configure the MAC with the Duplex Mode fixed by the auto-negotiation process */
if((regvalue & PHY_DUPLEX_STATUS) != (uint32_t)RESET)
{
/* Set Ethernet duplex mode to Full-duplex following the auto-negotiation */
EthHandle.Init.DuplexMode = ETH_MODE_FULLDUPLEX;
}
else
{
/* Set Ethernet duplex mode to Half-duplex following the auto-negotiation */
EthHandle.Init.DuplexMode = ETH_MODE_HALFDUPLEX;
}
/* Configure the MAC with the speed fixed by the auto-negotiation process */
if(regvalue & PHY_SPEED_STATUS)
{
/* Set Ethernet speed to 10M following the auto-negotiation */
EthHandle.Init.Speed = ETH_SPEED_10M;
}
else
{
/* Set Ethernet speed to 100M following the auto-negotiation */
EthHandle.Init.Speed = ETH_SPEED_100M;
}
}
else /* AutoNegotiation Disable */
{
error :

/* Set MAC Speed and Duplex Mode to PHY */
HAL_ETH_WritePHYRegister(&EthHandle, PHY_BCR, ((uint16_t)(EthHandle.Init.DuplexMode >> 3) |
(uint16_t)(EthHandle.Init.Speed >> 1)));
}

/* ETHERNET MAC Re-Configuration */
HAL_ETH_ConfigMAC(&EthHandle, (ETH_MACInitTypeDef *) NULL);

/* Restart MAC interface */
HAL_ETH_Start(&EthHandle);
}
else
{
/* Stop MAC interface */
HAL_ETH_Stop(&EthHandle);
}

ethernetif_notify_conn_changed(netif);
}




// Called from ETHIF thread. Detects link change and resets the link state.

static void ethernetif_set_link(struct netif *netif)
{
uint32_t regvalue = 0;

// Read PHY_BSR
HAL_ETH_ReadPHYRegister(&EthHandle, PHY_BSR, &regvalue);

regvalue &= PHY_LINKED_STATUS;

// Check whether the netif link down and the PHY link is up
if (!netif_is_link_up(netif) && (regvalue))
{
/* network cable is connected */
netif_set_link_up(netif);
}
else if (netif_is_link_up(netif) && (!regvalue))
{
/* network cable is disconnected */
netif_set_link_down(netif);
}

}



You can see the link change is detected by the PHY chip (LAN8742 etc) and this needs to be used to stop other things hanging.

But this stuff is so complex... and LWIP was done c. 15 years ago and everybody who worked on it has long since moved on, so there are no support forums for it, and the API descriptions are mostly useless unless you already know the answer. It works solidly though.

On the ST forum there is a super clever guy calling himself Piranha, but he feeds on people less clever than himself (IQ below 200) which is OK - I don't mind being insulted if I get a solution - but he rarely posts a complete solution to anything.

Which CPU are you using? I believe there are working sources for the H7.

Thank you for your response and for providing the solutions (though it seems that the links to the two ST forums cannot be opened).

I am using LwIP in two NXP microcontrollers (Kinetis MK64 and I.MX RT1170), and it seems that NXP's SDK team has addressed some bugs in LwIP. For MK64, this microcontroller went into mass production in 2015, and the SDK has been discontinued for years. The program freezing issue I encountered occurs on this chip. Its ethernetif_input is as follows:
Code: [Select]
/**
 * This function should be called when a packet is ready to be read
 * from the interface. It uses the function ethernetif_linkinput() that
 * should handle the actual reception of bytes from the network
 * interface. Then the type of the received packet is determined and
 * the appropriate input function is called.
 *
 * @param netif the lwip network interface structure for this ethernetif
 */
void ethernetif_input(struct netif *netif)
{
    struct pbuf *p;

    LWIP_ASSERT("netif != NULL", (netif != NULL));

    /* move received packet into a new pbuf */
    while ((p = ethernetif_linkinput(netif)) != NULL)
    {
        /* pass all packets to ethernet_input, which decides what packets it supports */
        if (netif->input(p, netif) != ERR_OK)
        {
            LWIP_DEBUGF(NETIF_DEBUG, ("ethernetif_input: IP input error\n"));
            pbuf_free(p);
            p = NULL;
        }
    }
}
Meanwhile, the I.MX RT1170 is currently NXP's flagship chip, so the SDK updates are timely, and it seems that they actively address bugs in LwIP. The implementation of ethernetif_input is also quite different from MK64 (similarly, STM32 may also have some maintenance for LwIP bugs).
Code: [Select]
/**
 * This function should be called when a packet is ready to be read
 * from the interface. It uses the function ethernetif_linkinput() that
 * should handle the actual reception of bytes from the network
 * interface. Then the type of the received packet is determined and
 * the appropriate input function is called.
 *
 * @param netif_ the lwip network interface structure for this ethernetif
 */
void ethernetif_input(struct netif *netif_)
{
#if ETH_DO_RX_IN_SEPARATE_TASK
    (void)netif_;

    if (__get_IPSR())
    {
        portBASE_TYPE taskToWake = pdFALSE;
        xTaskNotifyFromISR(ethernetif_rx_task, netif_to_bitmask(netif_), eSetBits, &taskToWake);
        if (taskToWake == pdTRUE)
        {
            portYIELD_FROM_ISR(taskToWake);
        }
    }
    else
    {
        (void)xTaskNotifyGive(ethernetif_rx_task);
    }
#else
    fetch_all_received_pkts(netif_);
#endif /* ETH_DO_RX_IN_SEPARATE_TASK */
}

static void fetch_all_received_pkts(struct netif *netif_)
{
    struct pbuf *p;
    /* move received packet into a new pbuf */
    while ((p = ethernetif_linkinput(netif_)) != NULL)
    {
        /* pass all packets to ethernet_input, which decides what packets it supports */
        if (netif_->input(p, netif_) != (err_t)ERR_OK)
        {
            LWIP_DEBUGF(NETIF_DEBUG, ("fetch_all_received_pkts: IP input error\n"));
            ethernetif_pbuf_free_safe(p);
            p = NULL;
        }
    }
}
The macro definition ETH_DO_RX_IN_SEPARATE_TASK is enabled by default, processing Ethernet packets in another task(In fact, it also calls the fetch_all_received_pkts function). The function fetch_all_received_pkts here is similar to ethernetif_input in MK64, but the function for releasing pbuf appears to be thread-safe based on its name. If you're interested, you can download this SDK from NXP's SDK Dashboard.

This reminds me that in the future, when encountering problems, I may consider referring to the SDKs of different semiconductor manufacturers.

My next project will use this chip. I'll see if the SDK for this chip have already addressed this issue.
« Last Edit: December 20, 2023, 05:33:52 am by tilblackout »
 

Offline ttt

  • Regular Contributor
  • *
  • Posts: 87
  • Country: us
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #9 on: December 21, 2023, 07:05:45 am »

This reminds me that in the future, when encountering problems, I may consider referring to the SDKs of different semiconductor manufacturers.

My next project will use this chip. I'll see if the SDK for this chip have already addressed this issue.

Like my previous post this will not fix anything but going forward may I suggest switching to ThreadX/NetX as it will licensed under a MIT license (https://threadx.io/)? At least it seems NXP has some support for it:

https://www.nxp.com/design/design-center/software/embedded-software/azure-rtos-for-nxp-microcontrollers:AZURE-RTOS

Overall I find it much easier to debug and way more stable. It's more resource hungry compared to lwIP but that should not be an issue for your target chips. lwIP is still the only good option if you have less than 128K of RAM though.
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3701
  • Country: gb
  • Doing electronics since the 1960s...
Re: LwIP freezes when unplugging the cable during a 'write' operation.
« Reply #10 on: December 21, 2023, 07:46:23 am »
LWIP should not need anywhere remotely near 128k. More like 20-30k. But it depends on the way it is used; some packet output functions will simply hang if some buffers (lwipopts.h) are too small.

Unfortunately, once I got this stuff working I rarely revisited it, because it is so damn complex and opaque.

OTOH it is obvious that some parts of LWIP work in quite a simple way e.g. outputting data just wraps it up and sends it out directly to the ETH subsystem (low_level_output etc). There, it disappears down the wire at 100mbps really fast, so having lots of TX buffers is a bit pointless. But they need to be big enough.

There are also zero-copy implementations but I am not sure if any of them actually work properly. I would expect max tx packet to have to be < (1 MTU + header) whether using the netconn or the sockets API. I also did lots of timing tests on the data copying and found the penalty to be a few microseconds per packet, only (memcpy is very fast, even 1 byte at a time).
« Last Edit: December 21, 2023, 08:56:05 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf