I cannot thank you too much,
dare, for these amazing and illuminating posts.
BTW I did more tests, with a) LWIP_TCPIP_CORE_LOCKING ON and b) LWIP_TCPIP_CORE_LOCKING OFF (and with LWIP_ALLOW_MEM_FREE_FROM_OTHER_CONTEXT and SYS_LIGHTWEIGHT_PROT ON as apparently required if calling the API from multiple tasks) and the only difference I can reproduce is that the former delivers better file transfer performance for a given fairly small number of PBUFs. But this is transfer
from LWIP so presumably this increase is related to being able to buffer more ACKs. So I have gone back to LWIP_TCPIP_CORE_LOCKING ON. As you said it should not make much difference because LWIP internally serialises the processing of the messages anyway (it runs only
one RTOS task).
Given your configuration, I think the simple answer is that MEM_SIZE governs the number of buffers available for outgoing data (i.e. data your app send over the network) and PBUF_POOL_SIZE governs the buffers available for incoming data (i.e. packets you receive from the network). Setting MEM_SIZE too small will induce the kind of buffer starvation/write blocking behavior described above. Setting PBUF_POOL_SIZE too small will result in inbound packets being dropped in the low-level input function.
This is so poorly documented in the various online sources. I did find, empirically, that MEM_SIZE directly limits the biggest packet which can be sent in one go. What happens if you try to send a bigger one I didn't investigate because I've been using
netconn_write(conn, PAGE_HEADER_ALL, strlen((char*)PAGE_HEADER_ALL), NETCONN_COPY);
and not checking the return code. In the context there was little one could do with it. A MEM_SIZE=6k allowed about 4k to be sent out. So MEM_SIZE is really about
transmit.
Various sources suggest the PBUFs are used for
rx only, while others suggest they are used for both rx and tx. I could not see the point of using this "list of buffers" system for tx because tx can be done by taking the caller's packet and repackaging it (presumably using some buffer temporarily allocated in MEM_SIZE) into one or more packets which are sent down to the PHY layer. In transmit mode, LWIP has total control. OTOH TCP packets are ACKed so presumably the PBUFs are used for these too and there would lie another bottleneck; that may be a reason for having a larger number of smaller PBUFs (the original ST port of LWIP used 512 byte PBUFs).
I see that you have set TCP_SND_BUF to 4 * TCP_MSS
It was already set to 2xTCP_MSS following your earlier suggestion.
Perhaps this is what is happening with your TLS task. TLS sends a huge chunk of data (the server certificates) to the peer during the handshake
I asked the guy who did the TLS integration on this product. It was empirically determined that TLS sends out about 1k during the setup. For the size of subsequent data packets, one can negotiate the size, and we have set this to 4k. So yeah one comes back to your 4k. This relates to TLS needing two 16k buffers by definition (16k rx and 16k tx) but if you control one or both ends then these can be reduced. With an HTTPS
client the rx buffer must still be 16k but the tx buffer can be smaller because you control what is sent out, and thus save 12k of RAM.
For background: Currently TLS is working with a statically allocated block of 48k in which it runs its private heap; this seems sufficient for everything that's been tested, but unfortunately this is not (and cannot be) deterministic. It depends on what crypto suites get negotiated. AES was using quite a lot of RAM (the TLS implementation was actually pretty fast, at 800kbytes/sec) and moving it to the 32F417's hardware AES saved 12k. But other savings are much smaller and the 417 doesn't do RSA/EC hence all this hassle with the 3 sec hangs. We have not tested DES/3DES which could be moved to 417 hardware, but I've not been able to find out if anybody out there might still be using it, and there is a proposal for new TLS to drop DES. I think this is dangerous because e.g. it was found that some root certificates (some of these are old e.g. 2005) are signed with a "supposedly deprecated" hash and if you tried to be clever and removed support for obsolete hashes because you read on the internet that they should not be used, you would have a problem! So the guy ported our TLS code to a win32 executable which can be run on a connection which isn't working and will tell you directly why not. The problem with doing crypto in hardware is that it prevents one using one of the chinese copies of the 417 which are just a 407 (no crypto).
I've tried increasing MEM_SIZE to silly values and it doesn't fix the hanging issue.
I've tried the priority adjustment and it makes no difference. Currently all "internet" tasks run at idle (0).
The main task which was getting hung up by TLS is now running OK due to pool size = 6 (was 4)
/* ---------- Pbuf dynamically allocated buffer blocks ----------
*
* These settings make little difference to performance, which is dominated by
* the low_level_input poll period. These PBUFs relate directly to the netconn API netbufs.
*
* PBUF_POOL_SIZE: the number of buffers in the pbuf pool.
* Statically allocated, so each increment frees up 1.5k of RAM. Any value under 4 slows
* down the data rate a lot.
* 12/2/23 PH 4 -> 6 fixes TLS blocking of various RTOS tasks (5 almost does).
* The most this can go to with TLS still having enough free RAM for its 48k block is ~9.
*/
#define PBUF_POOL_SIZE 6
/* PBUF_POOL_BUFSIZE: the size of each pbuf in the pbuf pool.
* The +2 is to get a multiple of 4 bytes so the PBUFs are 4-aligned (for memcpy_fast())
* Probably the +2 is not needed because the PBUFs are 4-aligned anyway in the LWIP code.
*/
#define PBUF_POOL_BUFSIZE 1500 + PBUF_LINK_ENCAPSULATION_HLEN + PBUF_LINK_HLEN + 2
The remaining task that still
is getting hung up by TLS is an http server which uses the netconn API and nothing I do seems to fix that, not even a priority of 40 (same as LWIP) or a huge increase in the buffers, but it doesn't matter because it is intended for local config only. I would never let a user run a server on an open port anyway
I am very sure that its RTOS task isn't actually hanging; it is hanging on the LWIP API calls. That in turn suggests insufficient buffers somewhere.
FWIW my lwipopts.h file is below, in case anybody is digging around here in years to come
/**
******************************************************************************
* @file LwIP/LwIP_HTTP_Server_Netconn_RTOS/Inc/lwipopts.h
* @author MCD Application Team
* @brief lwIP Options Configuration.
******************************************************************************
*
* This sort of explains the memory usage
* [url]https://lwip-users.nongnu.narkive.com/dkzkPa8l/lwip-memory-settings[/url]
* [url]https://www.cnblogs.com/shangdawei/p/3494148.html[/url]
* [url]https://lwip.fandom.com/wiki/Tuning_TCP[/url]
* [url]https://groups.google.com/g/osdeve_mirror_tcpip_lwip/c/lFYJ7Fw0Cxg[/url]
* ST UM1713 document gives an overview of integrating all this.
*
*
*
*
* 7/7/22 PH MEM_SIZE set to 5k (was 10k). Only ~1.5k is used.
* 13/7/22 PH MEMP_MEM_MALLOC=1, MEM_SIZE=16k.
* 14/7/22 PH MEMP_MEM_MALLOC=0; 1 was unreliable. 8 x 512 byte buffers now.
* 6/8/22 PH 4 x MTU buffers for RX. Done partly for EditGetData().
* MEM_SIZE=6k. 5k is used - see .map file for ram_heap address.
* Sizes of various static RAM structures determined experimentally.
* 17/12/22 PH LWIP_TCP_KEEPALIVE=1 (for ethser)
* 27/1/23 PH IP_SOF_BROADCAST etc added, later commented-out.
* 10/2/23 PH PBUF_POOL_BUFSIZE 4-aligned.
* 12/2/23 PH PBUF_POOL_SIZE = 6.
*
*
*
*
*
*
*
*
*/
#ifndef __LWIPOPTS_H__
#define __LWIPOPTS_H__
/**
* NO_SYS==1: Provides VERY minimal functionality. Otherwise,
* use lwIP facilities.
*/
#define NO_SYS 0
// Flag to make LWIP API thread-safe. The netconn and socket APIs are claimed
// to be thread-safe anyway. The raw API is never thread-safe.
// A huge amount of online discussion on this topic; most of it unclear, but
// ON (1) seems to be recommended, as being more efficient.
#define LWIP_TCPIP_CORE_LOCKING 1
// If LWIP_TCPIP_CORE_LOCKING=0 then these two need to be 1
// See [url]https://www.nongnu.org/lwip/2_1_x/multithreading.html[/url]
//#define LWIP_ALLOW_MEM_FREE_FROM_OTHER_CONTEXT 1
//#define SYS_LIGHTWEIGHT_PROT 1
// This places more objects into the static block defined by MEM_SIZE.
// Uses mem_malloc/mem_free instead of the lwip pool allocator.
// MEM_SIZE now needs to be increased by about 10k.
// It doesn't magically produce extra memory, and causes crashes.
// There is also a performance loss, apparently. AVOID.
#define MEMP_MEM_MALLOC 0
//NC: Need for sending PING messages by keepalive
#define LWIP_RAW 1
#define DEFAULT_RAW_RECVMBOX_SIZE 4
// For ETHSER
#define LWIP_TCP_KEEPALIVE 1
/*-----------------------------------------------------------------------------*/
/* LwIP Stack Parameters (modified compared to initialization value in opt.h) -*/
/* Parameters set in STM32CubeMX LwIP Configuration GUI -*/
/*----- Value in opt.h for LWIP_DNS: 0 -----*/
#define LWIP_DNS 1
/* ---------- Memory options ---------- */
/* MEM_ALIGNMENT: should be set to the alignment of the CPU for which
lwIP is compiled. 4 byte alignment -> define MEM_ALIGNMENT to 4, 2
byte alignment -> define MEM_ALIGNMENT to 2. */
#define MEM_ALIGNMENT 4
/*
* MEM_SIZE: the size of the heap memory. This is a statically allocated block. You can find it
* in the .map file as the symbol ram_heap and you can see how much of this RAM gets used.
* If MEMP_MEM_MALLOC=0, this holds just the PBUF_ stuff.
* If MEMP_MEM_MALLOC=1 (which is not reliable) this greatly expands and needs 16k+.
* Empirically this needs to be big enough for at least 4 x PBUF_POOL_BUFSIZE.
* This value also limits the biggest block size sent out by netconn_write. With a MEM_SIZE
* of 6k, the biggest block netconn_write (and probably socket write) will send out is 4k.
* This setting is mostly related to outgoing data.
*/
#define MEM_SIZE (6*1024)
// MEMP_ structures. Their sizes have been determined experimentally, by
// increasing them and seeing free RAM changing.
/* MEMP_NUM_PBUF: the number of memp struct pbufs. If the application
sends a lot of data out of ROM (or other static memory), this
should be set high. */
//NC: Increased to 20 for ethser
#define MEMP_NUM_PBUF 20 // each 1 is 20 bytes of static RAM
/* MEMP_NUM_UDP_PCB: the number of UDP protocol control blocks. One
per active UDP "connection". */
#define MEMP_NUM_UDP_PCB 6 // each 1 is 32 bytes of static RAM
/* MEMP_NUM_TCP_PCB: the number of simultaneously active TCP
connections. */
//NC: Increased to 20 for ethser
#define MEMP_NUM_TCP_PCB 20 // each 1 is 145 bytes of static RAM
//NC: Have more sockets available. Is set to 4 in opt.h
#define MEMP_NUM_NETCONN 10
/* MEMP_NUM_TCP_PCB_LISTEN: the number of listening TCP
connections. */
//NC: Increased to 20 for ethser
#define MEMP_NUM_TCP_PCB_LISTEN 20 // each 1 is 28 bytes of static RAM
/* MEMP_NUM_TCP_SEG: the number of simultaneously queued TCP
segments. */
// Was 8; increased to 16 as it improves ETHSER reliability when running
// HTTP server
#define MEMP_NUM_TCP_SEG 16 // each 1 is 20 bytes of static RAM
/* MEMP_NUM_SYS_TIMEOUT: the number of simulateously active
timeouts. */
#define MEMP_NUM_SYS_TIMEOUT 10 // each 1 is 16 bytes of static RAM
/* ---------- Pbuf dynamically allocated buffer blocks ----------
*
* These settings make little difference to performance, which is dominated by
* the low_level_input poll period. These PBUFs relate directly to the netconn API netbufs.
*
* PBUF_POOL_SIZE: the number of buffers in the pbuf pool.
* Statically allocated, so each increment frees up 1.5k of RAM. Any value under 4 slows
* down the data rate a lot.
* 12/2/23 PH 4 -> 6 fixes TLS blocking of various RTOS tasks (5 almost does).
* The most this can go to with TLS still having enough free RAM for its 48k block is ~9.
*/
#define PBUF_POOL_SIZE 6
/* PBUF_POOL_BUFSIZE: the size of each pbuf in the pbuf pool.
* The +2 is to get a multiple of 4 bytes so the PBUFs are 4-aligned (for memcpy_fast())
* Probably the +2 is not needed because the PBUFs are 4-aligned anyway in the LWIP code.
*/
#define PBUF_POOL_BUFSIZE 1500 + PBUF_LINK_ENCAPSULATION_HLEN + PBUF_LINK_HLEN + 2
/* ---------- TCP options ---------- */
#define LWIP_TCP 1
#define TCP_TTL 255
/* Controls if TCP should queue segments that arrive out of
order. Define to 0 if your device is low on memory. */
#define TCP_QUEUE_OOSEQ 0
/* TCP Maximum segment size. */
#define TCP_MSS (1500 - 40) /* TCP_MSS = (Ethernet MTU - IP header size - TCP header size) */
/* TCP sender buffer space (bytes). */
// Reduced from 4*MSS to leave more room for TX packets in the LWIP heap (MEM_SIZE).
#define TCP_SND_BUF (2*TCP_MSS) // no effect on static RAM
/* TCP_SND_QUEUELEN: TCP sender buffer space (pbufs). This must be at least
as much as (2 * TCP_SND_BUF/TCP_MSS) for things to work. */
// Was 2*; increased to 4* as it improves ETHSER reliability when running
// HTTP server
#define TCP_SND_QUEUELEN (4* TCP_SND_BUF/TCP_MSS) // (2* TCP_SND_BUF/TCP_MSS)
/* TCP advertised receive window. */
// Should be less than PBUF_POOL_SIZE * (PBUF_POOL_BUFSIZE - protocol headers)
#define TCP_WND (2*TCP_MSS) // no effect on static RAM
/* ---------- ICMP options ---------- */
#define LWIP_ICMP 1
/* ---------- DHCP options ---------- */
#define LWIP_DHCP 1
/* ---------- UDP options ---------- */
#define LWIP_UDP 1
#define UDP_TTL 255
// These are build flags which disable the support for the SOF_BROADCAST option on raw and UDP PCBs
// Commented-out because changing these requires a recompilation, and an application which receives
// broadcast packets may one day be necessary (set g_eth_multi=true to disable the packet filter in
// ethernetif.c)
//#define IP_SOF_BROADCAST 1
//#define IP_SOF_BROADCAST_RECV 1
/* ---------- Statistics options ---------- */
#define LWIP_STATS 0
/* ---------- link callback options ---------- */
/* LWIP_NETIF_LINK_CALLBACK==1: Support a callback function from an interface
* whenever the link changes (i.e., link down)
* 8/2022 this is done from the low_level_input RTOS task.
*/
#define LWIP_NETIF_LINK_CALLBACK 0
/*
--------------------------------------
---------- Checksum options ----------
--------------------------------------
*/
/*
The STM32F4xx allows computing and verifying the IP, UDP, TCP and ICMP checksums by hardware:
- To use this feature let the following define uncommented.
- To disable it and process by CPU comment the the checksum.
*/
#define CHECKSUM_BY_HARDWARE
#ifdef CHECKSUM_BY_HARDWARE
/* CHECKSUM_GEN_IP==0: Generate checksums by hardware for outgoing IP packets.*/
#define CHECKSUM_GEN_IP 0
/* CHECKSUM_GEN_UDP==0: Generate checksums by hardware for outgoing UDP packets.*/
#define CHECKSUM_GEN_UDP 0
/* CHECKSUM_GEN_TCP==0: Generate checksums by hardware for outgoing TCP packets.*/
#define CHECKSUM_GEN_TCP 0
/* CHECKSUM_CHECK_IP==0: Check checksums by hardware for incoming IP packets.*/
#define CHECKSUM_CHECK_IP 0
/* CHECKSUM_CHECK_UDP==0: Check checksums by hardware for incoming UDP packets.*/
#define CHECKSUM_CHECK_UDP 0
/* CHECKSUM_CHECK_TCP==0: Check checksums by hardware for incoming TCP packets.*/
#define CHECKSUM_CHECK_TCP 0
/* CHECKSUM_CHECK_ICMP==0: Check checksums by hardware for incoming ICMP packets.*/
#define CHECKSUM_GEN_ICMP 0
#else
/* CHECKSUM_GEN_IP==1: Generate checksums in software for outgoing IP packets.*/
#define CHECKSUM_GEN_IP 1
/* CHECKSUM_GEN_UDP==1: Generate checksums in software for outgoing UDP packets.*/
#define CHECKSUM_GEN_UDP 1
/* CHECKSUM_GEN_TCP==1: Generate checksums in software for outgoing TCP packets.*/
#define CHECKSUM_GEN_TCP 1
/* CHECKSUM_CHECK_IP==1: Check checksums in software for incoming IP packets.*/
#define CHECKSUM_CHECK_IP 1
/* CHECKSUM_CHECK_UDP==1: Check checksums in software for incoming UDP packets.*/
#define CHECKSUM_CHECK_UDP 1
/* CHECKSUM_CHECK_TCP==1: Check checksums in software for incoming TCP packets.*/
#define CHECKSUM_CHECK_TCP 1
/* CHECKSUM_CHECK_ICMP==1: Check checksums by hardware for incoming ICMP packets.*/
#define CHECKSUM_GEN_ICMP 1
#endif
/*
----------------------------------------------
---------- Sequential layer options ----------
----------------------------------------------
*/
/**
* LWIP_NETCONN==1: Enable Netconn API (require to use api_lib.c)
*/
#define LWIP_NETCONN 1
/*
------------------------------------
---------- Socket options ----------
------------------------------------
*/
/**
* LWIP_SOCKET==1: Enable Socket API (require to use sockets.c)
*/
#define LWIP_SOCKET 1
/*
------------------------------------
---------- httpd options ----------
------------------------------------
*/
/** Set this to 1 to include "fsdata_custom.c" instead of "fsdata.c" for the
* file system (to prevent changing the file included in CVS) */
#define HTTPD_USE_CUSTOM_FSDATA 0
/*
---------------------------------
---------- OS options ----------
---------------------------------
*/
#define TCPIP_THREAD_NAME "TCP/IP"
#define TCPIP_THREAD_STACKSIZE 4096
#define TCPIP_MBOX_SIZE 6
#define DEFAULT_UDP_RECVMBOX_SIZE 6
#define DEFAULT_TCP_RECVMBOX_SIZE 6
#define DEFAULT_ACCEPTMBOX_SIZE 6
#define DEFAULT_THREAD_STACKSIZE 512
#define TCPIP_THREAD_PRIO osPriorityHigh // should be >= that of any TCP/IP apps
#define LWIP_DEBUG 1
/*
#define IP_DEBUG LWIP_DBG_ON
#define DHCP_DEBUG LWIP_DBG_OFF
#define UDP_DEBUG LWIP_DBG_ON
#define SOCKET_DEBUG_LWIP_DBG_ON
//#define ICMP_DEBUG LWIP_DBG_ON|LWIP_DBG_TRACE
//#define NETIF_DEBUG LWIP_DBG_OFF
#define LWIP_DBG_TYPES_ON (LWIP_DBG_TRACE|LWIP_DBG_STATE)
*/
#define LWIP_SO_RCVTIMEO 1
#define LWIP_NETIF_HOSTNAME 1
#define SO_REUSE 1
// Defining these produces various errors
//#define LWIP_IPV6 1
//#define LWIP_IPV6_DHCP6 1
/*
// TODO
#ifdef LWIP_DEBUG
#define MEMP_OVERFLOW_CHECK ( 1 )
#define MEMP_SANITY_CHECK ( 1 )
#define MEM_DEBUG LWIP_DBG_OFF
#define MEMP_DEBUG LWIP_DBG_OFF
#define PBUF_DEBUG LWIP_DBG_ON
#define API_LIB_DEBUG LWIP_DBG_ON
#define API_MSG_DEBUG LWIP_DBG_ON
#define TCPIP_DEBUG LWIP_DBG_ON
#define NETIF_DEBUG LWIP_DBG_ON
#define SOCKETS_DEBUG LWIP_DBG_ON
#define DEMO_DEBUG LWIP_DBG_ON
#define IP_DEBUG LWIP_DBG_ON
#define IP6_DEBUG LWIP_DBG_ON
#define IP_REASS_DEBUG LWIP_DBG_ON
#define RAW_DEBUG LWIP_DBG_ON
#define ICMP_DEBUG LWIP_DBG_ON
#define UDP_DEBUG LWIP_DBG_ON
#define TCP_DEBUG LWIP_DBG_ON
#define TCP_INPUT_DEBUG LWIP_DBG_ON
#define TCP_OUTPUT_DEBUG LWIP_DBG_ON
#define TCP_RTO_DEBUG LWIP_DBG_ON
#define TCP_CWND_DEBUG LWIP_DBG_ON
#define TCP_WND_DEBUG LWIP_DBG_ON
#define TCP_FR_DEBUG LWIP_DBG_ON
#define TCP_QLEN_DEBUG LWIP_DBG_ON
#define TCP_RST_DEBUG LWIP_DBG_ON
#define PPP_DEBUG LWIP_DBG_OFF
#define LWIP_DBG_TYPES_ON (LWIP_DBG_ON|LWIP_DBG_TRACE|LWIP_DBG_STATE|LWIP_DBG_FRESH|LWIP_DBG_HALT)
#endif
*/
#endif /* __LWIPOPTS_H__ */