Electronics > Microcontrollers

[32F417] LWIP/MbedTLS - any idea if SPI SRAM could be used for buffers?

<< < (4/4)

tellurium:

--- Quote from: dare on July 05, 2022, 09:21:04 pm ---Add to this the time taken to generate the client's signature (if using client certs), and the key derivation/confirmation operations, and the time does not seem surprising to me.  In the past, I implemented a system that achieved certificate-based, mutually authenticated session establish in 1.8 seconds on an 80 Mhz Cortext M4.  But this was using a faster crypto library (MbedTLS is not great) and a more efficient protocol than TLS.

--- End quote ---

As you've mentioned, the time taken depends on the chain length - also, on the cert type (RSA/EC) and cert size.
We've seen cases when long cert chains with large RSA certs took 20+ seconds on 160 MHz MCU.

peter-h:
The TLS action involves

- pinging https://healthchecks.io/ (which can be done with HTTP or HTTPS, but is currently done with HTTPS for test purposes)
- sending a tiny file to a commercial server (legit, but I am not posting which one for commercial reasons)

On the first one, this is a log showing times in ms. I believe we are not verifying their certificate i.e. we are doing encryption but not authentication. This is because the product does not currently have the capability to walk through multiple certificates, as is needed for any "open" commercial server


--- Code: ---82797: healthcheck interval= 60 secs
82806: healthcheck url= http://hc-ping.com/8cd78cf4-eb2b-4844-b572-d5639a3128b6
82819:
         . Seeding the random number generator...
82829:  ok
82838:   . Loading the CA root certificate ...
82851:  ok
82860:   . Connecting to tcp/hc-ping.com/443...
82870: udp_bind(ipaddr =
82880: 0.0.0.0
82890: , port = 23000)
82900: udp_bind: bound to
82910: 0.0.0.0
82920: , port 23000)
82930: udp_send
82982: udp_input: calculating checksum
83042:  connect ok
83052:   . Setting up the SSL/TLS structure...
83062:  ok
83073:   . Performing the SSL/TLS handshake...
85389: ../Middlewares/Third_Party/mbedTLS/library/ssl_tls.c:5757: x509_verify_cert() returned -9984 (-0x2700)
88063:  TLS handshake ok
88072:   . Verifying peer X.509 certificate...
88082:  peer cert failure ignored
88092:   > Write to server:
88103:  72 initial bytes written:
88112:   < Read from server:
88523: Headers length = 188, Content-Length = 2
88532:  190 bytes read:
88542: HTTP/1.1 200 OK
server: nginx
date: Wed, 06 Jul 2022 05:58:46 GMT
content-type: text/plain; chars...

--- End code ---

The above is 5.745 seconds. If I strip out all other RTOS tasks, it is the same, within 100ms.

The log from the other server interaction (some info obfuscated) is here and takes 4.95s:


--- Code: ---289594: xxxxxx access token response= HTTP/1.1 200 OK
Cache-Control: no-cache, no-store, must-revalidate
Expires: 0
Pragma: no-cac
289603: xxxxxx_token_expiry_seconds= 14400
289613: xxxxxx_token= sl.BK1cExxxxxxx
289626:
          . Seeding the random number generator...
289636:  ok
289645:   . Loading the CA root certificate ...
289658:  ok
289667:   . Connecting to tcp/content.xxxxxxapi.com/443...
289677: udp_bind(ipaddr =
289687: 0.0.0.0
289697: , port = 1942)
289707: udp_bind: bound to
289717: 0.0.0.0
289727: , port 1942)
289737: udp_send
289762: udp_input: calculating checksum
289802:  connect ok
289812:   . Setting up the SSL/TLS structure...
290926: ../Middlewares/Third_Party/mbedTLS/library/ssl_tls.c:5757: x509_verify_cert() returned -9984 (-0x2700)
293053:  TLS handshake ok
293062:   . Verifying peer X.509 certificate...
293072:  peer cert failure ignored
293082:   > Write to server:
293093:  474 initial bytes written:
293102:   < Read from server:
294503: Headers length = 0, Content-Length = 405
294512:  256 bytes read:
294522: ...
294534: xxxxxx response= HTTP/1.1 200 OK
Cache-Control: no-cache
X-Content-Type-Options: nosniff
X-Server-Response-Time: 9
294544: xxxxxx upload succeeded

--- End code ---

So 2-3 secs goes into just setting up the (AIUI) public and private keys, and there is some certificate validation which fails (total time 10ms for that) because we don't have the root certificate collection.

The main bit takes ~2.5 secs, and there are odd bits of (considerable) remote system latency of ~500ms which nothing can be done about. Probably a lot of that is done deliberately to make DOS attacks harder.

What is embarrassing is that with all the other RTOS tasks removed, the HTTP thread is not affected by the TLS stuff :) Watch this space!

EDIT: what breaks it is an NTP server/client thread. That uses LWIP sockets. I am wondering whether there is a #define somewhere for the max # of sockets which can be grabbed, and we are on the limit of that?

This is a bit of that task


--- Code: ---
int NtpOpenSocket()
{
  //Open an UDP socket, bind it and return the socket id
  //This function binds to all network interfaces ( 0.0.0.0 )
  int fd = -1;

  struct sockaddr_in sock_addr;
  fd = socket(AF_INET, SOCK_DGRAM, 0);
  if (fd < 0)
    {
    dbg_printf("NTP can't get a socket!\n");
    osDelay(pdMS_TO_TICKS(1000)); //Wait 1 second before retrying
    return fd;  //failed, return
    }

  memset(&sock_addr, 0, sizeof(sock_addr));
  sock_addr.sin_family = AF_INET;
  sock_addr.sin_port = htons(NTP_PORT);

  if (bind(fd, (struct sockaddr*) &sock_addr, sizeof(sock_addr)) !=0 )
    {
    //bind failed!
    dbg_printf("NTP socket Bind failed!\n");
    close(fd);
    fd = -1;
    }

  if (fd >=0 )
    {
    dbg_printf("NTP socket init OK\n");
    }

  return fd;
}


--- End code ---

and digging around I find this value defined in lwipopts.h (the other definition applies only if not already defined)



But it appears to be 10 for TCP and 6 for UDP


--- Code: ---/* ---------- Memory options ---------- */
/* MEM_ALIGNMENT: should be set to the alignment of the CPU for which
   lwIP is compiled. 4 byte alignment -> define MEM_ALIGNMENT to 4, 2
   byte alignment -> define MEM_ALIGNMENT to 2. */
#define MEM_ALIGNMENT           4

/* MEM_SIZE: the size of the heap memory. If the application will send
a lot of data that needs to be copied, this should be set high. */
#define MEM_SIZE                (10*1024)

/* MEMP_NUM_PBUF: the number of memp struct pbufs. If the application
   sends a lot of data out of ROM (or other static memory), this
   should be set high. */
#define MEMP_NUM_PBUF           10
/* MEMP_NUM_UDP_PCB: the number of UDP protocol control blocks. One
   per active UDP "connection". */
#define MEMP_NUM_UDP_PCB        6
/* MEMP_NUM_TCP_PCB: the number of simulatenously active TCP
   connections. */
#define MEMP_NUM_TCP_PCB        10
/* MEMP_NUM_TCP_PCB_LISTEN: the number of listening TCP
   connections. */
#define MEMP_NUM_TCP_PCB_LISTEN 5
/* MEMP_NUM_TCP_SEG: the number of simultaneously queued TCP
   segments. */
#define MEMP_NUM_TCP_SEG        8
/* MEMP_NUM_SYS_TIMEOUT: the number of simulateously active
   timeouts. */
#define MEMP_NUM_SYS_TIMEOUT    10


/* ---------- Pbuf options ---------- */
/* PBUF_POOL_SIZE: the number of buffers in the pbuf pool. */
#define PBUF_POOL_SIZE          8

/* PBUF_POOL_BUFSIZE: the size of each pbuf in the pbuf pool. */
#define PBUF_POOL_BUFSIZE       512


--- End code ---

I've played with all these values before. Setting MEMP_NUM_UDP_PCB=10 or 20 makes no difference.

This is the line in the above source that breaks it

  fd = socket(AF_INET, SOCK_DGRAM, 0);

One possibility suggested is that while the above NTP server uses the socket API, the HTTP server uses the netconn API which opens a socket but at a lower level and via a different API. However this
https://lwip.fandom.com/wiki/Application_API_layers
suggests both are very similar in structure. Also I am told by the guy working on this on Mondays that LWIP error counters starts incrementing during the time that the HTTP server is failing, suggesting that LWIP is unable to transmit some packets. The Q however remains why the single call above to allocate the one UDP socket is sufficient alone to cause this problem.

This is interesting too
https://www.nongnu.org/lwip/2_0_x/raw_api.html

      Netconn or Socket API functions are thread safe against the
      core thread but they are not reentrant at the control block
      granularity level. That is, a UDP or TCP control block must
      not be shared among multiple threads without proper locking.

That NTP thread is opening a UDP socket, and the mere opening of that socket, without it being used at all ever, is enough to stop the HTTP netconn server working while the TLS handshake is being done.

In the NTP thread, closing that UDP socket and then hanging afterwards (while yielding to RTOS), fixes the problem. Removing the close() returns the problem.


peter-h:
Problem summary, and a hack solution:

The issue is: the NTP thread opens a single UDP socket. If this socket is open (even if never used) the netconn HTTP server breaks during TLS handshake.
The solution implemented is HTTP client activity detection with a 10 sec timeout, during which the NTP thread is suspended (and its UDP socket is closed, which is essential).
This hack may or may not be needed for the new HTTP server, which uses the socket API instead of netconn. But netconn gets sockets via the same API as the socket API...

The hack works. TLS still affects HTTP but all that happens is a very rare extension of a ~50ms server response time to ~300ms, which is fine.

What a rabbit-hole!

peter-h:
I could start a new thread on this but this is a simple Q: what ways are there with LWIP to open a UDP socket?

The code in question uses socket() - called the Netconn API - which then ends up in the same function as quite a few others including TLS, but my board also has DHCP and DNS which are also supposed to use UDP but (according to breakpoints) they don't call this function. They use very deep code and I was not able to track it down easily.

Navigation

[0] Message Index

[*] Previous page

There was an error while thanking
Thanking...
Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod