[32F417] LWIP/MbedTLS - any idea if SPI SRAM could be used for buffers?

Electronics > Microcontrollers

(1/4) > >>

peter-h:
My target has 128k SRAM which has about 60k spare, and 64k CCM which is allocated whole to FreeRTOS stacks etc (its private heap, memory model #4).

I am running a simplified HTTP server (for local config etc), which uses fairly minimal RAM (a few k), and an HTTPS/TLS client which uses about 50k (for its private heap).

So if both of the above are running concurrently, there is only ~10k RAM left, but it does work, but when TLS is doing its handshake/negotiation (which on a 168MHz 32F417 takes 2-3 seconds) the HTTP server temporarily hangs.

Investigating this, it appears that LWIP is running out of buffers during TLS and is rejecting incoming packets.

I don't really want to change the CPU to the next one up which has another 64k RAM, because a) I have stock of the 417 and this took about a year to get, b) the design is rock solid and I don't want to tempt fate (there is a lot of subtle hardware usage e.g. DAC ADC DMA timers) even though in theory it should be just alternate function pin changes, c) some versions of the product may not need TLS at all.

I have an option of an 8MB SPI-attached RAM
https://www.eevblog.com/forum/microcontrollers/lyontek-ly68l6400-8-megabyte-spi-sram-am-i-reading-the-data-sheet-right/
which does work and is not bad at $3 (there are cheaper 128kbyte versions too), but obviously cannot be addressed as normal RAM. The ESP32 can do that but the 32F4 can't without a huge amount of work to do what is basically invalid address trapping and virtual memory emulation
https://www.eevblog.com/forum/microcontrollers/st-32f417-any-way-to-make-an-spi-sram-to-look-like-normal-ram/

Does anyone know enough about the internals of LWIP, and MbedTLS, to know whether the memory usage structure lends itself to this kind of "overlay" memory? One can read or write say 1k bytes in 400us, in my target (21MHz SPI with DMA). Obviously this would be horribly inefficient for a byte at a time emulation but perhaps one can switch buffers in and out...

There is at least one MbedTLS "support forum" but almost nobody answers any questions there. If somebody knows of a concrete route, I am happy to pay for the time.

Scrts:
Not an expert here, but instead of RAM, maybe offload TLS to some secure element? E.g. ATECC608B?

peter-h:
That chip does only the crypto, which isn't too much of an issue. It is also unobtainable - typical Microchip situation now, 1.5 years lead time.

dare:

--- Quote ---Does anyone know enough about the internals of LWIP, and MbedTLS, to know whether the memory usage structure lends itself to this kind of "overlay" memory? One can read or write say 1k bytes in 400us, in my target (21MHz SPI with DMA). Obviously this would be horribly inefficient for a byte at a time emulation but perhaps one can switch buffers in and out...

--- End quote ---

LwIP has a number of different memory allocation schemes, and can be straightforwardly modified to support others. However all memory use by LwIP involves fine-grained access to allocated memory blocks. This is undoubtedly true for MbedTLS as well.

Modern TCP and TLS were not designed to run in highly memory constrained environments. It can be done, but it requires very careful tuning of stack behavior, allocation strategies and threading disciplines. In theory you could use your SPI ram to create an external queue of incoming packets between the network interface layer and the LwIP input function. However this would entail a fair amount of work, and really should only be done after optimizing the rest of the stack (and even then there's no guarantee it will solve the problem).

Here are some ideas for optimizing things:

First off, ensure that you've turned off all LwIP and MbedTLS features that you don't need. Some of these come with initialization-time memory overheads that, in simple contexts, represent completely wasted space.

Second, ensure you're using the pooled allocator for all important LwIP structures (packet buffers, PCBs, etc.). Using a general heap-based allocator (e.g. malloc) invites fragmentation that can drastically reduce memory utilization efficiency.

Third, ensure your TCP window size is set appropriately. If you're truly discarding incoming packets due to lack of packet buffers, the implication is that the other side is sending data too fast. This is could be because the advertised receive window size is too large.

Forth, based on a quick review of some open source projects using MbedTLS with LwIP, it appears that a copy is taken when data flows out of the TCP connection and into the MbedTLS session. This is an unfortunate design, but may be unavoidable based on the way MbedTLS works. If this is the case in your context, then you need to ensure that the memory pool in MbedTLS is sized appropriately to handle the largest TLS message that will be processed. This will be dominated by the cypher suite chosen and the type of certificates used by the peer. The challenge, of course, is coming up with the memory. It may be, however, that you have too much memory devoted to the LwIP pool and not enough devoted to MbedTLS.

If you have control over the types of certificated used by the peer, chose elliptic curve certs, which are smaller than RSA certs, and thus reduce the overall memory usage while receiving and processing TLS handshake messages. Additionally, if you're minting your own certificates (i.e. running your own trust domain), use certs with short distinguished names and minimal certificate extensions to further reduce certificate size.

Finally, consider the effects of crypto compute overhead on your threading model. On small devices, the fundamental crypto operations can take a considerable amount of compute time; seconds in many cases. In LwIP, incoming data over a TCP connection is delivered up-stack via a callback that runs on the LwIP TCPIP thread. If this upcall ends up performing crypto operations (note: I'm not familiar enough with MbedTLS to confirm that this is true), then the LwIP TCPIP thread will be blocked from further work until the crypto op completes. This means it will be unable to do things like deliver ACKs to the peer, or process additional incoming packets. As a result, the peer may retransmit packets that have already been received, but just not ACKed yet. This multiplies the pressure on the buffer pool, which can lead to buffer exhaustion.

peter-h:
Thank you for a very informative reply.

It seems that your last para is the closest, although the callback mechanism seems to be slightly different in that it uses an internal "MBOX" messaging system.

The funny thing is that I have been playing with the lwipopts.h and tried all kinds of stuff like doubling every buffer size, and nothing makes the slightest difference. This is (to me) suspicious and maybe it is something quite narrow, which is your suggestion.

One option that's available is due to the HTTP server being required only for simple local config tasks, so one will be able to simply disable TLS while that is running.

Navigation

[0] Message Index

[#] Next page

There was an error while thanking

Thanking...

Go to full version