Author Topic: [Video] Embedded TCP/IP stack explained: step-by-step code walk-through  (Read 2779 times)

0 Members and 1 Guest are viewing this topic.

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
Just made a video and want to share it:


Feedback is welcome.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 
The following users thanked this post: Ed.Kloonk, slothsd, eutectique, wek, 8goran8, IOsetting

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3743
  • Country: gb
  • Doing electronics since the 1960s...
I watched bits of it.

It is an incredibly complex piece of software. Where / how did you learn how to write this? Did you write it all yourself or is it based on snippets from elsewhere?

The video goes really deep but that could be just me never having integrated a TCP/IP stack. In my project another guy implemented LWIP - supplied by STM with Cube IDE; I think it took him a month or two of Monday afternoons and he did a lot of googling to find various bug fixes.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
It is an incredibly complex piece of software. Where / how did you learn how to write this? Did you write it all yourself or is it based on snippets from elsewhere?

I am an original author, but now it is not just me - other engineers from our team work on it. Drivers, TCP/IP stack, TLS stack, protocol handlers, good reference projects and docs - there's plenty of stuff todo.

It is not based on anything, written from scratch and rewritten several times since 2004. The only pieces of 3rd party code that we use are crypto algorithms. We don't reinvent those, just take stuff written by others under a public domain license.

It started very simple, as a web server for the Web UI, and then gradually evolved over the years.

Network programming, and especially embedded network programming, is difficult. Now our team works on a tool that is going to work like grafana. A simple configuration wizard, and a simple UI constructor should give a possibility to create sophisticated Web UI and other network functionality (like, remote control over MQTT)  by simple point / click / configure - and we'll generate the working (mongoose-based) code. Anyone, even non-experienced developers, should be able to create descent implementations with little effort. That's our goal.
« Last Edit: March 26, 2024, 04:34:50 pm by tellurium »
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
Just published another video - BSD sockets explained

Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 
The following users thanked this post: Ed.Kloonk

Offline PDP-1

  • Contributor
  • Posts: 18
  • Country: us
  • Spacewar!
That's really cool, thanks for sharing!

I wrote my own bare metal driver for an STM32F429 last year, it was a much less ambitious project as it's only ever intended to run on that chip with IP4 UDP and very simple sockets only, and no gateway/DHCP implementation as it is never intended to talk to the outside world. Just enough ARP/ICMP is there to ping the thing.

One big difference I saw was that you were doing memcpy in the IRQ, while I used pointer math to pop say a received packet off of the Rx DMA chain and move it to a queue where it would later be picked up by the main thread and processed up the stack. Then I'd grab another Rx buffer descriptor out of a pool of unused ones to replace the one I just took and keep the same number of unused Rx descriptors on the Rx DMA queue. I figured that was a bit less work for the CPU than copying buffers and got out of the interrupt handler quicker.

I still need to watch through the TCP part as that's something I'd like to add to my system some day. It was neat to see how someone else approached the STM32 Eth/MAC implementation. You really can make a small, performant Ethernet implementation on those chips but it's amazing how obfuscated the poorly written datasheet makes doing it for the first time.

 
The following users thanked this post: tellurium

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3743
  • Country: gb
  • Doing electronics since the 1960s...
The video shows how incredibly complex this stuff is. It dives straight into very deep stuff. One would need to spend a year learning about TCP/IP etc to really understand it.

That's the problem these days. Your product is likely to be 10% functionality and 90% connectivity.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 
The following users thanked this post: tellurium

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
One big difference I saw was that you were doing memcpy in the IRQ, while I used pointer math to pop say a received packet off of the Rx DMA chain and move it to a queue where it would later be picked up by the main thread and processed up the stack. Then I'd grab another Rx buffer descriptor out of a pool of unused ones to replace the one I just took and keep the same number of unused Rx descriptors on the Rx DMA queue. I figured that was a bit less work for the CPU than copying buffers and got out of the interrupt handler quicker.

Neat! Using such a pool makes things faster for sure. Also, it eliminates a need of a queue.

And that queue is not straightforward to implement, cause it should be thread safe: an IRQ handler is a producer, and network task is a consumer of frames. Mongoose implemented a lockless queue using compiler built-in primitives.

The only advantage of a queue I can think of, is the situation where a network peer can produce a burst of 4+ small frames. For example, 20 frames, 60 bytes each. The max is 4 chained descriptors, as far as I remember. Your user task may not be fast enough handling those frames, and IRQ handler can get out of RX buffers, and loose frames. A queue, however, serves as a buffer - an IRQ handler can just copy a short frame to the queue, and an RX descriptor is ready again!

Not a huge deal for TCP anyway, cause the TCP layer would retransmit if a segment gets lost.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3743
  • Country: gb
  • Doing electronics since the 1960s...
Looking at the ST forum, some effort went into STM32 - LWIP integration with zero-copy drivers. AFAIK it was done for the 32H7 (or some such) only and only one guy understood how it worked. All that stuff was buggy and nobody supported it.

I spent some time on execution time profiling the memcpy in the low level input/output in my 32F4-LWIP integration and the copy overhead was negligible - just a few us per MTU-sized packet (with a 4-aligned memcpy specialised to use word transfers; note the GCC newlib memcpy always does a byte at a time!) which in the context of some real user code generating or consuming data was completely immaterial.

However I don't think LWIP could work with 4-aligned buffers because while the ETH subsystem buffers can be aligned to anything you want, inside LWIP is some "packet construction" which prevents that.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
The video shows how incredibly complex this stuff is. It dives straight into very deep stuff. One would need to spend a year learning about TCP/IP etc to really understand it.

That's the problem these days. Your product is likely to be 10% functionality and 90% connectivity.

I agree. Network programming is difficult. Even if one has a descent API, it is still not trivial to program. Especially if the functionality includes Web UI (think device dashboard). Now, suddenly, an engineer must be a three-headed dragon: they should know an embedded system, know network programming, and know frontend programming. A difficult-to-find blend! Therefore tasks got split, outsourced, which creates extra source of issues.

We as a company try to alleviate this by creating a visual tool that generates network code: https://mongoose.ws/wizard/ . The goal is that any embedded engineer, having zero network programming skills and zero frontend skills, can create production-level, "connected" firmware code.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
Looking at the ST forum, some effort went into STM32 - LWIP integration with zero-copy drivers. AFAIK it was done for the 32H7 (or some such) only and only one guy understood how it worked. All that stuff was buggy and nobody supported it.

Looking at the ST software dynamics, and talking to a number of ST Application Engineers, it looks like that ST is steering away from a "traditional" SDK which is a blend of various open source components like LWIP, FatFS, FreeRTOS -- towards an "integrated" SDK: zephyr / azure.

Having a single vendor that provides the whole software stack is better from many perspectives.

So, from my own subjective opinion, we're going to see way less lwip, and way more zephyr/azure in the coming years.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3743
  • Country: gb
  • Doing electronics since the 1960s...
OK; sure, ST can change what they bundle at any time. Almost none of it works right out of the box anyway - unless you just want to blink a LED on a dev board.

The main problem is that all this stuff is unsupported. AFAIK ST (and the other vendors) support big OEMs only.

So switching from FreeRTOS to Zephyr will change nothing, and switching from LWIP to Azure will change nothing. FreeRTOS actually works well, and I am sure Azure will be just like LWIP, with forums filling up with the same questions :) A google on Azure just does one's head in :)
« Last Edit: May 10, 2024, 02:46:06 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline globoy

  • Regular Contributor
  • *
  • Posts: 202
  • Country: us
From the FWIW dept, I've been using the Berkeley socket compatibility layer above NetXduo in an application on a STM32F4 and it has been working very well.  Telnet + multiple Modbus TCP master and slave ports running simultaneously in different threads.  We have yet to do real stress testing (waiting on a resource) but I suspect the logic will be ok and I may have to increase allocation of memory to the NetXduo pool and perhaps adjust the priority of the threads.  Definitely seemed easier that using LWIP.
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
Definitely seemed easier that using LWIP.

Note that LWIP provides three different APIs, one of which is BSD sockets API.
Perhaps you were comparing Netx's sockets API with LWIP's raw API -  and I'd agree, LWIP's raw API is complex.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline globoy

  • Regular Contributor
  • *
  • Posts: 202
  • Country: us
No, I know about the three layers.  Unfortunately I can't remember the exact pain but I started the project using LWIP in a standard project and quickly ran into several pain points.  I was also planning to use FreeRTOS which I have a lot of experience with.  But decided to give ThreadX a try (the client thinks that someday they may want to use Azure) and ended up really liking it.  Everything just worked and I had running code very quickly.
« Last Edit: May 10, 2024, 07:05:34 pm by globoy »
 
The following users thanked this post: tellurium

Offline globoy

  • Regular Contributor
  • *
  • Posts: 202
  • Country: us
And don't mean to hijack your thread.  I keep trying to find a good project to try Mongoose with  :)
 

Offline telluriumTopic starter

  • Frequent Contributor
  • **
  • Posts: 267
  • Country: ua
No, I know about the three layers.  Unfortunately I can't remember the exact pain but I started the project using LWIP in a standard project and quickly ran into several pain points.  I was also planning to use FreeRTOS which I have a lot of experience with.  But decided to give ThreadX a try (the client thinks that someday they may want to use Azure) and ended up really liking it.  Everything just worked and I had running code very quickly.

Maybe the client's thinking is similar to ST's : if the whole software stack is provided by one reputable vendor (Microsoft), then it is less risky in the long run, than having a blend of various open source component that noone is responsible for.

I may happen that embedded software market will gradually become less fragmented and we'll see just a few major stacks like azure , zephyr. Just like we have Linux / Windows / Mac in the  workstation market.

A project for Mongoose - easy! Next time you need to make a Web UI for your device, is a good opportunity. Because whatever option you choose - NetxDuo, LWIP, whatever else - you'll need to do frontend. HTML / JS, that is. And most likely, that is something you're not very fond of. Writing the UI, then developing the REST API, glueing them together, and then attaching REST API to your firmware functionality can be a long and winding road. And there, our wizard is for the rescue.
Open source embedded network library https://github.com/cesanta/mongoose
TCP/IP stack + TLS1.3 + HTTP/WebSocket/MQTT in a single file
 

Offline peter-h

  • Super Contributor
  • ***
  • Posts: 3743
  • Country: gb
  • Doing electronics since the 1960s...
Quote
if the whole software stack is provided by one reputable vendor (Microsoft)

Hands up all those who think M$ will support anything :) They do not support anything whatever unless you are a paying contract customer.

Everybody else, like all the smaller devs, will be posting all over the internet looking for solutions, just like they do currently.

The best assurance of something working is a large number of design-ins, and LWIP has millions of those, despite having no support.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf