I've got it up and running and the amount of things I've learned could account for a couple of tutorials.
Some tips & thoughts:
1. First of all
do not use STM32F429I Discovery board for Ethernet prototyping like I did. Plan was to design only the physical layer electronics as a shield - just to make things faster. That failed cause MII pinout conflicts the accelerometer, screen and USB peripherals on the board. I had to rip all of it out. Well, I thought that I ripped all of it out only to find that I forgot one capacitor - that was enough to stop one of MII TX channels from woking. 2 days of debugging.
2. Speaking of debugging - probe all MII signals. Do not assume that any single one of them works. When designing your board prepare convenient, labeled test points for all the lines. You'll need a 4-channel scope to see that there actually is data going both ways on 4xTX and 4xRX. If you cannot see outgoing packets on a sniffer
(like Wireshark) there has to be an electrical problem.
3. Do not probe the RJ45 side unless You have fully isolated scope (each channel to each channel)! Secondly, you will not find anything there unless you have a scope with proper Ethernet decoder. When debugging TX channels I wanted to see if there is activity on TX wires after something gets send to the PHY chip. That DOES NOT make sense. As long as there is an active link You will see constant activity on the line. Sure you Can decode it by hand, but the signal is extremely hard to capture (trigger) since the line is always doing something.
3. Start debugging with static IP configuration and 1 to 1 connection (direct cable to PC). DHCP on LwIP did not work out of the box. On the other hand DHCP is convenient as a "hello world" packet source. Your device starts up and you shuld see the DHCP packet on Wireshark, comming from 0.0.0.0 IP addres.
4. Use Discovery as a programmer and a platform to test RTOS. Then spin your own board with the same MCU.
5. Use 50 ohm termination on MII lines cause otherwise they ring like crazy. The PHY also injects a lot of noise in to the power supply, so decouple it properly. Read this:
http://www.murata.com/~/media/webrenewal/support/library/catalog/products/emc/emifil/c39e.ashxPi filters were not enough, in the final version feed-through capacitors will be needed. That is at least if you plan to use the system for analog measurements & interfacing.
6.
TCP Builder is a cool bit of software for sending arbitrary TCP and UDP packets.
7.
Get yourself some help. This task is so complex and stuff may not work on so many levels that You absolutely need someone to talk it though. I was stuck twice and I made it only because a friend of mine, former Intel employee now running his own company, spared 2 x 20 minutes to consult the problems I was experiencing. Even with help of someone with 10+ years of experience it took me 3 weeks of 8h+ work to get it up and running.
On the other hand - I started with 0 knowledge about networking and RTOS. It is doable but hard.
8. RTOS is the best thing I've ever used on an embedded system! After using it I think that there is no point to program an ARM cortex without it (at least in 90% of cases). Use FreeRTOS + Trace - it will enable You to see what the Kernell is actually doing and get load stats, thread switching... priceless data.
9. Do not bother with socked API. Netconn works and it documented on an acceptable level. Socket API has crap documentation (or in other words almost none).
Answers to my questions:
A. Do not attempt to start with DHCP (see pt. 3). Prepare yourself a connection manager thread that initializes Ethernet peripherals, then inits LwIP and optionally does DHCP or sets static IP config. Then it should signal communication threads that they may bind TCP/UDP connections and start using the LwIP stack & newly established link. This thread should then be able to restart slave connection threads if the link goes down and optionally redo DHCP.
B. There are two LwIP threads - the stack thread and the frame input thread - effectively ethernetif_input() func. running in a loop. The
void MX_LWIP_Process(void) should be marked as APPLICABLE ONLY to NO RTOS scenario. Calling it separately in another thread is not only not needed but harmfull as you'll end up with two instances running! Sadly the doccumentation is bad, and there are many such small problems in the code that you'll need to RTFC cause there is no M in the equation
I also found that Ethernet HAL functions use HAL delay. When generating code with CubeMX systick interrupt is used only as a clock source for the RTOS. There was no HAL Systick call.
It caused HAL delay functions to wait forever! Since those delays are only used for Ethernet PHY initialization they are acceptable and appropriate. Either try to use RTOS delays (I do not know if that is possible in HAL drivers but should be) or include HAL systick function call in the Systick interrupt routine.
If anyone in the future struggles with LwIP and RTOS feel free to ask questions here