Author Topic: ARP causes trouble with Nucleo F7xx LwIP  (Read 4884 times)

0 Members and 1 Guest are viewing this topic.

Offline KremmenTopic starter

  • Super Contributor
  • ***
  • Posts: 1289
  • Country: fi
ARP causes trouble with Nucleo F7xx LwIP
« on: May 09, 2017, 03:41:24 pm »
I have tried to replicate what little there are of simple examples for LwIP stack using F7xx Nucleos. The example i am testing is a trivial web server servicing simple http GET's. It kinda works but very sporadically and the question i have is if there are known issues with this combo.
- Ethernet and ADC defined in CubeMX, as well as FreeRTOS and LwIP,
- toolchain is SW4STM32 i.e. Eclipse, OpenOCD, St-Link,
- I have mangled the project into C++, not that it should have any effect on this particular issue but who knows.

The specific problem i have is that ARP works very sporadically. I run Wireshark in the debugging host so i can see the traffic on Ethernet wire and it is obvious that ARP fails almost consistently. The LwIP stack should issue a sporadic ARP broadcast upon start but this is almost never seen. Also ARP queries from the http client (same host that runs debugger) usually go unanswered.

Occasionally there is a response to an ARP query and then the http GET immediately succeeds so it is obvious the problem is in ARP not working properly.
I have traced the code and the call to issue a sporadic ARP bradcast is called consistently, but mostly does not result in an ARP packet showing on the wire. So what gives? Is the Nucleo Ethernet hardware flaky or is there some other (hopefully known) issue?

P.S. My code never worked at all on a Nucleo F767, but works significally better on a Nucleo F746. Not well, but sometimes at least.
« Last Edit: May 09, 2017, 03:44:42 pm by Kremmen »
Nothing sings like a kilovolt.
Dr W. Bishop
 

Offline Scrts

  • Frequent Contributor
  • **
  • Posts: 797
  • Country: lt
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #1 on: May 09, 2017, 06:46:00 pm »
How do you debug? You use ethernet switch? I've seen issues when I could not see packets when using ethernet switch, but the function of sending it was executed.
When you put a break point on ARP send function - does it go there?
 

Offline KremmenTopic starter

  • Super Contributor
  • ***
  • Posts: 1289
  • Country: fi
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #2 on: May 09, 2017, 08:10:08 pm »
Debugger is ST-Link v2 + OpenOCD. The code executes as planned as far as i have been able to see via debugger. Specifically, the calls to send spurious ARP broadcasts from the LwIP stack are executed every time as planned but usually do not result in corresponding packets on the wire.

Switces do "hide" (or selectively forward) packets addressed to physically separated nodes on a different Ethernet leg, but not in this case. ARP packets are by nature link layer broadcasts precisely because the sender cannot yet know the receiver's link layer address (MAC address). That is the reason for the existence of the ARP protocol - to find out the MAC address corresponding to an IP address.
And even if this was not the case, i would still see at least the specific ARP query. Because i am monitoring the Ethernet interface of the same host that is making the ARP query in the first place. So the response must reach that interface to be of any use at all. I know that interface is OK because there are lots of ARP queries flying around to/from this interface and they behave exactly as expected. But not so the Nucleo.
Nothing sings like a kilovolt.
Dr W. Bishop
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #3 on: May 09, 2017, 10:01:40 pm »
From uIP I recall it is mapping structs onto packets which could break due to compiler optimisations and alignment issues. Does LwIP do the same?
Maybe put a breakpoint on the place where the data is fed into the ethernet MAC and look at the actual packet data.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline Scrts

  • Frequent Contributor
  • **
  • Posts: 797
  • Country: lt
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #4 on: May 10, 2017, 12:42:31 pm »
Are your timers running correctly? Do you update them as required in LwIP? Or you use NO_TIMERS?
 

Offline capt bullshot

  • Super Contributor
  • ***
  • Posts: 3033
  • Country: de
    • Mostly useless stuff, but nice to have: wunderkis.de
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #5 on: May 10, 2017, 01:13:15 pm »
Check the chip revision of the F767 chip. Rev. A chips have a hardware bug that is fixed with Rev. Z. This does not apply to F746 chips, they don't have this particular bug.
http://www.st.com/resource/en/errata_sheet/dm00257543.pdf
Section 2.7.6.
As I was told by STM, all STM32F767 nucleo boards have Rev. A chips until now.
So, if you're affected by this bug, the STM32F767 nucleo board is expected to work sporadically while the STM32F746 should works always, assumed your code bug free.
And yes, the PHY on the nucleo boards uses RMII, so the boards are affected by this bug.
Try to debug and run your code on a F746 nucleo board.

Edit: Performance and reliabilty of LWIP depends on the amount of memory that you allow it to use. I've never tried the CubeMX provided FreeRTOS / LWIP code, but to my experience with this tool, I wouldn't expect the code to work out of the box. I've got my own implementation of LWIP / ChibiOS on a nucleo F746: http://wunderkis.de/cube-lwip-bsp/index.html (not ported to the F767 yet, and not expected to be done soon). I wouldn't guarantee my code to work under all circumstances either.
« Last Edit: May 10, 2017, 01:21:24 pm by capt bullshot »
Safety devices hinder evolution
 

Offline KremmenTopic starter

  • Super Contributor
  • ***
  • Posts: 1289
  • Country: fi
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #6 on: May 10, 2017, 06:30:15 pm »
Are your timers running correctly? Do you update them as required in LwIP? Or you use NO_TIMERS?
The LwIP is configured to use FreeRTOS so there are separate tasks to handle stack internal things and no playing with any timers is needed in the using application.
Nothing sings like a kilovolt.
Dr W. Bishop
 

Offline KremmenTopic starter

  • Super Contributor
  • ***
  • Posts: 1289
  • Country: fi
Re: ARP causes trouble with Nucleo F7xx LwIP
« Reply #7 on: May 10, 2017, 06:46:08 pm »
Check the chip revision of the F767 chip. Rev. A chips have a hardware bug that is fixed with Rev. Z. This does not apply to F746 chips, they don't have this particular bug.
http://www.st.com/resource/en/errata_sheet/dm00257543.pdf
Section 2.7.6.
As I was told by STM, all STM32F767 nucleo boards have Rev. A chips until now.
Yes i noted this and sure enough the chip revision on my F767 is A. So no cigar.
Quote
So, if you're affected by this bug, the STM32F767 nucleo board is expected to work sporadically while the STM32F746 should works always, assumed your code bug free.
And yes, the PHY on the nucleo boards uses RMII, so the boards are affected by this bug.
Try to debug and run your code on a F746 nucleo board.
You will note i wrote in my first post that the code never works in F767 and works sporadically in F746. So yes i hae done that with partial results.
Quote
Edit: Performance and reliabilty of LWIP depends on the amount of memory that you allow it to use. I've never tried the CubeMX provided FreeRTOS / LWIP code, but to my experience with this tool, I wouldn't expect the code to work out of the box. I've got my own implementation of LWIP / ChibiOS on a nucleo F746: http://wunderkis.de/cube-lwip-bsp/index.html (not ported to the F767 yet, and not expected to be done soon). I wouldn't guarantee my code to work under all circumstances either.
Well, now we are approaching the core of the poodle (sorry, local idiom...). Of course the CubeMX code _should_ work out of the box and if it doesn't we are where i find myself now. Thus my question if anyone has encountered this and whether there are known fix(es) to whatever might be wrong.
I did encounter an optiomization issue that ntnico mentioned (thanks) and fixed that. But the problem remains. There cannot be a consistently systematic error because ARP sometimes works but mostly not. And when ARP works just once, the stack keeps working OK as long as the MAC cache stays valid. So the actual traffic has not failed once it gets underway.
Nothing sings like a kilovolt.
Dr W. Bishop
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf