Author Topic: STM32F417 - any reason why a min PCLK2 speed is required for ethernet to work?  (Read 2822 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
With a 168MHz core, I am finding it runs with DIV8 i.e. 21MHz, or anything faster, but fails with DIV16.

There isn't anything in the ST ethernet code AFAIK which relates to PCLK2. Ethernet uses its own 48MHz source.

There are 2 timers - systick and tim6, both running at 1kHz, neither related to PCLK1.

Could it be there is some data sampling issue which requires PCLK2 to run at x times something?

Needless to say debugging this will be complicated.

I don't need 10.5MHz PCLK2; it is just handy for low baud rates:
https://www.eevblog.com/forum/microcontrollers/32f417-any-way-to-get-baud-rates-below-1200/

EDIT: it turns out somebody has been here before, and as usual without any resolution:
https://community.st.com/s/question/0D50X00009Xkgdi/ethernet-mac-stops-transmitting-when-apb2-divided-by-16-stm32f407

It appears that slowing down PCLK1 is ok, but that screws up SPI2,SPI3 and other stuff.

Problem is one can't use DIV8 (21MHz) because one doesn't know the margin, so the slowest sensible PCLK2 is DIV4 (42MHz) if you are using ethernet.
« Last Edit: May 17, 2021, 07:55:41 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Today I tried to set APB2 clock to AHB/16 on an 'F427 (which is almost identical to the 'F405/407/415/417) and ETH worked as usually.

I don't Cube.

Debug as usually - start with checking you have properly set clocks in RCC, check presence of the 25MHz/50MHz clock (do you use MII or RMII? What's your hardware? How do you generate the ETH clock?). Check, if SMII/MDIO works, observe waveforms on pins, check that PHY responds properly. Read out and check/compare ETH registers content.

> EDIT: it turns out somebody has been here before, and as usual without any resolution:

As usual, that somebody did not care to pursue the problem.

JW
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
This is the clock config



There was another report somewhere of this issue, and it said that the width of the pulses on the ethernet cable shrinks when you do DIV16 on PCLK2.

This is not related to Cube IDE; well not in any obvious way.

The hardware is based on their dev board i.e. LAN8742 fed with a 50MHz clock from PA1. 32F417 xtal is 25MHz. The frequencies of PCLK1, PCLK2 have been checked in several ways, including viewing SPI2/3 clocks with a scope.

Of course, the cause could be some third factor.

Frankly, most people are not able to pursue this sort of thing, at the physical layer, because of the expertise required, not to mention the equipment, and they will already have spent so much time debugging the ST drivers.
« Last Edit: May 18, 2021, 09:23:19 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Quote
There was another report somewhere of this issue, and it said that the width of the pulses on the ethernet cable shrinks when you do DIV16 on PCLK2.

Can you please give a link to this claim?

Quote
This is not related to Cube IDE; well not in any obvious way.

No, the IDE is just that, an IDE (more precisely, Eclipse).  By Cube I mean the whole "ecosystem", of which the central part is the "library", in this case, CubeF4.

Quote
The hardware is based on their dev board

Who is "them", ST? Which dev board in particular?

PA1 is input in this case, ETH_RMII_REF_CLK. So the 50MHz clock is generated by PHY; what is the clock source to that PHY, then?

You can also try to run your code on said dev board, to have a "known good" reference point.

Quote
Frankly, most people are not able to pursue this sort of thing

That's why there are more ready-made solutions like the Wiznet chips, or boards like RPi, around.

JW
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
The 50MHz comes from the LAN8742:



Same as the dev board, IIRC - the STM32F407G-DISC1. The ethernet stuff is on STM32F4DIS-EXT.





« Last Edit: May 19, 2021, 06:03:38 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Quote
The 50MHz comes from the LAN8742

Yes I see that, but what's the primary clock source of LAN8742?

Quote
STM32F4DIS-EXT
That module did not come from ST, although they do have a page for it (for historical reasons, they did actively promote some of the 3rd party stuff back then). As witnessed by the schematics (which is not available at ST, but at Farnell) was produced by a chinese company called Embest Technology, at the time when it had already been in the process of being absorbed into Farnell.

Although it uses LAN8720 rather than LAN8742 as PHY, there is perhaps enough similarity between these two chips. At any case, STM32F4DIS-EXT uses a 25MHz crystal with LAN8720 as its primary clock source, so for now I assume that's what you use, too.

The reason why I'm asking is, that it's a relatively common practice to feed the PHY's primary clock from a MCO output from the STM32. If that clock came from PLL, its jitter may result in the clock to be marginal for ETH (especially RMII), and changes in the RCC may indirectly cause it go out of spec range. There can of course be issues with marginal clock with different arrangement of the primary clock source, but in case of dedicated crystal oscillator directly on the PHY I find it unlikely to be related to APB clock (still not impossible, though, as improper power supply/ground/tracks arrangement can do wonders). That IMO is probably the cause of the issue you mentioned above, "pulse shrink on cable" - I'd like to see that source to confirm/reject.

So, assuming the primary clock source is PHY's dedicated crystal oscillator, and assuming that is precise and stable enough to be excluded as the source of troubles, you might want to concentrate on the debugging path I outlined in my first post, i.e. find out, what exactly "does not work" - check, if SMII/MDIO works, observe waveforms on MDIO/MCK pins, check that PHY responds properly. Check if PHY detects connected cable/speed properly. Check the waveforms on RMII pins. Read out and check/compare ETH registers content. Check if packets arrive and depart, by observing the behaviour of ETH registers.

JW


PS. For the laughs, a personal story of how did I find out the importance of clock integrity when it comes to ETH/STM32 here.
« Last Edit: May 19, 2021, 06:59:38 am by wek »
 
The following users thanked this post: thm_w

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
" but what's the primary clock source of LAN8742?"

The CPU - 25 MHz out of PC9. The dev board circuit is this



but here we are getting the 25MHz out of the CPU's PC9. That came out of some other ST appnote, but I can't recall which. I found this

https://community.st.com/s/question/0D50X00009XkiAE/can-stm32f207xx-generate-50mhz-clock-for-rmii-ethernet-phy-

There is also this
https://docs.rs-online.com/97b6/0900766b813f6b00.pdf
(search for MCO) which shows the option for the CPU to generate the 25MHz clock.

What is somewhat worrying is the ambiguity on whether this needs to be 25MHz or 50MHz. I would think 25 must be fine since it is definitely fine with a 25MHz xtal.

You are right about possible jitter issue, but why would PCLK2 DIV16 affect that?

I did the hardware design, having read every word of the 32F4x7 300 page data sheet, but I didn't do the original investigation on this 25MHz setup. But yes this does show 50MHz for the reduced interface



The 8742 data sheet makes it clear you can use 25MHz from an oscillator, and therefore also from the 32F4, and the following is exactly what we are doing:



It would certainly be easy enough to scope this 25MHz, and do that with DIV16 and DIV8. Let me try that today.
« Last Edit: May 19, 2021, 08:37:48 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
The PCLK2 divider value has no effect on the 25MHz coming out of PC9. And there is no difference in the jitter (invisible on a scope even at x10 mag; the photos below are just out of focus but I re-examined them later]. The following are from a 200MHz scope:

This is the 25MHz, 1V/div vertically, with DIV4 (42MHz) on PCLK2



This is with DIV16 (10.5MHz)



The 50MHz output of the LAN8742:



The 50MHz is going through a 22R, to reduce EM, but the waveform looks no different on the scope either side of that.

Previously there was a mistake there: that resistor was 220R and that made the waveform more like a sawtooth, but that had no bearing on this issue.

Doing a ping over the LAN, 1470 byte packets (about the max), is error free, over ~1000 packets.

Ping statistics for 192.168.1.63:
    Packets: Sent = 1000, Received = 1000, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 20ms, Average = 9ms


The lack of jitter tells me the 25MHz PC9 output is generated from the 25MHz 32F4 crystal oscillator directly, and not from the 168MHz core clock (which would need to be 150MHz to have any hope of an exact jitter-free 25MHz). I vaguely recall the 25MHz xtal being a required frequency for ethernet operation.
« Last Edit: May 19, 2021, 02:34:39 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline harerod

  • Regular Contributor
  • *
  • Posts: 172
  • Country: de
  • ee - digital & analog stuff
    • My services:
peter-h, how is your German? I don't want to re-iterate the whole story about PHYs on the STM32F4:
https://www.mikrocontroller.net/topic/360978

Long story short: myself and several customers who asked me to fix their designs, ran into issues with PHYs on the F4. One typical issue is with the clock. This is reliably solved by giving the PHY its own quartz. That issue was introduced with the F4's clock system. Other solutions might be possible.


A secondary issue is present and not documented in some PHYs. That issue stems from the PHY's internal reset circuitry, which requires a specific slope to work. If one faithfully follows the reference design, the issue will never surface. It took a while and a bit of wasted money, but Micrel helped me out, eventually.

BOTH issues described will not prevent a simple PING. However, during production operation with high LAN-load the connection will be slow and/or unstable. Both issues were a royal PITA to track down.

« Last Edit: May 19, 2021, 07:00:09 pm by harerod »
Before you speak, let your words pass through three gates: At the first gate, ask yourself “Is is true?” At the second gate ask, “Is it necessary?” At the third gate ask, “Is it kind?” – Rumi
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Harerod - thank you. I must be missing something because I read through that thread, and the two linked ones, and there is no actual information there.

What I don't get is why, when the 8742 "works" with a 25MHz xtal, it "should not work" with the 25MHz coming from the 32F4.

This has to be "nonsense" :)

But it is possible, for unrelated reasons e.g.

- it needs a sinewave input, and/or is somehow sensitive to the waveform, risetimes, over/undershoots, etc, which just "happen" to not be present with a xtal osc

- giving it its own xtal means its 25M clock is de-correlated with the 32F4 clock, and this avoids some subtle data sampling issues, possibly related to metastability BUT these issues will still occur periodically each time the two clocks pass each other, which given say a 20ppm difference will happen a few hundred times per second BUT it won't be noticed unless there is actual data being sampled AND it is within the metastability window (which is extremely narrow, with GHz-speed logic of the 32F4) AND if it corrupts data nobody will notice because of error detection/correction at the TCP/IP layer ;)

whereas if one uses the 32F4's 25MHz output then the two are obviously correlated and if there is a problem, it will be there never, or all the time.

So unless I see something concrete, I don't buy this.

My hunch is that IF this was related to the 25MHz clock out of the 32F4, it would be a subtle sensitivity (of the 8742) to the waveform. I have seen this many times; Xilinx FPGAs (X3k, X4k) were notoriously sensitive to the edges on the config load clock.

BUT, as you can see in the photos above, the 25MHz out of the 32F4 looks exactly the same whether PCLK2 has a DIV4 or a DIV16 on it. On DIV16 the thing is totally unresponsive, on DIV8 or less it runs fine. That points to some sampling issue inside the 32F4.

What I can't determine is whether the DIV4 v. DIV16 shifts some data sampling internal to the 32F4 and exposes this problem. One of the old threads posted earlier alluded to delays between configuring one clock and configuring another clock so maybe somebody had this same idea. It is like the issue with the UARTs, where if you change the baud rate, it doesn't become active until the various counters have got all shifted out.

The PCB was done as carefully as I could. You can see the 8742 (IC17) and the 32F4. R87 is a 22R on the 50MHz clock



The integrated-magnetics RJ45 is an Hanrun HR911105A which is a very common part. There are many variations of it, and many more counterfeits (marked Hanrun too). We have tested a few of these and all work the same. I am inclined to go for a particular fake variant which is not only half the cost of Hanrun but has capacitors in series with the 75R resistors since this prevents the RJ45 smoking when somebody plugs in one of the chinese dumb POE injectors; another common problem...

"Both issues were a royal PITA to track down."

What did you actually find?
« Last Edit: May 19, 2021, 07:31:37 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2062
  • Country: us
What is somewhat worrying is the ambiguity on whether this needs to be 25MHz or 50MHz. I would think 25 must be fine since it is definitely fine with a 25MHz xtal.

The LAN8724 datasheet explains this.  The reference clock can be a 25MHz crystal, a 25MHz clock, or a 50MHz clock.  For a 50MHz clock input, REF_CLK In Mode must be enabled (by strapping nINTSEL low). 

I'm not sure why you're focused on clock correlation between the PHY clock and the MAC clock, after all it's perfectly normal to have the RMII interface clocked by the PHY which may have an arbitrary relationship with the MAC's peripheral clock or other clocks in the processor.  The whole point of having a clock line on a bus interface is to not have to worry about the correlation of clocks between the devices on either end of that bus, and while timing issues are possible between the MAC's MII interface block and the parts that are synchronous with the internal bus clock that's something that should show up a lot given how common ethernet applications with either clock configuration are. 

It's also not clear what makes you seem to be so sure that the clock signal between the MAC and PHY is the problem and not something else.  Cube has a lot of layers between the API calls and hitting the peripheral registers in an attempt to make it all "just work" regardless of hardware and peripheral configurations, but that means lots of opportunities for bugs or missed cases.  In particular, the cube libraries provide a lot of automated clock tree handling, and it's entirely possible that there's some issue with how it handles the DIV16 in relation to configuring the MAC.  As wek has suggested, you should do some additional checks to see what is going on in the system.  Poke the SMI interface, check the configuration registers in the MAC (and compare the values in the working configuration with the non-working configuration) , probe the RMII interface and see if there's any activity, etc.
« Last Edit: May 20, 2021, 05:37:38 pm by ajb »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I can make it work, or break it, by changing just the one line, defining the divisor of PCLK2.

/** Initializes the CPU, AHB and APB busses clocks
   * These settings override the stuff in SetSysClock which gets called earlier.
  */
  RCC_ClkInitStruct.ClockType = RCC_CLOCKTYPE_HCLK|RCC_CLOCKTYPE_SYSCLK
                              |RCC_CLOCKTYPE_PCLK1|RCC_CLOCKTYPE_PCLK2;
  RCC_ClkInitStruct.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
  RCC_ClkInitStruct.AHBCLKDivider = RCC_SYSCLK_DIV1;
  RCC_ClkInitStruct.APB1CLKDivider = RCC_HCLK_DIV4;    // 42MHz - SPI2,SPI3,UART2,UART3,TIM2,3,4,5,6,7,12,13,15 etc
  RCC_ClkInitStruct.APB2CLKDivider = RCC_HCLK_DIV4;    // 42MHz - UART1,UART6

  if (HAL_RCC_ClockConfig(&RCC_ClkInitStruct, FLASH_LATENCY_5) != HAL_OK)
  {
    Error_Handler();
  }

and then inside HAL_RCC_ClockConfig we have

    if(((RCC_ClkInitStruct->ClockType) & RCC_CLOCKTYPE_PCLK1) == RCC_CLOCKTYPE_PCLK1)
    {
      MODIFY_REG(RCC->CFGR, RCC_CFGR_PPRE1, RCC_HCLK_DIV16);
    }

    if(((RCC_ClkInitStruct->ClockType) & RCC_CLOCKTYPE_PCLK2) == RCC_CLOCKTYPE_PCLK2)
    {
      MODIFY_REG(RCC->CFGR, RCC_CFGR_PPRE2, (RCC_HCLK_DIV16 << 3));
    }

and MODIFY_REG is just a bit field merging macro.

I don't see any funny stuff there (like there is in say the UART baud rate setting code).

I don't think it is anything to do with the 25MHz clock, but that was suggested above, so I set out to see if there is anything in it.
« Last Edit: May 19, 2021, 08:14:03 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2062
  • Country: us
Those code snippets are only one part of the equation.  Other parts of the Cube libraries, including the parts that set up the MAC, will often refer to the clock tree configuration when they set up peripherals, so you'd need to look at those calls as well to see what implications changing the APB2 divider configuration might have and where that might be going wrong.  IIRC cube keeps track of the various bus clock frequencies when the RCC configuration is changed so that other parts of the libraries can use that information without having to inspect the RCC registers, so there could also be a problem there. 

But again, you can shave the problem space down quite a bit by first doing some more basic checks of what's going on in the system when it doesn't work.
« Last Edit: May 19, 2021, 08:23:02 pm by ajb »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
OK so if I get this right, we have two assertions as to the possible cause:

1) There is something magical about using an xtal (or perhaps some other async clock source) for the 8742's 25MHz clock

2) There is a bug in ST's config code, where it extracts some bit field out of the PCLK clock config registers and does the wrong thing with it (I would have expected this to surface rather sooner, given how long ago it was first reported)

Point 2) may get checked by examining whether the ethernet config registers change between PCLK2 having a DIV4 v. DIV16 on it. Can someone suggest which ones to look at?
« Last Edit: May 19, 2021, 09:13:40 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
Wrong.

The PCLK minimum clock speed is related to the correct operation of the synchronization logic in between the internal bus and the external asynchronous clock domain, that may be or is dictated by an external clock source.

This PCLK minimum applies to a lot more different peripherals, than just ETH.

The minimum PCLK2 clock speed for ETH operation is even clearly stated in the manual, so whats this fuss all about? This is not a bug.

//EDIT: Just looked in the refmanual, and huh, the minimum of 25 MHz is specified for HCLK, not PCLK2.  Interesting.... Now I take back what I have said, no idea what the issue with PCLK2 is then. Thought the ETH is tied to PCLK2 domain... which it is not.
« Last Edit: May 19, 2021, 09:53:10 pm by Yansi »
 

Online bson

  • Supporter
  • ****
  • Posts: 1930
  • Country: us
Note that with RMII the ref clock is 50MHz, not 25MHz on the RMII interface.  Unless the 8742 has an internal PLL to double (I see it has a PLL) the 25MHz clock you might need to feed it a 50MHz clock.  This is not unusual with PHYs.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
It must have a doubler otherwise it would not work with a 25MHz xtal, which is shown.

I wonder if ST's ethernet code uses any of the timers on APB2/PCLK2 i.e. TIM1,8,9,10,11...
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Online bson

  • Supporter
  • ****
  • Posts: 1930
  • Country: us
Does the ethernet controller sit on APB2?  It might need a minimum clock to work.
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
Ethernet peripheral seems to be AHB bus master - it even has it's own linked-list DMA.  (Dunno why I thought it was APB2...)

But... maybe part of the interface sits on APB2? Who knows...  Only ST knows and hard to get anything out of them.

 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I have raised a ticket on the ST website.

This sort of thing could easily have remained undiscovered for years, because why should anybody run PCLK2 that slowly? The only reasons would be to get baud rates below 1200 (UART 1 or 6) or to make TIM1,8,9,10,11 etc run slowly. But only 2,5 are 32-bit and they are on PCLK1.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
Let's hope they will respond with anything useful.

Btw, what does one need baud below 1200?  (as was probably said, could be done by simple software bitbang) ... but what is the real application these days?

 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4070
  • Country: fi
Have you carefully verified that any part of the library ethernet code does not use any peripheral clocked by the APB2 domain? I mean, maybe they use something else than the ETH itself for auxiliary functions, for example some timer to trig something. That something would start operating differently if you just change the divider but don't regenerate the whole code with the correct clock information; or that something might have a bug or undocumented limitation ("feature") preventing operation with clock too slow.
« Last Edit: May 20, 2021, 04:25:06 pm by Siwastaja »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I asked the guy who implemented the ST ETH library on this target and he is pretty sure no TIMs are used, but it does hook up into the 1kHz tick. Obviously timers are needed for ethernet. I did a search of the whole code tree for TIMx and found nothing.

If you look at what APB2/PCLK2 drives, it's hard to imagine any of it could be used for ETH code. It is this lot




Regarding < 1200, this is indeed very rare these days. It had been used for slow links especially 20mA loop ("TTY") which believe it or not is still used. There is some amazingly old legacy stuff installed. But in my business I have other products which can do everything down to 30 baud so they can use those :)

Probably the main application for low baud rates (below 9600) is licence free radio modems. They are supposed to implement a hardware handshake, allowing the host bitrate to be say 9600, even if the link bitrate is something slower. There are also some silly low rates in remote metering, as in talking to electricity power meters.

At the other end of the scale there are some bizzare high rates e.g. 500kbps which were always a PITA to do because the divisors are small; I used to buy special xtals, and do it with a product I had which used an 85C30, with its own xtal, for two serial ports, and then anything was possible. The 32F4 does all these values really well, with error < 1%.
« Last Edit: May 20, 2021, 04:46:49 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2062
  • Country: us
Ethernet itself doesn't require any timers (unless maybe you're using ingress/egress timestamping), but a lot of the IP stack above that does.  Depending on which parts of the stack you're using there are a handful of different task functions that need to be executed periodically to maintain ARP, DHCP, IGMP, TCP, etc.  I could imagine there being an issue with the way Cube handles those based on various clock configuration options. 

I assume Cube uses LWIP for its IP stack, in which case you could look for where it's timer functions are called from, for example `etharp_tmr` (since you're certainly going to be relying on ARP for anything IP).  That could provide a hint about how Cube handles periodic timing and will also give you a reasonable entry point to debugging the IP stack, as seeing which functions are executing will give you some idea of what's working and what isn't.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Yes, LWIP.





How this hangs together, I don't know. It either hooks up the 1kHz tick (and there are two of those, one from SysTick and one from TIM6, and if you ask why there are two 1Khz ticks, yes it seems dumb, but search for FreeRTOS interrupt priority issues with Systick; it seems a well known issue and this is the approach they recommend) or there is some timer associated with the ETH hardware (like ETH has its own DMA controller). Or it is using one of the 32F4 timers in which case we may have found something, but why isn't that all over google? :)

LWIP is running as a task under FreeRTOS.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline dgtl

  • Regular Contributor
  • *
  • Posts: 178
  • Country: ee
At least on F207, the MCO is not usable for eth phy when fed from PLL due to high jitter according to errata.  Perhaps some PCLK divider values somehow make the stability worse? I've always used a 25MHz xtal due to that errata for the MCU and driven the MCO directly from HSE, without PLL.
In what way doesn't the ethernet work? Are the MDIO registers usable? Does it link up to the remote party (link led on)?
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
What is the issue with running FreeRTOS from Systick?  I have not done much work with FreeRTOS yet, but I have noticed a lot of ST examples used a different timer for the 1ms tick.

Tried searching for it, but quick search found only some argument, at what priority the tick interrupt should be.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
With DIV16 on PCLK2, the two RJ45 LEDs flicker at ~1Hz intervals and after ~10 secs settle to both on continuously. But no DHCP.

Re FreeRTOS tick interrupt priority:
https://community.st.com/s/question/0D50X0000A4nQxpSQE/when-freertos-is-used-it-is-strongly-recommended-to-use-a-hal-time-base-source-other-than-the-systick-
https://www.digikey.com/en/maker/projects/getting-started-with-stm32-introduction-to-freertos/ad275395687e4d85935351e16ec575b1

I think it is fairly obvious the 25MHz MCO output comes straight from the 25MHz xtal osc:



Well, if it doesn't, it's a bloody stupid wasted opportunity :)

And I can't see any jitter on a scope on it.
« Last Edit: May 20, 2021, 06:40:27 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
It does not come straight; it comes through a lot of gates. Increased jitter or what, that is what makes it out of spec for the PHY. That is also a known issue, that is addressed in the documentation.

If you want one crystal less in the design,  put the crystal to the PHY and let its PLL do its job. Then you can use the clock output of the PHY to run the STM32. That is the recommended way both for ETH and especially USB HS, where the timing is even more critical.

 The issue with supplying clock to the PHY from MCO is exactly on the edge of acceptable. It sometimes works, sometimes not. It works with some PHYs, it may not work with others. As far as I remember, there is an appendix in the manual, that clearly warns you abotu this. So you have been warned :)
« Last Edit: May 20, 2021, 06:48:57 pm by Yansi »
 
The following users thanked this post: thm_w, Silenos

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
But... why?

The scope shows a trace which is just like what it would be from a xtal, or an xtal osc module.

I have a distinct feeling this is a case of the blind leading the blind, and somebody puts something in some doc because something didn't work and they never bothered to find out why not.

One still has to explain why the different PCLK2 speeds totally make it or totally break it.

Where is this appendix in the manual?
« Last Edit: May 20, 2021, 07:07:34 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
You will see shit all jitter on an osmelloscope. Measure it with a decent spectrum analyzer.

And I tell you, because I think I know it - or at least am pretty convinced so. I worked in ST and this is what I was specifically told, when I was working there. I would be very surprised, if this changed at all, after the years.  From what I remember, the path in between HSE and MCO contains just too many gates to meet the jitter spec of the PHY. That was exactly the explanation I was given.
« Last Edit: May 20, 2021, 07:13:05 pm by Yansi »
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3787
  • Country: 00
  • STM32, STM8, AVR, 8051
Where is this appendix in the manual?

Chapter A.3 in the datasheet.

Also, now looking at it, I think it was only valid for 50 MHz output, maybe. Am not sure, but I have already asked, I still have some friends there. I will post back what I will get. We was always told to check, that the PHY must not be clocked from MCO, like at all. That is what I remember. Now looking in the datasheet, seems only valid for RMII/50MHz.

//wait a minute, seems I have some pretty outdated datasheet for the F417... will load a new one ... okay, downloaded 8 years younger datasheet - no change in A.3 appendix. But am pretty confident, that we have been told to never use MCO for the PHY, as it was not reliable.

//EDIT2: Just wanted to add, this is not very much related to the APB2 clock dividing... Maybe I create a lot of offtopic lately. :-/
« Last Edit: May 20, 2021, 08:15:34 pm by Yansi »
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Okay so just throw out everything related to APB2 peripherals from your program.

Still problems with ETH?

JW


PS. Just FYI, tried the APB2 being set to AHB/16 on a 'F407 too (that's an identical chip to the 'F417), and as expected, ETH works flawlessly.
« Last Edit: May 20, 2021, 07:47:59 pm by wek »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I have a 2.4GHz spectrum analyser. Can check tomorrow. What specifically should I look for? What analyser settings should I use?

There will be loads of harmonics, given it is a "square" wave :)

The only devices I am using in the project which run off PCLK2 are UART1,6. And they aren't being used. They are initialised. BTW they worked perfectly with DIV16.

"Just FYI, tried the APB2 being set to AHB/16 on a 'F407 too (that's an identical chip to the 'F417), and as expected, ETH works flawlessly."

That's a great data point.

Is your core running at 168MHz?

Which devices on PCLK2 have you got enabled/configured?

Did you verify the PCLK2 has actually changed? ST's init code sets up clocks in 2 places. The CPU powers up at 16MHz. There are two bits of code which set up the APB1/APB2 clocks. The first gets called from SystemInit() which gets called from startup_stm32f407xx.s. The second is done in SystemClock_Config() in main(). The 2nd overrides the first, obviously. This confused the hell out of me.
« Last Edit: May 20, 2021, 09:03:06 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Quote
Is your core running at 168MHz?
Yes.

Quote
Which devices on PCLK2 have you got enabled/configured?
ADC1 which is just lazily polled, USART6 which did nothing in that experiment, TIM8 which is the source of timing (but constants are recalculated so that the output frequency of it remained unchanged). And SPI1, which is used heavily, being source of data displayed on an LCD, so the difference was obvious.

Quote
Did you verify the PCLK2 has actually changed?
Besides being obvious as I said above, I also confirmed it by reading back RCC_CFGR.

Quote
The only devices I am using in the project which run off PCLK2 are UART1,6.
So it will be easy to remove all code related to these. Then run the code and check if ETH still does not work.

You could read RCC_APB2ENR to check it's all cleared (perhaps except SYSCFGEN which should be harmless). Check also the EXTI registers, if there's any interrupt enabled there.

JW
« Last Edit: May 21, 2021, 12:44:12 am by wek »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Can these be read in debugging / single stepping?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Quote
Can these be read in debugging / single stepping?

Yes, in the same way as you do with all other peripherals' registers. Just don't forget that using the debugger may be intrusive.

Or use your favourite method of choice - output to SWO, print to UART, whatever works for you.

JW
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
OK I have some data here:

EXTI is all zeroes



Here is RCC, with DIV4 on PCLK2



and here is RCC with DIV16



APB2ENR is



It gets better. This is 0-26MHz span and you can see no subharmonics



I also checked the 25MHz with a delay timebase scope, magnification about 1000:1, and triggering from the input (not from the delayed segment) and there is no jitter. So all this stuff about jitter is definitely nonsense, in this case. Also looking at the frequency division config (posted earlier) one can see it goes direct.
« Last Edit: May 21, 2021, 06:33:11 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2062
  • Country: us
How this hangs together, I don't know. It either hooks up the 1kHz tick (and there are two of those, one from SysTick and one from TIM6, and if you ask why there are two 1Khz ticks, yes it seems dumb, but search for FreeRTOS interrupt priority issues with Systick; it seems a well known issue and this is the approach they recommend) or there is some timer associated with the ETH hardware (like ETH has its own DMA controller). Or it is using one of the 32F4 timers in which case we may have found something, but why isn't that all over google? :)

The ethernet block has its own timer but it's only used for timestamping and is clocked from HCLK, so the various IP stack intervals must be controlled from another time base.  The struct you shared earlier showing the reference to etharp_tmr is just the list of handlers that LWIP provides, it doesn't tell you where those handlers are actually invoked.  Easiest way to find that within all of the layers of Cube may be to throw a break point in etharp_tmr or one of the other LWIP _tmr functions and then skim up the call stack until you get out of LWIP and into Cube.  Then start looking for a reference to one of the other timers and see what's being used and if it's getting configured right.  Honestly throwing breakpoints into a few strategic places and running with the two different clock div settings could tell you a lot very quickly--seeing what is and isn't getting hit in either case will give you some hints about what the underlying problem is, and seeing what shows up above the breakpoint in the call stack can quickly clarify how the application is structured, since it's basically a cross section of the layers of abstraction provided by Cube/LWIP. 

There are also some debug macros defined in LWIP with flags to enable/disable them on a per-protocol basis, these are usually given in lwipopts.h (which needs to be provided by the application to configure LWIP), not sure where Cube puts that file, but look for symbols like TCPIP_DEBUG, NETIF_DEBUG and see where they're defined.  You may need to provide a definition for the LWIP_PLATFORM_DIAG macro if Cube doesn't provide something suitable, as that's what LWIP uses to invoke platform-specific diagnostic output.
« Last Edit: May 21, 2021, 04:43:22 pm by ajb »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Thank you.

We do have a problem with making DHCP work on different systems, and the main suspect is timeouts being too tight. One can make it work, or not, by commenting out some RTOS tasks, although currently I have it running DHCP fine on one system, with all tasks enabled, by having made them better behaved (putting osDelay(1) in various places where it doesn't matter, to yield to the RTOS).

So it is possible that this whole DIV16 business is another aspect of the same thing. Interesting however that somebody else found the same thing, as posted at the start.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Silenos

  • Contributor
  • Posts: 35
  • Country: pl
  • Fumbling in ignorance
You could actually post where the screenshots differed (CFGR register), it took me way to much time to spot it...
You know, you spend days on writing over forums about seemingly random, "suspect" issues around clocking and didn't even bother to read RM section about rcc?
IIRC on F4 the clock sources, PLL, routing and divs options should be in CR, CFGR, PLLCFGR registers (didn't check it now); have their values compared with the numbers you should calculate from RM and see if they meet your desires. The xxxENR registers are for "enabling"/"connecting" the clock to the cores hooked to given bus.

Then read some of the eth core, or search over RM/DS, find if the ETH requires some specific clocking scheme, dont rely on cube here.
Then if your schematic is ok, you measure good clock, the potential clock issues should be excluded by now. I personally had module with phy/crystal/magnetics dangling on random 15 cm wires with RMII from F4 discovery board. Shameful as it was, yet it surprisingly worked ok.

Another thing, that ST eth code afaik is still littered with bugs, I still remember how i stumbled upon plain mistake in the their init code long time before HAL came.
Another thing, that STM eth cores use to have some hardware bugs here and there too, which interfere with the cube code. Have you checked errata?
On "ST community" site there is/was a guy, iirc named "Piranha" who has made eth module work perfectly, with all the bugs handled, though afaik he didn't share the code. Yet he was helpul with insights.
One can make it work, or not, by commenting out some RTOS tasks, although currently I have it running DHCP fine on one system, with all tasks enabled, by having made them better behaved (putting osDelay(1) in various places where it doesn't matter, to yield to the RTOS).
:palm: This code will fail you in the moment of truth...

Anwway, looks like you have piled non-trivial ST driver over non-trivial eth core over non-trivial 3rd party stack over non-trivial (preempting? :horse:) rtos and it doesn't "just work"... I would drop the rtos, maybe even the stack (though I think the stack in the most trusty piece of code here) and check if driver alone misbehave, what it outputs, or are there any error flags in eth registers.
And clock the bus for max f, for now at least, to achieve eth flawless operation. Eg. iirc the USB cores on STM have the ahb/core interface actually exposed in papers and registers, and you can see it already requires special handling with lowered bus speed. Wouldn't be surprised if eth core also was vulnerable to low f.
 

Online bson

  • Supporter
  • ****
  • Posts: 1930
  • Country: us
1kHz tick?  In a real-time OS?  I didn't think kernels used heartbeat style ticks since... the 1980s.

The ARP cache doesn't need a timer.  It can simply detect on lookup that an entry has expired, and fire off a re-arp.  (And, if it's not overly stale, use the old value without waiting for the response.  When it gets the response, if it gets one, it updates the cache.  This way nobody blocks on the ARP cache during normal operation, it gets refreshed, there are no timers and stuffs, and everything is happy.  And if it doesn't get response, it can eventually start failing lookups altogether - nobody ever blocks.)
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 7441
I've been working with microcontrollers for a long time and the amount of complexity in the larger 32-bit MCUs still amazes me. Thousands of pages of documentation, for something that can fit on the tip of a finger...
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4070
  • Country: fi
I've been working with microcontrollers for a long time and the amount of complexity in the larger 32-bit MCUs still amazes me. Thousands of pages of documentation, for something that can fit on the tip of a finger...

But you usually use maybe some 5-30% of it, depending on project, rest is "dead silicon" powered down, and obviously you don't need to even look at those parts of the manual.

Same inside any complex peripheral, say a CAN peripheral is 100 pages of that 4000 pages yet you completely skip 50 pages of TTCAN feature description when you don't use it.
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
In RCC_APB2ENR, you still have ADC1 and ADC2 enabled.

Remove those two, too - mainly interrupts or any reading/writing to them - and try. Still no ETH?

Try to run it without the debugger connected (in case the debugger would want to access the APB2 domain regularly, for any reason, e.g. trying to display ADC results). Still no ETH?

JW
« Last Edit: May 22, 2021, 09:04:38 am by wek »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I have found that the ADC interrupt is enabled by the macro

    __HAL_RCC_ADC1_CLK_ENABLE();

which is ... jesus!

do { \
                                        volatile uint32_t tmpreg = 0x00U; \
                                        ((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x3800UL))->APB2ENR) |= ((0x1UL << (8U))));\
                                        /* Delay after an RCC peripheral clock enabling */ \
                                        tmpreg = ((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x3800UL))->APB2ENR) & ((0x1UL << (8U))));\
                                        (void)tmpreg; \
                                          } while(0U)

but, hang on, APB2ENR doesn't enable interrupts. It enables the clocks, which are required. These peripherals are not being accessed by any software.

Of course RCC_CFGR will change between DIV4 and DIV16 on PCLK2 - that is where these are configured. Nothing else has changed.
« Last Edit: May 22, 2021, 11:31:02 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 4070
  • Country: fi
I have found that the ADC interrupt is enabled by the macro

    __HAL_RCC_ADC1_CLK_ENABLE();

Why in the world do you expect that? Neither the macro name nor the implementation says anything about interrupts.

Quote
but, hang on, APB2ENR doesn't enable interrupts.

Of course not.

I think you should start from the very basics; read the reference manual. You need to know what RCC is and what it does, otherwise there is no chance of success.

And why do you think ADC1&ADC2 peripheral clock enables are "required"? Just turn them off and don't use them.

I also have said this numerous times and I know it's not a completely popular opinion but Cube and libraries are preventing you from understanding what's happening and debugging the system.
« Last Edit: May 22, 2021, 11:50:39 am by Siwastaja »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Th ADC clock enables are a part of the ADC init function. I can not call that function and see if it makes ETH work with DIV16. I will do that on Monday.

I do read the ref manual; that's how I found what RCC does :)

The interrupts were mentioned further above, and not by me.

Yes, the ST ETH code has various bugs but a colleague has spent a year or so, part-time, trawling the internet and fixing these :) There are still issues with it e.g. DHCP works if the LAN is 192.168.1.* but not if the LAN is 192.168.3.*. It is as if the 192.168.1.* was hard-coded somewhere...

I am just very resource-limited, which is why I am asking for advice. Nothing else I can do.
« Last Edit: May 22, 2021, 12:06:43 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Turns out I was right about a bit of the circuit using PCLK2, but not everybody will see the problem. This was posted on the ST forum, after some time:

thank you for pointing to this issue.
The problem seems to be in RMII/MII selection which is done in SYSCFG peripheral and clocked from APB2 clocks. During initialization, the RMII/MII switch is configured and then reset of ETH MAC is performed. If the APB2 clocks are much slower than AHB, it might happen that the reset is performed before RMII/MII is switched properly. To ensure proper functionality, dummy read from SYSCFG needs to be added.

In the HAL driver (function HAL_ETH_Init in stm32f4xx_hal_eth.c) there can be following workaround:

  /* Select MII or RMII Mode*/
  SYSCFG->PMC &= ~(SYSCFG_PMC_MII_RMII_SEL);
  SYSCFG->PMC |= (uint32_t)heth->Init.MediaInterface;
  (void)SYSCFG->PMC; // <---- Workaround: Dummy read to sync SYSCFG with ETH
  /* Ethernet Software reset */
  /* Set the SWR bit: resets all MAC subsystem internal registers and logic */
  /* After reset all the registers holds their respective reset values */
  (heth->Instance)->DMABMR |= ETH_DMABMR_SR;

I'm not 100% sure if "(void)SYSCFG->PMC;" works in all compilers, but it worked in GCC as dummy read.

This should fix the issue. I also reported this internally so it should be fixed in future releases.
Could you please check if this workaround fixes your issue?


I am away from the "lab" currently so can't test it.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Quote
Turns out I was right about a bit of the circuit using PCLK2, but not everybody will see the problem.

Oh. This is a gotcha indeed. Thanks for sharing.

Quote
This was posted on the ST forum

Can you please post a link to this?

Please let us know of your findings when you get to testing it.

Thanks,

JW
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
I looked for my original thread there (https://community.st.com/) but can't find it, can't find (there or via google) the above text, and there appears to be no way to look at my own posts in my profile there.

EDIT: the info posted came from ST by email.
« Last Edit: June 08, 2021, 10:53:36 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline wek

  • Contributor
  • Posts: 9
  • Country: sk
Probably related: https://community.st.com/s/question/0D50X0000BABrWMSQ1/stm32f767-no-ethernet-when-apb2clkdivider-rcchclkdiv8-or-greater

JW

(That thread has more than 10 messages and one has to click on blue "More answers" at the bottom to show them. If there's no "More answers", reload. ST insists of using Salesforce instead of proper forum software.)
« Last Edit: June 09, 2021, 02:32:35 pm by wek »
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
Yes - same issue exactly.

Interesting that DIV8 or DIV16 is reported as breaking it. I found that DIV16 definitely broke it, but DIV8 possibly didn't; may have been right on the edge. So clearly DIV4 or DIV2 are the only sensibly available values.

That ST guy, Adam Berlinger, is the one who sent me the email.

Yes, their forum software is crap, but this is not unusual; U-BLOX use a similar useless one e.g.
https://portal.u-blox.com/s/question/0D52p0000AT0rV9CQJ/how-to-detect-waasegnos-is-actually-being-used-for-the-fix

« Last Edit: June 10, 2021, 09:07:08 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2062
  • Country: us
Interesting that DIV8 or DIV16 is reported as breaking it. I found that DIV16 definitely broke it, but DIV8 possibly didn't; may have been right on the edge. So clearly DIV4 or DIV2 are the only sensibly available values.

If the issue really is that the MII/RMII switch setting needs time to be completed before the MAC is reset then no, DIV2 and DIV4 aren't the only available values, they're just the only values where you don't have to be aware of the timing issue between the two clock domains.  That's still a really annoying bug, and maybe they could have solved it with some better design decisions in how the MAC integrates with the system, but it's not that hard to work around it if you need to use the other DIV ratios.
 

Offline peter-h

  • Frequent Contributor
  • **
  • Posts: 741
  • Country: gb
  • Doing electronics since the 1960s...
The above suggested fix works perfectly. ETH now running with DIV16 on PCLK2

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 90S1200 32F417
 
The following users thanked this post: thm_w


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf