Author Topic: STM32F417 - any reason why a min PCLK2 speed is required for ethernet to work? (Read 6295 times)

dgtl · « **Reply #25 on:** May 20, 2021, 06:16:37 pm »

At least on F207, the MCO is not usable for eth phy when fed from PLL due to high jitter according to errata. Perhaps some PCLK divider values somehow make the stability worse? I've always used a 25MHz xtal due to that errata for the MCU and driven the MCO directly from HSE, without PLL.
In what way doesn't the ethernet work? Are the MDIO registers usable? Does it link up to the remote party (link led on)?

Yansi · « **Reply #26 on:** May 20, 2021, 06:34:28 pm »

What is the issue with running FreeRTOS from Systick? I have not done much work with FreeRTOS yet, but I have noticed a lot of ST examples used a different timer for the 1ms tick.

Tried searching for it, but quick search found only some argument, at what priority the tick interrupt should be.

peter-h · « **Reply #27 on:** May 20, 2021, 06:35:54 pm »

With DIV16 on PCLK2, the two RJ45 LEDs flicker at ~1Hz intervals and after ~10 secs settle to both on continuously. But no DHCP.

Re FreeRTOS tick interrupt priority:
https://community.st.com/s/question/0D50X0000A4nQxpSQE/when-freertos-is-used-it-is-strongly-recommended-to-use-a-hal-time-base-source-other-than-the-systick-
https://www.digikey.com/en/maker/projects/getting-started-with-stm32-introduction-to-freertos/ad275395687e4d85935351e16ec575b1

I think it is fairly obvious the 25MHz MCO output comes straight from the 25MHz xtal osc:

Well, if it doesn't, it's a bloody stupid wasted opportunity

And I can't see any jitter on a scope on it.

Yansi · « **Reply #28 on:** May 20, 2021, 06:44:28 pm »

It does not come straight; it comes through a lot of gates. Increased jitter or what, that is what makes it out of spec for the PHY. That is also a known issue, that is addressed in the documentation.

If you want one crystal less in the design, put the crystal to the PHY and let its PLL do its job. Then you can use the clock output of the PHY to run the STM32. That is the recommended way both for ETH and especially USB HS, where the timing is even more critical.

The issue with supplying clock to the PHY from MCO is exactly on the edge of acceptable. It sometimes works, sometimes not. It works with some PHYs, it may not work with others. As far as I remember, there is an appendix in the manual, that clearly warns you abotu this. So you have been warned

peter-h · « **Reply #29 on:** May 20, 2021, 07:05:18 pm »

But... why?

The scope shows a trace which is just like what it would be from a xtal, or an xtal osc module.

I have a distinct feeling this is a case of the blind leading the blind, and somebody puts something in some doc because something didn't work and they never bothered to find out why not.

One still has to explain why the different PCLK2 speeds totally make it or totally break it.

Where is this appendix in the manual?

Yansi · « **Reply #30 on:** May 20, 2021, 07:11:22 pm »

You will see shit all jitter on an osmelloscope. Measure it with a decent spectrum analyzer.

And I tell you, because I think I know it - or at least am pretty convinced so. I worked in ST and this is what I was specifically told, when I was working there. I would be very surprised, if this changed at all, after the years. From what I remember, the path in between HSE and MCO contains just too many gates to meet the jitter spec of the PHY. That was exactly the explanation I was given.

Yansi · « **Reply #31 on:** May 20, 2021, 07:14:56 pm »

Quote from: peter-h on May 20, 2021, 07:05:18 pm

Where is this appendix in the manual?

Chapter A.3 in the datasheet.

Also, now looking at it, I think it was only valid for 50 MHz output, maybe. Am not sure, but I have already asked, I still have some friends there. I will post back what I will get. We was always told to check, that the PHY must not be clocked from MCO, like at all. That is what I remember. Now looking in the datasheet, seems only valid for RMII/50MHz.

//wait a minute, seems I have some pretty outdated datasheet for the F417... will load a new one ... okay, downloaded 8 years younger datasheet - no change in A.3 appendix. But am pretty confident, that we have been told to never use MCO for the PHY, as it was not reliable.

//EDIT2: Just wanted to add, this is not very much related to the APB2 clock dividing... Maybe I create a lot of offtopic lately. :-/

wek · « **Reply #32 on:** May 20, 2021, 07:45:00 pm »

Okay so just throw out everything related to APB2 peripherals from your program.

Still problems with ETH?

JW

PS. Just FYI, tried the APB2 being set to AHB/16 on a 'F407 too (that's an identical chip to the 'F417), and as expected, ETH works flawlessly.

peter-h · « **Reply #33 on:** May 20, 2021, 08:20:28 pm »

I have a 2.4GHz spectrum analyser. Can check tomorrow. What specifically should I look for? What analyser settings should I use?

There will be loads of harmonics, given it is a "square" wave

The only devices I am using in the project which run off PCLK2 are UART1,6. And they aren't being used. They are initialised. BTW they worked perfectly with DIV16.

"Just FYI, tried the APB2 being set to AHB/16 on a 'F407 too (that's an identical chip to the 'F417), and as expected, ETH works flawlessly."

That's a great data point.

Is your core running at 168MHz?

Which devices on PCLK2 have you got enabled/configured?

Did you verify the PCLK2 has actually changed? ST's init code sets up clocks in 2 places. The CPU powers up at 16MHz. There are two bits of code which set up the APB1/APB2 clocks. The first gets called from SystemInit() which gets called from startup_stm32f407xx.s. The second is done in SystemClock_Config() in main(). The 2nd overrides the first, obviously. This confused the hell out of me.

wek · « **Reply #34 on:** May 21, 2021, 12:37:29 am »

Quote

Is your core running at 168MHz?

Yes.

Quote

Which devices on PCLK2 have you got enabled/configured?

ADC1 which is just lazily polled, USART6 which did nothing in that experiment, TIM8 which is the source of timing (but constants are recalculated so that the output frequency of it remained unchanged). And SPI1, which is used heavily, being source of data displayed on an LCD, so the difference was obvious.

Quote

Did you verify the PCLK2 has actually changed?

Besides being obvious as I said above, I also confirmed it by reading back RCC_CFGR.

Quote

The only devices I am using in the project which run off PCLK2 are UART1,6.

So it will be easy to remove all code related to these. Then run the code and check if ETH still does not work.

You could read RCC_APB2ENR to check it's all cleared (perhaps except SYSCFGEN which should be harmless). Check also the EXTI registers, if there's any interrupt enabled there.

JW

peter-h · « **Reply #35 on:** May 21, 2021, 08:57:27 am »

Can these be read in debugging / single stepping?

wek · « **Reply #36 on:** May 21, 2021, 11:02:52 am »

Quote

Can these be read in debugging / single stepping?

Yes, in the same way as you do with all other peripherals' registers. Just don't forget that using the debugger may be intrusive.

Or use your favourite method of choice - output to SWO, print to UART, whatever works for you.

JW

peter-h · « **Reply #37 on:** May 21, 2021, 01:53:37 pm »

OK I have some data here:

EXTI is all zeroes

Here is RCC, with DIV4 on PCLK2

and here is RCC with DIV16

APB2ENR is

It gets better. This is 0-26MHz span and you can see no subharmonics

I also checked the 25MHz with a delay timebase scope, magnification about 1000:1, and triggering from the input (not from the delayed segment) and there is no jitter. So all this stuff about jitter is definitely nonsense, in this case. Also looking at the frequency division config (posted earlier) one can see it goes direct.

ajb · « **Reply #38 on:** May 21, 2021, 04:38:32 pm »

Quote from: peter-h on May 20, 2021, 06:12:46 pm

How this hangs together, I don't know. It either hooks up the 1kHz tick (and there are two of those, one from SysTick and one from TIM6, and if you ask why there are two 1Khz ticks, yes it seems dumb, but search for FreeRTOS interrupt priority issues with Systick; it seems a well known issue and this is the approach they recommend) or there is some timer associated with the ETH hardware (like ETH has its own DMA controller). Or it is using one of the 32F4 timers in which case we may have found something, but why isn't that all over google?

The ethernet block has its own timer but it's only used for timestamping and is clocked from HCLK, so the various IP stack intervals must be controlled from another time base. The struct you shared earlier showing the reference to etharp_tmr is just the list of handlers that LWIP provides, it doesn't tell you where those handlers are actually invoked. Easiest way to find that within all of the layers of Cube may be to throw a break point in etharp_tmr or one of the other LWIP _tmr functions and then skim up the call stack until you get out of LWIP and into Cube. Then start looking for a reference to one of the other timers and see what's being used and if it's getting configured right. Honestly throwing breakpoints into a few strategic places and running with the two different clock div settings could tell you a lot very quickly--seeing what is and isn't getting hit in either case will give you some hints about what the underlying problem is, and seeing what shows up above the breakpoint in the call stack can quickly clarify how the application is structured, since it's basically a cross section of the layers of abstraction provided by Cube/LWIP.

There are also some debug macros defined in LWIP with flags to enable/disable them on a per-protocol basis, these are usually given in lwipopts.h (which needs to be provided by the application to configure LWIP), not sure where Cube puts that file, but look for symbols like TCPIP_DEBUG, NETIF_DEBUG and see where they're defined. You may need to provide a definition for the LWIP_PLATFORM_DIAG macro if Cube doesn't provide something suitable, as that's what LWIP uses to invoke platform-specific diagnostic output.

peter-h · « **Reply #39 on:** May 21, 2021, 05:02:28 pm »

Thank you.

We do have a problem with making DHCP work on different systems, and the main suspect is timeouts being too tight. One can make it work, or not, by commenting out some RTOS tasks, although currently I have it running DHCP fine on one system, with all tasks enabled, by having made them better behaved (putting osDelay(1) in various places where it doesn't matter, to yield to the RTOS).

So it is possible that this whole DIV16 business is another aspect of the same thing. Interesting however that somebody else found the same thing, as posted at the start.

Silenos · « **Reply #40 on:** May 21, 2021, 09:28:12 pm »

You could actually post where the screenshots differed (CFGR register), it took me way to much time to spot it...
You know, you spend days on writing over forums about seemingly random, "suspect" issues around clocking and didn't even bother to read RM section about rcc?
IIRC on F4 the clock sources, PLL, routing and divs options should be in CR, CFGR, PLLCFGR registers (didn't check it now); have their values compared with the numbers you should calculate from RM and see if they meet your desires. The xxxENR registers are for "enabling"/"connecting" the clock to the cores hooked to given bus.

Then read some of the eth core, or search over RM/DS, find if the ETH requires some specific clocking scheme, dont rely on cube here.
Then if your schematic is ok, you measure good clock, the potential clock issues should be excluded by now. I personally had module with phy/crystal/magnetics dangling on random 15 cm wires with RMII from F4 discovery board. Shameful as it was, yet it surprisingly worked ok.

Another thing, that ST eth code afaik is still littered with bugs, I still remember how i stumbled upon plain mistake in the their init code long time before HAL came.
Another thing, that STM eth cores use to have some hardware bugs here and there too, which interfere with the cube code. Have you checked errata?
On "ST community" site there is/was a guy, iirc named "Piranha" who has made eth module work perfectly, with all the bugs handled, though afaik he didn't share the code. Yet he was helpul with insights.

Quote from: peter-h on May 21, 2021, 05:02:28 pm

One can make it work, or not, by commenting out some RTOS tasks, although currently I have it running DHCP fine on one system, with all tasks enabled, by having made them better behaved (putting osDelay(1) in various places where it doesn't matter, to yield to the RTOS).

This code will fail you in the moment of truth...

Anwway, looks like you have piled non-trivial ST driver over non-trivial eth core over non-trivial 3rd party stack over non-trivial (preempting?

) rtos and it doesn't "just work"... I would drop the rtos, maybe even the stack (though I think the stack in the most trusty piece of code here) and check if driver alone misbehave, what it outputs, or are there any error flags in eth registers.
And clock the bus for max f, for now at least, to achieve eth flawless operation. Eg. iirc the USB cores on STM have the ahb/core interface actually exposed in papers and registers, and you can see it already requires special handling with lowered bus speed. Wouldn't be surprised if eth core also was vulnerable to low f.

bson · « **Reply #41 on:** May 21, 2021, 11:31:03 pm »

1kHz tick? In a real-time OS? I didn't think kernels used heartbeat style ticks since... the 1980s.

The ARP cache doesn't need a timer. It can simply detect on lookup that an entry has expired, and fire off a re-arp. (And, if it's not overly stale, use the old value without waiting for the response. When it gets the response, if it gets one, it updates the cache. This way nobody blocks on the ARP cache during normal operation, it gets refreshed, there are no timers and stuffs, and everything is happy. And if it doesn't get response, it can eventually start failing lookups altogether - nobody ever blocks.)

amyk · « **Reply #42 on:** May 22, 2021, 02:55:59 am »

I've been working with microcontrollers for a long time and the amount of complexity in the larger 32-bit MCUs still amazes me. Thousands of pages of documentation, for something that can fit on the tip of a finger...

Siwastaja · « **Reply #43 on:** May 22, 2021, 07:18:41 am »

Quote from: amyk on May 22, 2021, 02:55:59 am

I've been working with microcontrollers for a long time and the amount of complexity in the larger 32-bit MCUs still amazes me. Thousands of pages of documentation, for something that can fit on the tip of a finger...

But you usually use maybe some 5-30% of it, depending on project, rest is "dead silicon" powered down, and obviously you don't need to even look at those parts of the manual.

Same inside any complex peripheral, say a CAN peripheral is 100 pages of that 4000 pages yet you completely skip 50 pages of TTCAN feature description when you don't use it.

wek · « **Reply #44 on:** May 22, 2021, 09:01:29 am »

In RCC_APB2ENR, you still have ADC1 and ADC2 enabled.

Remove those two, too - mainly interrupts or any reading/writing to them - and try. Still no ETH?

Try to run it without the debugger connected (in case the debugger would want to access the APB2 domain regularly, for any reason, e.g. trying to display ADC results). Still no ETH?

JW

peter-h · « **Reply #45 on:** May 22, 2021, 11:25:31 am »

I have found that the ADC interrupt is enabled by the macro

__HAL_RCC_ADC1_CLK_ENABLE();

which is ... jesus!

do { \
volatile uint32_t tmpreg = 0x00U; \
((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x3800UL))->APB2ENR) |= ((0x1UL << (8U))));\
/* Delay after an RCC peripheral clock enabling */ \
tmpreg = ((((RCC_TypeDef *) ((0x40000000UL + 0x00020000UL) + 0x3800UL))->APB2ENR) & ((0x1UL << (8U))));\
(void)tmpreg; \
} while(0U)

but, hang on, APB2ENR doesn't enable interrupts. It enables the clocks, which are required. These peripherals are not being accessed by any software.

Of course RCC_CFGR will change between DIV4 and DIV16 on PCLK2 - that is where these are configured. Nothing else has changed.

Siwastaja · « **Reply #46 on:** May 22, 2021, 11:47:54 am »

Quote from: peter-h on May 22, 2021, 11:25:31 am

I have found that the ADC interrupt is enabled by the macro

__HAL_RCC_ADC1_CLK_ENABLE();

Why in the world do you expect that? Neither the macro name nor the implementation says anything about interrupts.

Quote

but, hang on, APB2ENR doesn't enable interrupts.

Of course not.

I think you should start from the very basics; read the reference manual. You need to know what RCC is and what it does, otherwise there is no chance of success.

And why do you think ADC1&ADC2 peripheral clock enables are "required"? Just turn them off and don't use them.

I also have said this numerous times and I know it's not a completely popular opinion but Cube and libraries are preventing you from understanding what's happening and debugging the system.

peter-h · « **Reply #47 on:** May 22, 2021, 12:05:01 pm »

Th ADC clock enables are a part of the ADC init function. I can not call that function and see if it makes ETH work with DIV16. I will do that on Monday.

I do read the ref manual; that's how I found what RCC does

The interrupts were mentioned further above, and not by me.

Yes, the ST ETH code has various bugs but a colleague has spent a year or so, part-time, trawling the internet and fixing these

There are still issues with it e.g. DHCP works if the LAN is 192.168.1.* but not if the LAN is 192.168.3.*. It is as if the 192.168.1.* was hard-coded somewhere...

I am just very resource-limited, which is why I am asking for advice. Nothing else I can do.

peter-h · « **Reply #48 on:** May 28, 2021, 06:40:23 pm »

Turns out I was right about a bit of the circuit using PCLK2, but not everybody will see the problem. This was posted on the ST forum, after some time:

thank you for pointing to this issue.
The problem seems to be in RMII/MII selection which is done in SYSCFG peripheral and clocked from APB2 clocks. During initialization, the RMII/MII switch is configured and then reset of ETH MAC is performed. If the APB2 clocks are much slower than AHB, it might happen that the reset is performed before RMII/MII is switched properly. To ensure proper functionality, dummy read from SYSCFG needs to be added.

In the HAL driver (function HAL_ETH_Init in stm32f4xx_hal_eth.c) there can be following workaround:

/* Select MII or RMII Mode*/
SYSCFG->PMC &= ~(SYSCFG_PMC_MII_RMII_SEL);
SYSCFG->PMC |= (uint32_t)heth->Init.MediaInterface;
(void)SYSCFG->PMC; // <---- Workaround: Dummy read to sync SYSCFG with ETH
/* Ethernet Software reset */
/* Set the SWR bit: resets all MAC subsystem internal registers and logic */
/* After reset all the registers holds their respective reset values */
(heth->Instance)->DMABMR |= ETH_DMABMR_SR;

I'm not 100% sure if "(void)SYSCFG->PMC;" works in all compilers, but it worked in GCC as dummy read.

This should fix the issue. I also reported this internally so it should be fixed in future releases.
Could you please check if this workaround fixes your issue?

I am away from the "lab" currently so can't test it.

wek · « **Reply #49 on:** June 04, 2021, 08:51:19 am »

Quote

Turns out I was right about a bit of the circuit using PCLK2, but not everybody will see the problem.

Oh. This is a gotcha indeed. Thanks for sharing.

Quote

This was posted on the ST forum

Can you please post a link to this?

Please let us know of your findings when you get to testing it.

Thanks,

JW


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32F417 - any reason why a min PCLK2 speed is required for ethernet to work? (Read 6295 times)

Share me