Author Topic: 32F417 - peripheral settling time after clock speed configuration  (Read 2252 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
I have previously posted on this, reporting strange issues in the SPI config department. It seems that the UARTs have the same problem.

With a "normal old" UART, you load in the single divisor and that's it. It starts to decrement and when it gets to zero it toggles a flipflop which generates the baud rate clock. There may also be a prescaler... The UART is usable immediately, probably because the clock will be right immediately after the first edge of it comes out.

With the 32F4xx UARTs you get a complex mantissa + fractional (12 and 4 bits) setup.

There is no prescaler, which limits the baud rates to the higher ones (generally 1200+ see here https://www.eevblog.com/forum/microcontrollers/32f417-any-way-to-get-baud-rates-below-1200/msg3508804/#msg3508804) but you have degree of control by setting the PCLK divider (DIV2 to DIV16) although that affects loads of other stuff like timers.

I am getting some funny results which suggest that after setting up the baud rate you have to wait for a surprisingly long time. It is not obviously baud rate related though - at least not monotonically. It looks like it may need hundreds of microseconds, before data can be loaded into the UART and before interrupts work.

Have they got some sort of DPLL running there? How else would they implement a mantissa+fractional system? You cannot multiply a clock, or divide a clock by a non integer value, unless you have a source of extra edges.

EDIT: I have done some measurements at low baud rates to make it easier and the required delay, after the serial port baud rate etc config, is > 1 character period. If one doesn't wait for this time, everything still works, but the UART doesn't start shifting out the data loaded into it until 1 char period has elapsed since initialisation. And at high baud rates, say 115k and above, the thing appears to just lock up if you load the data in too early. Never seen this before. And I can see few people would have come across this, since in typical applications you init the port and then mess about doing other stuff, and/or the delay would not be noticed anyway.
« Last Edit: March 14, 2021, 03:02:21 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1847
  • Country: se
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #1 on: March 14, 2021, 03:02:53 pm »
There is no prescaler, which limits the baud rates to the higher ones
[...]
I am getting some funny results which suggest that after setting up the baud rate you have to wait for a surprisingly long time. It is not obviously baud rate related though - at least not monotonically. It looks like it may need hundreds of microseconds, before data can be loaded into the UART and before interrupts work.

Have they got some sort of DPLL running there? How else would they implement a mantissa+fractional system? You cannot multiply a clock, or divide a clock by a non integer value, unless you have a source of extra edges.
Yes, as analysed in the other thread, low bit rates are quite limited.

How are you loading the new bit rate?
The internal counters are immediately reset to the new value, and the RM advise not to do that during transmission (it will bork any data which is being shifted out or received in that moment).
Remember also that when enabling the UART for transmission  (TE bit) an idle data frame will be sent, see the RM, 30.3.2, very last paragraph "Idle characters".
Might this be the cause of the delays you are seeing?

Last point: There are ways to implement fractional dividers with no "extra edges", look up the Bresenham algorithm (the same used for line drawing).
Yes, it's quite jittery, but consider that:
  • The actual receive/transmit bit rate is always an integer divisor of the clock frequency - look at the formulas in 30.3.4 Fractional baud rate generation - bits will be shifted in/out with no jitter.
  • This is probably not that important for this kind of oversampling.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #2 on: March 14, 2021, 03:53:19 pm »
Amazing... yes the idle data frame explains it fully. Thank you!

I thought they perhaps used fractional-N for the frequency synthesis, on which the Marconi patents should have expired in recent years. Interesting about Bresenham; I implemented that in the 1980s in Z80 assembler when writing a graphics library. Bresenham for lines and Horn for arcs.
« Last Edit: March 14, 2021, 05:01:15 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #3 on: March 16, 2021, 07:18:59 pm »
I've found something else that is unusual about this UART:

When you load a byte into the transmit holding register, the propagation delay to it starting to shift out is anything from about 1us, to 1 bit time of the selected baud rate.

So if you are doing RS485, you enable the driver, then put the byte into the UART, then wait for the UART to empty out, then drop the driver.

Watching this on a scope, there is obviously jitter due to this.

All other UARTs I have played with have jitter (of course) but they do the internal operations at the master clock speed - typically some MHz - so the jitter is negligible.

Of course it still works...

On a related topic, there is weird stuff going on with the TXE and TC bits. You would think that when both go to 1, the UART is empty and you can drop the driver. Well, TXE works as you would expect but TC doesn't. Not sure why; no way to single step through it.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1847
  • Country: se
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #4 on: March 17, 2021, 08:32:37 am »
When you load a byte into the transmit holding register, the propagation delay to it starting to shift out is anything from about 1us, to 1 bit time of the selected baud rate.
[...]
On a related topic, there is weird stuff going on with the TXE and TC bits. You would think that when both go to 1, the UART is empty and you can drop the driver. Well, TXE works as you would expect but TC doesn't. Not sure why; no way to single step through it.
If you look at the block diagram of the USART, fig 296 in the RM, you can see that the transmit control block, and hence the shift register, is clocked by the bit rate.
Note that the bit rate counters are running continuously, from descriptions scattered in several other points.
So, very well hidden information, but there nonetheless.

I see no errata on the TC/TXE flags behaviour, how does it differ from what is shown in Figure 299?
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #5 on: March 17, 2021, 10:25:20 am »
In practice TC=1 seems to happen more frequently. Also the diagram doesn't mention if TC is ever cleared by hardware, which certainly seems to be the case.



What I have been trying to do is what I have done with various UARTs over many years: write an "output queue count" function which returns not just the #bytes in the circular buffer but also adds in the #bytes in the UART itself. And if this is done right, when the returned value falls to zero, you can disable the RS485 driver.

Of course this can also be done by an interrupt from TC, in the 32F4 case, but a lot of UARTs can't do that. So the "opqcount" function

- gets the buffer size
- adds 1 if there is a byte in the TX holding reg
- adds 1 if there is a byte in the TX shift reg

What I am finding is that the first works fine (obviously!), the 2nd works (TXE) and the 3rd doesn't (TC) because TC almost always returns 0. Well, it returns zero because we interrupt off it and the ISR clears TC to prevent the interrupt from reoccuring which it seems to even if that interrupt source is disabled.

I imagine that if the UART was entirely polled, TC will work correctly.

So we are creating a "proxy TC" flag which is set by loading a byte into the UART and reset by the TC=1 interrupt. Polling this flag, together with polling TXE, should enable the determination of whether there are 2,1, or 0 bytes in the UART.

« Last Edit: March 17, 2021, 10:42:09 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: us
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #6 on: March 17, 2021, 07:19:11 pm »
I'm not sure what you're trying to achieve with the 'output queue count' or the 'proxy TC' thing.  Surely your code that handles transmission can simply keep track of how many bytes remain to be relayed into the USART for transmission, then once the USART has been given the last byte, wait for the TC flag to be set and then disable the driver?  That's kind of the whole point of TC, to know when the actual transmission is complete, versus TXE, which indicates that the USART is ready to accept another byte for transmission, which due to double buffering or FIFOs will usually occur before the previous byte has been fully transmitted on the wire.  If you were doing interrupt-based sends you would enable the TXE IRQ initially, load the next byte each time that fires, then once the last byte has been given to the USART you would disable TXE and enable the TC IRQ, and then once the TC IRQ goes off you disable the driver (or do whatever else) and stop transmission.  DMA would be similar, with transfers triggered by TXE, then once the DMA transaction is complete you'd enable the TC IRQ.

Quote
What I am finding is that the first works fine (obviously!), the 2nd works (TXE) and the 3rd doesn't (TC) because TC almost always returns 0. Well, it returns zero because we interrupt off it and the ISR clears TC to prevent the interrupt from reoccuring which it seems to even if that interrupt source is disabled.

This problem--where the ISR is re-entered immediately after the ISR disables the TC IRQ--I recall having run into, and I suspect it has something to do with the way that changes to the interrupt mask bits are handled versus the flag bits, such that by the time the write to clear TCIE propagates through the control block logic to clear the USART IRQ line the ISR has already exited and the NVIC has already pended another TC IRQ. 

But in general you need to clear status bits like that anyway, because if the hardware doesn't clear them you'll immediately get an interrupt when you re-enable that source later on.  TXC does actually get cleared on the 417 by reading the status register and then writing to the data register, but if you skip that sequence for some reason then you need to clear it manually.  Plus logically the TXC flag represents an event rather than a static condition, so in most applications once that event--the completion of transmission--is handled you're done and the system is in a different state, generally waiting for a response or sending a different message, so it doesn't make sense to retain the flag.
« Last Edit: March 17, 2021, 07:32:30 pm by ajb »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #7 on: March 18, 2021, 12:17:16 pm »
The reason for having an accurate "output queue size" value is that it enables more precise character spacing. For example Modbus requires a minimum of 3.5 character periods between packets, and less than this within a packet.

It was solved by having a "UART not empty" flag which is set by putting a byte into the UART and is reset by the TC interrupt. Then the "output queue size" function tests that flag in conjunction with checking the TXE bit and determines whether there is 1 byte in there or 2 bytes. If TXE=0 there are two, if TXE=1 there is just one. This works correctly at the end of a packet which is the important place, but at the start of a packet the count goes 0 2 3 4... because the 1 lasts just one bit period. At a few hundred kbps or more I do see 0 1 2 3 4... :)
« Last Edit: March 18, 2021, 01:35:23 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: us
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #8 on: March 18, 2021, 05:13:10 pm »
I still don't see how your more complicated method is better than just using the TC flag/interrupt.  TC already tells you the transmission is finished, and it already takes into account whatever characters remain in the shift register (and FIFO if the device has one).  So if you need to control interpacket timing (or set timeouts or whatever) then just waiting for the TC IRQ (or polling for the flag) after you load the last byte into the UART is way easier and does precisely what you need. 

I have RS485-based protocols with specified turnaround timing, mark/break framing, and timeout requirements at a few hundred kbaud, and it's perfectly straightforward to use the TXC/RXC interrupts in the UART to start timing those things when a complete message has been sent or received.  I'm using a hardware timer in conjunction with the UART for this which makes the protocol implementation a little more complicated between the two peripheral ISRs, but the transition from finishing transmit to starting the timer is purely based on the TXC IRQ and it all works perfectly.
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 486
  • Country: de
  • ee - digital & analog
    • My services:
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #9 on: March 18, 2021, 07:41:16 pm »
Quote from: ajb on Today at 18:13:10
I still don't see how your more complicated method is better than just using the TC flag/interrupt.

...
My thoughts exactly. Just one remark regarding interrupt latency on the STM32F4: The overhead for entering and exiting an interrupt is way higher than, say on an AVR or PIC. With the exception of the FPU, the complete context gets stored and eventually restored. Interrupt chaining and other mechanisms mitigate that penalty in the case of overlapping interrupt requests. The F4 interrupt overhead will be well below 1µs.
Comparing this to the TO's baudrates, I see no reason not to go with a simple solution.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #10 on: March 18, 2021, 08:56:49 pm »
Do you mean the 32F4 is fast on interrupts, or slow? Well below 1us is pretty fast.

Looking at some waveforms on a scope where interrupts would delay things and cause jitter, I see no more than a few us, and that is with serial comms, system tick (1kHz) and an RTOS all running.

What is the significance of the FPU comment? Does it mean that you can't have multiple RTOS tasks using floats? Admittedly most of them take just 1 clock anyway, but division takes 16 (for a float; a double will be a lot longer).
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 486
  • Country: de
  • ee - digital & analog
    • My services:
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #11 on: March 19, 2021, 08:17:17 am »
Quote from: peter-h on Yesterday at 21:56:49
Do you mean the 32F4 is fast on interrupts, or slow? Well below 1us is pretty fast.
If you are used to programming 8bitters in assembly, the STM32's interrupt handling is pretty comfy, but overhead is huge. An AVR running on 20MHz will show similar interrupt latencies. To get the best performance out of an STM32, one has to avoid interrupts and use hardware peripherals, often clusters of peripherals. The catch is that the F4 alone comes with 10000+ pages of original docu. No CubeMX/HAL can replace reading and knowing your way around that heap of info. Since some functions are limited to certain pins, the firmware concept must be there at schematic design time.

If a designer can't afford the time to actually get to know his platforms, he can use more general approaches. This is why I suggested using interrupts, knowing that the 168MHz number crunching CPU will be fast enough to get the job done, although that isn't its specialty.

Quote from: peter-h on Yesterday at 21:56:49
Looking at some waveforms on a scope where interrupts would delay things and cause jitter, I see no more than a few us, and that is with serial comms, system tick (1kHz) and an RTOS all running.
So, that sounds like you wouldn't have any trouble using a simple ISR approach?!

Quote from: peter-h on Yesterday at 21:56:49
What is the significance of the FPU comment? Does it mean that you can't have multiple RTOS tasks using floats? Admittedly most of them take just 1 clock anyway, but division takes 16 (for a float; a double will be a lot longer).
That would be a hoot, wouldn't? If the FPU is active, context changes simply require some CPU housekeeping, leading to higher interrupt penalties. CubeMX/HAL/freeRTOS should handle this adequately.
« Last Edit: March 19, 2021, 08:19:06 am by harerod »
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #12 on: March 19, 2021, 09:15:14 am »
OK; understood and thank you always for your input.

When I did the PCB design I used their eval kit as a starting point (doesn't everyone?) and then read every word of the hardware manual, which shows what can come out on which pins, etc. The Issue A PCB, which has tons of analog and digital, SPI, USB, ethernet, etc, worked first time, with no mistakes. Well, almost none, given the crappy documentation by ST. A colleague is doing the "system" software on this one, all in C, and after he spent months fixing bugs in the ST drivers (a well known topic) we are making good progress.

Yes, the UART issue was solved with interrupts.

I continue to be seriously impressed with the speed of this thing. I wrote a load of code (in Cube) to parse NMEA-type (some nonstandard binary) sentences from a GPS, which is definitely sub-optimal (e.g. I am searching for multiple 8 byte substrings concurrently, by comparing each one against a sliding buffer) and the overall processing rate, including a load of sscanfs, floats, doubles, uint32, uint64, etc, is just under a megabyte per second. That is about 600x faster than the GPS (a u-blox module) is generating the data.

The fastest I programmed before, not counting some 80x86 asm PC stuff, was probably a Z280 (a now-obscure but interesting and fast chip; according to Zilog I was the 1st bulk user in Europe) running at 24MHz and this is easily 10x faster, or 100x to 1000x on floats. Since then I have been on the Z180 and H8/3xx stuff which is good enough for most industrial stuff but is slow; bit-banging an I2C port for an ADS7828 took 600us (yes about 30x the ADC's conversion rate!).

« Last Edit: March 19, 2021, 09:16:53 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 486
  • Country: de
  • ee - digital & analog
    • My services:
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #13 on: March 19, 2021, 12:41:18 pm »
You're welcome. There is no bypassing the docu. For STM32F4 hardware/firmware design, the minimum would be the datasheet, the officially admitted errata sheet, RM0090, PM0214. When necessary the dedicated peripheral appnotes. And the countless silent outcries of suffering developer's souls on the interwebs.


I sense some generation gap here. The fanciest thing that I did on a Z80 might have been hacking Super Robin Hood on a Speccy for infinite health. While you were taming that Z280 beast, I was probably optimizing assembly loops for ADC data acquisition with my Atari ST's 68k, for a digital storage oscilloscope. All this, while yearning for a shot at Roger Wilson's revolutionary Acorn Risc Machine. Took another 20 years and half a dozen other architectures and changes, until in 2008 I got my first chance to work on an STM32F103. Coming from first generation AVRs, that modern ARM/NVIC-bolted-to-some-peripherals-architecture required a different mindset. Took me about a year to get good enough. Today might be faster, since better documentation and better IDEs have appeared. Might save a week, or so. ;)
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4619
  • Country: gb
  • Doing electronics since the 1960s...
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #14 on: March 19, 2021, 05:06:34 pm »
We are using the Cube code generator to generate the skeletons for the various bits of code e.g. for making the DAC work, and then throwing away the garbage which has no place in an embedded system e.g. callbacks for what can only be a hardware failure in the silicon and which has no practical way to report the error condition.

It does seem to save a lot of time to work that way.

They seem to have invented this dumb "HAL" concept. HAL stands for hardware abstraction layer but there is no such thing here. The generated code goes straight to the chip registers etc. So basically they have added a HAL_ to everything :)

Yes of course one has to read the 2000 page software ref manual for the actual things one is programming.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 486
  • Country: de
  • ee - digital & analog
    • My services:
Re: 32F417 - peripheral settling time after clock speed configuration
« Reply #15 on: March 19, 2021, 06:34:03 pm »
I might have mentioned this before: I have only started to use CubeMX early last year. Reason was a pretty interesting project using freeRTOS, lwIP, USB, serial, I2C, SPI, you name it. I needed something presentable quickly (sensor unit for a PCR tester), so I bet on CubeMX being stable enough for that. I won that bet, but lost time fixing weird library issues.

First thing I did, was download a CubeMX version and freeze updates. That ensures stable behavior on all threes sites that are working on that project. So we stay are bug-compatible. ;)


(Sidenote: ST has the nasty habit of renaming library interfaces every now and then. What they call LL_ used to be the third (AFAIR) iteration of their STM32 interface definitions libraries.) I have written my own drivers based on those libs or based on datasheets and direct register accesses. So I have a pretty good idea what the HAL-code should be doing.

When I write drivers and have the slightest doubt about ST's lib, I open the SFR view during testruns, and single step (C or even machine instruction) through the offending piece of code. If it requires fixing, it gets fixed. (you will know this already - the DBGMCU-related block defines peripheral behavior during breakpoints.)
For most peripherals I don't use HAL, but LL definitions only. The UART-HAL would be a prime example for a broken piece of code.

At this point I can only assume that HAL is a homage to Clarke/Kubrick's creation of the same name, cruelly mocking any innocent newcomers.


My previous post made me think about stuff one could do with old MCUs. Maybe you would like to share a story here:https://www.eevblog.com/forum/projects/what-was-your-slowest-microcontroller-design/
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf