Author Topic: possible to synch digital out puts so they turn on or off at the same time?  (Read 3942 times)

0 Members and 1 Guest are viewing this topic.

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
Is it possible to sync the outputs of the stm32 so you can sync or delay the outputs so they turn on at one time? Or is my best bet to use I/O on one port and control each port by it's 32 bit (or is it 16) register and deal with the delay from one port to another?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11261
  • Country: us
    • Personal site
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #1 on: September 27, 2023, 08:11:54 pm »
There is no way to synchronize them. The best you can do is minimize the delay by carefully optimizing the write code.
Alex
 

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 392
  • Country: be
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #2 on: September 27, 2023, 08:20:59 pm »
Use ODR to output a data pattern.
Use BRR to set selective pins to zeros.
Use BSRR to set selective pins to ones.

Is it what you are after?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11261
  • Country: us
    • Personal site
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #3 on: September 27, 2023, 08:23:38 pm »
No, he needs to synchronize multiple different ports (toggle 32 pins at the same time, for example).
Alex
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3036
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #4 on: September 27, 2023, 08:30:42 pm »
Would DMA be a possible solution?
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11261
  • Country: us
    • Personal site
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #5 on: September 27, 2023, 08:32:26 pm »
It would still have a delay. Two consecutive store instructions in the code would be faster than DMA.

No matter what, you will be accessing two different peripherals. It is not possible to do at the same time.
Alex
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19515
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #6 on: September 27, 2023, 08:54:12 pm »
Is it possible to sync the outputs of the stm32 so you can sync or delay the outputs so they turn on at one time? Or is my best bet to use I/O on one port and control each port by it's 32 bit (or is it 16) register and deal with the delay from one port to another?

Replace adjectives with numbers, and the question might be answerable.

Are you thinking in terms of ms, µs, ns, ps, fs?

If you switch them all "at the same time", how much ground bounce will there be in your system?

Alternatively, design your system so that it is tolerant of different outputs changing at "different times".
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #7 on: September 27, 2023, 09:05:07 pm »
Yeah, what do you define as "sync"?

If it's within say 1 µs, then it should be possible with care and disabling interrupts. Depending on the MCU, system clock and peripheral clock, you may manage to get down to 100 ns or less.
But otherwise, no go.

Just use the same port if that needs to be done. Still define what "sync" means to you, as there will still be a small delay between I/O of the same port, even if it looks tiny.

Alternatively, if you absolutely have to use I/Os that are on separate ports, you could use an external latch IC and control the output change with an additional GPIO as clock.
 

Offline ddrown

  • Newbie
  • Posts: 5
  • Country: us
    • blog
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #8 on: September 28, 2023, 02:15:05 pm »
Would a timer's output compare fit your needs? That would give you very precise control on the delay between two outputs
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5912
  • Country: es
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #9 on: September 28, 2023, 02:21:02 pm »
I'd say to simply avoid brainstorming until OP properly anwers to the first messages... So many threads end in smoke!
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 
The following users thanked this post: edavid, tooki, eutectique

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #10 on: September 29, 2023, 08:05:43 pm »
No, he needs to synchronize multiple different ports (toggle 32 pins at the same time, for example).

Correct, but it sounds like delays are gonna happen since you can only access one register/port at a time.
 

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #11 on: September 29, 2023, 08:06:42 pm »
Would DMA be a possible solution?

Hmmm...I have never used DMA before...so I would need to look into that.
 

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #12 on: September 29, 2023, 08:10:01 pm »
Yeah, what do you define as "sync"?

If it's within say 1 µs, then it should be possible with care and disabling interrupts. Depending on the MCU, system clock and peripheral clock, you may manage to get down to 100 ns or less.
But otherwise, no go.

Just use the same port if that needs to be done. Still define what "sync" means to you, as there will still be a small delay between I/O of the same port, even if it looks tiny.

Alternatively, if you absolutely have to use I/Os that are on separate ports, you could use an external latch IC and control the output change with an additional GPIO as clock.

Are you taking about shift registers? If so, I have done it with 8 bit shift registers and cascaded them, but they can be painful code wise (especially when you have to onboard people). I'd rather just code it from an IC than deal with shift registers.  Also, when things go wrong, debugging can be painful.
« Last Edit: September 29, 2023, 08:12:48 pm by Gibson486 »
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11261
  • Country: us
    • Personal site
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #13 on: September 29, 2023, 08:10:24 pm »
DMA is using the same bus. Except in case of DMA you would need to use linked lists, and there would be an additional delay. Two consecutive store instructions is the fastest way of doing it. The delay would equal to one bus clock cycle.
Alex
 

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #14 on: September 29, 2023, 08:17:28 pm »
Is it possible to sync the outputs of the stm32 so you can sync or delay the outputs so they turn on at one time? Or is my best bet to use I/O on one port and control each port by it's 32 bit (or is it 16) register and deal with the delay from one port to another?

Replace adjectives with numbers, and the question might be answerable.

Are you thinking in terms of ms, µs, ns, ps, fs?

If you switch them all "at the same time", how much ground bounce will there be in your system?

Alternatively, design your system so that it is tolerant of different outputs changing at "different times".

I was actually thinking at the same time. I was wondering if there was something that can latch the output until another register bit is set. Otherwise, as fast as possible would be nice (ns time). I was thinking of maybe using an IO expander type deal, but none have an input that allows you to delay the output update until you set a pin, or atleast I have not found one.
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6722
  • Country: nl
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #15 on: September 29, 2023, 08:24:29 pm »
You would only use an IO expander if you also want to do serial to parallel at the same time.

Otherwise you could simply use D-type latch ICs from your favourite logic family.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #16 on: September 29, 2023, 08:29:47 pm »
Yeah, what do you define as "sync"?

If it's within say 1 µs, then it should be possible with care and disabling interrupts. Depending on the MCU, system clock and peripheral clock, you may manage to get down to 100 ns or less.
But otherwise, no go.

Just use the same port if that needs to be done. Still define what "sync" means to you, as there will still be a small delay between I/O of the same port, even if it looks tiny.

Alternatively, if you absolutely have to use I/Os that are on separate ports, you could use an external latch IC and control the output change with an additional GPIO as clock.

Are you taking about shift registers? If so, I have done it with 8 bit shift registers and cascaded them, but they can be painful code wise (especially when you have to onboard people). I'd rather just code it from an IC than deal with shift registers.  Also, when things go wrong, debugging can be painful.

No no, just a simple parallel register. N inputs, N outputs. 1 clock to latch the inputs to the outputs, so the outputs would all change in sync on the clock pulse.
From a programming POV, that wouldn't change anything except that an additional GPIO (for the clock input of the register) would trigger the change reflected on the outputs. A shift register wouldn't get you that behavior.
A typical reference would be a 74HC574.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19515
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #17 on: September 29, 2023, 08:35:29 pm »
Is it possible to sync the outputs of the stm32 so you can sync or delay the outputs so they turn on at one time? Or is my best bet to use I/O on one port and control each port by it's 32 bit (or is it 16) register and deal with the delay from one port to another?

Replace adjectives with numbers, and the question might be answerable.

Are you thinking in terms of ms, µs, ns, ps, fs?

If you switch them all "at the same time", how much ground bounce will there be in your system?

Alternatively, design your system so that it is tolerant of different outputs changing at "different times".

I was actually thinking at the same time. I was wondering if there was something that can latch the output until another register bit is set. Otherwise, as fast as possible would be nice (ns time). I was thinking of maybe using an IO expander type deal, but none have an input that allows you to delay the output update until you set a pin, or atleast I have not found one.

The basic strategy would be to have two stages of registers. The first stage is setup at leisure using as many write operations as convenient. The output of those registers would be the input to the second stage.

Then the processor would use a single operation to clock all the first stage outputs into the second stage registers.

Variants could include
  • only having a single stage of registers, and using a single processor operation to enable the outputs
  • using an FPGA

Such partitioning of functionality between pure hardware and pure software is a standard part of system design.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5912
  • Country: es
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #18 on: September 29, 2023, 10:43:18 pm »
Still missing lots of details. How many different ports? Acceptable delay with numbers? STM32 speed?

Something like:

Code: [Select]
__attribute__((optimize("Ofast")))
void writePorts(uint32_t valA, uint32_t valB, uint32_t valC){
  GPIOA->ODR = valA;
  GPIOB->ODR = valB;
  GPIOC->ODR = valC;
}

Generates:
Code: [Select]
8000168: b430      push {r4, r5}
 800016a: 4b04      ldr r3, [pc, #16] ; (800017c <writePorts+0x14>)
 800016c: 4d04      ldr r5, [pc, #16] ; (8000180 <writePorts+0x18>)
 800016e: 4c05      ldr r4, [pc, #20] ; (8000184 <writePorts+0x1c>)
 8000170: 60e8      str r0, [r5, #12]
 8000172: 60e1      str r1, [r4, #12]
 8000174: 60da      str r2, [r3, #12]

Each port is updated with only 1 instruction from each other.
IIRC the STR instruction takes 2 cpu cycles in ARM, this would be ~40ns @ 50MHz.


Otherwise, use a latch.
You can daisy chain any number of 74HC595s, send a serial stream through SPI and sync the oututs with the latch clock (Pin 12).
« Last Edit: September 29, 2023, 10:49:30 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1641
  • Country: nl
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #19 on: September 29, 2023, 10:51:04 pm »
Several problems with that code:

STR in sequences is not guaranteed at all.

It will break on longer sequences because of register scheduling pressure (it can probably only find space for 5-6 values).

Nearly not all ARM chips have 2 cycles latency. On STM32H7, a store to GPIOs can take 10 clock cycles.

The best way IMO is to automate this procedure by hardware. A timer could work a handful of compare channels, but if its getting wider it gets increasingly more tricky to do on a MCU (especially at high frequencies)
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5912
  • Country: es
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #20 on: September 29, 2023, 10:59:10 pm »
It was just a simple example showing it can be done pretty close :-//

He's not said anything about the device so what gives?
What's the true required accuracy for sync? No idea.
Is it 2 ports, 4 ports, 6 ports? Who knows.
STM32F0? F1? G0? Same thing.

Maybe it has a clock signal so all this could be ignored at all by simply updating that port last.

I'm seriously tired of seeing threads filled with hypothetical devices and conditions.
Maybe, perhabs, what if, I wonder...

Last message: It was for updating some leds! Damn! :-DD
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 495
  • Country: sk
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #21 on: September 30, 2023, 05:33:33 am »
Using FSMC/FMC on a sufficiently large package could bring you closer to 50 simultaneously switching pins.

JW
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: possible to synch digital out puts so they turn on or off at the same time?
« Reply #22 on: September 30, 2023, 07:18:27 am »
The popular 74x595 shift register has a separately clocked output register.  You can shift in bits one at a time for essentially any number of cascaded bits, and then transfer all the bits to the output pins "at the same time."  Presumably it would be relatively easy to duplicate the logic using parallel loaded registers for the cpu-side.  As tggzzz described, I guess.  Essentially the same idea as a "master slave flip-flop."
It wouldn't be any faster than writing the bits to separate IO registers non-concurrently (I guess you could achieve equal speed, if the last data transfer automatically triggered the latch), but you could achieve simultaneous changes at the outputs.
 

Offline Gibson486Topic starter

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: us
The popular 74x595 shift register has a separately clocked output register.  You can shift in bits one at a time for essentially any number of cascaded bits, and then transfer all the bits to the output pins "at the same time."  Presumably it would be relatively easy to duplicate the logic using parallel loaded registers for the cpu-side.  As tggzzz described, I guess.  Essentially the same idea as a "master slave flip-flop."
It wouldn't be any faster than writing the bits to separate IO registers non-concurrently (I guess you could achieve equal speed, if the last data transfer automatically triggered the latch), but you could achieve simultaneous changes at the outputs.

That is what we do now. It works, but it is dreadfully painful to maintain. We have about 96 digital outs, so when issues happen, it becomes quite the adventure to debug.

To answer other questions, I did not specify which stm32, because it is fair game at this point. I have only used the F4 and to my knowledge, it does not have any special feature that allows this. Also, not sure why people think this is for LEDs.

Also, atleast for now, I am fine with ns timing, but obvioulsy same time would be the goal, but that does not seem possible with the MCU alone.
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5912
  • Country: es
Managing 96pins with 595s shouldn't be that much of an issue.
If you're fine using the STM32 ports, you can add transparent latches like the 74HC573. Update the ports and finally issue a pulse to LE (Latch Enable) so they all sync together.
F4s easily run at 100MHz+. Have you tried to measure the actual delay between ports? Might be low enough.

You're the one knowing the details, we don't. You're still mising basically everything related to the timings.
What's the target receiving this data? Is the STM32 a master or a slave?
The only scenario I can think of requiring extremely tight timings is a data bus externally clocked, so you can't delay it more than the timing specifications.
But if you're the master, it should be doable somehow by calibrating the GPIO latency.
« Last Edit: October 02, 2023, 02:54:34 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19515
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Also, atleast for now, I am fine with ns timing, but obvioulsy same time would be the goal, but that does not seem possible with the MCU alone.

Sigh.

Q: is X within walking distance?
A: yes, if you walk for long enough.

E. G.  "Full Tilt" is the book which launched the career of travel writer Dervla Murphy. Her first trip from home in Ireland was on push-bike. To India, via Romania, Turkey, Afghanistan.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 495
  • Country: sk
> it ['595s] is dreadfully painful to maintain

Why?

Using DMA and timers you can automate update.

Atomicity may be an issue, if you need that; and so is latency; but then the potential for miraculous solutions is also quite small.

JW
 

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1545
  • Country: au
Also, atleast for now, I am fine with ns timing,
What does ns timing even mean ?
1ns or 500ns ?

Also, atleast for now, I am fine with ns timing, but obvioulsy same time would be the goal, but that does not seem possible with the MCU alone.
I've not seen any MCU that expect 96 pin updates on the same clock edge. 32 at a time is your best block access.

That is what we do now. It works, but it is dreadfully painful to maintain. We have about 96 digital outs, so when issues happen, it becomes quite the adventure to debug.
The only real issue with serial+latch is update rates. It is the simplest, by far, way to have same-edge update of that many pins. It gives you same-edge update precision. (likely ~ 1ns)
Next step would be a FPGA or CPLD, or a number of focused small MCU as GPIO.

I was thinking of maybe using an IO expander type deal, but none have an input that allows you to delay the output update until you set a pin, or atleast I have not found one.
You could roll your own IO expander, from a small 32b MCU, or CPLD, that can update 32 pins at a time.
If that sleeps waiting on a trigger pin, it can update across multiple MCUs within a sysclk window, so maybe 20ns ballpark.
If you also send a sysclk, to all slaves, that update could shrink further.

Most LED drivers etc, actually take some pains to avoid pins all changing at the same time ! 

TI have serial IO expanders for LEDs that can Shift/Latch 16, 24 or 48 pins, but the 48 pin parts have a series of 3ns delays to avoid same-time pin change, they settle within ~21ns
 
The following users thanked this post: Gibson486

Offline SL4P

  • Super Contributor
  • ***
  • Posts: 2318
  • Country: au
  • There's more value if you figure it out yourself!
The popular 74x595 shift register has a separately clocked output register.  You can shift in bits one at a time for essentially any number of cascaded bits, and then transfer all the bits to the output pins "at the same time."
Exactly… I did this years ago with NINE daisychained 595s to realise 72 outputs driven with software PWM. No difficulty or challenges at all - other than maximum pin update speed.  For control applications or or dimming LEDS it works fine.

https://youtu.be/eewAOwAOzpw?si=ppdkKtlc2F8lPp4C
« Last Edit: October 03, 2023, 10:21:30 pm by SL4P »
Don't ask a question if you aren't willing to listen to the answer.
 

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 19515
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
...

Many correct points that have been made before - but which are worth repeating since the OP has failed to respond to them :(

"Numbers, not adjectives" is something i would put in my .sig, but I've run out of characters.
« Last Edit: October 04, 2023, 04:37:13 am by tggzzz »
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline DavidAlfa

  • Super Contributor
  • ***
  • Posts: 5912
  • Country: es
If you see his profile, he does this all the time, starts a topic, then never answers nor thanks anything.
Another blocked user! I'm seriously tired of these lazy people.
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
As an aside, when one uses some pins in a single GPIO port as a parallel output bus, I like to use the DR_TOGGLE to update the state.
If you have
    uint32_t  bus_lookup[BUS_MASK + 1];
to lookup the GPIO port bit pattern corresponding to the bus value (with BUS_MASK==(1<<BUS_BITS)-1),
    uint32_t  bus_toggle;
initialized to zero with the output bus pins low, you can update the output bus state to new_bits using
    bus_toggle ^= bus_lookup[new_bits & BUS_MASK];
    GPIOn_DR_TOGGLE = bus_toggle;

The "standard" way of using the lookup table is to use the set and clear instead, say
    GPIOn_DR_CLEAR = bus_lookup[(~new_bits) & BUS_MASK];
    GPIOn_DR_SET = bus_lookup[new_bits & BUS_MASK];
noting that sometimes you want to clear before set (e.g. keyboard matrix column states), and other times set before clear.

On many development boards the GPIO pins are not consecutive, so the lookup approach allows one to pick the pin order that best suits the circuit in use.  Teensy 4.x are a good example of this.

Wider buses do need to be split into multiple parts, but using DR_TOGGLE, i.e.
    bus_a_toggle ^= bus_a_lookup[new_bits & BUS_A_MASK];
    bus_b_toggle ^= bus_b_lookup[(new_bits >> BUS_A_BITS) & BUS_B_MASK];
    bus_c_toggle ^= bus_c_lookup[(new_bits >> (BUS_A_BITS + BUS_B_BITS)) & BUS_C_MASK];
    GPIOa_DR_TOGGLE = bus_a_toggle;
    GPIOb_DR_TOGGLE = bus_b_toggle;
    GPIOc_DR_TOGGLE = bus_c_toggle;
will give you pretty minimal differences in the sub-bus state changes, time-wise.
(Note that GPIOa, GPIOb, and GPIOc may refer to the same or different GPIO ports, as long as each bus pin is separate.)
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
Yes using the toggle register avoids having to read - modify - write.
As I remember, the PIC32 (MIPS based) also have CLEAR and SET registers, which allow setting or clearing given bits within a port in the same manner, without a read-modify-write and without having to store the state in a variable.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Quote
using the toggle register avoids having to read - modify - write.
It doesn't avoid the RMW, but it does avoid the need for additional atomicity protections.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Quote
using the toggle register avoids having to read - modify - write.
It doesn't avoid the RMW
It does avoid having the processor from having to do a read-modify-write on the data register, as
    GPIOn_DR_TOGGLE = mask;
compiles to a simple memory store, but performs the same pin state changes as
    GPIOn_DR ^= mask;
would, the latter doing a read-exclusive_or-write cycle.  Exactly how the GPIO peripheral does the various bit flips is its internal magic, so irrelevant here.

As I remember, the PIC32 (MIPS based) also have CLEAR and SET registers, which allow setting or clearing given bits within a port in the same manner, without a read-modify-write and without having to store the state in a variable.
Like I said, the standard approach is
    GPIOn_DR_CLEAR = bus_lookup[(~new_bits) & BUS_MASK];
    GPIOn_DR_SET = bus_lookup[new_bits & BUS_MASK];
when using such a parallel bus lookup array or arrays.

The clear and set approach does have the benefit that it explicitly sets the pin states –– and does not affect other pins in the same bank, assuming bus_lookup[] is correctly set ––, whereas the toggle approach will get confused if you change any of the bus bits/pins without updating the shadow mask value.  As these GPIO bank accesses tend to take quite a few clock cycles, using just one toggle instead of explicit sets and clears can give a significant speed boost in critical sections.

For example, Teensyduino for Teensy 4.x implements a digitalWriteFast() using an always inlined function (using GCC extensions, always_inline function attribute, and __builtin_constant_p() built-in to branch only when the parameter expression is known at compile-time), so that when both the pin and the state are known at compile time, the generated machine code is a single assignment of a constant to GPIOn_DR_SET or GPIOn_DR_CLEAR; and otherwise two lookup arrays are used: one identifies the GPIO port (via a pointer), and another the bit within the port.  Overhead is obviously 8 bytes per supported I/O pin number (of Flash), but generates quite efficient code considering it allows completely arbitrary pin port and bit selection.
(The other efficient alternative is to number your pins so that low bits correspond to the bit within a port, and high bits identify the port, but even that will generate a some bit shift and addition instructions per function call for the cases where the pin and/or state varies at run time.)
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Quote
It does avoid having the processor from having to do a read-modify-write on the data register, as
    GPIO_DR_TOGGLE = mask;
Compiles to a single store.
Ah.  I thought you were talking about the hack that sets arbitrary bitfields (from a mask) by reading the current value and writing to toggle.
Code: [Select]
    GPIO_DR_TOGGLE = (GPIO_DR_OUTPORT ^ value) & mask;
https://github.com/raspberrypi/pico-sdk/blob/master/src/rp2_common/hardware_gpio/include/hardware/gpio.h#L718
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
Ah.  I thought you were talking about the hack that sets arbitrary bitfields (from a mask) by reading the current value and writing to toggle.
No, the "current value" is an ordinary variable, and only maintained for the bus bits.  In real life, I often implement it as
    bus_bits ^= new_bits;
    GPIOn_DR_TOGGLE = bus_lookup[bus_bits & BUS_MASK];
i.e. the temporary variable tracks the bus state, not the output data register state.  I thought the other form was easier to read, that's all.

The 'Pi hack, using my notation here, would be written as
    GPIOn_DR_TOGGLE = GPIOn_DR ^ bus_lookup[new_bits & BUS_MASK];
It is superior to writing the result to GPIOn_DR, because there is a short window between the read from the port, and modifying the port, during which an interrupt may change the port state.  Using the toggle register, only if the interrupt modifies the same pins, can there be an incorrect result state.
It is a read-modify-write cycle, even though two different registers are used, because they refer to the same underlying data.

Using a separate variable saves the data register read, which can take a few clock cycles (from an extra one to about a dozen on Cortex-M's, I believe).  As I understand it –– ataradov and others can correct me if I'm wrong –– this extra access cost is due to the GPIO subsystem using its own clock; i.e. the cost is due to different clock domains, and thus applies to both reads and writes equally.  On many Cortex-M's the same GPIO port can also be accessed via different internal buses, for example on Teensy 4.x GPIO1 and GPIO6 refer to the same bank, the former accessible via DMA, the latter non-DMA and faster.

Another trick is to use 32-bit DMA to a GPIOn_DR_TOGGLE register, allowing one to do up to 32-bit wide parallel bus output via DMA.  The trick is that the data buffer has to be exclusive-or -processed (and bus-lookup'ed) first, but fortunately that is very cheap to do when filling the buffer.  You can even use 64 bits per data element, if you have a write strobe pin in the same GPIO bank, with every DMA write toggling the write strobe pin; or 96 bits with each data element not toggling the write stobe pin, but followed by two words only toggling the write strobe pin.  When such a buffer is initially filled, subsequent refills do not need to modify the toggle words at all.

That way, the bus_bits variable is actually a register during the fill, and only saved for use by the next fill; unless one clears the bus state to all-zeroes/all-ones between DMA runs.  Quite useful with small display controllers like ILI9341, when you use a framebuffer with unusual color format (or multiple layers), tiled graphics, or non-Cartesian framebuffer read order, and two or three separate DMA'd stripe buffers.  You recalculate the next stripe while the current one is transferred, so with fast MCUs you can do surprisingly interesting effects processing; definitely stuff like RGB blending and transparency and rotation of at least a couple of different "planes" on top of a static background.  This is how I reinvented this wheel approach for myself.



Just in case some of the readers of this thread have not guessed the relationship between the GPIOn_DR, GPIOn_DR_SET, GPIOn_DR_CLEAR, and GPIOn_DR_TOGGLE GPIO registers (and to make this post too a 'draft of a dissertation'):
  • Any bits set in the value written to GPIOn_DR_SET will cause the corresponding bits in GPIOn_DR to be set
  • Any bits set in the value written to GPIOn_DR_CLEAR will cause the corresponding bits in GPIOn_DR to be cleared
  • And any bit set in the value written to GPIOn_DR_TOGGLE will cause the corresponding bits in GPIOn_DR to change state
  • Unset/zero bits in the value written to GPIOn_DR_SET, _CLEAR, or _TOGGLE, are ignored: they do not cause any changes
These are common on some Cortex-M's GPIO subsystems, but similar can be available even on some 8-bit MCUs; check your MCU datasheet and manual.
 
The following users thanked this post: SiliconWizard

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
STM32 MCUs have a BSRR register for GPIOs (bit set/reset), I'm not sure if it's a common ARM Cortex feature or specific to these.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
STM32 MCUs have a BSRR register for GPIOs (bit set/reset), I'm not sure if it's a common ARM Cortex feature or specific to these.
As it is a peripheral, it does vary among Cortex-M implementations.  I suspect that each chip manufacturer has their own GPIO subsystem they use.

STM32F4 series uses 32-bit registers, but only up to 16 pins per bank.  16 low bits of GPIOx_ODR correspond to the output pin states; GPIOx_BSRR combines set (low 16 bits) and clear/reset (high 16 bits).  You can use 8-bit, 16-bit or 32-bit accesses to these registers.  There is no toggle, but one can do
    GPIOx_BSRR = bus_lookup[new_bits & BUS_MASK] | (bus_lookup[(~new_bits) & BUS_MASK] << 16);
to set and clear the bits corresponding to the new bus state (any subset within the same bank, up to 16 pins) in a single register write (after computing the 32-bit value to be written).  Also note that on STM32F4, bus_lookup[] values only need be 16-bit, not 32-bit, but larger than 16-bit wide buses need to be split into sub-buses and will use different GPIO ports.

On a comparable NXP Kinetis K20 Cortex-M4 (MK20DX64VLH7, MK20DX128VLH7, MK20DX256VLH7), there are up to 32 pins per bank.  The GPIOx_PDOR corresponds to the output pin states, GPIOx_PSOR is the set-bits register, GPIOx_PCOR is the clear-bits register, and GPIOx_PTOR is the toggle-bits registers.  8-, 16-, and 32-bit accesses are supported.

If we switch to Cortex-M7, like NXP i.MX RT106x (as used on e.g. Teensy 4), there are actually two different GPIO subsystems that one can select between, similar to how one selects between pin functions.  Both only support 32-bit accesses.  One is for DMA access, and the other is for faster, processor access.  This means that on RT106x, if you have only one DMA'd parallel output bus, you can have it direct it to GPIO[1-4]_DR, and only enable GPIO1-4 for the pins used by that DMA'd bus.  Processor (digitalWriteFast() etc.) uses GPIO6-9_DR, _DR_SET, _DR_CLEAR, and _DR_TOGGLE for manipulating pin states, which will not interfere with GPIO1-4_DR and related registers, so the two really behave as separate peripherals.

A comparable ST Cortex-M7, STM32F7 family, has the ST-style _ODR and _BSRR registers.

Raspberry Pi Pico AKA RP2040 has a Cortex-M0+ core, but each processor-programmable pin is controlled in a separate register; then again, you do have the eight PIO state machines instead.  NXP Kinetis KL26 sub-family also has Cortex-M0+ cores, but have the same GPIO registers as the K20 Cortex-M4 above.

In my experience, Cortex-M's provide at least the set and reset facility (either via _SET/_CLEAR, or _BSRR), but toggle is rarer; and the entire lineup of Cortex-M's from the same manufacturer tends to have a very similar GPIO subsystem.  (Without looking at 16-bit PIC MCUs, I'd guess they too have the _ODR and _BSRR register interface; I would not be too surprised if the subsystem implementation was really similar to ST's Cortex-M's, too.  It'd just make sense, really.)

Funnily enough, even some 8-bit MCUs like ATtiny4/5/9/10, and even ATmega32u4, have a 'toggle' facility: When a pin is configured an output, writing to PINx (port input register) causes the corresponding bits that were set to toggle in the PORTx registers.  (That might be an AVR specialty, since I haven't seen same in 8051's or MC68HC08's.)
« Last Edit: October 09, 2023, 11:20:44 pm by Nominal Animal »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
As a peripheral for a soft core, I've written a pretty useful kind of GPIO register (IMO) which allows to modify given bits of a GPIO port to an arbitrary value (0 or 1) according to a mask (for which a given bit = 1 means: modify the corresponding GPIO, = 0 means: don't modify it), so the register is split into a mask part and a value part (rather than the set and clear parts of the BSRR registers). Very handy to modify just a set of GPIOs within a port in a single cycle.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
As a peripheral for a soft core, I've written a pretty useful kind of GPIO register (IMO) which allows to modify given bits of a GPIO port to an arbitrary value (0 or 1) according to a mask (for which a given bit = 1 means: modify the corresponding GPIO, = 0 means: don't modify it), so the register is split into a mask part and a value part (rather than the set and clear parts of the BSRR registers). Very handy to modify just a set of GPIOs within a port in a single cycle.
Yes, but it does limit the number of pins to half the register width.

In any case, these are excellent examples of the techniques one can use to minimize the timing differences when changing parallel bus states, depending on the exact MCU capabilities, without using external circuitry like latches.

If OP and others are wondering about how much time they can spend in changing the parallel bus pin states (i.e., how long can one spend to change the state of all the pins in a parallel bus), for typical parallel buses with a separate 50% duty cycle clock or read/write strobe signal, the answer is "almost half a bus clock cycle".  The "almost" is because the triggering edge is not instantaneous (it is a slope, and thus takes a measurable time to transition), and there is a bit of latency between writing to the port data registers and the pins changing state, and these too need to happen well within the half clock cycle of the parallel bus.

Typically, that means that even if the parallel bus pins are spread into two or three GPIO banks, but you can use a shared DMA trigger (with 50% duty cycle, so you can use the opposite edge of that compared to what the other end uses), the bus will work just fine.  For some buses, even an interrupt handler might be fast enough, if it keeps the data for the next transfer prepared beforehand.

Often, the "hard" part is generating the exact desired number of clock cycles or read/write strobes especially at higher frequencies.  My trick there is to use SPI SCK (clock) if the number of transfers is a multiple of SPI transfer size (often programmable), or SPI MOSI/DO with 8-bit transfers and double (0b01010101), quadruple (0b00110011), or octuple (0b00001111) clock frequency, allowing one to specify the pulse count as a multiple of 4, 2, and 1, respectively.  This often takes two or more external pins (one for the SCK or MOSI/DO, plus the DMA trigger pin) wired together, though.
Many MCUs allow a DMA to use the same source and target addresses but a variable count, so using (part of) one SPI and one DMA channel, one can often generate the desired clock, especially since on Cortex-M's the SPI cores can often go to pretty high frequencies (dozens of MHz).

On some MCUs, you can even use the SPI in slave mode, dividing the incoming clock by 2/4/8 (and even different dividers if the SPI subsystem supports transfer sizes other than 8 bits); if the SPI data register defaults to all zeros or all ones in the absence of (DMA) writes, you can still control the pulse count (to a multiple of 4, 2, or 1, respectively).
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8173
  • Country: fi
Yes, but it does limit the number of pins to half the register width.

Yeah. Since address space (on 32-bit MCUs) and simple logic gates are cheap, I would like to see more different options to access the same peripherals, e.g. 32-bit wide registers which can set/reset 16 bits at once in any combination (BSRR or SiliconWizard's mask/value, basically the same thing expressed differently), or write 32 pin values at once, or set 32 pin values, or reset 32 pin values.

Interestingly, while STM32 offers combined/atomic set-reset for 16 bits, nRF52 offers atomic set OR reset for 32 bits, but you can't do combined set-reset. Seeing both options at once would be great so one could choose based on application requirements. Full free-routable IO matrix, as in nRF52, is highly helpful as you can arrange the pins you need to switch simultaneously into the same IO register, without causing a PCB routing nightmare or painting yourself into corner with other peripherals sharing the same pins.
« Last Edit: October 10, 2023, 06:23:52 am by Siwastaja »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14481
  • Country: fr
As a peripheral for a soft core, I've written a pretty useful kind of GPIO register (IMO) which allows to modify given bits of a GPIO port to an arbitrary value (0 or 1) according to a mask (for which a given bit = 1 means: modify the corresponding GPIO, = 0 means: don't modify it), so the register is split into a mask part and a value part (rather than the set and clear parts of the BSRR registers). Very handy to modify just a set of GPIOs within a port in a single cycle.
Yes, but it does limit the number of pins to half the register width.

Of course, no free lunch here. But it can prove much better than having to use set/reset registers when dealing with arbitrary values that can't be statically defined (so that a static approach such as set/reset will require extra cycles to define at run-time which should be used for which bits.)

ST's BSRR registers are also half-half, but other implementations use separate SET and CLEAR registers. Of course which approach is more efficient depends on your use case exactly.

The usual implementation approach when using these single registers is to restrict GPIO "ports" to half the native register size (so 16 GPIOs per port for a 32-bit CPU), which is what ST does.
An alternative (which is what I did for flexibility) is to implement N-bit ports (N = native width) that can be read/written in a single cycle entirely (like the ODR registers), and the value/mask registers can only act on half of the port, with the half being configurable in another register. In practice, that gives a ton of flexibility.
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
one can do
    GPIOx_BSRR = bus_lookup[new_bits & BUS_MASK] | (bus_lookup[(~new_bits) & BUS_MASK] << 16);
to set and clear the bits corresponding to the new bus state
A nice thing with STM32 BSRR is that setting bits takes priority on resetting them, so one can have slightly more efficient code with:
    GPIOx_BSRR = bus_lookup[new_bits & BUS_MASK] | (bus_lookup[BUS_MASK] << 16);
The bits set in the higher word of BSRR will have no effect if the corresponding bit is set in the lower word.
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: SiliconWizard, Nominal Animal

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8173
  • Country: fi
But quite frankly, the most usual case why you have to do non-atomic modification in two or more steps is not because you are switching too many (say 17) pins at a time, but because you are switching maybe three or four pins which stupidly happen to reside in different ports. Having a full routing matrix on other peripherals, so that you can get all those UARTs and SPIs etc. out of the way instead of having them in fixed positions (or two-three choices only), pretty much solves this issue, and is hugely helpful in many other ways, too. This has proved an excellent feature in nRF52 and it can't be that expensive to implement. Peripherals with special analog requirements would still need fixed mappings, of course.
« Last Edit: October 10, 2023, 07:02:58 am by Siwastaja »
 
The following users thanked this post: newbrain

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1719
  • Country: se
Having a full routing matrix on other peripherals, so that you can get all those UARTs and SPIs etc. out of the way instead of having them in fixed positions (or two-three choices only), pretty much solves this issue, and is hugely helpful in many other ways, too. This has proved an excellent feature in nRF52 and it can't be that expensive to implement. Peripherals with special analog requirements would still need fixed mappings, of course.
Another example is the PSoC family, with full routing of peripheral modules (and much more...).
But I also like what they did with the RP2040: one single 32 bit GPIO, with a full complement of set, clear, and xor registers, and a very liberal (and regular, easy to memorize!) mapping of SPI/I2C/UART, not 100% flexible but 99% sensible.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6264
  • Country: fi
    • My home page and email address
But quite frankly, the most usual case why you have to do non-atomic modification in two or more steps is not because you are switching too many (say 17) pins at a time, but because you are switching maybe three or four pins which stupidly happen to reside in different ports.
I admit, because of the stuff I do –– using microcontrollers to interface to sensors and displays as extensions to Linux SBCs –– I spend a lot of time picking the pin sets using various spreadsheets before writing any code, so I don't usually get bitten by that.

I wonder how much silicon would be needed for a few full-matrix GPIO output buffer registers?

For example, let's say you have one or more 32-bit buffer registers, each with a full bit mask across all GPIO banks (128-bit mask if four 32-bit GPIO banks, for example), identifying the pins each bit affects.  A write to such a register sets the corresponding pins in parallel, with conflicting bit masks (i.e. two different full bit masks having common bits set) yielding undefined/unpredictable/don't-care results?  While the write would have no delay, the actual GPIO state change could be delayed by the typical cross-clock-domain delay, as long as all GPIO banks change state simultaneously, plus a constant number of subsystem clock cycles to generate the actual GPIO bank changes. (There are various ways to generate the needed bit masks and changesets in C/VHDL/hardware, the simplest being a shifter, leading to 32 cycle latency with 32-bit GPIO registers.)

If you wanted read support, it'd need to select a combinatoric logic –– OR, NOR, AND, XOR –– but otherwise be very similar.  (Each output bit is the result of the combinatoric logic across the GPIO input data register bits corresponding to set bits in the full bit mask.)  Note how the read trigger would be immediate, but the data only available after the aforementioned latency; a simple latch/FIFO mechanism (where reading causes the next read to latch on next clock cycle or so and starts propagating the new data, but returns previous latch contents) would probably be acceptable.

Note that this is not a crossbar-type situation, but a mask approach affecting all GPIO banks in parallel.

I haven't done FPGA stuff yet, but I see this as an added/optional subsystem with access to the GPIO port data registers in parallel (needing a mask-and-new-state write access to all GPIO banks in parallel).  It'd be very interesting to test this with some real-world use cases on a soft-core, I think.
« Last Edit: October 10, 2023, 08:55:49 pm by Nominal Animal »
 

Online radiolistener

  • Super Contributor
  • ***
  • Posts: 3378
  • Country: ua
two pins always have some phase difference. You're needs to be careful with PCB design. Keep electrical length of wires from your pins as close as possible, so they will have minimum phase difference. There is no way to add or remove some delay for some pin, like you can do on FPGA. The phase delay for each pin is fixed and depends on wire layout on die, so you can measure it and compensate with wires layout on your PCB.

But I suspect that jitter of stm32 don't allows you to be very precise with phase delay for all pins. It's better to use FPGA for that. It also has jitter, but it will better than for stm32.
« Last Edit: October 10, 2023, 09:50:06 pm by radiolistener »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf