Author Topic: STM32 - How many clock cycles does it take to setup a bus?  (Read 8180 times)

0 Members and 1 Guest are viewing this topic.

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
STM32 - How many clock cycles does it take to setup a bus?
« on: August 19, 2016, 02:02:31 pm »
I'm attempting to squeeze every ounce of performance out of this chip and I think my DAC routine is consuming more clock cycles than I thought. Due to the lack of available pins, I'm operating two 16-bit DACs off of a single 8-bit bus. The current setup works but I think I should be able to push it faster.

My question is: How many clock cycles does it take to setup a bus? Is it a single clock cycle for the entire operation or do I burn a cycle for every pin on that bus?

I'm currently resetting the pins, applying the control signals to target the 8-bits in question, and then applying the actual data, then moving onto the next 8-bit segment

I'm attempting to go through this and some basic integer math down to 1.5us but I seem to be hitting a bit of a wall at about 3 or 4us.

Example:
Code: [Select]
DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;
DAC_CONTROL_PORT->BSRRH = DAC_X_WRITE | DAC_LSB_CS | DAC_DATA_PINS;
DAC_CONTROL_PORT->BSRRL = Output & 0x00FF;

 

Offline danadak

  • Super Contributor
  • ***
  • Posts: 1875
  • Country: us
  • Reactor Operator SSN-583, Retired EE
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #1 on: August 19, 2016, 06:58:45 pm »
You can always look at the asm listing for the code, and using a
STM32 manual figure out how many clocks are used. There might
be addon tools for the IDE that do that for you, not sure.

Note the ARM website has the clocks/instruc for that core.


Regards, Dana.
Love Cypress PSOC, ATTiny, Bit Slice, OpAmps, Oscilloscopes, and Analog Gurus like Pease, Miller, Widlar, Dobkin, obsessed with being an engineer
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #2 on: August 19, 2016, 09:03:33 pm »
I don't know the details of your bus or your DAC, but you can probably combine the last two writes into one operation. I have a vague memory that the Cortex-M0/3/4 cores can't support back-to-back I/O writes, which was one of the improvements of the M0+ cores.

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 5039
  • Country: dk
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #3 on: August 20, 2016, 12:10:12 am »
turned on any kind of optimization?
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #4 on: August 20, 2016, 01:37:52 am »
turned on any kind of optimization?

Nope. I have to use free tools.
 

Offline MT

  • Super Contributor
  • ***
  • Posts: 1759
  • Country: aq
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #5 on: August 20, 2016, 02:00:11 am »
Lots of free tools have several levels of optimization that can be turned on and tuned.
 
The following users thanked this post: Kilrah

Offline bobaruni

  • Regular Contributor
  • *
  • Posts: 156
  • Country: au
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #6 on: August 20, 2016, 03:11:13 am »
It can be hard to predict how many clock cycles depending on if the code is in cache and also if it's in an ISR etc.
Which STM32?
What clock speed are you running?
Have you tried relocating the DAC output routine to RAM?
Have you tried just bit banging the port to see what the maximum speed obtainable is as maybe fetching/creating data for your DAC is what's taking lots of time?
Can we see more of the function sending data to the DAC, we can't help you optimise without seeing more code?
« Last Edit: August 20, 2016, 03:29:52 am by bobaruni »
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #7 on: August 20, 2016, 07:00:18 am »
turned on any kind of optimization?
Nope. I have to use free tools.
That's a complete non-sequitor. There's no shortage of tools that are either completely free, or have code-size or other limitations that don't affect optimizations.
« Last Edit: August 20, 2016, 08:18:45 am by andersm »
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #8 on: August 20, 2016, 05:20:13 pm »
It can be hard to predict how many clock cycles depending on if the code is in cache and also if it's in an ISR etc.
Which STM32?
What clock speed are you running?
Have you tried relocating the DAC output routine to RAM?
Have you tried just bit banging the port to see what the maximum speed obtainable is as maybe fetching/creating data for your DAC is what's taking lots of time?
Can we see more of the function sending data to the DAC, we can't help you optimise without seeing more code?

I'm using the STM32F407VET6 chip at a clock speed of 168 MHz. I wasn't the original programmer for this but using the STMCube tools I plugged in the values being set and the configuration verified this. Logically, I should be able to better handle the speeds, since the system clock is orders of magnitude greater than the performance that I'm seeing and the output of the system.

I could blame it partially on how I'm handling the output triggers and the output waveform. I'm producing a triangle / sawtooth waveform using one timer and sending out pulse triggers using a separate timer. The purpose of using the separate timer is to solve an issue caused by a phase issue. The output waveform is piped through some analog circuitry that adds a delay so if I pulse the output triggers using the same routine in the firmware the triggers and the waveform will no longer be in sync.


Can you elaborate on outputting the routine to RAM? It's a function call from within the interrupt that triggers at about 160 Hz. When I push it further the output seems to "glitch" or stall periodically.


I don't like how the firmware was initially structured. Designing it from scratch I would have only used the interrupt timers to set flags so the appropriate routines are then called in the main program loop to clean it up and to have better priority scheduling like an RTOS.

Since it's looking like a substantial rewrite and that I want to be able to get the DAC and triggering routine running at > 320 Hz (preferably 480 Hz) I'm going to advantage of the situation to rev the hardware to use a larger chip to have a dedicated 16 or 32 bit wide bus to the DACs, plus take advantage of the 216 MHz clock on the STM32F7 processors.

For the curious, below is the entire DAC routine.

Code: [Select]
void DAC_LT1657_Set(const uint8_t Channel, uint16_t Output)
{

// LOAD LSB
// set up DAC, all control pins high
DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;

if(Channel == DAC_SELECT_X)
DAC_CONTROL_PORT->BSRRH = DAC_X_WRITE | DAC_LSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins
else
DAC_CONTROL_PORT->BSRRH = DAC_Y_WRITE | DAC_LSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins

DAC_CONTROL_PORT->BSRRL = Output & 0x00FF; // set LSB data pins
Output = Output >> 8; // set up data for next, DAC needs 60nS

// LOAD MSB
// set up DAC, all control pins high
DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;

if(Channel == DAC_SELECT_X)
DAC_CONTROL_PORT->BSRRH = DAC_X_WRITE | DAC_MSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins
else
DAC_CONTROL_PORT->BSRRH = DAC_Y_WRITE | DAC_MSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins

DAC_CONTROL_PORT->BSRRL = Output & 0x00FF; // set MSB data pins
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS

// SET DAC OUTPUT
DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;
DAC_CONTROL_PORT->BSRRH = DAC_LDAC; // Clear LDAC
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS
asm("nop"); // delay, DAC needs 60nS
DAC_CONTROL_PORT->BSRRL = DAC_LDAC; // Set LDAC
}
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #9 on: August 20, 2016, 05:24:49 pm »
turned on any kind of optimization?
Nope. I have to use free tools.
That's a complete non-sequitor. There's no shortage of tools that are either completely free, or have code-size or other limitations that don't affect optimizations.

For this specific design it's all programmed using CooCox CoIDE. I haven't found any optimization configurations in the IDE. Since it seems to no longer be in active development I'm shifting towards STM's System Workbench since it's also Eclipse based (minimal re-learning of the tool) and since it's supported directly by the manufacturer of the chip I hopefully shouldn't need to switch again for the same reason.
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3466
  • Country: gb
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #10 on: August 20, 2016, 05:38:03 pm »
For this specific design it's all programmed using CooCox CoIDE.

CoIDE is just an eclipse based IDE, it uses the GCC ARM compiler which has numerous optimisation features.
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #11 on: August 20, 2016, 05:46:53 pm »
For this specific design it's all programmed using CooCox CoIDE.

CoIDE is just an eclipse based IDE, it uses the GCC ARM compiler which has numerous optimisation features.

I'm aware it's based on eclipse, but I have not seen any hooks or menu items to fine tune optimization.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 28846
  • Country: nl
    • NCT Developments
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #12 on: August 20, 2016, 06:03:01 pm »
There should be some way to add options to the GCC compiler (if it uses GCC under the hood). -O3 is the one for maximum performance but don't look at the flash usage. Eclipse CDT (the generic C/C++ development environment for Eclipse) allows to use seperate options for individual source files and every now and then I use that to only compile a timing critical piece of code with -O3 and the rest with -Os (optimise for size). BTW it is worthwile to spend some time on how the compilation process works. It saves you from dealing with idiosynchronies and limitations impossed by some crippled IDEs.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4408
  • Country: us
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #13 on: August 20, 2016, 06:17:28 pm »
Did you imply in the op that you're using the external memory bus interface to an external DAC, rather than an internal DAC or a gpio port?  If so,speed is probably determined by the setup of the bus interface.  Perhaps it can be tuned.  Perhaps you could go faster by treating the bus as gpio and bit banging everything, if the DACs are all that is on the bus.

 
The following users thanked this post: mubes

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #14 on: August 20, 2016, 06:30:50 pm »
Did you imply in the op that you're using the external memory bus interface to an external DAC, rather than an internal DAC or a gpio port?  If so,speed is probably determined by the setup of the bus interface.  Perhaps it can be tuned.  Perhaps you could go faster by treating the bus as gpio and bit banging everything, if the DACs are all that is on the bus.

All of the waveform information is stored on the ARM chip. For most of the waveforms the points are calculated on-the-fly due to the limited amount of storage. In this specific setup it's simple addition of the last point generated and a step size.

The DAC bus is setup as regular GPIO
Code: [Select]
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_All;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_Init(GPIOE, &GPIO_InitStructure);

No care seemed to have been taken to trace-length match the bus but the calculated delay but the calculated propagation delay in PADs is calculated to be well within +/- 1ns for each bit on the bus. I should be able to push this ten times faster before that's an issue.
 

Offline andersm

  • Super Contributor
  • ***
  • Posts: 1198
  • Country: fi
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #15 on: August 20, 2016, 07:39:36 pm »
I haven't found any optimization configurations in the IDE.
You should try to never let that limit you.

If the timings of your bus allows for setting the data bits at the same time as the control lines, one optimization you could try is storing the data as 16-bit, and baking the control line bits into the upper 8 bits. It'll consume twice as much space, but also reduce the latter half to one memory load and one store. The next step after that is to bake the other control line manipulation into the same array, at which point you can DMA everything into the port output register.

Offline hamdi.tn

  • Frequent Contributor
  • **
  • Posts: 629
  • Country: tn
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #16 on: August 20, 2016, 08:07:07 pm »
STMCube tools

to be honest i didn't understand what you trying to do with this DAC and how your should flow and what your routine is suppose to do.

just one detail about Code generated with STMCube, probably with HAL library ... they add much more code than the one you write. if your routine are executed after an interrupt for example you may experience a bit of delay caused by all the sub-routine and verification and checking of various stuff in the HAL library that they add before running your code.



 
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #17 on: August 20, 2016, 10:39:02 pm »
FYI,

While I haven't worked personally with STM parts, I've looked at some code (specifically for the STM32F4).  The copypasta and HAL and code generation tools produce deplorable results.  It's no wonder they have to put 256kB of Flash in the poor things; if they took the spam out of their tools, they could do it in 64k or less!

And I mean just the C code, I didn't even look at the assembler output.  But that's not usually too painful, as long as the compiler is good (ARM is pretty damn common today, so there shouldn't be a single compiler still working in the stone ages...), and optimization is turned on.

From reading the docs, it's not obvious if IO/peripherals hold the CPU in wait states during transactions.  If they're clocked much slower than the CPU, one would hope this is the case...  Anyway, if so, this should be the only limit to the speed of IO operations: updating registers, at best, one cycle at a time (whether it's a CPU or IO clock cycle, or a bus cycle consisting of many clock cycles).

Whether you can achieve this optimal rate with any of the standard approaches, who knows.  Any code generation tools are likely to have highly suspect output.  Make sure your initialization and operation functions are consistent, and don't faff around calling a thousand other functions.  Put in the register-poking statements yourself, if you have to.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Pack34Topic starter

  • Frequent Contributor
  • **
  • Posts: 753
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #18 on: August 20, 2016, 11:39:01 pm »
FYI,

While I haven't worked personally with STM parts, I've looked at some code (specifically for the STM32F4).  The copypasta and HAL and code generation tools produce deplorable results.  It's no wonder they have to put 256kB of Flash in the poor things; if they took the spam out of their tools, they could do it in 64k or less!

And I mean just the C code, I didn't even look at the assembler output.  But that's not usually too painful, as long as the compiler is good (ARM is pretty damn common today, so there shouldn't be a single compiler still working in the stone ages...), and optimization is turned on.

From reading the docs, it's not obvious if IO/peripherals hold the CPU in wait states during transactions.  If they're clocked much slower than the CPU, one would hope this is the case...  Anyway, if so, this should be the only limit to the speed of IO operations: updating registers, at best, one cycle at a time (whether it's a CPU or IO clock cycle, or a bus cycle consisting of many clock cycles).

Whether you can achieve this optimal rate with any of the standard approaches, who knows.  Any code generation tools are likely to have highly suspect output.  Make sure your initialization and operation functions are consistent, and don't faff around calling a thousand other functions.  Put in the register-poking statements yourself, if you have to.

Tim

Going through some of the HAL stuff for the M7 (standard libraries are not supported), the only way to describe it is "funky."
 

Offline hamdi.tn

  • Frequent Contributor
  • **
  • Posts: 629
  • Country: tn
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #19 on: August 21, 2016, 12:37:17 am »
Going through some of the HAL stuff for the M7 (standard libraries are not supported), the only way to describe it is "funky."

i been playing with an F7 those past days too, the eval board is awesome, the library is easy to use and can start your own application by starting from one of the applications already supplied with the thing. but it's a total nightmare to maintain and to go through the code and get rid of the useless stuff they put in there. just looking at all those switch case if for checking that the HAL library is initialized and some whatever variable is used to finally set a bit in a register and you end up with 10 lines of code instead of one, is too much disturbing.

FYI,

While I haven't worked personally with STM parts, I've looked at some code (specifically for the STM32F4).  The copypasta and HAL and code generation tools produce deplorable results.  It's no wonder they have to put 256kB of Flash in the poor things; if they took the spam out of their tools, they could do it in 64k or less!

just compile a ready to use template, where the micro just do nothing except init clocks and stay in a while loop cost some Kb of flash.
doing the same with standard library give much more better results.
we talked so many times about HAL here and every time i can't figure out ... WHY ?? :palm:
 

Offline C

  • Super Contributor
  • ***
  • Posts: 1346
  • Country: us
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #20 on: August 21, 2016, 12:40:35 am »

Try very simple IO with no masks.

output 0x0055
output 0x00AA
output 0x0055
output 0x00AA
output 0x0055
output 0x00AA

these values cause one bit to go low while bits on each side go high.
A group like this should be max

 

Offline bobaruni

  • Regular Contributor
  • *
  • Posts: 156
  • Country: au
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #21 on: August 21, 2016, 12:55:24 am »
You code is running at orders of magnitude slower than it should, going to an F7 might help but you may not need it.
On an F4 running at  the same speed, I can get tens of MHZ of data out to a DAC but not using an interrupt for each byte.
1. Please check the actual clock frequency is correct.
2. Try changing from debug to release to see if that speeds things up.
3. You can hand optimise the output routine slightly and mark it as inline, this should speed things up.

Instead of calling with the channel number, just call it with DAC_X_WRITE or DAC_Y_WRITE depending on the channell you want.

Code: [Select]
inline void DAC_LT1657_Set(const uint8_t Channel_Select_Bit, uint16_t Output)
{

 // LOAD LSB
 // set up DAC, all control pins high
 DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;

 DAC_CONTROL_PORT->BSRRH = Channel_Select_Bit | DAC_LSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins

 DAC_CONTROL_PORT->BSRRL = Output & 0x00FF; // set LSB data pins

 // LOAD MSB
 // set up DAC, all control pins high
 DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;

 DAC_CONTROL_PORT->BSRRH = Channel_Select_Bit | DAC_MSB_CS | DAC_DATA_PINS; // clear X select, LSB select and data pins

 DAC_CONTROL_PORT->BSRRL = Output >> 8; // set MSB data pins
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS

 // SET DAC OUTPUT
 DAC_CONTROL_PORT->BSRRL = DAC_LDAC | DAC_MSB_CS | DAC_LSB_CS | DAC_X_WRITE | DAC_Y_WRITE;
 DAC_CONTROL_PORT->BSRRH = DAC_LDAC;   // Clear LDAC
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS
 asm("nop");         // delay, DAC needs 60nS
 DAC_CONTROL_PORT->BSRRL = DAC_LDAC;   // Set LDAC
}


Also, please post the actual Timer ISR and also what else is the F4 doing while the timer is firing?
You need to post more code for us to help.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4408
  • Country: us
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #22 on: August 28, 2016, 04:13:01 am »
Quote
You code is running at orders of magnitude slower than it should
What Bob said...
the actual code that you quote looks basically OK, so the problem is elsewhere...
Using the slow and bloated cube/HAL code to set things up should not affect the performance of the actual IO, which you are already doing using direct register access...
Even without optimization, things should be faster than you describe, but it's definitely time to look at the produced code to see if there are any obvious issues.

 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: STM32 - How many clock cycles does it take to setup a bus?
« Reply #23 on: August 28, 2016, 07:27:27 pm »
@Pack34

IO Register access is through constant address access, which can take extra cycles on the ARM. If you are doing the sequence of accesses in a loop or in the same function, cache the address in a register, e.g.

unsigned *dac_bsrrl = &(DAC_CONTROL_PORT->BSRRL);

(May have to be unsigned short *, depending on how BSRRL field is declared), then your access is just

*dac_bsrrl = ...

which is just a single STR instruction.

Ditto with the constant masks on the right hand side. If you use the same mask bits for multiple uses, then use a local variable (which will almost certainly assigned to a register) to cache the value.

-O3 should do these too, but if you only need to optimize a small code fragment such as your case, then hand-optimize them like these can be very effective.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf