AtmelStudio SAMD21, running slow and sysclk_init() error

AtmelStudio SAMD21, running slow and sysclk_init() error
Posted by Jester on 28 Jan, 2019 03:08
Disclosure, I'm a hardware guy not a programmer.....

I'm trying to get a SAMD21 up and running properly so I can begin writing some useful code, and have struggled with AtmelStudio (first day). I'm now at the stage where I can program the part via SWD, for this initial test I'm using a Adafruit ItsyBitsy M0 Express board. It must be using the internal oscillator as there is no xtal on the board.

With the following blinky code, I have a pin toggling, however only at 500kHz, I was hoping for something closer to 48MHz.
int main (void)
{
   system_init();
//   sysclk_init();
   board_init();

ioport_init(); // call before using IOPORT service
ioport_set_pin_dir(LED0, IOPORT_DIR_OUTPUT); // LED pin set as output
ioport_set_pin_level(LED0, IOPORT_PIN_LEVEL_HIGH); // switch LED off

   /* Insert application code here, after the board has been initialized. */
   while (1) {
      ioport_set_pin_level(LED0, IOPORT_PIN_LEVEL_LOW); // switch LED off
      //delay_ms(400);
      ioport_set_pin_level(LED0, IOPORT_PIN_LEVEL_HIGH);
   }
}

My guess is there are two issues:
1) I'm using this code in conf_board.h, and it's not obvious how I would need to tweak that to get maximum internal clock frequency?

#ifndef CONF_BOARD_H

#define CONF_BOARD_H

      // clock resonators
      #define BOARD_FREQ_SLCK_XTAL (32768U)
      #define BOARD_FREQ_SLCK_BYPASS (32768U)
      #define BOARD_FREQ_MAINCK_XTAL 0
      #define BOARD_FREQ_MAINCK_BYPASS 0
      #define BOARD_MCK CHIP_FREQ_CPU_MAX
      #define BOARD_OSC_STARTUP_US 15625

#endif // CONF_BOARD_H

2) I used ASF to add the libraries for I/O and the clock system and internet examples I found are calling sysclk_init(), however when I uncomment the call to sysclk_init(); I get a "implicit declaration of function 'sysclk_init' [-Werror=implicit-function-declaration" error.

Help Please

#1 Reply
Posted by ataradov on 28 Jan, 2019 03:23
It is not possible even in theory to toggle a pin at 48 MHz on a 48 MHz MCU.

The maximum you can get using basic naive approach, but with more or less optimal code is 6 MHz using set/clear functions. And probably 12 MHz using toggle function.

ASF code is somewhat bloated, so 500 KHz seems reasonable here.

#2 Reply
Posted by westfw on 28 Jan, 2019 05:01
Yeah; Arduino gets about 300kHz, so 500kHz from ASF doesn't seem awful.I did get about 12MHz using the PORT_IOBUS (the single-cycle pin-access feature offered by the SAMD21 https://forums.adafruit.com/viewtopic.php?f=57&t=133497#p668379

#3 Reply
Posted by Jester on 28 Jan, 2019 10:56
My 48MHz comment "closer" was a bit tongue in cheek. That being said, I have always preferred Harvard architecture from a x MHz = x instructions perspective.

I can live with 12MHz, would prefer higher, I'm not sure if I can enable a clock output for a set # of pulses, using PWM mode, that might work?

Back to the original problem, surely a 48MHz controller can do something useful at more than 1/50 of its specified clock speed when running regular C. I'm guessing that whatever configuration this is operating on at the moment is much slower than 48MHz, perhaps 8MHz

Can anyone comment on how to correct the clock configuration, when using the internal oscillator?

#4 Reply
Posted by JPortici on 28 Jan, 2019 11:25
Check the assembly output!!!
Anyway in these microcontrollers there is usually a way to issue a pin toggle per instruction (and cascade several of those because otherwise the looping through the main while(1) become a significant portion of the program, giving misleading results)
with pipelining it becomes in theory a toggle per clock but you are ultimately limited by
- Internal Bus frequency
- GPIO Bus frequency
- GPIO drive strenght -> rise/fall time

which is why you have 400+ MHz controllers that can't toggle an IO at more than 50 MHz.
The correct way to measure the system clock would be to set up a timer that would toggle a pin at a certain frequency or if the MCU has a reference clock module send the system clock or a divided down clock, or set up a pwm channel and see if the carrier frequency mathches your calculations

#5 Reply
Posted by ataradov on 28 Jan, 2019 17:08
Quote from: Jester on 28 Jan, 2019 10:56
My 48MHz comment "closer" was a bit tongue in cheek. That being said, I have always preferred Harvard architecture from a x MHz = x instructions perspective.
This mostly true for Cortex-M, but you need to keep in mind memory limitations too. For 48 MHz, you are running with 1 Flash Wait State. But sine CPU does 32-bit fetches, and most instructions are 16-bit, it all balances out. But in some degenerate cases, the code may be slower because of that. In that case, it is possible to place parts of the program into RAM.

Quote from: Jester on 28 Jan, 2019 10:56
I can live with 12MHz, would prefer higher, I'm not sure if I can enable a clock output for a set # of pulses, using PWM mode, that might work?
If you use PWM, then you can normally output up to Fper/2. So if you use a regular TC clocked from 48 MHz, then you can get up to 24 MHz. But if you use TCC clocked from 96 MHz, then you can get full 48 MHz. The problem is controlling the exact number of cycles. It all depends on your actual goal.

Quote from: Jester on 28 Jan, 2019 10:56
Back to the original problem, surely a 48MHz controller can do something useful at more than 1/50 of its specified clock speed when running regular C.
It does a lot with this code. It takes your input parameters and calculates what register it should access to toggle the required pins. And I can see it taking 50 instructions to do that.

If you want more optimal code - write more optimal code. You are using a framework - it is a compromise between the convenience, speed and code size.

Quote from: Jester on 28 Jan, 2019 10:56
I'm guessing that whatever configuration this is operating on at the moment is much slower than 48MHz, perhaps 8MHz Is
Not really, it looks like it is running at 48 MHz.

Quote from: Jester on 28 Jan, 2019 10:56
Can anyone comment on how to correct the clock configuration, when using the internal oscillator?
It is hard to tell what you have at the moment. If it is a clean ASF code, then post conf_clocks.h file. Otherwise, post your complete project.

#6 Reply
Posted by westfw on 29 Jan, 2019 08:50
Ok, I pasted your main() code into AS7 and looked at the compiler output.It seems that the ioport_*() functions are actually pretty well optimized in the current-ish ASF (v3.40.0) for SAMD21 (inlined, in fact.)The loop in main becomes:
Code: [Select]
6b2: 001a movs r2, r3 6b4: 000b movs r3, r1 6b6: 6153 str r3, [r2, #20] arch_ioport_pin_to_base(pin)->OUTSET.reg = arch_ioport_pin_to_mask(pin); 6b8: 6193 str r3, [r2, #24] 6ba: e7fc b.n 6b6 <main+0x22>Which is about as good as you can do. Given that, I think you are correct that it should be running faster than 500kHz, if your clock rate were actually 48MHz.

I can't help much with a corrected clock setup. ASF seems to only make the SAMD's confusing clock system even more confusing, so I have up and wrote bare metal code (also, I never used the internal clocks.) :-( However:
- To get 48MHz from internal clocks, you'll need either the DFLL or DPLL
- Those both need an input frequency of around 32kHz.
- If you want to use the 8MHz internal clock as input, you need to divide it down to 32kHz first.
- (so you might as well use the internal 32kHz clock)
I do have my "bare metal" code for running at 48MHz based on an 8MHz external clock here:https://github.com/WestfW/SAMD10-experiments/blob/master/D10-LED_TOGGLE0/src/UserSource/led_toggle.c#L16It's for a SAMD10, which theoretically has "similar" clock setup. But beware subtle differences between SAMD families (I don't know if there are any that come into play here, but there could be.)

#7 Reply
Posted by ataradov on 29 Jan, 2019 08:55
If the code is indeed optimized to that extent, then you should at least post clock configuration file. There are many ways to screw up clock configuration. You may accidentally set flash wait states to some really high value, and 48 MHz system will turn into a pumpkin.

#8 Reply
Posted by westfw on 30 Jan, 2019 00:07
Quote
conf_board.h:
Code: [Select]
// clock resonators #define BOARD_FREQ_SLCK_XTAL (32768U) #define BOARD_FREQ_SLCK_BYPASS (32768U) #define BOARD_FREQ_MAINCK_XTAL 0 #define BOARD_FREQ_MAINCK_BYPASS 0 #define BOARD_MCK CHIP_FREQ_CPU_MAX #define BOARD_OSC_STARTUP_US 15625
Hmm. As far as I know, you should be modifying conf_clocks.h to change the clock setup.

I did a bit of hunting around, and it looks like the BOARD_* definitions you're quoting above are meant to describe the capabilities of the particular board you selected with ASF: "this board has a low speed crystal but not a high speed crystal" and etc. The definitions aren't actually USED in the clock_init code, or referred to in the conf_clocks.h file where the magic happens (I guess that would make too much sense?)If you load up one of the more complex example projects, you'll find that the only place that BOARD_MCK is references is in the .h file that describes the board. (On the bright side, the way AS sets up projects, you have ALL the source code from ASF that you are using "sucked in", so you can do that search with some confidence that you're actually seeing everywhere that it is used.

#9 Reply
Posted by cv007 on 30 Jan, 2019 01:54
On a samd10, doing my own thing in c++, not caring about speed, and the clock at 8Mhz (because the clock stuff is not much fun), no serial port output yet for info, no debugger, a multimeter that does not do high freq, but have my morse code class to transmit info-

https://github.com/cv007/Samd10XplainedMini/blob/master/main4.cpp

I get a value of 42 on the tc1 counter (after /4, as 8 toggles done), for a speed of 190khz at 8Mhz. All those tog's are called functions, and there is checking for which mode the pin is set to and either toggles out or dir (if odrain). Not ideal for speed, but without any effort to optimize you can see this is probably in the same ballpark and extrapolates to about the same 500khz when not using toggle (using my on/off instead of tog drops speed in half).

Peripherals do the high rate pin toggles

#10 Reply
Posted by Jester on 30 Jan, 2019 08:36
Quote from: ataradov on 28 Jan, 2019 17:08
Quote from: Jester on 28 Jan, 2019 10:56
My 48MHz comment "closer" was a bit tongue in cheek. That being said, I have always preferred Harvard architecture from a x MHz = x instructions perspective.
This mostly true for Cortex-M, but you need to keep in mind memory limitations too. For 48 MHz, you are running with 1 Flash Wait State. But sine CPU does 32-bit fetches, and most instructions are 16-bit, it all balances out. But in some degenerate cases, the code may be slower because of that. In that case, it is possible to place parts of the program into RAM.

Quote from: Jester on 28 Jan, 2019 10:56
I can live with 12MHz, would prefer higher, I'm not sure if I can enable a clock output for a set # of pulses, using PWM mode, that might work?
If you use PWM, then you can normally output up to Fper/2. So if you use a regular TC clocked from 48 MHz, then you can get up to 24 MHz. But if you use TCC clocked from 96 MHz, then you can get full 48 MHz. The problem is controlling the exact number of cycles. It all depends on your actual goal.

Quote from: Jester on 28 Jan, 2019 10:56
Back to the original problem, surely a 48MHz controller can do something useful at more than 1/50 of its specified clock speed when running regular C.
It does a lot with this code. It takes your input parameters and calculates what register it should access to toggle the required pins. And I can see it taking 50 instructions to do that.

If you want more optimal code - write more optimal code. You are using a framework - it is a compromise between the convenience, speed and code size.

Quote from: Jester on 28 Jan, 2019 10:56
I'm guessing that whatever configuration this is operating on at the moment is much slower than 48MHz, perhaps 8MHz Is
Not really, it looks like it is running at 48 MHz.

Quote from: Jester on 28 Jan, 2019 10:56
Can anyone comment on how to correct the clock configuration, when using the internal oscillator?
It is hard to tell what you have at the moment. If it is a clean ASF code, then post conf_clocks.h file. Otherwise, post your complete project.

Thanks I appreciate your help on this. I have attached conf_clocks.h as well as conf_board.h

I should mention that I'm using this ItsyBitsy board only to get this up and running in a reasonable fashion and then I will move to the actual board that will have the identical SAMD21, however the real board has provision for a XTAL. So I realize this will have to be tweaked again, but for the moment, I just want to get a handle on the clock system so I know what I can expect.

conf_clocks.h

conf_board.h

#11 Reply
Posted by ataradov on 30 Jan, 2019 08:43
Your system does run at 8 MHz as configured by this line "# define CONF_CLOCK_GCLK_0_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_OSC8M"

Probably minimal changes that have to be made are these:
Code: [Select]
# define CONF_CLOCK_FLASH_WAIT_STATES 1 # define CONF_CLOCK_DPLL_ENABLE true # define CONF_CLOCK_GCLK_1_ENABLE true # define CONF_CLOCK_GCLK_1_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_OSCULP32K # define CONF_CLOCK_GCLK_0_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_DPLL
I'm not 100% sure of all the names and stuff, but the idea is there.

#12 Reply
Posted by Jester on 30 Jan, 2019 08:52
Quote from: westfw on 29 Jan, 2019 08:50
Ok, I pasted your main() code into AS7 and looked at the compiler output.It seems that the ioport_*() functions are actually pretty well optimized in the current-ish ASF (v3.40.0) for SAMD21 (inlined, in fact.)The loop in main becomes:
Code: [Select]
6b2: 001a movs r2, r3 6b4: 000b movs r3, r1 6b6: 6153 str r3, [r2, #20] arch_ioport_pin_to_base(pin)->OUTSET.reg = arch_ioport_pin_to_mask(pin); 6b8: 6193 str r3, [r2, #24] 6ba: e7fc b.n 6b6 <main+0x22>Which is about as good as you can do. Given that, I think you are correct that it should be running faster than 500kHz, if your clock rate were actually 48MHz.

I can't help much with a corrected clock setup. ASF seems to only make the SAMD's confusing clock system even more confusing, so I have up and wrote bare metal code (also, I never used the internal clocks.) :-( However:
- To get 48MHz from internal clocks, you'll need either the DFLL or DPLL
- Those both need an input frequency of around 32kHz.
- If you want to use the 8MHz internal clock as input, you need to divide it down to 32kHz first.
- (so you might as well use the internal 32kHz clock)
I do have my "bare metal" code for running at 48MHz based on an 8MHz external clock here:https://github.com/WestfW/SAMD10-experiments/blob/master/D10-LED_TOGGLE0/src/UserSource/led_toggle.c#L16It's for a SAMD10, which theoretically has "similar" clock setup. But beware subtle differences between SAMD families (I don't know if there are any that come into play here, but there could be.)
westfw, thanks for the link to your code, I will peruse when I have a moment. It seems like the serious coders prefer to bypass ASF, and my hunch is that I will end up going that route at the end of the day. They taught us assembler in school and my first few projects were written in that, so I come from a perspective of you set a bit and it happens on the next clock.

I'm not a coder so I'm not well versed on searching GitHub etc. for bare metal code that could be the starting basis for my project. The two challenges for this project in my mind are getting the configuration sorted out and the display driver working. I have working Arduino code for the display, so when I get past the clock configuration I will investigate how to get the display code into this project. If I can get that working I'm comfortable enough with writing basic C/C++ to accomplish what I want to do.

Cherrs

#13 Reply
Posted by Jester on 30 Jan, 2019 08:53
Quote from: ataradov on 30 Jan, 2019 08:43
Your system does run at 8 MHz as configured by this line "# define CONF_CLOCK_GCLK_0_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_OSC8M"

Probably minimal changes that have to be made are these:
Code: [Select]
# define CONF_CLOCK_FLASH_WAIT_STATES 1 # define CONF_CLOCK_DPLL_ENABLE true # define CONF_CLOCK_GCLK_1_ENABLE true # define CONF_CLOCK_GCLK_1_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_OSCULP32K # define CONF_CLOCK_GCLK_0_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_DPLL
I'm not 100% sure of all the names and stuff, but the idea is there.

I will try today, much appreciated.

#14 Reply
Posted by westfw on 30 Jan, 2019 09:51
Ah hah! To use ASF to configure your clock to run at 48MHz, based on multiplication of the 32kHz internal clock, it looks like the relevant parts of the conf_clocks.h file should look something like:
Code: [Select]
#define CONF_CLOCK_FLASH_WAIT_STATES 1 /* SYSTEM_CLOCK_SOURCE_OSC32K configuration - Internal 32KHz oscillator */ # define CONF_CLOCK_OSC32K_ENABLE true # define CONF_CLOCK_OSC32K_STARTUP_TIME SYSTEM_OSC32K_STARTUP_130 # define CONF_CLOCK_OSC32K_ENABLE_1KHZ_OUTPUT true # define CONF_CLOCK_OSC32K_ENABLE_32KHZ_OUTPUT true # define CONF_CLOCK_OSC32K_ON_DEMAND true # define CONF_CLOCK_OSC32K_RUN_IN_STANDBY false # define CONF_CLOCK_DPLL_ENABLE true # define CONF_CLOCK_DPLL_ON_DEMAND true # define CONF_CLOCK_DPLL_RUN_IN_STANDBY false # define CONF_CLOCK_DPLL_LOCK_BYPASS false # define CONF_CLOCK_DPLL_WAKE_UP_FAST false # define CONF_CLOCK_DPLL_LOW_POWER_ENABLE false # define CONF_CLOCK_DPLL_LOCK_TIME SYSTEM_CLOCK_SOURCE_DPLL_LOCK_TIME_DEFAULT # define CONF_CLOCK_DPLL_REFERENCE_CLOCK SYSTEM_CLOCK_SOURCE_DPLL_REFERENCE_CLOCK_GCLK # define CONF_CLOCK_DPLL_FILTER SYSTEM_CLOCK_SOURCE_DPLL_FILTER_DEFAULT # define CONF_CLOCK_DPLL_REFERENCE_FREQUENCY 32768 # define CONF_CLOCK_DPLL_REFERENCE_DIVIDER 1 # define CONF_CLOCK_DPLL_OUTPUT_FREQUENCY 48000000 /* DPLL GCLK reference configuration */ # define CONF_CLOCK_DPLL_REFERENCE_GCLK_GENERATOR GCLK_GENERATOR_1 /* DPLL GCLK lock timer configuration */ # define CONF_CLOCK_DPLL_LOCK_GCLK_GENERATOR GCLK_GENERATOR_1 /* Configure GCLK generator 0 (Main Clock) */ # define CONF_CLOCK_GCLK_0_ENABLE true # define CONF_CLOCK_GCLK_0_RUN_IN_STANDBY false # define CONF_CLOCK_GCLK_0_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_DPLL # define CONF_CLOCK_GCLK_0_PRESCALER 1 # define CONF_CLOCK_GCLK_0_OUTPUT_ENABLE false /* Configure GCLK generator 1 */ # define CONF_CLOCK_GCLK_1_ENABLE true # define CONF_CLOCK_GCLK_1_RUN_IN_STANDBY false # define CONF_CLOCK_GCLK_1_CLOCK_SOURCE SYSTEM_CLOCK_SOURCE_OSC32K # define CONF_CLOCK_GCLK_1_PRESCALER 1 # define CONF_CLOCK_GCLK_1_OUTPUT_ENABLE falseYou'll notice that I used the DPLL rather than the DFLL. That's because I decided I don't understand the DFLL.It looks like the DFLL always has an output of 48MHz, and you have the OPTION of syncing it to a higher accuracy low-frequency source (the input clock frequency can be 8MHz, but the reference frequency is limited to 32kHz?) if configured properly. That's pretty boring (I've asked in a couple places why one would want to use the DFLL instead of the DPLL, given that the DPLL is so much more flexible. I don't think I ever got an answer.
I got so annoyed at the way conf_clocks.h contains so much ... CLUTTER, that I split it up into a conf_clocks.h that I can put just the things that I'm using, and a clock_defaults.h that defines all of the rest of the symbols that ASF wants defined. It just as a bunch of #ifndef wrappers (and it could use a lot more, actually.) Attached.
(IMO, the fact that I had to do this is an example of the sort of thing that bugs me about ASF. Sigh.)
With the code shown, your code example runs about 4.8MHz (10 clock cycles for the three-instruction loop.) To get to ~12MHz (4 i/l), you have to change the ioport definitions to use the fast PORT_IOPORT (single cycle IO space) definitions instead of just PORT So I guess that's 1 or 4 cycles for each store, and 2 cycles for the branch.Interestingly (?) increasing the number of flash wait states doesn't seem to change the speed of the loop I guess such a short loop stays in the cache/flash-accelerator, or whatever...

conf_clocks.zip

#15 Reply
Posted by ataradov on 30 Jan, 2019 17:14
Quote from: westfw on 30 Jan, 2019 09:51
That's pretty boring (I've asked in a couple places why one would want to use the DFLL instead of the DPLL, given that the DPLL is so much more flexible. I don't think I ever got an answer.
The answer is simple. DFLL is specifically designed to provide the main clock. So it is limited in options, because most of the time you want your clock to be 48 MHz.

This leaves DPLL to do more interesting things, for example run TCCs at 96 MHz, or run I2S at some weird frequency, which is good for audio applications. DFLL also has the option for USB Clock Recovery, so if you want to use USB in a crystal-less mode, you have to use DFLL at least for USB.

I would use DFLL for the main clock in more or less complicated projects. But if all you need is basic 48 MHz, then it does not really matter.

Quote from: westfw on 30 Jan, 2019 09:51
increasing the number of flash wait states doesn't seem to change the speed of the loop I guess such a short loop stays in the cache/flash-accelerator, or whatever...
There is no real caching in those parts. There is no slow down is because flash fetches are always 32-bit wide, and instructions are mostly 16-bit wide, so every two clock cycles the core fetches two instructions on average.

#16 Reply
Posted by ataradov on 30 Jan, 2019 17:18
Quote from: Jester on 30 Jan, 2019 08:52
I'm not a coder so I'm not well versed on searching GitHub etc. for bare metal code that could be the starting basis for my project.

Here are some of my projects you can use as a source for code snippets for various peripherals:

https://github.com/ataradov/mcu-starter-projects
https://github.com/ataradov/vcp
https://github.com/ataradov/dgw/tree/master/embedded
https://github.com/ataradov/siggen/tree/master/fw

With all of them, there is enough sample bate-metal code to get you started.

#17 Reply
Posted by cv007 on 30 Jan, 2019 18:51
I guess I was a little low on my numbers, as I was using the tgl/outset/outclr registers but was still doing rmw on them. Probably safest to be a little skeptical of anything I claim.

One thing I dislike about these sam d's (I only have the D10), is the clock system seems needlessly complicated. Clock controls everywhere, the need to sync (or not), and the terminology gets a little fuzzy when reading about it all. The other thing I dislike, is with the 32bit address space available, it still is a big mix of 8/16/32bit registers. I guess copy/paste is used in chip design similar to documentation (I know nothing of chip design).

I do have blinking lights, so in the end it must be ok.

#18 Reply
Posted by ataradov on 30 Jan, 2019 19:01
Quote from: cv007 on 30 Jan, 2019 18:51
One thing I dislike about these sam d's (I only have the D10), is the clock system seems needlessly complicated.
Actually after you work with them for a while, everything else seems not sufficiently sophisticated. There are some improvements possible, for sure, but the flexibility is amazing. It does make simple project harder to start, but it also makes things so much easier for big projects.

Quote from: cv007 on 30 Jan, 2019 18:51
Clock controls everywhere, the need to sync (or not), and the terminology gets a little fuzzy when reading about it all.
You actually almost never have to sync. Unfortunately documentation is not very clear on that. Sync is an option that you just don't get with other MCUs that have peripheral clocks slower than the core (bus) clock. They always sync blocking the bus, so no other activity can take place (code fetch, DMA transfers, etc). If you simply ignore syncs, then you get the same exact behavior. With syncs you have an option to let those background operation continue while your code waits in a loop.

You may consider SYNCBUSY bit to be a flag that next write will block the bus. And in that case you may still go ahead and do the write. Or wait for the bit to clear, it is up to you.

Sometimes read sync is required, because this is what actually transfers the data from the peripheral clock domain to the core clock domain. But read synchronizations are very rare.

Quote from: cv007 on 30 Jan, 2019 18:51
The other thing I dislike, is with the 32bit address space available, it still is a big mix of 8/16/32bit registers. I guess copy/paste is used in chip design similar to documentation (I know nothing of chip design).
This is my personal biggest complain about those devices. There is a [good] historical reason for that, but still the end result is not great. I'm insisting as much as I can on turning everything into 32-bit registers.

#19 Reply
Posted by westfw on 30 Jan, 2019 19:55
Quote
This leaves DPLL to do more interesting things
Ah! I am so used to having a single clock domain on 8bit parts, that it did no occur to me that I might want two high-speed clocks that were essentially unrelated to each other! Thanks for pointing that out.

Quote
If you simply ignore syncs, then you get the same exact behavior [of blocking the bus]
Someone pointed out elsewhere that such an automatic sync takes something like 6x the slower clock, and if you let the bus block, the cpu won't service interrupts during that time. Assuming that that's true, there could be very good reasons for looping and checking for sync manually, when accessing slow peripherals (like the RTC._)

Quote
the flexibility [of the SAMD clock system] is amazing.
What's the largest number of GCLKs you've ever used in a project? :-) I can't imagine ever needing 8, especially with most of the peripherals having their own prescaler.
Quote
There is no real caching in those parts.
I didn't think so either (instead I was expecting some vaguely decribed "accelerator" (but I guess that's STM?))But the datasheet says there is a 64-byte direct-mapped cache (enabled by default, but can be disabled to configured in "deterministic" mode.)
Quote
- [size=0pt]The NVM Controller cache reduces the device power consumption and improves system performance when wait states are required. Only the NVM main array address space is cached. It is a direct-mapped cache that implements 8 lines of 64 bits (i.e., 64 Bytes). NVM Controller cache can be enabled by writing a '0' to the Cache Disable bit in the Control B register ([/size][size=0pt][color=rgb(0.000000%, 0.000000%, 100.000000%)]CTRLB[/color][/size][size=0pt].CACHEDIS). [/size]

#20 Reply
Posted by cv007 on 30 Jan, 2019 19:58
I was wrong again on my 'I was wrong' post. I was doing the right thing originally (a single write to outclr/set/tgl registers). I make a function with a toggle count, and skip the check if the pin is in odrain mode (which will use dir to toggle), then I can get 1Mhz+ @8Mhz (16toggles anyway). I put the function in the header so the loop/code gets inlined.

Quote
You may consider SYNCBUSY bit to be a flag that next write will block the bus. And in that case you may still go ahead and do the write. Or wait for the bit to clear, it is up to you.
I had thought that was the case, but in my (simple) rtc code if I didn't use the syncbusy in my rtc reset code (clear enable, set swrst) there didn't seem to be any blocking going on with the result being the rtc did not start. I'm sure there is a simple explanation, though.

Quote
I'm insisting as much as I can on turning everything into 32-bit registers.
I have mostly been using a pic32mm, and am used to every register being 32bit aligned. Of course with the set/clr/inv available for use on almost every register, it cannot be anything other. With the set/clr/inv available, one can get used it it quite fast and my template code does the choosing. There are a few cases where you need to get at a specific byte or half-word, but just need to cast the value to make sure the right template is used.

I tried to transfer most of the template code to the xplained mini (without the set/clr/inv decisions of course), and it does work fine but I have to be more careful because it is easier to get unaligned access if I'm not paying attention. It becomes a little confusing as I demonstrated- as on a pic32 I use setbit/clrbit and the template uses the right register to set/clr bits, where on the sam setbit/clrbit does rmw and when a non-rmw register is available (outtgl) I have to do a 'val' write (which does a normal register single write).

#21 Reply
Posted by ataradov on 30 Jan, 2019 21:18
Quote from: westfw on 30 Jan, 2019 19:55
Assuming that that's true, there could be very good reasons for looping and checking for sync manually, when accessing slow peripherals (like the RTC._)
Yes, the delay is generally in the range from 5*P_GCLK + 2*P_APB to 6*P_CLK + 3*P_APB. And P_CLK is the period of the peripheral clock, which can be very slow indeed.

With old style designs (lie SAM4, SAM7, or even SAM V71) you typically get peripheral clock that is the same as the CPU clock. At most it is divided by some integer number. In this case synchronization is not an issue.

But even there RTC is typically in its own domain, so working with RTC is subject to the same delay, you are just never exposed to it, nor do you have any option to avoid it.

Quote from: westfw on 30 Jan, 2019 19:55
What's the largest number of GCLKs you've ever used in a project? :-) I can't imagine ever needing 8, especially with most of the peripherals having their own prescaler.
The largest I can find right now is 4. But having spares is nice, I guess.

Quote from: westfw on 30 Jan, 2019 19:55
I didn't think so either (instead I was expecting some vaguely decribed "accelerator" (but I guess that's STM?))But the datasheet says there is a 64-byte direct-mapped cache (enabled by default, but can be disabled to configured in "deterministic" mode.)
This is interesting. I thought the cache was much more recent addition, but I guess not. However it is implemented, it is transparent enough to not cause any problems, since I never had any

#22 Reply
Posted by ataradov on 30 Jan, 2019 21:25
Quote from: cv007 on 30 Jan, 2019 19:58
I had thought that was the case, but in my (simple) rtc code if I didn't use the syncbusy in my rtc reset code (clear enable, set swrst) there didn't seem to be any blocking going on with the result being the rtc did not start. I'm sure there is a simple explanation, though.
RTC is the worst peripheral in this regard. Since it has to run in all sleep modes, it is heavily isolated from the rest of the system. On some devices it is actually in its own power domain, so you can remove the power from the entire chip, but keep the RTC running powered from the battery.

But you still need to be mindful of how you use this synchronization. A typical code for WTD reset you will find looks something like this:
Code: [Select]
void wdt_reset(void) { while (WDT->STATUS.bit.SYNCBUSY); WDT->CLEAR.reg = WDT_CLEAR_CLEAR_KEY; }
Looks good, right? Even ASF has this code. Well, the problem is that if you just add this to the main while (1) {} loop, you will slow down your system to the 32 kHz speed, spending most of the time waiting for sync.

The correct version of the code is this:

Code: [Select]
void wdt_reset(void) { if (0 == WDT->STATUS.bit.SYNCBUSY) WDT->CLEAR.reg = WDT_CLEAR_CLEAR_KEY; }
You only perform a write if there is not one already pending. This avoids stalling the buses, or needlessly waiting in the loop.

#23 Reply
Posted by westfw on 30 Jan, 2019 23:07
Quote
This is interesting. I thought the cache was much more recent addition, but I guess not. However it is implemented, it is transparent enough to not cause any problems, since I never had any
SAM3X just had 64/128bit flash access (according to datasheet) "Direct mapped" is pretty primitive, I guess; It might not even be actual cache memory, so much as some sort of page-mode access to the flash.
I can't think of any reason that one would ever need to use the 'flush cache" NVM command, since it's read-only except for NVM peripheral commands (which incidentally flush the cache, anyway.) Perhaps it is a nod toward the deep embedded programmers who wanted better determinism (along with the single-cycle IO pins.)

#24 Reply
Posted by Jester on 31 Jan, 2019 11:11
I just power read the 30 or so pages associated with the clock system and now have an inkling of what’s going on. It seems fairly complex, but I now understand that it allows for a lot of flexibility and power saving capability
•   One thing jumped that out at me was that after a reset, OSC8M is enabled and divided by 8, so by default it appears we have a 1 MHz clock! That would be consistent with my 500kHz bit banging example.
•   Another interesting aspect is the synchronization system, I imagine this could be tricky when debugging a tricky timing related issue.

I tried implementing the suggested code (ataradov and westfw), when I build after making the changes I get two errors :
1)   Recipe for target ‘src/ASF/sam0/drivers/system/clock/clock_samd21_r21_da_ha1/clock.o’ failed
2)   ‘SYSTEM_CLOCK_SOURCE_DPLL_REFERENCE_CLOCK_OSC32K’ undeclared (first use in this function)

I have attached updated conf_clock.h

EDIT:

So ‘SYSTEM_CLOCK_SOURCE_DPLL_REFERENCE_CLOCK_OSC32K’ is undefined, I searched valid arguments for this and found ‘SYSTEM_CLOCK_SOURCE_DPLL_REFERENCE_CLOCK_GCLK’ and this compiles and produces a 4.7MHz blinker so an improvement of 9.4 x, so moving in the right direction.

My boards arrived so I'm going to populate one (that has a 32.768 kHz external crystal) and try again this time with ‘SYSTEM_CLOCK_SOURCE_DPLL_REFERENCE_CLOCK_XOSC32K’

EDIT:
Update, populated board and it's up and running, thanks for the help everyone.
Moving on to [part 2 of 2 "the display"]

conf_clocks.h