Author Topic: SAMD51 External Interrupt latency 500ns (Read 1966 times)

snarkysparky · « **on:** November 14, 2023, 01:43:21 pm »

SAMD51

From the rising edge of the input to the first line of the Interrupt handler ( which is a pin raise ) takes 500-600 ns. measured by Oscope.

I have EIC clocked at full speed 120 mhz. Triggering is set to asynchronous.

No filtering.

I would expect the latency to be lower. under 200ns.

Anyone know a trick I am missing?

Thanks

ataradov · « **Reply #1 on:** November 14, 2023, 04:57:32 pm »

Are your vector table and the related interrupt handler located in the flash? That would be subject to the flash wait states. Move both into SRAM and you should see some improvement.

Although I just tried and it does not look like it changes much. And the delays happen in the core, EIC clock does not affect the result if asynchronous detection is enabled.

It looks like latency is about 30 cycles at slow core speeds (32 kHz and 12 MHz) and about 60 cycles at faster core speeds.

~~Ok, something really strange going on. With the system (GCLK0) running from 12 MHz XOSC, I get about 30 cycles of latency. With the system running from 120 MHz PLL divided by 10, I get 60 cycles.~~ disregard that for now.

snarkysparky · « **Reply #2 on:** November 14, 2023, 06:46:21 pm »

is that wait states?

ataradov · « **Reply #3 on:** November 14, 2023, 06:47:26 pm »

No wait states should be involved, everything runs from SRAM.

But disregard my clock measurements for now, there is something strange with my setup. clocks are just different in those two cases.

ataradov · « **Reply #4 on:** November 14, 2023, 07:37:08 pm »

My DPLL was not locking correctly. With it running as expected, I get consistent 28 cycle latency, same as with other clock sources.

In theory Cortex-M4 has 12 cycle latency from the context save. Then in my case the first instruction of the handler are

Code: [Select]

4b06            ldr     r3, [pc, #24] // Get PORT address
2210            movs    r2, #16 // PB4
f8c3 209c       str.w   r2, [r3, #156] // Write OUTTGL

So, this would take 3-5 cycles to actually write the peripheral.

And when running at slow speeds (12 MHz), it is actually faster if both vector table and the handler are in the flash. Placing handler in the flash saves 3 clock cycles and placing vector table itself in the flash saves 1 more.

At fast clocks placing everything in the RAM wins by a lot.

But ultimately all this account for half the observed latency. It is possible that the rest is just PORT being slow.

Full measured latency is 250 ns in the best case scenario, which is about 30 clock cycles at 120 MHz.

langwadt · « **Reply #5 on:** November 14, 2023, 08:01:02 pm »

Quote from: ataradov on November 14, 2023, 07:37:08 pm

My DPLL was not locking correctly. With it running as expected, I get consistent 28 cycle latency, same as with other clock sources.

In theory Cortex-M4 has 12 cycle latency from the context save. Then in my case the first instruction of the handler are
Code: [Select]
4b06 ldr r3, [pc, #24] // Get PORT address 2210 movs r2, #16 // PB4 f8c3 209c str.w r2, [r3, #156] // Write OUTTGL So, this would take 3-5 cycles to actually write the peripheral.

And when running at slow speeds (12 MHz), it is actually faster if both vector table and the handler are in the flash. Placing handler in the flash saves 3 clock cycles and placing vector table itself in the flash saves 1 more.

I'd guess that RAM and FLASH have separate busses so by placing handler/vector in flash, fetching can happen in parallel with stacking

Quote from: ataradov on November 14, 2023, 07:37:08 pm

At fast clocks placing everything in the RAM wins by a lot.

But ultimately all this account for half the observed latency. It is possible that the rest is just PORT being slow.

Full measured latency is 250 ns in the best case scenario, which is about 30 clock cycles at 120 MHz.

some ARMs have more than one way of accessing GPIO because they started out with something that is very slow.
the SAMD51 too?

ataradov · « **Reply #6 on:** November 14, 2023, 08:08:03 pm »

No, there is no IOBUS version of the port. I assume this is because it makes much harder to close timings at 120 MHz. And another big issue would be the fact that Cortex-M4 does not have IO port interface, so it is not even possible there.

I'm trying to come up with an alternative way to indicate interrupt entry, but so far everything else is slower than a simple port write.

SiliconWizard · « **Reply #7 on:** November 14, 2023, 08:59:09 pm »

Quote from: ataradov on November 14, 2023, 08:08:03 pm

No, there is no IOBUS version of the port. I assume this is because it makes much harder to close timings at 120 MHz. And another big issue would be the fact that Cortex-M4 does not have IO port interface, so it is not even possible there.

I'm trying to come up with an alternative way to indicate interrupt entry, but so far everything else is slower than a simple port write.

Can you not use a counter like DWT CYCCNT instead? That may add a couple cycles though, and you will not be able to use a scope.

ataradov · « **Reply #8 on:** November 14, 2023, 09:13:19 pm »

Quote from: SiliconWizard on November 14, 2023, 08:59:09 pm

Can you not use a counter like DWT CYCCNT instead? That may add a couple cycles though, and you will not be able to use a scope.

How would you use that to measure latency from the external event?

ataradov · « **Reply #9 on:** November 14, 2023, 10:01:13 pm »

The shortest delay I can get so far from input to output is by using the event system directly. The resulting delay is about 50 ns or 6 clock cycles.

I don't think it really tells us anything useful, I just did it because I wanted to test a software event trigger instead of the direct PORT write. But that just added about 5-6 clock cycles compared to the direct PORT write.

It is possible that APB bus access is just that slow.

snarkysparky · « **Reply #10 on:** December 04, 2023, 02:07:30 pm »

All I am trying to do at this point is fill a slot in memory with contents of the IO port

Would using the parallel capture controller with DMA to fill the memory be significantly faster ?

Thanks

ataradov · « **Reply #11 on:** December 04, 2023, 04:00:50 pm »

PCC is definitely faster, but it requires external clock connection. You can route GCLK_IO on the pin, but this would require board modification. Plus it has limited number of bits it can capture and they are assigned to fixed pins.

snarkysparky · « **Reply #12 on:** December 04, 2023, 05:53:58 pm »

Ataradov,

I have to change the board layout anyway.

Thanks

ataradov · « **Reply #13 on:** December 04, 2023, 06:33:11 pm »

If you do, then also make sure to connect EN1 and EN2 pins to GPIOs as well. This way you have application control over the capture start/stop.

I have not tried PCC on the SAM D51, but a similar PCC on the SAM V7x captures the data up to 60 MHz with no issues. And I was able to send it over USB HS at ~49 MB/s to the host PC.

SAM D51 has slightly weaker DMA, and there may be issues at high clock speeds if more than one channel is used due to channel change overhead.

snarkysparky · « **Reply #14 on:** December 07, 2023, 08:18:19 pm »

I seem to have my DMA working.

But i have a question.

Does DMA transfer have higher priority over all interrupts?

I believe DMA halts the CPU core during the transfer as it is using the busses.

So can I count on DMA transfer being mostly independent of what is executing?

thanks

ataradov · « **Reply #15 on:** December 07, 2023, 08:42:17 pm »

DMA does not halt the CPU. There is a multi-layer bus matrix, multiple masters may operate at the same time as long as they access different slaves. You can find the matrix connectivity structure in the datasheet.

And SRAM has 4 separate ports, so MCU and DMA can access SRAM at the same time.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: SAMD51 External Interrupt latency 500ns (Read 1966 times)

snarkysparky

SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

snarkysparky

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

langwadt

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

SiliconWizard

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

snarkysparky

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

snarkysparky

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

snarkysparky

Re: SAMD51 External Interrupt latency 500ns

ataradov

Re: SAMD51 External Interrupt latency 500ns

Share me