Author Topic: Is Free Microchip XC8 v 2.1 compiler deliberately slowing down ISR Latency? (Read 3326 times)

SuzyC · « **on:** December 08, 2019, 07:15:33 pm »

If I want to have precise timing from an ISR that is triggered from TRM0, I must compensate for ISR Latency and preset the TMR0 counter by some value to get very close to exactly 100uSec ISR's.

With a Pro licensed compiler MaxOPT for PIC16F866, I would preset the ISR TMR0 counter 18 counts 20MHz system Xtal Clock 200nSec/Instruction Cycle).

Using the XC8 ver 2.1 Free compiler(OPT=2) and PIC12F1572 32-MHz (8MHz HFINTOSCx4 PLL), I find that I must preset TMR0 to exactly 59 counts to get precise timing.

Is the Free compiler wasting time by adding about 40 instruction cycles before entering the ISR code?
------------------------------------------------------
interrupt int_server(void) //Chip=12F1572
{ //F32 TMR0 Clk = Instruct Clk 8MHz TPeriod=125nS and using *4(prescaler) to feed TMR0 500nS clock
if (T0IF)
{ //59 is the verified exact preset of TMR0 to output 1-Sec blips on a port pin to see ISR timing accuracy on Dig Scope)
TMR0=59; //Preset for 32MHZ FOSC InstClk= 8MHz ISR presetting TMR0 to get 200counts x .5uS = 100uS/IRQ(if no latency to ISR)
T0IF=0;
pt1_mSec++; // Tick the 100-uS IRQ counter ticker
pt1_mSec2++; // used for A2D acquisition
...etc
-------------------
Someone who has a licensed version of XC8 ver 2.1 could compile and post their tweak of TMR0 to get exact timing.

FenTiger · « **Reply #1 on:** December 08, 2019, 08:45:45 pm »

If you try to re-arm the timer from the interrupt handler you can run into all sorts of problems. IRQ handler overhead is one of them, as you've found. Other interrupts, or code that disables interrupts, can randomly delay your interrupt handler too, introducing jitter into your timing.

Use a free-running timer, which automatically resets itself when triggered, and you'll get an accurate repetition rate regardless of what the software does. (I'm assuming PICs have free-running timers. I'd be surprised if they didn't, but I've never used one.)

cv007 · « **Reply #2 on:** December 08, 2019, 08:46:11 pm »

?Is Free Microchip XC8 v 2.1 compiler deliberately slowing down ISR Latency?

Definitely. They put a loop in there to kill some time unless you pay them.

Actually, you can start by doing some calculations.

You do not say what the old timer prescale is, but the old pic is running at 20MHz, so Fosc/4 is 5MHz. Assuming timer prescale is /2 for a 400ns timer clock. That means the timer needs to count 100us/400ns = 250 counts, for 100us.

New pic- 32MHz/4 = 8MHz, 8MHz/4 = timer clock of 500ns. The timer needs to count 100us/500ns = 200 counts, for 100us.

So the old pic needed 250 counts to get to 100us. The new pic needs 200 counts. That is a start and accounts for much of the difference.

Then you can figure out how much time is being lost in the isr, and it will turn out that the new pic is taking less than a third of the time to set the timer than the old pic.-

old pic- Fosc/4 = 5MHz
250counts = 100us
256-18 = 238counts
250-238 = 12counts = 4.8us = 24 Fosc/4 clocks into isr you are setting the timer value

new pic- Fosc/4 = 8MHz
200counts = 100us
256-59 = 197counts
200-197 = 3counts = 1.5us = 12 Fosc/4 clocks into isr you are setting the timer value

so new pic = 12 clocks to get into isr and set new value, old pic = 24 clocks

>I'm assuming PICs have free-running timers

The normal timers may not lend themselves well to doing this as they only seem to interrupt on overflow, but this pic has 3 separate 16bit pwm peripherals, which have 4 interrupts each, and can simply use the period interrupt. Set the period to 100us, and use its period interrupt which will always be 100us.

NorthGuy · « **Reply #3 on:** December 08, 2019, 11:18:18 pm »

You can disassemble the prologue and look, but the new PIC has hardware register saves, so it would be surprising if it took longer than the old one.

hugo · « **Reply #4 on:** December 09, 2019, 01:25:31 am »

Also TMR0 on some (old) pics has a 3 tick write latency so writes must be -3 IIRC.

SuzyC · « **Reply #5 on:** December 09, 2019, 03:38:51 am »

CV007
TMR0 has a free-running clock source for the TMR0 8-bit timer, the instruction clock (SysClk/4).

With the 16F886 I am presetting TMR0 with 18 which equals 18 TMR0 clocks, (derived from 20MHz Xtal for system clock)
I must consider to add to the unavoidable necessary delay time of instruction cycles for the if(T0IF and TMR0=18 statements to complete after the other latency overhead spent by saving essential register values to the stack.

The scope doesn't lie. It shows me I need to preset TMR0 by 18 TMR0 clock cycles for the '886 and 59 for the '1572.

On the 12F1572 side, I need to preset TMR0 to 59 TMR0 clock cycles, yet with both chips TMR0 is being preset at the earliest possible point in the ISR, and both are using instruction clock x2 clock sources, then why shouldn't the overhead be 18 be increased by the ratio of the system clocks.

The 32-MHz system clock of the 12F1572 would lead me to expect that the observed real (scope verified) latency ratio would be 20Mhz/32Mhz less.

This leads me to think I should preset the 12F1572 TMR0 to match the Fosc ratio, to approx (32/20) x18 TMR0 counts( but not to 59.)

I am not in main using any code that would momentarily turn off T0IE.

cv007 · « **Reply #6 on:** December 09, 2019, 04:51:07 am »

I'll stick to my story, and claim you are not thinking this through correctly. You can re-read my post.

There is no way to know what the prescale is set to on the old pic, but it is /1 /2 /4 , etc. With a 20MHz clock and a Fosc/4 of 5MHz, the only thing that makes sense (for the 18 value you load) is a /2 prescale for a timer clock period of 400ns.

The new pic is at Fosc/4 of 8MHz, and a timer prescale of /4 for a 500ns timer clock period (which is in the info you provided)

Your timer load values are to produce an overflow, so look at the number of counts you are leaving for the timer to count- for the old pic it is 256-18=238, for the new pic it is 256-59=197.

To get 100us on the old pic, you needed 250 timer counts (250*400ns), and you are loading 18 for 238 counts to go till overflow. You needed 250, so you must have lost 12 getting to the isr and setting the timer. 12 timer counts is 4.8us, and 4.8us is 24 Fosc/4 clocks. Its taking you 24 instruction clocks to set the timer value.

To get to 100us on the new pic, you need 200 timer counts (200*500ns), and you are loading 59 for 197 counts to go till overflow. You needed 200, so you must have lost 3 getting to the isr and setting the timer. 3 timer counts is 1.5us, and 1.5us is 12 Fosc/4 clocks. Its taking you 12 instruction clocks to set the timer value.

So, the new pic with the non-pro compiler is getting to set the timer value quicker than the pro version on the old pic. That can be explained by the automatic saving of registers that the old pic does not have. You can also read the assembly listing to see it.

If you are using something else for the old pic timer prescale, just tell us and I'll adjust my story.

In addition, you can skip the whole thing and use one of the pwm timers, which is 16bit and can simply produce an interrupt every 100us, on time, every time. Set once, and never touch it again. You still have interrupt latency that can change from something like 3-5 clocks depending on what instruction is being interrupted.

SuzyC · « **Reply #7 on:** December 09, 2019, 11:36:18 am »

CV007, thanks so very much for your detailed and clear reply!

As my sweet Mom once said, "Do the math!".

I am embarrassed to notice I have needed you to restate your carefully thought out first reply that correctly calculates the clocks needed to finally get me to understand my error in using a knee-jerk reasoning simply based on system clock ratios.

The scope doesn't lie and neither does your count of system clocks and instruction counts needed to calculate latency with two different Fosc periods for both chips.

Your very well-thought-out calculations, better known as the process of "doing the math" is what counts here!.

The idea of using PWM to get accurate 100uSec timings is a good one, except that i often need to use all the PWM's in a project.

I am still wondering what is the best guess of difference in code performance and compiled size results from using a XC8 v2.1 --OPT=2 max of the Free Ver compared to OPT=ALL allowed with the paid Pro compiler.

I am also totally confounded why MChip should stuff so many features into the 12F1572 that only has 5 1/2 pins to do anything with and only 2K of pgm memory and 256 bytes of data RAM, and then frustrate a coder and limit the compiler's efficiency to boot.

Why aren't there 10-pin MCU's?
Why aren't there 8-pin MCU's with at least ~2-K of data RAM and at least 8K words of pgm space?

The 12F1572 is such a tease!

But answering these mysteries would be a different topic!

SiliconWizard · « **Reply #8 on:** December 09, 2019, 02:36:09 pm »

Quote from: SuzyC on December 09, 2019, 11:36:18 am

I am still wondering what is the best guess of difference in code performance and compiled size results from using a XC8 v2.1 --OPT=2 max of the Free Ver compared to OPT=ALL allowed with the paid Pro compiler.

Your best guess would be to actually compare assembly output for your particular code as someone suggested. Unfortunately, whereas XC16 and XC32 are based on GCC, and you can figure out what kind of optimizations are available at a given level, I don't think XC8 is? And I don't think Microchip documents the various optimizations very well, if at all.

One thing I remember for XC16 and XC32 is that there are attributes you can add to your ISRs definition to make the compiler generate various kinds of prologs/epilogs, with various sets of registers automatically saved or not. I have never used XC8, but I used mcc8 a good while ago, and I think there were also some similar attributes? Something to check out. The default (without specific attributes) would generally generate ALL registers saving - pretty "expensive". The max optimization level with the Pro version may optimize register saving without any special attribute needed?

Another thing I remember with the 8-bit PICs is the banks - I dunno if the PIC you're using has them. Depending on compiler optimization level, the resulting code may be pretty inefficient too.

Anyway, your best option would be to make it a habit to look at the generated assembly code and thus understand better how the compiler works and what you can do to get better code output, especially in critical parts.

cv007 · « **Reply #9 on:** December 09, 2019, 04:18:26 pm »

Since you have limited pins on the 1572, its going to be difficult to use up the 3 pwm timers as pwm outputs, and still have other useful things going on with the pins. I also missed that timer2 has a period register, so it could also be used- same /4 prescale, and a pr2 of 199 would also get to 100us period, with an interrupt.

Mchp has a truckload of mcu's, and its more difficult figuring out which to choose than it is to find something that will work. In the 'similar devices' tab for the 12f1572 there is listed a PIC16F1575 for example. A 14 pin device in 4 packages, has 4 times the flash and ram, in addition you also get the pps feature where you can get digital i/o routed to any pin. If you don't like the added size of a 14pin dip, use one of the smaller packages.

Then you have another couple dozen in the newer 16f family that are going to be similar enough, and you are just choosing different features.

With the free xc8 compiler, you get up to -O2 optimization, and the missing -O3/-Os options are not a big deal as it makes little difference. Maybe if you are buying in large quantity (10k+), and are on the edge of needing the next pic up in flash size, and it costs 5-10 cents more, the extra 100-200 bytes saved with the -Os optimization may be worth the price. If you are in more of a hobby or low volume situation, then just move up to the next pic if/when needed and pay the extra 10 cents.

edit-
I tested -O2 vs -Os on this project-
https://github.com/cv007/3DigitLed
and it is the exact same size.
I don't have the pro, but I know which byte to change to make it so just for curiosity (I don't know for a fact I'm getting -Os, but there are no complaints about the license for the optimization level). Not worth worrying about what you are missing, because you are not missing anything. There may be cases where it starts to show up in larger projects, but I doubt its much.

SuzyC · « **Reply #10 on:** December 10, 2019, 02:21:56 am »

cv0007 Thanks again for you excellent help and advice.

I don't seem to be able to offer the xc8 v2.1 compiler the --OPT=Os to set the compiler optimization. The xc8 compiler aborts immediately with this in the command line to invokes the xc8 compiler. I can however offer --OPT=9 without any problem.

I have experienced seeing a full compiler result output page of "cannot find xx bytes" lines when compiling a 16F886 using xc8 2.1 with a program that compiles without error to use 88% of pgm space with an older Pro compiler, and that is a big difference to me.

MarkF · « **Reply #11 on:** December 10, 2019, 05:31:58 am »

If you use a PIC (like a PIC16F876A and the 886) that has the Capture/Compare/PWM (CCP) module,
you can use it to reset the Timer without the need of any intervention in the ISR.

The ISR overhead does NOT affect the timing.

Example:

Code: [Select]


//============================================================
void main(void)
{
   // Setup CCP1 configuration for clock interrupt
   CCPR1H=0x00;                  // 25 KHz interrupt with 20 MHz clock
   CCPR1L=0xc8;
   CCP1CON=0x0b;                 // Compare mode, trigger special event
   // Setup Timer1 configuration
   TMR1H=0;
   TMR1L=0;
   T1CON=0x05;                   // 1:1 Prescale | TMR1CS | TMR1ON bits
   // Enable CCP1 interrupt
   PIR1bits.CCP1IF=0;            // CCP1 Interrupt Flag bit
   PIE1bits.CCP1IE=1;            // CCP1 Interrupt Enable bit
   INTCON=0xc0;                  // GIE and PEIE interrupts


   // MAIN LOOP
   while (1) {

   }

}

//============================================================
void __interrupt() isr(void)
{
   if (PIE1bits.CCP1IE && PIR1bits.CCP1IF) {

      // ---------- Reset Timer1 interrupt ----------
      PIR1bits.CCP1IF = 0;       // Clear CCP1 Interrupt Flag

      // TODO:

   }
}

SuzyC · « **Reply #12 on:** December 11, 2019, 12:25:39 am »

Several replies suggest the use of a PWM counter instead of my TMR0 ISR.

Thanks for the advice!

The fact is very clear, I don't have any problem with using TMR0 to create a periodic ISR.

If the "TMR0=59;" statement is clearly at the absolute beginning of the ISR immediately following where the"if (T0IF)" , then it always works perfectly and consistently and gives almost perfect timing, even when observed over tens of minutes on a storage scope.

It works perfect, of course, only if the magic right value to offset TMR0 is used.
The problem I had was not knowing what instructions were delaying ISR latency after a TMR0 overflow IRQ trigger, so I could by calculation, rather than by empirical means, find the magic number that works, the one and only right value.

It turns out, then when the Ver 2.1 free version attempts to compile a program that is using floating-point math, the compiler repeats adding the same code necessary for each instantiation of a floating point multiplication(for instance) with --OPT=2 and cleans up this unnecessary duplication of code when using the Pro version and this is one why the PRO compiled code doesn't become bloatware.

NorthGuy · « **Reply #13 on:** December 11, 2019, 02:38:48 am »

Quote from: SuzyC on December 11, 2019, 12:25:39 am

The fact is very clear, I don't have any problem with using TMR0 to create a periodic ISR.

If the "TMR0=59;" statement is clearly at the absolute beginning of the ISR immediately following where the"if (T0IF)" , then it always works perfectly and consistently and gives almost perfect timing, even when observed over tens of minutes on a storage scope.

If the timer overflows while your code is busy working on some different interrupt, or while interrupts are disabled for whatever reason, your timer ISR gets delayed. Your "TMR0=59" ties the timing of the nextr interrupt to the beginning of the ISR. This makes this delay permanent As your code gets bigger and involves more stuff, ISR delays will happen more and more offen, and the frequency of your interrupts will dwindle down from 10 kHz, generally unpredictably.

Even if you strongly against free running timers, at least you can do "TMR0 += 50 (or whatever)". This will make your timer intervals somewhat immune to interrupt delays.

SuzyC · « **Reply #14 on:** December 11, 2019, 04:24:38 pm »

NorthGuy, thanks for your reply, but what are you saying? I am already setting TMR0=59 or (in the case of 16F886 TMR0=18.)

I have nothing against free-running timers, TMR0 is a free-running timer.

Of course I recognize that servicing other enabled interrupts could making accurate timings using a TMR0-based ISR less accurate, but not if those IRQ's are not enabled. Enabled or not other IRQ flags can be set.

The TMR0 ISR can service any other IRQ flags set, and within 100uS, within the TMR0 ISR and without upsetting a main() pgm that relies on the best imitation of an RTC and I code this by just generating accurate timings derived from a TMR0 ISR made precise with a Xtal based system Osc. I therefore to don't write any other code that would delay a TMR0 ISR IRQ.

Most of the time, exact timing is what counts, but a lot of other interruptions, like even those in real life can wait a few microseconds more to get some attention.

SuzyC · « **Reply #15 on:** December 11, 2019, 04:41:10 pm »

Thanks MarkF, for your capturating advice!

Your idea is fine, but it also ties up Timer1, and it is the only 16-bit timer available(16F886 et al) and a likely resource needed to accomplish other tasks within a pgm.

I'd rather commit to TmR0, an eight-bit timer and get just as good results.

SuzyC · « **Reply #16 on:** December 11, 2019, 04:45:18 pm »

CV007, "Since you have limited pins on the 1572, its going to be difficult to use up the 3 pwm timers as pwm outputs, and still have other useful things going on with the pins."

Ever consider three long strings of Xmas lights that can be enlightened using three PWM's?

NorthGuy · « **Reply #17 on:** December 11, 2019, 04:47:12 pm »

Quote from: SuzyC on December 11, 2019, 04:24:38 pm

I have nothing against free-running timers, TMR0 is a free-running timer.

A free-running timer is the one which is only set once and then runs freely. Setting TMR0 to 59 in the middle makes it not free-running.

Quote from: SuzyC on December 11, 2019, 04:24:38 pm

Of course I recognize that servicing other enabled interrupts could making accurate timings using a TMR0-based ISR less accurate, but not if those IRQ's are not enabled. Enabled or not other IRQ flags can be set.

Of course, if you never enable other interrupts and never disable your timer interrupt then it doesn't matter.

Your ISR code checks for T0IF flag. Apparently, if you do this, you think the T0IF may not be set, which is only possible if another enabled interrupt exists which have caused the ISR to start. Or are you checking the T0IF only to deliberately slow down ISR latency?

SuzyC · « **Reply #18 on:** December 11, 2019, 05:01:40 pm »

NorthGuy, , I don't get it, the TMR0 ISR is only invoked by T0IF being set, so I can understand why you could say "Apparently, if you do this, you think the T0IF may not be set."

Of course, you are right, I may not need to doubt that T0IF invoked the ISR, but its a cheap fail-safe statement that makes sure I am not being confused by some filthy, deep-buried bug in my pgm that could invoke an IRQ mistakenly left enabled and drive me a little more crazy for several hours. After all, I will most often among the many of my coding masterpieces, within the TMR0 ISR, service other flags set.
-----------------------------

"Setting TMR0 to 59 in the middle makes it not free-running."
There is no way to enable/stop TMR0 on the 16F886(and many other similar chips). It is always free running if not being preset.

My code consistently presets TMR0, and this tiny always-the-same delay makes TMR0 free-running as possible while setting exact timing.

SuzyC · « **Reply #19 on:** December 11, 2019, 06:58:12 pm »

We are drifting from the OP.

Can anyone show me the difference in compiled size between OPT=2 and OPT=ALL?

SiliconWizard · « **Reply #20 on:** December 11, 2019, 07:21:38 pm »

I suggest reading the "2.6.7 How Can I Make My Interrupt Routine Faster?" section of XC8's user guide (and "4.9.4 Context Switching"). It basically says what I said earlier.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Is Free Microchip XC8 v 2.1 compiler deliberately slowing down ISR Latency? (Read 3326 times)

Share me