Author Topic: C runtime optimization - division and modulous together  (Read 4336 times)

0 Members and 1 Guest are viewing this topic.

Offline aurmerTopic starter

  • Regular Contributor
  • *
  • Posts: 84
  • Country: us
  • Insert text here.
C runtime optimization - division and modulous together
« on: December 16, 2016, 07:56:51 pm »
Hello fellow EEVers.

I have a C 'fundamentals' issue that I am dealing with. I am working with Microchip's XC16 compiler (which I believe is JUST LIKE C30 but with a few architecture-specific optimizations).

I know that / does the same machine operation as %.
When you perform this operation, the / result is placed in w0 and the % result is placed in w1.

My goal is runtime speed. In C, is there a way for me to perform this division operation ONCE and receive both the division result AND the remainder all with one C operation? My best idea is to take my two operands and use inline assembly to do division and then store w0 at address X and w1 at address Y and then proceed.

Am I missing something obvious? Is there a better way?

Thanks!
If I just asked the wrong question, shame on me for asking before I was ready for help. Please be kind and direct me to a resource which will teach me the question I SHOULD be asking. Thank you.
 

Offline IanB

  • Super Contributor
  • ***
  • Posts: 11885
  • Country: us
Re: C runtime optimization - division and modulous together
« Reply #1 on: December 16, 2016, 08:01:29 pm »
Have you considered the div() function in the standard library?

http://www.cplusplus.com/reference/cstdlib/div/
 

Offline aurmerTopic starter

  • Regular Contributor
  • *
  • Posts: 84
  • Country: us
  • Insert text here.
Re: C runtime optimization - division and modulous together
« Reply #2 on: December 16, 2016, 08:03:20 pm »
Thank you for that point, I will test that at runtime and present my results here.
If I just asked the wrong question, shame on me for asking before I was ready for help. Please be kind and direct me to a resource which will teach me the question I SHOULD be asking. Thank you.
 

Offline Twoflower

  • Frequent Contributor
  • **
  • Posts: 737
  • Country: de
Re: C runtime optimization - division and modulous together
« Reply #3 on: December 16, 2016, 08:20:15 pm »
Note: I don't know the quality of the XC16 compiler.

Have you checked what the compiler actually produces? If the data structures are well aligned usually the current compiler do know lots of tricks. Hiding your intention behind some optimisation tricks may led that the compiler don't understand your intention. If the compiler is reasonable well they might use some other arithmetic tricks that are faster than single assembler commands (how many cycles take the div on your target)? Probably y = a/b; x = a%b; will produce what you want.

If your target processor has a cache there are things you can do to increase the cache hit rate: It might be useful to check if any nested loops are 'the wrong way around' (swapping inner loop with outer loop) or thinking about the data structures (array of struct vs. struct of array).
 

Offline aurmerTopic starter

  • Regular Contributor
  • *
  • Posts: 84
  • Country: us
  • Insert text here.
Re: C runtime optimization - division and modulous together
« Reply #4 on: December 16, 2016, 11:05:50 pm »
here are my results

I set an interrupt timer to go off after 0x8000 cycles, and when the interrupt fired, each of these code blocks (which perform the exact task) performed as follows.


Code: [Select]
    i = 0;
    int j = 5;
    int quotient = 0;
    int remainder = 0;
    int this_op_represents_how_long_it_takes_to_access_quot = 0;
    int this_op_represents_how_long_it_takes_to_access_rem = 0;
   
    TMR1 = 0;
   
    while(1){
        i++;
        quotient = j / 2;
        remainder = j % 2;
        this_op_represents_how_long_it_takes_to_access_quot = quotient;
        this_op_represents_how_long_it_takes_to_access_rem = remainder;
    }
0x024A (decimal 586) times performed
56 cycles per loop

Code: [Select]
    i = 0;
    int j = 5;
    div_t divresult;
    int this_op_represents_how_long_it_takes_to_access_quot = 0;
    int this_op_represents_how_long_it_takes_to_access_rem = 0;
   
    TMR1 = 0;
   
    while(1){
        i++;
        divresult = div(j,2);
        this_op_represents_how_long_it_takes_to_access_quot = divresult.quot;
        this_op_represents_how_long_it_takes_to_access_rem = divresult.rem;
    }
0x02E9 (decimal 745) times performed
44 cycles per loop

Code: [Select]
    i = 0;
    this_op_represents_how_long_it_takes_to_access_quot = 0;
    this_op_represents_how_long_it_takes_to_access_rem = 0;
   
    TMR1 = 0;
   
    while(1){
        i++;
        asm volatile("MOV _j, w4");
        asm volatile("MOV #2,  w5");
        asm volatile("REPEAT #0x11");
        asm volatile("DIV.SW w4, w5");
        asm volatile("MOV w0, _this_op_represents_how_long_it_takes_to_access_quot");
        asm volatile("MOV w1, _this_op_represents_how_long_it_takes_to_access_rem");
    }
0x0493 (decimal 1171) times performed
28 cycles per loop


This is a demonstration of how the inline assembly can do it. I am just asking if C has a "perfectly" streamlined/optimized operator for this.
If I just asked the wrong question, shame on me for asking before I was ready for help. Please be kind and direct me to a resource which will teach me the question I SHOULD be asking. Thank you.
 

Offline bson

  • Supporter
  • ****
  • Posts: 2270
  • Country: us
Re: C runtime optimization - division and modulous together
« Reply #5 on: December 16, 2016, 11:19:44 pm »
If I remember correctly the free XC8 compiler doesn't even permit use of hardware MUL... and it's not like you can just inline the asm or machine code for it either, because the assembler won't accept it.  Presumably so you can't circumvent use of their library.  This and some other completely idiotic features is why I switched away from PIC to MSP430 and use of either TI's compiler ($10 with the purchase of a LaunchPad) or gcc.  Shitty tools is really the worst place to begin any project.
 
The following users thanked this post: aandrew

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12860
Re: C runtime optimization - division and modulous together
« Reply #6 on: December 17, 2016, 01:00:03 am »
If I remember correctly the free XC8 compiler doesn't even permit use of hardware MUL... and it's not like you can just inline the asm or machine code for it either, because the assembler won't accept it.
Either your memory is incorrect or the hardware multiplier has been enabled for free mode in a later XC8 version to the one you were using.
Code: [Select]
l649:
;main.c: 18: unsigned int t;
;main.c: 19: t=PORTA*PORTB;
movf ((c:3968)),c,w ;volatile
mulwf ((c:3969)),c ;volatile
movff prodl,(c:main@t)
movff prodh,(c:main@t+1)
line 20
That's a snippet of the .as file from a PIC18 test program that contained:
Code: [Select]
void main(void)
{
   unsigned int t;
   t=PORTA*PORTB;
built in FREE mode using XC8 v1.35.
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: C runtime optimization - division and modulous together
« Reply #7 on: December 17, 2016, 11:34:49 am »
Optimizations used?
I used to think i was smarter than the compiler but XC16 is actually good, so i let go. of course assembly is best for raw performance, but this seems like a benchmark: useless in the real world, where the sanity you keep from writing C is work a couple o megahertz more to keep up with performance :)

With no optimization the code is a bit bulky and reloads variables and relinks at every unnecessary occasion, but enabling standard optimization should result in your assembly code, give or take one instruction.
 

Offline bktemp

  • Super Contributor
  • ***
  • Posts: 1616
  • Country: de
Re: C runtime optimization - division and modulous together
« Reply #8 on: December 17, 2016, 11:54:28 am »
C30/XC16 have a lot of buildin functions for making use of the nice instruction set without the need of inline assembly:
Quote
__builtin_divmodsd
Description: Issues the 16-bit architecture’s native signed divide support with the same restrictions given in the “dsPIC30F/33F Programmer’s Reference Manual” (DS70157). Notably, if the quotient does not fit into a 16-bit result, the results (including remainder) are unexpected. This form of the built-in function will capture both the quotient and remainder.
Prototype: signed int __builtin_divmodsd(signed long dividend, signed int divisor,signed int *remainder);
Argument: dividend number to be divided
divisor number to divide by
remainder pointer to remainder
There is also __builtin_divmodud for unsigned division.
 

Offline John Coloccia

  • Super Contributor
  • ***
  • Posts: 1212
  • Country: us
Re: C runtime optimization - division and modulous together
« Reply #9 on: December 17, 2016, 12:06:12 pm »
You must have optimizations turned off. If not, the compiler would simply remove all loop that never does anything and never exits, right? I'm assuming it would, anyhow.

Why not turn on optimizations, and then use the real time clock (assuming there is one) to do a real timing analysis on the straight forward version of the code. Why are you trying to save a handful of cycles, anyhow? Are you starved for resources?
 

Offline bson

  • Supporter
  • ****
  • Posts: 2270
  • Country: us
Re: C runtime optimization - division and modulous together
« Reply #10 on: December 17, 2016, 10:17:34 pm »
Either your memory is incorrect or the hardware multiplier has been enabled for free mode in a later XC8 version to the one you were using.
They must have come to their senses and enabled it in the free version.
 

Online Ian.M

  • Super Contributor
  • ***
  • Posts: 12860
Re: C runtime optimization - division and modulous together
« Reply #11 on: December 17, 2016, 11:56:07 pm »
Either your memory is incorrect or the hardware multiplier has been enabled for free mode in a later XC8 version to the one you were using.
They must have come to their senses and enabled it in the free version.
Well I've just checked v1.10 (August 2012) and it generates the same MULWF instruction so you must have been using a *really* old version.
 

Offline julian1

  • Frequent Contributor
  • **
  • Posts: 735
  • Country: au
Re: C runtime optimization - division and modulous together
« Reply #12 on: December 18, 2016, 12:21:36 am »
It's a function of the quality of the compiler. Gcc on x86 will do it. Eg. take test function to compute result and remainder and return the sum.

Quote
int main( int a, int b)
{
  return a / b + a % b;
}

without optimisation gcc gives generates two division instructions (and lots of stack shuffling) then adds them

Quote
...
        movl    %esi, -8(%rbp)
        movl    -4(%rbp), %eax
        cltd
        idivl   -8(%rbp)
        movl    %eax, %ecx
        movl    -4(%rbp), %eax
        cltd
        idivl   -8(%rbp)
        movl    %edx, %eax
        addl    %ecx, %eax
...


compiled with -O2 optimisation uses a single idiv to get the result,
Quote
main:
.LFB0:
        .cfi_startproc
        movl    %edi, %eax
        cltd
        idivl   %esi
        addl    %edx, %eax
        ret
        .cfi_endproc
 

Offline aurmerTopic starter

  • Regular Contributor
  • *
  • Posts: 84
  • Country: us
  • Insert text here.
Re: C runtime optimization - division and modulous together
« Reply #13 on: December 20, 2016, 01:58:55 am »
Thanks all for the discussion. This was a great learning experience. The _builtin_divmodsd is what I will probably go with in the end.

I didn't understand the -O compiler flag when I posed this question, but since then I have registered for a 60 trial of the PRO version just to see what the -O3 can do. You are right, at the -O3 level of optimization, it figures out how to just do the DIV once.
If I just asked the wrong question, shame on me for asking before I was ready for help. Please be kind and direct me to a resource which will teach me the question I SHOULD be asking. Thank you.
 

Offline JPortici

  • Super Contributor
  • ***
  • Posts: 3461
  • Country: it
Re: C runtime optimization - division and modulous together
« Reply #14 on: December 20, 2016, 06:29:06 am »
o3? o1 should already do the trick
or at least be good enough so you don't have to pay for the lic.... hey wait you can recompile the compiler using the sources provided by microchip and patch away the licensing bit. completely legal, the license part is more a lazy tax. user karel made a comprehensive guide
https://www.eevblog.com/forum/microcontrollers/pic32-evolution/msg1007099/#msg1007099
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf