Author Topic: Divide algorithm  (Read 20716 times)

0 Members and 2 Guests are viewing this topic.

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Divide algorithm
« on: March 11, 2014, 05:29:18 pm »
Hi,

I need to divide 2 long ints up to an arbitrary precision. I must be able to output digits one by one. When the part before decimal point is over then the decimal point is output and then up to N digits after the decimal point.
So if N=6 and I have to divide  1234/33 then result would be:
37.393939
I'm not sure if I want to round correctly. I have limited memory so it would be fine without rounding the last digit.

I tried to search for an algorithm and I found this page:
http://www.cs.sunysb.edu/~skiena/392/programs/bignum.c

which is nice and does big integer calculations, but the result is also an integer. My head is now full of bugs, so any attempt to modify the algorithm fails. I'll try later, but any help will be appreciated. the idea is to have something like:

// outputs a/b up to N digits after decimal point
function divideBig(unsigned long int a, unsigned long int b){
  if(b = 0)
    return 0; // I'll handle that one later
  while(precision < N){
     // calc
     outputDigit(x);
  }
}

Thanks in advance!
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #1 on: March 11, 2014, 05:48:47 pm »
Believe or not the biggest ideas come to you while you are in the toilet. The algorithm here allows math with only integer numbers. The result is also integer. The numbers are held simply in character arrays (well struct with character array, sign and count of digits). So in order to get 6 digits after the decimal point I can:
1. add 6 zeroes after first number (multiply by 1000000)
2. run math function
3. when outputting the result put a decimal point 6 digits before the end of the number

... it's that simple.

Well if you have any ideas to optimize that, you are welcome. I only have 2k instructions memory in PIC16F1503 and 128 bytes of ram. Maybe I should use BCD to reduce the usage of RAM...
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #2 on: March 11, 2014, 06:29:09 pm »
Back to square 1. Can't use the library because it uses recursion :(

Code: [Select]
Error   [1089] E:\data\PIC\frequency-counter\firmware\bignum.c; 111. recursive function call to "_subtract_bignum"
Even if the compiler supports it the stack will probably not be enough.
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 28622
  • Country: nl
    • NCT Developments
Re: Divide algorithm
« Reply #3 on: March 11, 2014, 06:33:28 pm »
For for fixed point math. If N is 6 then multiply 1234 by 1000000, divide by 33 and send a dot after the 6th digit.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline miceuz

  • Frequent Contributor
  • **
  • Posts: 387
  • Country: lt
    • chirp - a soil moisture meter / plant watering alarm
Re: Divide algorithm
« Reply #4 on: March 11, 2014, 06:41:44 pm »
if I recall correctly, you can always rewrite a recursive algorithm as a loop
« Last Edit: March 11, 2014, 07:27:30 pm by miceuz »
 

Offline Len

  • Frequent Contributor
  • **
  • Posts: 553
  • Country: ca
Re: Divide algorithm
« Reply #5 on: March 11, 2014, 06:47:05 pm »
if I recall correctlym you can always rewrite a recursive algorithm as a loop
But you still need a stack-like data structure to hold the partial results, which uses a similar amount of memory.

[Edit] Actually, looking at the code, I think it could be re-written to not be recursive at all.
« Last Edit: March 11, 2014, 06:50:42 pm by Len »
DIY Eurorack Synth: https://lenp.net/synth/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #6 on: March 11, 2014, 08:21:24 pm »
Quote
up to an arbitrary precision.

Not going to happen.

Set realistic goals and then you have a chance of getting something done.

Typical routes are BCD, or one of those Qn formats (Q15 or Q31 for example).
================================
https://dannyelectronics.wordpress.com/
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Divide algorithm
« Reply #7 on: March 11, 2014, 08:46:52 pm »
pls refine this, i'm not going to debug it, just come out of my biggest astral plane in the lab in the sleepy "almost dawn" 4AM here...
Code: [Select]
function divideBig(unsigned long int a, unsigned long int b) {
long int rem;

outputDigit(a \ b);
outputChar('.');
a = a % b; // remainder

while (a) {
outputDigit((a * 10) \ b);
a = (a * 10) % b; // remainder
};
};
note:
1) i'm not sure exactly in C (need to check this out later), but inverted slash "\" (instead of normal slash for division "/") as in VB means get integer part only without "rounding off" to nearest integer. so 19 \ 10 will result 1, not 2. and iirc % is for modulus (remainder) in C right? well you check that one out, i'm sleepy, its been a while i talked in C.
2) that one is designed for infinite precision (unless until the remainder is exhausted). so if you try to divide pi it will halt :P
3) since its not debugged, its highly probably there will be bugs. so you need 3X the effort to debug it ;)
4) or the whole thing is just entirely "something else"
5) and last edit... it only works for +ve numbers which is bad.
6) and last edit... i didnt understand your question which i believe now trying to do 64bit numbers in 8bits machine. so...
FWIW.
« Last Edit: March 11, 2014, 09:02:22 pm by Mechatrommer »
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Divide algorithm
« Reply #8 on: March 11, 2014, 08:49:31 pm »
if I recall correctlym you can always rewrite a recursive algorithm as a loop
But you still need a stack-like data structure to hold the partial results, which uses a similar amount of memory.

[Edit] Actually, looking at the code, I think it could be re-written to not be recursive at all.

The advantage should be substantial, like several fold.  If you're talking C, each recursive call gets you the stack frame of that function, including the return address, and usually a frame pointer.  The classic x86 being;
Code: [Select]
CALL func
...
func:
PUSH BP
MOV BP, SP
ADD BP, -framesize
[PUSH used registers]
...
[POP used registers]
POP BP
RET
And I believe for something like AVR8, gcc normally uses X (or Y or Z) for the purpose of BP.  So you get at least two WORDs on the stack plus new copies of all local variables, plus backups of all registers (give or take how the compiler has allocated and optimized the function, whether it uses registers or memory, and whether it has free registers to clobber or not).

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: nz
Re: Divide algorithm
« Reply #9 on: March 11, 2014, 08:52:05 pm »
Here's my stab at it - seems to work for the one test case you supplied. No recursion, not the highest performance but that wasn't one of your constraints.

#include "stdio.h"
unsigned long numerator, denominator;
char place;
#define DIV_MAX_INT_DIGITS 8

int div_setup(unsigned long a, unsigned long b)
{
   if(b == 0) return -1;

   /* Prevent any overflows */
   if(a > 0xFFFFFFFF/10)
     return -1;
   if(b > 0xFFFFFFFF/10)
     return -1;

   numerator = a;
   denominator = b;
   place = DIV_MAX_INT_DIGITS-1;
   return 0;
}

int divide_extract_next_digit(void)
{
   int digit = 0;
   unsigned long dtemp = denominator;
//   printf("Extracting place %i...",place);

   if(place > 0) {
     int i;
     for(i = 0; i < place; i++)
       dtemp*=10;
   }
   else if(place < 0)
   {
     /* Extracting the decimal digits */
     numerator *= 10;
   }

   /* Actually extract the digit */
   while(numerator >= dtemp)
   {
     digit++;
     numerator -= dtemp;
   }

   place--;
//   printf("%i\r\n",digit);
   return digit;
}

int main(int argc, char *argv[])
{
   if(div_setup(1234, 33) < 0)
     return -1;


   /* Surpress leading zeros */
   while(place > 0) {
     char c = divide_extract_next_digit();
     if(c != 0)
     {
       putchar('0'+c);
       break;
     }
   }

   while(place >= 0)
     putchar('0'+divide_extract_next_digit());
   putchar('.');

   while(place > -DIV_MAX_INT_DIGITS)
     putchar('0'+divide_extract_next_digit());

   putchar('\r');
   putchar('\n');
   return 0;
}
« Last Edit: March 11, 2014, 09:24:36 pm by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: nz
Re: Divide algorithm
« Reply #10 on: March 11, 2014, 09:03:27 pm »
Quote
up to an arbitrary precision.

Not going to happen.


The input are two 32-bit unsigned integers (depending on platform), so output has a range of between (2^32-1)/1 down to 1/(2^32-1).

When just using the full range of 32-bits numbers it is a bit painful to handle corner cases - avoiding overflows and so on. In my code above I'm limiting both inputs to at most 429,496,729. This allows it to work with up to nine digits to the left of the decimal (well, actually eight and a half), but as many as you care to the right of the decimal.

 
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #11 on: March 11, 2014, 10:54:12 pm »
Treat it like a long division but you will need to do it in assembler for performance.

Code: [Select]
1234/33

10011010010 / 100001
100001
------
000101
   101100
   100001
   ------
   00101110
     100001
     ------
     00110100 <-- here we past the integer part and we step into the fractional part
       100001
       ------
       0100110
        100001
        ------
        000101000
           100001
           ------
           000111000
              100001
              ------
              0101110
               100001
               ------
               00110100
                 100001
                 ------

 
Result: 0010 0101.0110 0100 1101

Explained a little:

1234d == 10011010010b (dividend)
33d == 100001b (divisor)

zeros at the beginning don't count so you want to shift left the dividend into an accumulator until the accumulator is bigger than the divisor and you have to take into account the number of digits. Since in C you don't get to access the Carry and Overflow flags you can use bitwise operators.

This is what I just came up with, not tested so the code might have bugs, but I think it should work or at least give you an idea.

you gotta handle the divisor == 0 as well that I didn't include in the code
and also the infinites, the undetermined and nans (not a number) as well if you need them

Code: [Select]
unsigned long divideBig(unsigned long dividend, unsigned long divisor)
{
  unsigned long result = 0;
  unsigned long accumulator = 0;
  int precission=sizeof(unsigned long)*8; // number of bits
  unsigned long msb = 1<<(precission-1); // most significant bit constant.
  unsigned char fraction = 0; // for fractional part later on (8 bits)

  while(precission) {
    // check most significant bit
    if(dividend&msb) { // We are going to carry a bit into accumulator
      accumulator = (accumulator<<1)|1;
    } else { // we just shift left the accumulator.
      accumulator = (accumulator<<1);
    }
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to substract the divisor from accumulator
      accumulator -= divisor;
      // adjust result with a 1
      result = (result<<1)|1;
    } else {
      // adjust result with a 0
      result = (result<<1);
    }
    // shift the dividend (for the next bit)
    divided <<=1;
    // You might be able to do an early out check
    // if the accumulator and dividend are 0.
    // but you will have to shift the result by the precission counter (number of bits) left.

    // decrement the number of bits to process.
    precission--;
  }
  // At this point you have the result (integer division)
  // if you want fixpoint decimals you can run it through the same loop for
  // as many binary bits of precission you need.
  // the fraction needed is in the accumulator and you just pump 0's per loop.
  // for example, we want the fraction for 8 bits.

  precission = sizeof(unsigned char)*8; // number of bits in an unsigned char (really is 8 if you want to hard code it)

  while(precission) {
    // we just shift left the accumulator since we are done with the dividend.
    accumulator = (accumulator<<1);
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to substract the divisor from accumulator
      accumulator -= divisor;
      // adjust fraction with a 1
      fraction = (fraction <<1)|1;
    } else {
      // adjust fraction with a 0
      fraction = (fraction<<1);
    }
    // shift the dividend (for the next bit)
    divided <<=1;
    // You might be able to do an early out check
    // if the accumulator is 0.
    // but you will have to shift the fraction by the precission counter (number of bits) left.

    // decrement the number of bits to process.
    precission--;
  }
  // fraction will have your fractional part in binary format.
  // meaning the most significant bit will be 2^-1 or 1/2
  // the next 1/4, 1/8, 1/16, 1/32, 1/64, 1/128 and 1/256
  // if the bit is on then you add the fraction and you should come up with the
  // value.
  ...
}


Edit: I didn't deal with the sign because the inputs were unsigned long, otherwise you have to compute the result for the sign with an xor

Edit2: once you reach the fractional part after exhausting the divider you can do some optimization. Say you keep a history of accumulator values if the accumulator gets to the same value again then you know it's periodical and will repeat over and over. since you have the binary result for those bits, then you can just shift and or the periodical result.

Edit3: also I didn't return the result and didn't set how to pass the fractional part back, but it was to give you an idea on how to do this.

Edit4: there are a lot of optimizations that you can make if you know the sizes of the operands, you can hard code a lot of those values for your needs, if using fixed point math say 16.16 you can optimize the code to deal with that too. If done in assembly you could make a function that will deal with both the integer part and the fractional part reducing code size, also you can use the overflow flag instead of the accumulator, the example is just a template in C but assembly language will give you the best results.
« Last Edit: March 12, 2014, 02:03:20 am by miguelvp »
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #12 on: March 12, 2014, 01:43:03 am »
Thanks for all suggestions! I was a bit tired and I didn't explain the best. So the best term is "fixed point" math. The only problem is that maximum size of int is 32 bits in XC8. Some of you have nice suggestions. hamster_nz's code seams like it'll do the job - instead of multiplying it upfront, he does that for every digit after the integer division. Yes, performance is still not an issue, unless it takes seconds to calculate. Currently I'm using 32 bit floating point that is supported by microchip's XC8 and it takes about 10-15 ms to calculate with 10MHz clock (2.5MHz instruction clock). If this time rises to 100-200 ms it's still fine. miguelvp's idea to try it. I'm not sure how easy it'll be to do it with the bits directly though...
I'll review all posts tomorrow again and I'll share results.
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #13 on: March 12, 2014, 02:45:52 am »
On:
   /* Prevent any overflows */
   if(a > 0xFFFFFFFF/10)
     return -1;
   if(b > 0xFFFFFFFF/10)
     return -1;

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.

Edit: looked it up, pretty spartan instruction set
http://en.wikipedia.org/wiki/PIC_microcontroller#Instruction_set

but my algorithm should map fine to that instruction set, but will be better to actually code it in assembler to begin with
« Last Edit: March 12, 2014, 03:08:13 am by miguelvp »
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #14 on: March 12, 2014, 03:26:22 am »

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.

Yes, binary will be easier to divide, but the number still has to be output to the screen - how do you do that - with divide by 10. You still have to use divide library and yes, these basic 8 bit MCUs don't have divide or multiply.
Also the whole is to have more digits than long int can represent. Long is 4,294,967,295 (unsigned) which is about 9 and a bit more. So you can't fit write the number 16 000 000.123456 in a long (or integer representation of the fixed point float 16 000 000 123 456 )

In theory you can shift 2 long numbers and save the result in a 64 integer even when compiler doesn't support it, but in every operation you'll have to execute 8 shifts ... which better be an assembler function to use carry flag to avoid additional instructions when shifting bits between bytes. But even this is still not a show stopper how are you gonna output 64 bit integer as a decimal string?
« Last Edit: March 12, 2014, 03:31:45 am by npelov »
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #15 on: March 12, 2014, 03:55:19 am »
Here's my stab at it - seems to work for the one test case you supplied. No recursion, not the highest performance but that wasn't one of your constraints.

I think it won't work. max digits is 8 in your case. So t the start place = 8-1 = 7. In int divide_extract_next_digit(void) first thing you do is:
 if(place > 0) {
     int i;
     for(i = 0; i < place; i++)
       dtemp*=10;
   }

dtemp here is the divisor. You check if the divisor is > 0xFFFFFFFF/10, but if divisor is 0xFFFFFFFF/10 - 1 then it'll overflow because you multiply it by 10 eight times.
I'll try to modify the algorithm.
 

Offline T3sl4co1l

  • Super Contributor
  • ***
  • Posts: 22436
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Divide algorithm
« Reply #16 on: March 12, 2014, 04:22:43 am »
I wrote a uint16 divide-by-ten for AVR8, which takes 45 words and 60 cycles.  It uses a multiplier of (2^24 / 10) + 1 for an effective 24 bit multiply, getting the byte-aligned shifts for free, and returns a pair of uint16s (quotient and remainder).  The result is exact for all operands.

But AVR8 has 8x8 hardware multiply and piles of registers, so I'm not sure this is at all relevant to PIC.  I would just as well suggest dumping the PIC altogether and using something actually, you know, nice. :P

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: nz
Re: Divide algorithm
« Reply #17 on: March 12, 2014, 04:37:01 am »
On:
   /* Prevent any overflows */
   if(a > 0xFFFFFFFF/10)
     return -1;
   if(b > 0xFFFFFFFF/10)
     return -1;

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.


That division can/will be optimized away at compile time - no division is used at run-time. It will just be testing a and b against a number a little bit more than 400,000,000. No division is needed - I just didn't want to include a magic number - at least using 0xFFFFFFFF / 10 makes some sense (ie. you can multiply it by 10 without an overflow in an unsigned 32-bit integer).

And the only multiplication used is by 10, so it can all be done in a single subroutine using bit-shifts "out = ((in<<3) + (in<<1)", making that pretty easy too.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #18 on: March 12, 2014, 04:52:28 am »
For your example 1234/33 = 37.393939

0010 0101.0110 0100 110(110 0100 110) where () means periodical so they repeat because the accumulator ended up to be the same.
Lets say you want the fixed point to be 8 bits integer and 24 bits fraction for 32 bits then you have

0010 0101(.)0110 0100 1101 1001 0011 0110

The integer part is simple you just take the 0010 0101 and print it as a decimal or use a table to add the values together.

Code: [Select]
00100101|01100100 11011001 00110110
++++++++|-------- -------- --------
00000000|00000000 01111111 11122222
76543210|12345678 90123456 78901234

The numbers below the + and - are the 2 exponential so on the positive part is fairly easy.

Now on the negative part I need to think about it.

You can't really use a table since it's the invert i.e. 1/2,1/4,1/8..... (Edit: you actually can use a table or shift a value, look edits towards the end).
I guess you could treat them backwards and make a table but then you will have to do the inverse which is another division.

I have to think about this further.

Edit: if you have a printf function that actually handles floats without math it might be possible to change the fractional float to a format that printf will recognize. i.e (1.)00101011001001101100100110110*2^6
which translates to  IEEE 754 single precision:



127 biased + 6 = 133 = 10000101 binary

so equal to:
Code: [Select]
   Exponent|Mantisa (1.) implied
0|10000101|00101011001001101100100
^
sign positive

Also on the integer number approach: 00100101011001001101100100110110 in decimal (the result for your sample) is 627366198
divided by 256^3 (16777216 because 24 bit mantissa on the fixed point) = 37.39393937587738037109375 (yeah, round off, or I made a mistake on my computations, not on the way to do it. Might have missed a 0 or a 1 since i'm doing these calculations manually)

I know that means the representation is binary sound but maybe there is something to it, but only base changes that come to mind implies exponents and logs (not cheap) but there has to be a way.

Edit again, found a way to print the mantissa in decimal form, make a table for the mantissa for the sample it yields: 393939375877380371093750 for the fractional side. And if you don't want a table, just shift right to get to the next value from the 500000000000000000000000 base number, the code should be simple and in assembly you should be able to use the carry to concatenate the shifts if using rotate right through carry (RRF)

Note: this denotes a fixed point 8 bit integer and 24 bit mantissa because it makes sense but I don't know your range and my earlier code sample was not that way it was a long integer and 8 bit mantissa but you should have enough to mold things together. You have to figure out if shifting is faster than fetching a table value on your micro-controller depending on your integer and fractional bits.

Code: [Select]
xxxx xxxx(.)0110 0100 1101 1001 0011 0110

500000000000000000000000 0 (1/2)        2^-1
250000000000000000000000 1 (1/4)        2^-2
125000000000000000000000 1 (1/8)        2^-3
062500000000000000000000 0 (1/16)       2^-4
031250000000000000000000 0 (1/32)       2^-5
015625000000000000000000 1 (1/64)       2^-6
007812500000000000000000 0 (1/128)      2^-7
003906250000000000000000 0 (1/256)      2^-8

001953125000000000000000 1 (1/512)      2^-9
000976562500000000000000 1 (1/1024)     2^-10
000488281250000000000000 0 (1/2048)     2^-11
000244140625000000000000 1 (1/4096)     2^-12
000122070312500000000000 1 (1/8192)     2^-13
000061035156250000000000 0 (1/16384)    2^-14
000030517578125000000000 0 (1/32768)    2^-15
000015258789062500000000 1 (1/65536)    2^-16

000007629394531250000000 0 (1/131072)   2^-17
000003814697265625000000 0 (1/262144)   2^-18
000001907348632812500000 1 (1/524288)   2^-19
000000953674316406250000 1 (1/1048576)  2^-20
000000476837158203125000 0 (1/2097152)  2^-21
000000238418579101562500 1 (1/4194304)  2^-22
000000119209289550781250 1 (1/8388608)  2^-23
000000059604644775390625 0 (1/16777216) 2^-24

total:
393939375877380371093750

The table for the integer part is easy as well, and you can shift from 1 left to get the next table value if needed and you don't want a table.

Code: [Select]
0010 0101(.)xxxx xxxx xxxx xxxx xxxx xxxx

128 0 (2^7 = 128)
064 0 (2^6 = 64)
032 1 (2^5 = 32)
016 0 (2^4 = 16)

008 0 (2^3 = 8)
004 1 (2^2 = 4)
002 0 (2^1 = 2)
001 1 (2^0 = 1)

32+4+1 = 37

So 1234/33 = 37(int)"."393939375877380371093750(fraction)


one last edit before bed, i had 2^0=0 that's wrong, it's = 1 so I corrected it
« Last Edit: March 12, 2014, 08:05:45 am by miguelvp »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #19 on: March 12, 2014, 11:09:03 am »
Quote
I wrote a uint16 divide-by-ten for AVR8, which takes 45 words and 60 cycles. 

You can easily beat that.

Division of a given number is fairly easy. For example, in your case, X / 10 can be decomposed into (X * 3.2) / 32 = (X * 2 + X + X * 2 / 10) / 32. X * 2 / 10 can be further expanded to reach your desired precision, as the / 32 done with shifting.

Quote
It uses a multiplier of (2^24 / 10) + 1 for an effective 24 bit multiply

There are very efficient routines for (2^M / N), particularly in the DDS libraries.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #20 on: March 13, 2014, 03:01:47 am »
Bringing it home and returning a formatted static string to display the result.

Notes:

Once you reach the fractional part after exhausting the divider you can do some optimization. Say you keep a history of accumulator values if the accumulator gets to the same value again then you know it's periodical and will repeat over and over. since you have the binary result for those bits, then you can just shift and or the periodical result.

There are a lot of optimizations that you can make if you know the sizes of the operands, you can hard code a lot of those values for your needs, if using fixed point math say 16.16 you can optimize the code to deal with that too. If done in assembly you could make a function that will deal with both the integer part and the fractional part reducing code size, also you can use the overflow flag instead of the accumulator, the example is just a template in C but assembly language will give you the best results.

Also note that I didn't even compile any of the sample code I'm submitting here, should work but consider it as a guideline.

Code: [Select]
//
// divideBig, the inputs can be fixed point and we don't care where the period is at
// since a division is a relation, so 3.56/2.45 will yield the same result as 356/245
// so as long as both inputs have the same fixed point representation the results will be good.
//
// The return string must be static or global somewhere because we are returning the pointer to it
// so the values must be valid when we are outside of the scope of this function.
// In this example the return string will be 8.8 bynary unsigned and 0 filled but converted to decimal
// So the integer part will be in the range [0,65535] and the fractional part will be in the range [0,9999847412109375] correction [0,999984741]
// in increments of 2^-16 (0000152587890625), correction (000015258) because that's the largest we can fit in an unsigned long of 32 bits.
//
// Even if the result and the fraction fit on unsigned shorts, we use unsigned longs so the compiler
// can do the right operations. You could try using unsigned shorts instead but you will have to test
// if there is bad behaviour specially since we ignore where the fraction is in the dividend and divisor.
// so the result will overflow in some cases.
//
char * divideBig(unsigned long dividend, unsigned long divisor)
{
  //
  // edit: 16 digits for the mantissa is too big for an unsigned long, correcting it.
  // static char strResult[5+1+16+1]; // you can hard code it to 23 but using adds to show why we pick this value
  //
  // Integer part 5 decimal digits, dot one digit, decimal mantissa 9 digits + null termination.
  static char strResult[5+1+9+1]; // you can hard code it to 16 but using adds to show why we pick this value
  unsigned long result = 0; // This holds the integer part we only need 16 bits of this so it could be an unsigned short.
  unsigned long accumulator = 0; // temporary accumulator for computing the division and coding the string.
  int precision=16; // number of bits for integer result and later own for the fractional number of bits.
  unsigned long msb = 1<<(precision-1); // most significant bit constant.
  unsigned long fraction = 0; // for fractional part later on could be an unsigned short (16 bits).
  unsigned long decimalFraction = 0; // for the decimal representation for the end result.

  // Place code here to deal with divide by 0, infinite (+|-), NaNs, undetermined values.
 
  // If inputs are not unsigned, place code here to extract the msb from both and compute the
  // resulting sign by XOring both bits if the result is 1 then the final result is negative
  // otherwise is positive, bitwise XOR operator in C is '^'
  // If inputs are signed you must strip the sign by doing an AND with ~msb
 
  // Do the integer part of the division.
  while(precision) {
    // check most significant bit
    if(dividend&msb) { // We are going to carry a bit into accumulator
      accumulator = (accumulator<<1)|1;
    } else { // we just shift left the accumulator.
      accumulator = (accumulator<<1);
    }
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to subtract the divisor from accumulator
      accumulator -= divisor;
      // adjust result with a 1
      result = (result<<1)|1;
    } else {
      // adjust result with a 0
      result = (result<<1);
    }
    // shift the dividend (for the next bit)
    dividend <<=1;
    // You might be able to do an early out check
    // if the accumulator and dividend are 0.
    // but you will have to shift the result by the precision counter (number of bits) left.

    // decrement the number of bits to process.
    precision--;
  }
  // At this point you have the result (integer division) if you were dividing by the smallest number
  // the result might take an extra digit
  // if you want fixed point fractional part you can run it through the same loop for
  // as many binary bits of precision you need.
  // the fraction needed is in the accumulator and you just pump 0's per loop.
  // for example, we want the fraction for 16 bits.

  precision = 16; // number of bits for the fractional result predefined for an 16.16 unsigned fixed point float.

  while(precision) {
    // we just shift left the accumulator since we are done with the dividend.
    accumulator = (accumulator<<1);
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to subtract the divisor from accumulator
      accumulator -= divisor;
      // adjust fraction with a 1
      fraction = (fraction <<1)|1;
    } else {
      // adjust fraction with a 0
      fraction = (fraction<<1);
    }
    // shift the dividend (for the next bit)
    dividend <<=1;
    // You might be able to do an early out check
    // if the accumulator is 0.
    // but you will have to shift the fraction by the precision counter (number of bits) left.

    // decrement the number of bits to process.
    precision--;
  }
  // fraction will have your fractional part in binary format.
  // meaning the most significant bit will be 2^-1 or 1/2
  // the next 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, 1/256, 1/512, 1/1024, 1/2048, 1/4096, 1/8192, 1/16384, 1/32768 & 1/65536
  // if the bit is on then you add the fraction and you should come up with the
  // value.
 
  // Now to produce the string
  // result has the integer part so we don't need to mess with it
  // fraction however has to be converted to decimal.
  // we need enough space to hold the increment (0000152587890625) 16 decimal digits but max it out to 1/2
  // correction: increment 000015258, because we can only fit 9 digits on an unsigned long.
  //
  // Edit: this number is too big to fit on a 32 bit unsigned long
  // decimalFraction = 5000000000000000;
  decimalFraction = 500000000;
  precision = 16; // number of bits for the fractional part that needs to be converted to decimal form.
  accumulator = 0; // use the accumulator to store the decimal fractional value.
 
  while(precision) {
      // check most significant bit
    if(fraction&msb) { // We are going to carry a bit into accumulator otherwise nothing.
      accumulator += decimalFraction;
    } 
    // adjust the bit value weight to the right (divide by 2)
    decimalFraction >>= 1;
    // shift the fractional result left (for the next bit)
    fraction <<=1;
   
    // decrement the number of bits to process.
    precision--;
  }

  // 16 decimal digits too big for an unsigned long, correcting code
  // sprintf(strResult,"%05lu.%016lu", result, accumulator);

  // Now we have the integer part in result, and the fractional part printf friendly in the accumulator
  sprintf(strResult,"%05lu.%09lu", result, accumulator);

  // We are done
  return strResult;
}

Edit: change 16 to 9 so it will fit on an unsigned long
if you don't want the zero filled integer part, substitute the formatting string from "%05lu.%09lu" to "%5lu.%09lu" if you need to right justify it, or "%lu.%09lu" if you don't care about justification.

Caveats:

Overflow/Underflow, even if we are symmetrical (16.16) on the integer part and the fractional part, a division can overflow if you divide by the smallest possible number (2^-16) because the inverse will be 2^16 and that's too big to represent numerically in 16 bits.
However we are using 32 bits unsigned values and returning a string for the integer part can accommodate 32768, then in 5 digits it can accommodate the overflow value of 65536.

Changing the symmetry and/or the ranges will need some adjustments to the algorithm. That includes adjusting the msb to check if the bit being worked on (integer, fraction, or when computing the fraction decimal part) will need to be adjusted. I'm not sure if leaving msb to look at bit 15 for the whole algorithm will work, maybe needs to be adjusted when computing the fractional decimal representation on the third loop.

Similarly if you make the fractional part smaller then dividing by a large integer will underflow. Or something like that. In this example it can't underflow as far as I can tell, and since we are returning a string the overflow won't matter.

Since the smallest fractional step is 2^-16 (0.0000152587890625) (correction, we can only fit 0.000015258) your floating point accuracy is only 0.0001 with those 16 bits I think. So the results will be accurate  from 0.0 to 65536.0-0.0001 in 0.0001 steps. So you might want to adjust the sprintf to only give you the accurate values.

Taking it further
code this in assembler using the rotates through carry for concatenating shifts and using the carry for concatenating adds, etc.
Also in assembler you could make some of the code reentrant for computing the integral value and the fractional value reducing memory footprint. Some of the routines could be re-purposed for multiplication by doing jumps depending if you are dividing or multiplying.
you can use the Z (zero) flag too for early outs, and I bet more optimizations can be done, specially finding periodic values so you can exit the processing and exit early if there is a pattern that can be repeated.

Hope this helps someone.

Edit: the 16 digit conversion value was too big to fit on an unsigned long, corrected the code but probably introduced bugs if I missed anything. Then again, it's a guideline.

Edit: decided to compile the code and I was missing a couple of semicolons and instead of dividend i typed divided in a couple of places so I fixed it.
« Last Edit: March 13, 2014, 07:22:29 pm by miguelvp »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #21 on: March 13, 2014, 11:33:43 am »
You have a choice:

1) build your propritetary fixed point routines - challenging, interesting and probably not as reliable;
2) use the widely available Qn format.
================================
https://dannyelectronics.wordpress.com/
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #22 on: March 13, 2014, 01:25:22 pm »
This whole thread seems very puzzling to me, it seems to have drifted off the tracks.

You have a choice:

1) build your propritetary fixed point routines - challenging, interesting and probably not as reliable;
2) use the widely available Qn format.

Miguel's code seems, as far as I can tell (based on this idea of incrementing by some coarse approximation of 2^-16), to be performing binary division, including a binary fractional part, and then converting that binary fraction part to a decimal representation, probably inaccurately. Similarly, the suggestion above to use the Qn format must be implying the same idea, because the Qn format is just fixed-point binary representation. Again, it requires further work to get the decimal output.

Key point: converting a fixed-point binary fraction to decimal is a hard (as in, about as hard as the problem the OP asked) problem, not a throwaway afterthought.

The OP's insight at the beginning to just calculate (A*1000000)/B was much closer to the mark. Remembering how to do *decimal* long division from your school days and writing that in code form is also a good approach.

If A*1000000 is too big, then just do integer divide A/B for the integer part, then another integer divide ((A%B)*1000000)/B to get the fractional part. If that's still to big (A%B can be as big as B, so it's not technically any better), then doing each digit one by one:

Code: [Select]
print A/B
remainder = A //yeah, this is looks weird, but it gets clobbered on the first line of the loop
for each digit() {
   remainder = (remainder % B) * 10
   print remainder / B;
}

This is, essentially, decimal long division. You can get really clever and work in base 100 or base 1000, depending on how much latitude you have from your denominator. Note also how it can be asked to provide a million digits, and it can do so in constant memory and linear time. The binary long division code above is probably good for doing the integer divisions, but the key is to stop once you reach the fractional bit, and just retain the remainder (notice how the loop prints x % B and x / B; long division produces a pair of outputs, divisor (x/b) and remainder (x%b)).

It would help to have clear bounds provided on the numerator, denominator, and how large the largest integer types available are. It's past midnight here so I'm dropping this for tonight, but I'm just going to encourage moving away from the binary fractional part business for now.

« Last Edit: March 13, 2014, 01:31:43 pm by rs20 »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #23 on: March 13, 2014, 07:38:35 pm »
So I decided to compile the code and test it, here are the results on an intel processor:

Code: [Select]
// debug
Result: 37.393936155, cycles: 7350   fixed point 16.16
Result: 37.393940, cycles: 133       floating point IEEE 754
// profile
Result: 37.393936155, cycles: 4981
Result: 37.393940, cycles: 81

IEEE 754 has 24 bit mantissa (1 implied) so it's going to be a bit more accurate, but then you need to use floating point

I did look a the assembly generated and you can do better manually.
the floating point result is what it takes the Floating Point Unit to do it in dedicated hardware (way faster of course).
Do not understand how the FPU can vary from debug build to profile build.
For cycle measurement I used the performance counter.

rs20, what kind of assembly does your code generate?
a PIC 18 has a very minimal set of instructions, it can't even divide integers or multiply for that matter.

If you look at my code, the first loop does just what you said, but with the bits and hinting the compiler how to produce the code, only better way would be assembly.

As for the converting a fixed point binary fraction to decimal, is not hard, the code is above and it works. (hint) third loop does that.

why do a long division base 10 when the processor is base 2? base 2 long division is easier too, if the value is divisible then you put a 1 and shift, if not you just shift because the value is 0.

base 10 has 100 permutations on multiplications and divisions 10x10 array
base 2 has only 4 permutations on multiplications and divisions 2x2 array

And out of those permutations it's only true on one case so it's a binary decision that is fast and simple.

Just apply the same basic math but instead of 10 digits you have 2, it's the same thing.

And no, i'm not incrementing things by 2^-16 steps, i'm doing a long division to obtain the integer, then the fraction, then converting the fraction to decimal in three loops with lots of comments so you can follow.

But the code has it's limitations, it only accepts dividends under 65535, you you can alter it to change precision too
« Last Edit: March 13, 2014, 08:37:17 pm by miguelvp »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: nz
Re: Divide algorithm
« Reply #24 on: March 13, 2014, 08:22:57 pm »
So I decided to compile the code and test it, here are the results on an intel processor:

Code: [Select]
// debug
Result: 37.393936155, cycles: 7350   fixed point 16.16
Result: 37.393940, cycles: 133       floating point IEEE 754
// profile
Result: 37.393936155, cycles: 4981
Result: 37.393940, cycles: 81

But it is wrong [grin] - "I need to divide 2 long ints up to an arbitrary precision" as the original poster's problem.

Now I agree that 16.16 should be enough for anybody (well, I have used use 4.68 on a FPGA fractal generator....) - but in my code I can change the how many decimal digits I want to extract to an arbitrary number and keep on extracting digits:


Code: [Select]
...
while(place >= -100)
     putchar('0'+divide_extract_next_digit());
...

37.3939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939

Sure, it might be off by one least significant count (as it always rounds down), but it can keep extracting valid digits to the end of time if needed...

As a check - what do you get for 3/400000001? I get

0.00000000749999998125000004687499988281250029296874926757812683105468292236329269409176826477057933807355165481612086295969784260075539349811151625472120936319697659200755851998110370004724074988189812

(as does 'bc')


« Last Edit: March 13, 2014, 09:09:40 pm by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #25 on: March 14, 2014, 01:20:00 am »
As for the converting a fixed point binary fraction to decimal, is not hard, the code is above and it works. (hint) third loop does that.

Sorry, I meant doing it precisely was "hard" (as opposed to correct to 4dp), and as I mentioned in my original reply, I define "hard" as similarly difficult to the OP's question.

Quote
why do a long division base 10 when the processor is base 2? base 2 long division is easier too, if the value is divisible then you put a 1 and shift, if not you just shift because the value is 0.

You could explain to me how easy adding numbers is, it's not particularly relevant if you end up with something that's of no use to solving the problem at hand. For example, consider two different calculations, one of which works out to 0.10000000000000..., and another that works out to 0.099999999999... You want to print this number to 1 decimal place. Clearly, the first should print 0.1, the second should print 0.0. This is trivial to achieve correctly using base-10 division code.

Now, how do we achieve this with binary division and then decimal conversion second? 0.099999999999 and 0.10000000000000 in binary are:

0.0001100110011001100110011001100110011001100110011...
0.0001100110011001100110011001100110011000100000000...

So you need words and words of binary expansion before you can decide what the very first decimal digit is. One tenth has an infinite recurring binary expansion, never forget this. A contrived example for sure, but one that I think justifies my uneasiness with the approach in general. It's just unnecessarily wrong when a single line using AVR's standard compiler can do this with absolute precision.

Quote
And no, i'm not incrementing things by 2^-16 steps, i'm doing a long division to obtain the integer, then the fraction, then converting the fraction to decimal in three loops with lots of comments so you can follow.

Why have you got numbers like 000015258 in your comments, and nothing like that in the code itself? It's confusing. All I know is that your very own sample output shows an error in the fifth decimal place, so there has to be some sort of approximation going on here. When asked a problem as simple as outputting a division to 6 dp, something that could be done with perfect precision in one line in AVR's standard compiler, just making such approximations is bit rough, I suspect.

As I mentioned, my code does involve performing divides and getting remainders; if I were to write this code I'd use your divide code. I'd just pull out once I hit the fractional part, and keep the remainder, and start working digit by digit. All I ever need to do in addition to your code is * 10, which is trivial. My code will have longer run-time, of course, but still easily within OP's spec.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: Divide algorithm
« Reply #26 on: March 14, 2014, 04:34:50 am »
You guys are making this a lot more complicated than it needs to be , assuming that you're allowed to use the math primitives that are already in C (except for Mechatrommer!)  When you do an integer division, you get a remainder.  To get digits of fraction, you essentially treat the remainder as if it were multiplied by some power of 10, and do more integer division (just like doing long division by hand; the decimal point is an illusion, and you can keep doing more division to get as many digits as you want.)  With 32bit longs, you can get about 9 digits per division, but you don't need to.  Here's an example (debugged, even.  I kept NDIGITS small so as to test more):
Code: [Select]
#include <stdio.h>

// Number of times to loop over remainder
#define NLOOPS 10
// Digits per loop
#define NDIGITS 3

// Support macros for building printf strings
#define xstr(s) str(s)
#define str(s) #s

int main(int argc, char *argv[])
{
    long dividend;
    long divisor;
    long multiplier = 1;

    if (argc < 2) {
printf("\n Want arguments \"divisor dividend\"");
return 0;
    }
    sscanf(argv[1], "%ld", &dividend);
    sscanf(argv[2], "%ld", &divisor);


    // Print integer part
    printf("\n%ld.", dividend / divisor);


    // Compute the multiplier
    for (int i=0; i < NDIGITS; i++) {
multiplier *= 10;
    }

    for (int i=0; i < NLOOPS; i++) {
dividend = dividend % divisor;  // start with the remainder
dividend *= multiplier;  // multiply to get bigger integer
        // print NDIGITS more digits of fraction (with leading zeros.)
printf("%0" xstr(NDIGITS) "ld", dividend/divisor);
    }
    printf("\n");
}
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #27 on: March 14, 2014, 05:25:42 am »
You guys are making this a lot more complicated than it needs to be...

My point exactly :-). My only concern with your code is that we're dealing with a very, very primitive MCU here. Integer division does not come for free; if you use the / operator, that's going to inject some general-purpose divider into your assembler. I'd be very curious to see what that assembler looks like, and if the / 10 and % 10 operations are actually combined together cleverly. Also, just a nitpick, if you set NDIGITS to 9, the denominator can't be more than 4, which is a bit restrictive; there's a tradeoff between number of digits computed at a time and allowed denominator size here. (e.g., if you change from longs to uint32_t's [or probably just ints on a typical desktop], calculating 7/9 gives garbage since 7*10^9 overflows the unsigned 32-bit.)

Here's my code. Based on hamster_nz's ideas, this will go about twice as fast, uses only operations directly mappable to the assembler in question, has no function call overhead, figures out how many places to use before the decimal place by itself, won't overflow, and is absolutely precise. I've also included features like truncating early (1/8 = 0.125 instead of 0.1250000000...) and round to nearest (2/3 = 0.666667), but these can be removed by just deleting the relevant lines. Presented as a GCC program, but the printDivide function is ready for MCU. I think this code should work out to be amongst the most compact once compiled, too.

Code: [Select]
#include <stdio.h>
#include <stdint.h>

#define DP 50

void printDivide(uint32_t num, uint32_t denom) {
  // Preconditions.
  if (num > 0xFFFFFFFF/10 || denom > 0xFFFFFFFF/10 || denom == 0) {
     putchar('X'); putchar('\r'); putchar('\n');
     return;
  }

  int8_t place = 0; // We always want to print the ones digit

  // Identify the place of the first digit by multiplying denom by 10 until it's within a factor of 10 of num.
  while (1) {
    // This multiplication is guaranteed to be safe because denom is less than num and num is less that UINT32_MAX / 10.
    uint32_t tendenom = (denom << 3) + (denom << 1); // Even a simple * 10 gets turned into a call to llmod by xc8.
    if (tendenom > num) {
      break;
    }
    denom = tendenom;
    ++place;
  }

  // Print answer to 50 d.p.
  for (; place >= -DP; place--) {
    if (place == -1) putchar('.');
    char digit = '0';
    // Slight optimization here: pull a bit of bit-shifting trickery to find the digit in 4 iterations rather than 10.
    uint32_t shifted_denom = denom << 3;
    int8_t bitt;
    for (bitt = 8;; bitt >>= 1, shifted_denom >>= 1) {
      if (shifted_denom <= num) {
        num -= shifted_denom;
        digit += bitt;
      }
      if (bitt == 1) break;
    }

    uint32_t tnum = num << 1;

    // Optional line: round to nearest (delete this line if you want, e.g., 0.6666 instead of 0.6667).
    if (place == -DP && tnum >= denom) ++digit;

    putchar(digit);

    // Optional line: bail out if the decimal expansion is complete.
    if (num == 0 && place <= 0) break;

    num = (tnum << 2) + tnum; // Even a simple * 5 gets turned into a call to llmod by xc8.
  }
  putchar('\r'); putchar('\n');
}

int main(void) {
  printDivide(1, 9801);
  printDivide(1, 1);
  printDivide(1, 8);
  printDivide(429496729, 1);
  printDivide(429496730, 3);
  printDivide(1, 429496729);
  printDivide(1, 429496730);
  printDivide(3, 400000001);
  printDivide(0, 77);
  printDivide(77, 0);
  printDivide(1, 3);
  printDivide(2, 3);
}

printDivide is 313 words when compiled for the 16F1503 using the free XC8 compiler. This is less than just writing printf("%ld\r\n", (long)2345214), which adds 445 words by itself. There's plenty of room for further optimisation.

Sample output:

Code: [Select]
$ gcc arbdivide.cc && ./a.out
0.00010203040506070809101112131415161718192021222324
1
0.125
429496729
X
0.00000000232830643979130281106285212244305590508933
X
0.00000000749999998125000004687499988281250029296875
0
X
0.33333333333333333333333333333333333333333333333333
0.66666666666666666666666666666666666666666666666667
« Last Edit: March 16, 2014, 04:26:55 am by rs20 »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #28 on: March 14, 2014, 10:56:32 am »
Really nice ideas and implementations to do integer math with printf() + its siblings on "primitive mcus". With guys like these, we will need some super computers to just blink leds to the 27th digit! Talking about not making it overly complicated.

Keep up the good work, guys. :)
================================
https://dannyelectronics.wordpress.com/
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #29 on: March 14, 2014, 11:19:46 am »
Really nice ideas and implementations to do integer math with printf() + its siblings on "primitive mcus". With guys like these, we will need some super computers to just blink leds to the 27th digit! Talking about not making it overly complicated.

Keep up the good work, guys. :)

Forgive me if I'm taking this too seriously, but I don't understand what you're saying through the sarcasm.
- If you're saying that using printf on an MCU is something that should be avoided somewhat, I would agree with you entirely; indeed, I avoided *, /, % and printf so the code I provided above compiles to a smaller object file than printf alone.
- If you're saying that the MCU is underpowered for the task requested by the OP, I disagree and present the evidence of the solutions already presented.
- If you're saying that we're way over-optimizing, you are correct. And we're loving it.

I disagree that Qn is an appropriate choice. It sounds heavy, and suggesting Qn still leaves the question of rendering those fractional bits as a decimal completely unanswered, correct me if I'm wrong. And that's 90% of the problem. On the flip side, though, if the OP just wants 6dp, just doing (A*100000)/B, possibly with the support of an adaptation of the lldiv code below to support 64-bit ints, would be appropriate. It's true that we've taken the OPs request for "arbitrary precision" and taken it literally and run with it.


On an unrelated note, I've been playing around with the XC8 compiler and noticed some interesting things (I don't mess with assembler much, so I'm not sure how much of this is common; also, I have the free version, so I can't try optimizing to see if these things are fixed):

Check out this assembly code generated for a 32-bit shift-right-by-5 operation:
Code: [Select]
;arbdivide.c: 74: uint16_t y = x / 32;
        movf    main@x+1,w
        movwf   ??_main+1
        movf    main@x,w
        movwf   ??_main
        movlw   5
u885:
        lsrf    ??_main+1,f
        rrf     ??_main,f
        decfsz  9,f
        goto    u885
        movf    ??_main,w
        movwf   main@y
        movf    ??_main+1,w
        movwf   main@y+1
        line    76
That's right. It's basically a for loop doing x >>= 1 five times. That's like, at least 20 clock cycles right there. I've modified my code in my previous post to do less shifts-by-N and more shifts-by-1 accordingly. Because I'm all about premature optimization.

Also, taking a uint32_t x and going x % 10 makes a call to llmod:

Code: [Select]
__llmod(unsigned long int dividend, unsigned long int divisor)
{
        unsigned char   counter;

        if(divisor != 0) {
                counter = 1;
                while((divisor & 0x80000000UL) == 0) {
                        divisor <<= 1;
                        counter++;
                }
                do {
                        if((unsigned long)divisor <= (unsigned long)dividend)
                                dividend -= divisor;
                        *(unsigned long int *)&divisor >>= 1;
                } while(--counter != 0);
        }
        return dividend;
}

This is called even with a literal as the divisor. It's even called from printf as part of its %ld printing code. Which seems pretty bad when you consider that the first thing the code does is spend 22 iterations shifting the 10 left bit-by-bit until it becomes 0xA0000000. And that's repeated for every single digit in the printed number.  I find it hard to believe that the generated code would still do this with -O3, but also find it hard to believe that the compiler could be smart enough to optimise this away. Finally, division is done by lldiv, a completely separate function. If you want both x%10 and x/10, you calculate one and then calculate the other from scratch, even though it's straightforward to blend the two loops together and get both at once (as both hamster_nz's and my code do). Fair enough, C doesn't have a way of asking for div and mod simultaneously, but it's just worth being aware how much performance you can be throwing away by not sticking close to the metal.
« Last Edit: March 14, 2014, 01:27:37 pm by rs20 »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #30 on: March 14, 2014, 04:20:07 pm »
-much of this is common; also, I have the free version, -

sounds like a case of having your cake (free tools) and eating it (quality code) at the same time
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #31 on: March 14, 2014, 07:21:15 pm »
So I decided to compile the code and test it, here are the results on an intel processor:

Code: [Select]
// debug
Result: 37.393936155, cycles: 7350   fixed point 16.16
Result: 37.393940, cycles: 133       floating point IEEE 754
// profile
Result: 37.393936155, cycles: 4981
Result: 37.393940, cycles: 81

But it is wrong [grin] - "I need to divide 2 long ints up to an arbitrary precision" as the original poster's problem.

The code is a guideline, he can expand it by changing the precision of the integral and fractional parts.
He will need to change the fractional constant as well to a 5000... that fits on the number of bits that he needs.

edit: 5000... constant explanation at the end of this post.

Also use more than one unsigned long to expand the operation to concatenate the accumulator and results.

I wouldn't do it in C because it's inefficient, Targeted to the very spartan instruction set that the PIC has will yield better results.

One more thing, I added my code to get those results to an existing program that I was working on that uses many threads, that might affect the precision counter so the cycles could be way less. Didn't want to do a one off or make a new project just to test the results.

Now I agree that 16.16 should be enough for anybody (well, I have used use 4.68 on a FPGA fractal generator....) - but in my code I can change the how many decimal digits I want to extract to an arbitrary number and keep on extracting digits:


Code: [Select]
...
while(place >= -100)
     putchar('0'+divide_extract_next_digit());
...

37.3939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939

Sure, it might be off by one least significant count (as it always rounds down), but it can keep extracting valid digits to the end of time if needed...

As a check - what do you get for 3/400000001? I get

0.00000000749999998125000004687499988281250029296874926757812683105468292236329269409176826477057933807355165481612086295969784260075539349811151625472120936319697659200755851998110370004724074988189812

(as does 'bc')
Very impressive! I will test your code to see what it generates in assembly and how fast it is.

I'm just trying to get people to think at the system level instead of relying on C code generation that makes sense programming wise but without knowing the underlying hardware you are not doing yourself a favour by not understanding these concepts. Lots of people shy from binary math, when in reality is easier than decimal math, just 2 values instead of 10 per digit.

As for the converting a fixed point binary fraction to decimal, is not hard, the code is above and it works. (hint) third loop does that.

Sorry, I meant doing it precisely was "hard" (as opposed to correct to 4dp), and as I mentioned in my original reply, I define "hard" as similarly difficult to the OP's question.
But that's the thing it's not hard at all, you place the maximum constant 500000.... that fits into your working memory model unsigned long in the sample above but you could concatenate it ad-infinitum by changing the precision value. for example if you place it on an unsigned long long (quad word) then
you can fit 5000000000000000000 into it and change the precision to 64 obtaining the accuracy of 2^-64 (5.4210108624275221703311375920553e-20) instead of just 2^-16 but it's all into what you need.

edit: 5000... constant explanation at the end of this post.


Quote
why do a long division base 10 when the processor is base 2? base 2 long division is easier too, if the value is divisible then you put a 1 and shift, if not you just shift because the value is 0.

You could explain to me how easy adding numbers is, it's not particularly relevant if you end up with something that's of no use to solving the problem at hand. For example, consider two different calculations, one of which works out to 0.10000000000000..., and another that works out to 0.099999999999... You want to print this number to 1 decimal place. Clearly, the first should print 0.1, the second should print 0.0. This is trivial to achieve correctly using base-10 division code.

Now, how do we achieve this with binary division and then decimal conversion second? 0.099999999999 and 0.10000000000000 in binary are:

0.0001100110011001100110011001100110011001100110011...
0.0001100110011001100110011001100110011000100000000...

So you need words and words of binary expansion before you can decide what the very first decimal digit is. One tenth has an infinite recurring binary expansion, never forget this. A contrived example for sure, but one that I think justifies my uneasiness with the approach in general. It's just unnecessarily wrong when a single line using AVR's standard compiler can do this with absolute precision.

You just add them as you would any integer as long as they are aligned (i.e. both numbers 16.16) it will keep the result because it doesn't matter where the fraction start just like on base 10, if you ignore the dot you can add them then you place the dot afterwards (figuratively since there is no dot unless you want to print the result) .

Quote
And no, i'm not incrementing things by 2^-16 steps, i'm doing a long division to obtain the integer, then the fraction, then converting the fraction to decimal in three loops with lots of comments so you can follow.

Why have you got numbers like 000015258 in your comments, and nothing like that in the code itself? It's confusing. All I know is that your very own sample output shows an error in the fifth decimal place, so there has to be some sort of approximation going on here. When asked a problem as simple as outputting a division to 6 dp, something that could be done with perfect precision in one line in AVR's standard compiler, just making such approximations is bit rough, I suspect.

As I mentioned, my code does involve performing divides and getting remainders; if I were to write this code I'd use your divide code. I'd just pull out once I hit the fractional part, and keep the remainder, and start working digit by digit. All I ever need to do in addition to your code is * 10, which is trivial. My code will have longer run-time, of course, but still easily within OP's spec.

It has to do with the precision you work with. IEEE 754 for example has a precision of 2^-24 but you can change the range via the exponent so it can shift the value by 128 shifts each way.

My example had a precision of 2^-16 (0.0000152587890625) but since I did choose and unsigned 32bit variable then it truncates precision so it can fit (.)999984742 at 000015258 increments. I can change the accumulator to be a quad word (64 bits unsigned) and the precision variable and increase the precision by many orders of magnitude at the cost of longer computations and resources, so it depends on what you need my constrains in the text under Caveats I state that this routine as coded will be accurate  from 0.0 to 65536.0-0.0001 in 0.0001 steps.

That said, it can be expanded by using more storage dealing with the extra bits at the cost of time to get the result by using a quad word as mentioned above.

Being fixed point the precision is stuck at that no shifting by exponent unless you want to add that into the function and make it floating point. shifting in binary is the same as moving the decimal dot in decimal, we tend to think in decimal so we think of shifting as multiplying or dividing by 2 but we are actually moving the binary fractional point that happens to be base 2.

Thanks for your code by the way I will look into it since it looks like a nice compromise between both methods.


Edit: for clarification the 500000... constant represents 2^-1 or 1/2 or 0.5 so when you shift that constant right it now represents 2^-2 or 1/4 or 0.25, further right shifts gives you the next negative exponent for the fractional part.
« Last Edit: March 14, 2014, 08:55:44 pm by miguelvp »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #32 on: March 14, 2014, 07:33:02 pm »
Really nice ideas and implementations to do integer math with printf() + its siblings on "primitive mcus". With guys like these, we will need some super computers to just blink leds to the 27th digit! Talking about not making it overly complicated.

Keep up the good work, guys. :)

Forgive me if I'm taking this too seriously, but I don't understand what you're saying through the sarcasm.
- If you're saying that using printf on an MCU is something that should be avoided somewhat, I would agree with you entirely; indeed, I avoided *, /, % and printf so the code I provided above compiles to a smaller object file than printf alone.
- If you're saying that the MCU is underpowered for the task requested by the OP, I disagree and present the evidence of the solutions already presented.
- If you're saying that we're way over-optimizing, you are correct. And we're loving it.

I disagree that Qn is an appropriate choice. It sounds heavy, and suggesting Qn still leaves the question of rendering those fractional bits as a decimal completely unanswered, correct me if I'm wrong. And that's 90% of the problem. On the flip side, though, if the OP just wants 6dp, just doing (A*100000)/B, possibly with the support of an adaptation of the lldiv code below to support 64-bit ints, would be appropriate. It's true that we've taken the OPs request for "arbitrary precision" and taken it literally and run with it.

I agree with rs20 100% i'm loving it as well because doing things the hard way and understanding the underlying architecture you are working on and optimizing your code to fit better on that architecture gives you an edge among the rest.

We rely on tools too much and treat them as black boxes, but sometimes you gotta peek into what's inside the black box.

Finally to rs20, awesome job and thanks a lot for your code and views. Optimization for small mcu's is a lost art, I had to deal with it back on z80 days but even that processor was more powerful instruction set wise.

Also I agree on the printf usage as not being good, I had the sprintf just to use it as a black box to show the results, printing the values would be a totally different topic.


« Last Edit: March 14, 2014, 08:00:39 pm by miguelvp »
 

Offline lewis

  • Frequent Contributor
  • **
  • Posts: 704
  • Country: gb
  • Nullius in verba
Re: Divide algorithm
« Reply #33 on: March 14, 2014, 08:12:34 pm »
Very insightful thread guys, might need to borrow rs20's code for future reference! It's so easy to type '/' or '%' until you realise how expensive these can be on a small architecture.
I will not be pushed, filed, stamped, indexed, briefed, debriefed or numbered.
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #34 on: March 14, 2014, 08:51:00 pm »
Well you really took my task seriously, so thanks for that.

I needed 12 digits (total) and without putting much thought in it I made rough calculation that whatever format you use you can't fit it in 32 bit (the largest supported format). So built in floating point  wouldn't do the job. I've managed to do the job using hamster_nz's code. It's still pretty ugly to post it here, but the goal is constantly changing and I'm trying not to spend a lot of time on something I'll delete later. The next step would be to count total digits instead digits after decimal point. Currently it's 1068 instructions and I bet I can drop 200 or so if I spend 1-2 hours on it. But I'll do that at the end of the project.

One thing I don't understand. Why do you need to check if A and B are < 0xFFFFFFFF/10. Sorry I didn't have time to read rs20's code in details (I'll do later, I promise), but my code can divide 0xFFFFFFFF to 0xFFFFFFFE. I don't multiply by 10 at all.

Well, whatever, I'll post the code - but if someone wants to use it better optimize it or come back later - I'll post the optimized code. I also think I'll remove string buffering and I'll output directly. The string is 16 chars wide - just enough to display on 16 character LCD. However I left space at the end for "Hz" text. I can write that on the panel and have 16 digits, but I think I don't have measurement accuracy for 16 digits.

Code: [Select]
#ifndef DIVIDE_LIB_H
#define DIVIDE_LIB_H

#define DIV_MAX_INT_DIGITS 8
#define DECIMAL_POINT_AT 2 // normally this is 0. increase by 1
                           // to multiply by 10

#ifndef DIVIDE_LIB_C
extern char outputBuffer[17];
extern signed char outBuffIndex;
#endif

void longToStr(unsigned long x);
void divideToString(unsigned long a, unsigned long b);
void padOutBuffer(void);
void clearOutBuffer(void);

#endif //DIVIDE_LIB_H

Code: [Select]

#define DIVIDE_LIB_C
#include "divide_lib.h"

unsigned long numerator, denominator;
char place;
char outputBuffer[17];
signed char outBuffIndex;

void padOutBuffer(void){
  while(outBuffIndex>=0){
    outputBuffer[outBuffIndex--] = ' ';
  }
}

void clearOutBuffer(void){
  outBuffIndex = 15;
  padOutBuffer();
}

void longToStr(unsigned long x){
  unsigned char digit;
  do{
    digit = x % 10;
    outputBuffer[outBuffIndex--] = digit + '0';
    x = x / 10;
    if(outBuffIndex<0)
      break;
  }while(x>0);
}

unsigned char divide_extract_next_digit(void){
   int digit = 0;
//   printf("Extracting place %i...",place);


   /* Extracting the decimal digits */
   numerator *= 10;

  /* Actually extract the digit */
  while(numerator >= denominator){
    digit++;
    numerator -= denominator;
  }

   place++;
//   printf("%i\r\n",digit);
   return digit;
}

void divideToString(unsigned long a, unsigned long b){
  signed char c;
  signed char i;
  unsigned long intResult;


  if(b == 0){
    outputBuffer[0] = 'e';
    outputBuffer[1] = 0;
    return;
  }


   place = DIV_MAX_INT_DIGITS;
   intResult = a/b; // doing int division the lazy way - I'll change that later
   a = a % b; // I just noticed that - I do mod twice!!!  :))) it still works -  every time it'll return the same result. remove that and you saved 20 instructions

  outBuffIndex = 12-DIV_MAX_INT_DIGITS;
  longToStr(intResult);
  padOutBuffer();
  outBuffIndex = 13-DIV_MAX_INT_DIGITS;


   numerator = a%b; // well that's a-intResult ... optimization .... (wrong ... it's a-intResult*b)
    denominator = b;


  place = 0;
  while(place < DIV_MAX_INT_DIGITS){
    if(place == DECIMAL_POINT_AT)
    outputBuffer[outBuffIndex++] = '.';
    outputBuffer[outBuffIndex++] = '0'+divide_extract_next_digit();
  }

}

« Last Edit: March 15, 2014, 08:23:36 am by npelov »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #35 on: March 14, 2014, 09:14:34 pm »
On an unrelated note, I've been playing around with the XC8 compiler and noticed some interesting things (I don't mess with assembler much, so I'm not sure how much of this is common; also, I have the free version, so I can't try optimizing to see if these things are fixed):

Check out this assembly code generated for a 32-bit shift-right-by-5 operation:
Code: [Select]
;arbdivide.c: 74: uint16_t y = x / 32;
        movf    main@x+1,w
        movwf   ??_main+1
        movf    main@x,w
        movwf   ??_main
        movlw   5
u885:
        lsrf    ??_main+1,f
        rrf     ??_main,f
        decfsz  9,f
        goto    u885
        movf    ??_main,w
        movwf   main@y
        movf    ??_main+1,w
        movwf   main@y+1
        line    76
That's right. It's basically a for loop doing x >>= 1 five times. That's like, at least 20 clock cycles right there. I've modified my code in my previous post to do less shifts-by-N and more shifts-by-1 accordingly. Because I'm all about premature optimization.

Also, taking a uint32_t x and going x % 10 makes a call to llmod:

Code: [Select]
__llmod(unsigned long int dividend, unsigned long int divisor)
{
        unsigned char   counter;

        if(divisor != 0) {
                counter = 1;
                while((divisor & 0x80000000UL) == 0) {
                        divisor <<= 1;
                        counter++;
                }
                do {
                        if((unsigned long)divisor <= (unsigned long)dividend)
                                dividend -= divisor;
                        *(unsigned long int *)&divisor >>= 1;
                } while(--counter != 0);
        }
        return dividend;
}

This is called even with a literal as the divisor. It's even called from printf as part of its %ld printing code. Which seems pretty bad when you consider that the first thing the code does is spend 22 iterations shifting the 10 left bit-by-bit until it becomes 0xA0000000. And that's repeated for every single digit in the printed number.  I find it hard to believe that the generated code would still do this with -O3, but also find it hard to believe that the compiler could be smart enough to optimise this away. Finally, division is done by lldiv, a completely separate function. If you want both x%10 and x/10, you calculate one and then calculate the other from scratch, even though it's straightforward to blend the two loops together and get both at once (as both hamster_nz's and my code do). Fair enough, C doesn't have a way of asking for div and mod simultaneously, but it's just worth being aware how much performance you can be throwing away by not sticking close to the metal.

And that's why I used shifts and why I kept on saying that using 10 base and divides are not good.

Also, if you look at __llmod code there, it's pretty much what I wrote myself but without the call overhead, and with less loop usage for computing the modulus for the 2nd loop

Edit: also it's weird that those functions use C code instead of having optimized assembler, but then it will be too specific to for that micro controller.
« Last Edit: March 14, 2014, 10:42:18 pm by miguelvp »
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #36 on: March 15, 2014, 01:08:07 am »

One thing I don't understand. Why do you need to check if A and B are < 0xFFFFFFFF/10. Sorry I didn't have time to read rs20's code in details (I'll do later, I promise), but my code can divide 0xFFFFFFFF to 0xFFFFFFFE. I don't multiply by 10 at all.


Code: [Select]
...
void longToStr(unsigned long x){
  unsigned char digit;
  do{
    digit = x % 10;
    outputBuffer[outBuffIndex--] = digit + '0';
    x = x / 10;
    if(outBuffIndex<0)
      break;
  }while(x>0);
}

unsigned char divide_extract_next_digit(void){
   int digit = 0;
//   printf("Extracting place %i...",place);


   /* Extracting the decimal digits */
   numerator *= 10;

  /* Actually extract the digit */
  while(numerator >= denominator){
    digit++;
    numerator -= denominator;
  }

   place++;
//   printf("%i\r\n",digit);
   return digit;
}

...
   place = DIV_MAX_INT_DIGITS;
   intResult = a/b; // doing int division the lazy way - I'll change that later
   a = a % b; // I just noticed that - I do mod twice!!!  :))) it still works -  every time it'll return the same result. remove that and you saved 20 instructions

...

   numerator = a%b; // well that's a-intResult ... optimization .... keep in mind that intresult is destroyed when converted to string.

The check to see if A and B are < 0xFFFFFFFF/10 is done so that if you multiply A or B by 10 it doesn't overflow for lack of storage in the unsigned long.

Since 0xFFFFFFFF/10 are both constants, the preprocessor of the compiler will substitute it with 0x19999999 (429496729) so if you multiply it by 10 decimal you will get 0xFFFFFFFA or 4294967290 decimal (Not overflowing even if the constant is truncated). On the comparison I would change it to <= to allow 429496729 as an input.

Edit: but if your A or B are one more 429496730, ifyou multiply it by 10 then it will overflow to 0x100000004

But the reasoning is that if A or B are larger than that number it might overflow.

On the statement you make that you don't multiply by 10 at all, well, you do.
Also all those divisions and modulus are going to cost you a lot.
« Last Edit: March 15, 2014, 01:14:04 am by miguelvp »
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #37 on: March 15, 2014, 01:47:43 am »

One thing I don't understand. Why do you need to check if A and B are < 0xFFFFFFFF/10. Sorry I didn't have time to read rs20's code in details (I'll do later, I promise), but my code can divide 0xFFFFFFFF to 0xFFFFFFFE. I don't multiply by 10 at all.


Nope, try to divide 4294967295UL by 4000000000UL, your code prints out 1.00000 instead of the correct answer, 1.07374.... Your example just happens to be a poor test because the correct output consists of just ones and zeros, the only output this code could possibly produce with a denominator that high. You do numerator *= 10 in divide_extract_next_digit, that's where it goes wrong.

So your denominator must be < 0xFFFFFFFF/10, but your code is indeed better than mine in the sense that your numerator may be any value. That's only because the "// Identify the place of the first digit" section of my code would get confused, it's not too hard to fix that and the rest of the algorithm would work just fine once that fix is made. What is the end intended use for this code, is a 32-bit by 28-bit divider really more useful to you that a 28-bit by 28-bit divider?

Also; you say in your comments "keep in mind that intresult is destroyed when converted to string". This is false, you're passing by value, not by reference, the function is operating on a copy of the value. Unless XC8 doesn't follow the rules of C.

"Also keep in mind that subtract 2 long ints compiles into more instructions than a%b, because the latter is a function." Good observation, but just make a subtract function and this objection to using subtraction disappears. Also, worrying about little things like this when you have completely separate, independent code for printing the integer and fractional parts is odd.

 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #38 on: March 15, 2014, 08:44:25 am »
Yes, you are right. I didn't put too much thought on it. Actually they don't both need to be < 0xFFFFFFFF/10.  a%b must be < 0xFFFFFFFF/10, because that's what I multiply by 10. So correct me if I'm wrong bu A doesn't have to be < 0xFFFFFFFF/10. If B is < 0xFFFFFFFF/10 then A%B will be < 0xFFFFFFFF/10. I tested it and this works just fine: divideToString(0xFFFFFFFF, 400000000UL);

And yes intResult is not destroyed now (when I moved this into function). I guess this bit of information was left in my mind before I make it's own function.
And numerator is not actually a-intResult. It's a-intResult*b  |O. well it's a mess. I'll rewrite it later.

You are right that having separate code for integer and fractional part is not the best. It's just the way I work - I need motivation to continue. I take a note when I do something stupid, then do the stupid thing in order to have quicker results just to see it working, and then I optimize. That''s the same for the divisions. I can completely remove divisions (even by 10) and I'll at least save lib code for division if not time.

And I said the code is bad. One of the reasons to post it is to have someone else look at it and tell me mistakes I didn't see.
« Last Edit: March 15, 2014, 10:03:39 am by npelov »
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #39 on: March 15, 2014, 09:45:46 am »
You're right, A can be any number (from my previous message, "your code is indeed better than mine in the sense that your numerator may be any value.") So yes, A%B must be less than 0x19999999, and a good way to guarantee that is to restrict B to less than 0x19999999.

As for everything else, my comments are intended as helpful hints and little warnings and things to watch out for, not as a criticism of your coding skills. You're doing well!
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #40 on: March 15, 2014, 10:59:51 am »
I was studying your code to see if I can modify it to give fixed number of digits and I found that there is a place where you multiply by 16 (shift by 4) and this makes a limitation of denominator to max_int>>4. But you could shift left by 3 if you move the shift right at the end:

Code: [Select]
    unsigned long shifted_denom = denom << 3;
    for (unsigned char bitt = 8; bitt; bitt >>= 1) {
      if (shifted_denom <= num) {
        num -= shifted_denom;
        digit += bitt;
      }
      shifted_denom >>= 1;
    }

It also reduces time by one shift which is actually 4 byte shifts per digit. But like I said I don't really care about the time. The worst code will do the job 10 times faster than I really need to.

So here is what I came up with(I don't need to put it in a buffer, but it's easier to run in simulator and see result):
Code: [Select]
#define putChar(x) outputBuffer[outBuffIndex++] = x
#define TOTAL_DIGITS 14

char outputBuffer[17];
signed char outBuffIndex;

void divideToStr(unsigned long num, unsigned long denom) {
  outBuffIndex = 0;
  // Preconditions.
  if (num > 0xFFFFFFFF/10 || denom > 0xFFFFFFFF/10 || denom == 0) {
     putChar('X');
     return;
  }
  signed char place = 0; // We always want to print the ones digit

  // Identify the place of the first digit by multiplying denom by 10 until it's within a factor of 10 of num.
  while (1) {
    // This multiplication is guaranteed to be safe because denom is less than num and num is less that UINT32_MAX / 10.
    unsigned long tendenom = (denom << 3) + (denom << 1); // This is just * 10, surely the compiler can accept * 10? Anyway.
    if (tendenom > num) {
      break;
    }
    denom = tendenom;
    ++place;
  }

  // Print answer to 50 d.p.
  //for (; place >= -DP; place--) {
  for (; outBuffIndex<TOTAL_DIGITS; place--) {
    if (place == -1) putChar('.');
    char digit = '0';
    // Slight optimization here: pull a bit of bit-shifting trickery to find the digit in 4 iterations rather than 10.
    unsigned long shifted_denom = denom << 3;
    for (unsigned char bitt = 8; bitt; bitt >>= 1) {
      if (shifted_denom <= num) {
        num -= shifted_denom;
        digit += bitt;
      }
      shifted_denom >>= 1;
    }


    // Optional line: round to nearest (delete this line if you want, e.g., 0.6666 instead of 0.6667).
    if (outBuffIndex == TOTAL_DIGITS-1 && (num << 1) >= denom) ++digit;

    putChar(digit);

    // Optional line: bail out if the decimal expansion is complete.
    //if (num == 0) break;

    num = (num << 3) + (num << 1); // This is just * 10, surely the compiler can accept * 10? Anyway.
    //num = num*10; // This is just * 10, surely the compiler can accept * 10? Anyway.
  }
}


One thing I noticed. Maybe I don't need that many digits. Because I'm using long int here is the finest resolution I can get:
4294967295 / 3999999998L = 10.73741829118
4294967295 / 3999999999L = 10.73741826434
4294967295 / 400000000UL = 10.73741823750

Incrementing the number by one makes the eighth (never realized how wierd the letters in this word are) number after the decimal point change. So there is no point of calculating more than 10 digits in total. Because my algorithm allows any long int in divided number I have one decimal digit more resolution. But displaying more than 11 digits is not helpful.

So here is what I'll do - I'll still use my code because of higher possible value of dividend and I don't care it takes a lot more time unless I run out of program memory.
rs20's code is pretty neat and I'll save it for future use - I can reduce numbers to 16 bit and use it with ADC. It's the cheapest way to display fractional ADC results. I'm sure lots of other people would want to use it.
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #41 on: March 15, 2014, 12:12:11 pm »
I never quite understood the purpose of those all fabulous and elegant pieces of code presented here. In my highly simplistic mind, I thought something like this may be achieved in more simplistic ways, like this:

Code: [Select]
  tmp1=ul_n; tmp2=ul_n / ul_d; //ul_n is the numerator and ul_d the denominator
  for (tmp=0; tmp<MAXDIGs; tmp++) {
    tmp1 -= tmp2 * ul_d; //obtain the residual
    tmp1 *= 10; //go to the next digit
    tmp2 = tmp1 / ul_d; //tmp2 now contains the next decimal point, between 0..9
    //save tmp2 somewhere. Output in BCD or ascii.
  }

This will produce all the decimal points of ul_n / ul_d, up to MAXDIGs.

More optimization is possible but that's the general ideal.
================================
https://dannyelectronics.wordpress.com/
 

Offline npelovTopic starter

  • Frequent Contributor
  • **
  • Posts: 331
  • Country: bg
    • Microlab.info
Re: Divide algorithm
« Reply #42 on: March 15, 2014, 02:35:10 pm »
@dannyf I'm open to suggestions, but I'm having troubles understanding your code. Variables naming confused me. Also how do you know where the decimal point is?
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #43 on: March 16, 2014, 02:44:57 am »
I never quite understood the purpose of those all fabulous and elegant pieces of code presented here. In my highly simplistic mind, I thought something like this may be achieved in more simplistic ways, like this:

Code: [Select]
  tmp1=ul_n; tmp2=ul_n / ul_d; //ul_n is the numerator and ul_d the denominator
  for (tmp=0; tmp<MAXDIGs; tmp++) {
    tmp1 -= tmp2 * ul_d; //obtain the residual
    tmp1 *= 10; //go to the next digit
    tmp2 = tmp1 / ul_d; //tmp2 now contains the next decimal point, between 0..9
    //save tmp2 somewhere. Output in BCD or ascii.
  }

This will produce all the decimal points of ul_n / ul_d, up to MAXDIGs.

More optimization is possible but that's the general ideal.

Yes, this code works fine, although to complete the picture you need to print out ul_n/ul_d as an integer, which is not necessarily a single digit. Actually doing that is of similar difficulty to the fractional part, which is why I cheat in my code by multiplying ul_d by 10 until the integer part *is* a single digit, so that there is only a fractional part.

To clarify for other people watching, tmp1 -= tmp2 * ul_d could be re-written as tmp1 = tmp1 % ul_d and it'd be equivalent.

If you optimize it by removing all divides and multiplies, (and do a little trickery to avoid ever having to print ul_n/ul_d when it may be more than one digit), you end up with my code. It might seem over-the-top, and probably is, but when you know that the quotient from each division is a number between 0 and 9, and accordingly combine the divide-and-modulo calculation into a single 4-iteration loop instead of function calls out to two separate generic 32-iteration loops (llmod and lldiv), the code size and execution time savings seem really substantial. Obviously that may be a false economy iff you already have other code doing long division and modulos (or, equivalently, use printf with "%ld" ever).
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #44 on: March 16, 2014, 04:31:29 am »
I was studying your code to see if I can modify it to give fixed number of digits and I found that there is a place where you multiply by 16 (shift by 4) and this makes a limitation of denominator to max_int>>4. But you could shift left by 3 if you move the shift right at the end:

Code: [Select]
    unsigned long shifted_denom = denom << 3;
    for (unsigned char bitt = 8; bitt; bitt >>= 1) {
      if (shifted_denom <= num) {
        num -= shifted_denom;
        digit += bitt;
      }
      shifted_denom >>= 1;
    }

It also reduces time by one shift which is actually 4 byte shifts per digit. But like I said I don't really care about the time. The worst code will do the job 10 times faster than I really need to.

Good catch, I've updated my code in my previous post.
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Divide algorithm
« Reply #45 on: March 16, 2014, 06:30:07 am »
I never quite understood the purpose of those all fabulous and elegant pieces of code presented here. In my highly simplistic mind, I thought something like this may be achieved in more simplistic ways, like this:
Code: [Select]
  tmp1=ul_n; tmp2=ul_n / ul_d; //ul_n is the numerator and ul_d the denominator
  for (tmp=0; tmp<MAXDIGs; tmp++) {
    tmp1 -= tmp2 * ul_d; //obtain the residual
    tmp1 *= 10; //go to the next digit
    tmp2 = tmp1 / ul_d; //tmp2 now contains the next decimal point, between 0..9
    //save tmp2 somewhere. Output in BCD or ascii.
  }

Code: [Select]
tmp1 -= tmp2 * ul_d; //obtain the residualis similar to modulus operator % to get residual
the semantic of your whole code is somewhat similar to mine earlier except your finite loop.

why the complexity that you saw in this thread is that everybody tried to get rid of * and / or % operators wherever possible, which are more expensive than + and - or >> and << operators in cheap mcu that doesnt support "mul" or "div" op code. and/or probably tried to hack/exploit the features of binary numbers using the said op code ("add", "sub", "shr" or "shl").

in some other schools, the "pupils" just decided to punch in the * and / (just like you and i did) and hope for the best for the compiler to translate it to the most optimized way given the limitation of the related mcu op codes domain, or insert the non-native contructs/subroutines which are also hoped to be highly optimized.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #46 on: March 16, 2014, 06:40:34 am »
And because the operations you are relying on, use the same fabulous and elegant pieces of code done by the Philippine Electronics and Robotics Enthusiast Club maintained by a couple of guys. (well that's probably an adaptation).

I looked at the source and actually is really good but they use the same constructs as my code because they know the limitations of the PIC 16.

Meaning that if you want a specific functionality, there is no reason not to customize your code and understand how things work in the inside.

But you are way better off by doing it in assembler for only 2K of program space.

The code you wrote will call the following elegant pieces of code, done by programmers that care what's going on inside:

Code: (__lldiv) [Select]
// long unsigned unsigned division

unsigned long int
__lldiv(unsigned long int divisor, unsigned long int dividend)
{
unsigned long int quotient;
unsigned char counter;

quotient = 0;
if(divisor != 0) {
counter = 1;
while((divisor & 0x80000000UL) == 0) {
divisor <<= 1;
counter++;
}
do {
quotient <<= 1;
if(divisor <= dividend) {
dividend -= divisor;
quotient |= 1;
}
divisor >>= 1;
} while(--counter != 0);
}
return quotient;
}

Code: (__lmul) [Select]
unsigned long
__lmul(unsigned long multiplier, unsigned long multiplicand)
{
unsigned long product = 0;

do {
if(multiplier & 1)
product += multiplicand;
multiplicand <<= 1;
multiplier >>= 1;
} while(multiplier != 0);
return product;
}

Edit: I found that this code by google after I saw the llmod that rs20 posted, it finds the most significant bit to divide by or do the reminder with an ongoing counter then it performs the operation, simple but using binary math.

Also the PIC18 has a DAW (decimal adjust W register) instruction set and also MUL instructions, that might help I did search and found code that will convert a binary representation to BCD and then use this instruction for outputting the values, but there is an Errata that can be circumvented so it will take research (not much of it) to figure it out if you want to go that route on higher end PICs.

But your compiler might not use them, unless you investigate if they do or not. But the OP uses a PIC16 so he is out of luck.
« Last Edit: March 16, 2014, 07:15:15 am by miguelvp »
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: Divide algorithm
« Reply #47 on: March 16, 2014, 08:29:29 am »
I think you can divide the solutions into two groups:
1) Those that missed the "PIC with 2K code space" comment, and have used "standard" C functions for division, multiplication, modulus, and printing.
2) Those that are wrapping their own optimized math code into the solution to save space.

People using the free XC8 compiler should download the latest version.  Microchip seems to have been (quietly?) upping the level of "optimization" done in the free version.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2823
  • Country: nz
Re: Divide algorithm
« Reply #48 on: March 16, 2014, 09:26:21 am »
I have a great amount of time for people who find interest in working on problems like these, rather than just using canned code.

It is not like anybody can wake up at 6am on Tuesday and say "today I am going to be an awesome coder". Great coders are made by people working on problems like this to develop a deep understanding of how to do things, and building up knowledge of a wide range of techniques to approach these problems.

I also think that there most probably isn't a "right" answer to these sorts of problems... there is always a better way, unless your use-case is the generic use-case, when the code is to go into a library. Out side of libraries people quite often have different constraints for what their "better" is - it might be portability, maintainability, freedom from legal issues, code size, data size, fastest performance, fixed latency...
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rs20

  • Super Contributor
  • ***
  • Posts: 2322
  • Country: au
Re: Divide algorithm
« Reply #49 on: March 16, 2014, 10:30:03 am »
People using the free XC8 compiler should download the latest version.  Microchip seems to have been (quietly?) upping the level of "optimization" done in the free version.

I downloaded the free version a couple of days ago (for the purposes of this thread), and the llmod and lldiv code is still used. (I don't mean to imply that you were claiming otherwise, but throwing it out there nevertheless.)

I found that this code by google after I saw the llmod that rs20 posted, it finds the most significant bit to divide by or do the reminder with an ongoing counter then it performs the operation, simple but using binary math.

For the record, I found this file when inspecting the assembler code generated by xc8. The assembler is annotated with the lines of source code, and the files from which that source code came from. Very nice little bits of code, but by necessity they need to work for divisions of any long by any other long. There's no "divide long by byte" code, so it's significantly suboptimal for dividing by 10.

On a increasingly off-topic note, all this made me interested in looking at the assembler for my own project I'm working on using an AVR32 chip. AVR-GCC does optimization for free, and (I'm not an AVR fanboy, honest) I was very impressed by some of the things it did. Here's the C code:

Code: [Select]
uint32_t valu = ...;
valu = ((valu - 10178195) * 71) / 1940 - 20;

Now, for background, the AVR32 can do 32-bit multiply in 1 clock, 32-bit times 32-bit to 64-bit output in 2 clocks, but 32-bit divide takes 17 clocks. So the assembler output, with my commentary added. 8000406c is the highlight for me:

Code: [Select]
The input valu is r8, the output ends up in r11.
r11 = r8 * 71, done as (r8 + r8 << 3) << 3 - r8 for some reason
8000404e: f0 08 00 3b add r11,r8,r8<<0x3
80004052: a3 7b        lsl r11,0x3
80004054: 10 1b        sub r11,r8

r8 = -722651845
80004056: e0 68 35 3b mov r8,13627
8000405a: ea 18 d4 ed orh r8,0xd4ed

r9 = -2027933011
8000405e: e0 69 32 ad mov r9,12973
80004062: ea 19 87 20 orh r9,0x8720

r11 += r8
80004066: 10 0b        add r11,r8

lr = r11 >> 31
80004068: f6 0e 14 1f asr lr,r11,0x1f

r9r8 (combined into 64-bit destination register) = r11 * r9
8000406c: f6 09 04 48 muls.d r8,r11,r9


r11 += r9 (the upper word)
r11 >>= 10
r11 -= lr
r11 -= 20

80004070: 30 aa        mov r10,10
80004072: 12 0b        add r11,r9
80004076: f6 0a 08 4b asr r11,r11,r10
8000407a: 1c 1b        sub r11,lr
8000407c: 21 4b        sub r11,20

I guess the highlight for me is the replacement of 17-clock division with 2-clock 64-bit multiply, keeping only the upper 32 bits of that multiplication (plus some other miscellaneous stuff that takes a few more clock cycles). I guess the multiplicand used is sort of the reciprocal of the denominator? The coolest thing is that the only reason I have that weird formula is to approximate a multiplication by a fractional number (I didn't realise division was so expensive); but it turns out I can just use muls.d and take the upper register and get it done in one hit. Learning assembler tricks from your compiler? Awesome!
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #50 on: March 16, 2014, 11:42:31 am »
Quote
If you optimize it by removing all divides and multiplies,

Whether that (removing divides and multipies) is optimization or not depends on the particular chip in question.
================================
https://dannyelectronics.wordpress.com/
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #51 on: March 16, 2014, 02:48:34 pm »
Quote
The code you wrote will call the following elegant pieces of code,

When the numerator is 2^N, that piece of code collapses nicely, and can be used to quickly generate tuning words at a given frequency for DDS.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #52 on: March 16, 2014, 05:04:20 pm »
On a increasingly off-topic note, all this made me interested in looking at the assembler for my own project I'm working on using an AVR32 chip. AVR-GCC does optimization for free, and (I'm not an AVR fanboy, honest) I was very impressed by some of the things it did. Here's the C code:

Code: [Select]
uint32_t valu = ...;
valu = ((valu - 10178195) * 71) / 1940 - 20;

Now, for background, the AVR32 can do 32-bit multiply in 1 clock, 32-bit times 32-bit to 64-bit output in 2 clocks, but 32-bit divide takes 17 clocks. So the assembler output, with my commentary added. 8000406c is the highlight for me:

Code: [Select]
The input valu is r8, the output ends up in r11.
r11 = r8 * 71, done as (r8 + r8 << 3) << 3 - r8 for some reason
8000404e: f0 08 00 3b add r11,r8,r8<<0x3 ; *8(r8)+1(r8) = *9(r8)
80004052: a3 7b        lsl r11,0x3        ; *8(r8) = *72(r8)
80004054: 10 1b        sub r11,r8         ; -1(r8) = *71(r8)

I guess the highlight for me is the replacement of 17-clock division with 2-clock 64-bit multiply, keeping only the upper 32 bits of that multiplication (plus some other miscellaneous stuff that takes a few more clock cycles). I guess the multiplicand used is sort of the reciprocal of the denominator? The coolest thing is that the only reason I have that weird formula is to approximate a multiplication by a fractional number (I didn't realise division was so expensive); but it turns out I can just use muls.d and take the upper register and get it done in one hit. Learning assembler tricks from your compiler? Awesome!

I added comments following the code what it's doing for the r8*71.

Also the reason the compiler was able to optimize the division is because you are dividing by a constant. If you had a variable on the denominator there is no much the compiler can do.

It seems like  AVR-GCC is really working hard to optimize the code by looking what are constants and what not, seems it would have been cheaper to use the muls in one clock instead of that shift, add, shift, subs? strange. Might be a power optimization if the clock savings are not that much.

Edit:

The divide is done by multiplying with the inverse of 1940

0.00051546391752577319587628865979381

In binary (fixed point 0. implied at the left)
(.)0000000000 1000 0111 0010 0000 0011 0010 1010 1101
so it's the constant 0x872032AD (2267034285) shifted to the right 10 bits.


Pretty clever compiler, although it will be really dividing by
1939.9999992077755453971883799719
because of the binary limitations to store fractions and the inverse represented is:
2267034285/(1024*4294967296) = 0.00051546391773626965004950761795044
where 2267034285 is the constant the compiler figured out (0x872032AD), 1024 is the 10 bit right shift and 4294967296 is 2^32 needed to change the meaning of the constant to 2^-32.

Reason being for the inaccuracy is that 1940 = 2*2*5*97 and you can't store 1/5 or 1/97 very well in binary format.

Edit again: that said, you won't find a problem since if you divide max unsigned int32 by either value you'll get the same result 2213900 so the precision should not affect 32 bit unsigned operations by themselves.
« Last Edit: March 16, 2014, 06:48:36 pm by miguelvp »
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Divide algorithm
« Reply #53 on: March 17, 2014, 12:01:27 am »
Code: [Select]
r11 = r8 * 71, done as (r8 + r8 << 3) << 3 - r8 for some reason
(r8 + r8 << 3) = r8 + 8*r8 = 9*r8

(r8 + r8 << 3) << 3 = 9 * r8 * 8 = 72 * r8

(r8 + r8 << 3) << 3 - r8 = 72 * r8 - r8 = 71 * r8.

This CAN be quite inefficient on some chips.
================================
https://dannyelectronics.wordpress.com/
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: Divide algorithm
« Reply #54 on: March 17, 2014, 01:47:03 am »
Code: [Select]
r11 = r8 * 71, done as (r8 + r8 << 3) << 3 - r8 for some reason
(r8 + r8 << 3) = r8 + 8*r8 = 9*r8

(r8 + r8 << 3) << 3 = 9 * r8 * 8 = 72 * r8

(r8 + r8 << 3) << 3 - r8 = 72 * r8 - r8 = 71 * r8.

This CAN be quite inefficient on some chips.

I already noted all that on my reply but I disagree on being inefficient on ANY chip. It takes around 3 cycles on his, maybe others will take 4 cycles because they don't have a shift & add instruction.

I haven't seen a single mcu/cpu that can't do shifts, adds and subs fast (even through the carry flag). So I think the inefficiency concern is a bit overstated. Or if there is such a processor it's a pile of bat dung. because shifts adds subtracts and flags are the core of all processors out there.

But his chip can do a mul 32bit in 1 cycle so it's strange the compiler decided to use ~3|4cycles instead of the mul instruction. Then again, the mul instruction might need setup, to load the constant (71), do the multiplication then extract the answer so it's probably the same time and if the ALU doing the mul takes more power, then shifting will be better. And it looks like both approaches will take around 8 bytes of instruction set.

Edit: just setting up to call a function and processing the result will actually burn more program space than doing the shifts add, subs just to burn more cycles with a generic function.

But everything comes at a price, code maintenance if not well documented can be a pain if using "elegant" solutions, might take the next person a long time to ramp up. Then again, it might be a good test if to keep him or not. Clarification "as a programmer that is"

« Last Edit: March 17, 2014, 05:14:13 am by miguelvp »
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4382
  • Country: us
Re: Divide algorithm
« Reply #55 on: March 17, 2014, 03:14:54 am »
Quote
There's no "divide long by byte" code,
Yep; this (mixed size arithmetic) is one of the common areas where an assembly language programmer can do better than the C compiler (although, doing the primitive operations in C is another route, I guess.)

As for AVR multiply, it use registers R0/R1, which avr-gcc normally reserves for other purposes (R1 is "known zero"), so the setup/cleanup time for a multiply is relatively significant.  (now, I wonder why R1 is still the "known zero" instead of, say, R15.  It may have originated before the multiply instruction, but nowadays... (all the R0..R15 registers are sort of "second class registers" and don't get used too much by the compiler...))

 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf