Author Topic: Divide algorithm (Read 18655 times)

npelov · « **on:** March 11, 2014, 05:29:18 pm »

Hi,

I need to divide 2 long ints up to an arbitrary precision. I must be able to output digits one by one. When the part before decimal point is over then the decimal point is output and then up to N digits after the decimal point.
So if N=6 and I have to divide 1234/33 then result would be:
37.393939
I'm not sure if I want to round correctly. I have limited memory so it would be fine without rounding the last digit.

I tried to search for an algorithm and I found this page:
http://www.cs.sunysb.edu/~skiena/392/programs/bignum.c

which is nice and does big integer calculations, but the result is also an integer. My head is now full of bugs, so any attempt to modify the algorithm fails. I'll try later, but any help will be appreciated. the idea is to have something like:

// outputs a/b up to N digits after decimal point
function divideBig(unsigned long int a, unsigned long int b){
if(b = 0)
return 0; // I'll handle that one later
while(precision < N){
// calc
outputDigit(x);
}
}

Thanks in advance!

npelov · « **Reply #1 on:** March 11, 2014, 05:48:47 pm »

Believe or not the biggest ideas come to you while you are in the toilet. The algorithm here allows math with only integer numbers. The result is also integer. The numbers are held simply in character arrays (well struct with character array, sign and count of digits). So in order to get 6 digits after the decimal point I can:
1. add 6 zeroes after first number (multiply by 1000000)
2. run math function
3. when outputting the result put a decimal point 6 digits before the end of the number

... it's that simple.

Well if you have any ideas to optimize that, you are welcome. I only have 2k instructions memory in PIC16F1503 and 128 bytes of ram. Maybe I should use BCD to reduce the usage of RAM...

npelov · « **Reply #2 on:** March 11, 2014, 06:29:09 pm »

Back to square 1. Can't use the library because it uses recursion

Code: [Select]

Error [1089] E:\data\PIC\frequency-counter\firmware\bignum.c; 111. recursive function call to "_subtract_bignum"
Even if the compiler supports it the stack will probably not be enough.

nctnico · « **Reply #3 on:** March 11, 2014, 06:33:28 pm »

For for fixed point math. If N is 6 then multiply 1234 by 1000000, divide by 33 and send a dot after the 6th digit.

miceuz · « **Reply #4 on:** March 11, 2014, 06:41:44 pm »

if I recall correctly, you can always rewrite a recursive algorithm as a loop

Len · « **Reply #5 on:** March 11, 2014, 06:47:05 pm »

Quote from: miceuz on March 11, 2014, 06:41:44 pm

if I recall correctlym you can always rewrite a recursive algorithm as a loop

But you still need a stack-like data structure to hold the partial results, which uses a similar amount of memory.

[Edit] Actually, looking at the code, I think it could be re-written to not be recursive at all.

dannyf · « **Reply #6 on:** March 11, 2014, 08:21:24 pm »

Quote

up to an arbitrary precision.

Not going to happen.

Set realistic goals and then you have a chance of getting something done.

Typical routes are BCD, or one of those Qn formats (Q15 or Q31 for example).

Mechatrommer · « **Reply #7 on:** March 11, 2014, 08:46:52 pm »

pls refine this, i'm not going to debug it, just come out of my biggest astral plane in the lab in the sleepy "almost dawn" 4AM here...

Code: [Select]

function divideBig(unsigned long int a, unsigned long int b) {
	long int rem;

	outputDigit(a \ b);
	outputChar('.');
	a = a % b; // remainder

	while (a) {
		outputDigit((a * 10) \ b);
		a = (a * 10) % b; // remainder
	};
};

note:
1) i'm not sure exactly in C (need to check this out later), but inverted slash "\" (instead of normal slash for division "/") as in VB means get integer part only without "rounding off" to nearest integer. so 19 \ 10 will result 1, not 2. and iirc % is for modulus (remainder) in C right? well you check that one out, i'm sleepy, its been a while i talked in C.
2) that one is designed for infinite precision (unless until the remainder is exhausted). so if you try to divide pi it will halt

3) since its not debugged, its highly probably there will be bugs. so you need 3X the effort to debug it

4) or the whole thing is just entirely "something else"
5) and last edit... it only works for +ve numbers which is bad.
6) and last edit... i didnt understand your question which i believe now trying to do 64bit numbers in 8bits machine. so...
FWIW.

T3sl4co1l · « **Reply #8 on:** March 11, 2014, 08:49:31 pm »

Quote from: Len on March 11, 2014, 06:47:05 pm

Quote from: miceuz on March 11, 2014, 06:41:44 pm
if I recall correctlym you can always rewrite a recursive algorithm as a loop
But you still need a stack-like data structure to hold the partial results, which uses a similar amount of memory.

[Edit] Actually, looking at the code, I think it could be re-written to not be recursive at all.

The advantage should be substantial, like several fold. If you're talking C, each recursive call gets you the stack frame of that function, including the return address, and usually a frame pointer. The classic x86 being;

Code: [Select]

CALL func
...
func:
PUSH BP
MOV BP, SP
ADD BP, -framesize
[PUSH used registers]
...
[POP used registers]
POP BP
RET

And I believe for something like AVR8, gcc normally uses X (or Y or Z) for the purpose of BP. So you get at least two WORDs on the stack plus new copies of all local variables, plus backups of all registers (give or take how the compiler has allocated and optimized the function, whether it uses registers or memory, and whether it has free registers to clobber or not).

Tim

hamster_nz · « **Reply #9 on:** March 11, 2014, 08:52:05 pm »

Here's my stab at it - seems to work for the one test case you supplied. No recursion, not the highest performance but that wasn't one of your constraints.

#include "stdio.h"
unsigned long numerator, denominator;
char place;
#define DIV_MAX_INT_DIGITS 8

int div_setup(unsigned long a, unsigned long b)
{
if(b == 0) return -1;

/* Prevent any overflows */
if(a > 0xFFFFFFFF/10)
return -1;
if(b > 0xFFFFFFFF/10)
return -1;

numerator = a;
denominator = b;
place = DIV_MAX_INT_DIGITS-1;
return 0;
}

int divide_extract_next_digit(void)
{
int digit = 0;
unsigned long dtemp = denominator;
// printf("Extracting place %i...",place);

if(place > 0) {
int i;
for(i = 0; i < place; i++)
dtemp*=10;
}
else if(place < 0)
{
/* Extracting the decimal digits */
numerator *= 10;
}

/* Actually extract the digit */
while(numerator >= dtemp)
{
digit++;
numerator -= dtemp;
}

place--;
// printf("%i\r\n",digit);
return digit;
}

int main(int argc, char *argv[])
{
if(div_setup(1234, 33) < 0)
return -1;

/* Surpress leading zeros */
while(place > 0) {
char c = divide_extract_next_digit();
if(c != 0)
{
putchar('0'+c);
break;
}
}

while(place >= 0)
putchar('0'+divide_extract_next_digit());
putchar('.');

while(place > -DIV_MAX_INT_DIGITS)
putchar('0'+divide_extract_next_digit());

putchar('\r');
putchar('\n');
return 0;
}

hamster_nz · « **Reply #10 on:** March 11, 2014, 09:03:27 pm »

Quote from: dannyf on March 11, 2014, 08:21:24 pm

Quote
up to an arbitrary precision.

Not going to happen.

The input are two 32-bit unsigned integers (depending on platform), so output has a range of between (2^32-1)/1 down to 1/(2^32-1).

When just using the full range of 32-bits numbers it is a bit painful to handle corner cases - avoiding overflows and so on. In my code above I'm limiting both inputs to at most 429,496,729. This allows it to work with up to nine digits to the left of the decimal (well, actually eight and a half), but as many as you care to the right of the decimal.

miguelvp · « **Reply #11 on:** March 11, 2014, 10:54:12 pm »

Treat it like a long division but you will need to do it in assembler for performance.

Code: [Select]

1234/33

10011010010 / 100001
100001
------
000101
   101100
   100001
   ------
   00101110
     100001
     ------
     00110100 <-- here we past the integer part and we step into the fractional part
       100001
       ------
       0100110
        100001
        ------
        000101000
           100001
           ------
           000111000
              100001
              ------
              0101110
               100001
               ------
               00110100
                 100001
                 ------

 
Result: 0010 0101.0110 0100 1101

Explained a little:

1234d == 10011010010b (dividend)
33d == 100001b (divisor)

zeros at the beginning don't count so you want to shift left the dividend into an accumulator until the accumulator is bigger than the divisor and you have to take into account the number of digits. Since in C you don't get to access the Carry and Overflow flags you can use bitwise operators.

This is what I just came up with, not tested so the code might have bugs, but I think it should work or at least give you an idea.

you gotta handle the divisor == 0 as well that I didn't include in the code
and also the infinites, the undetermined and nans (not a number) as well if you need them

Code: [Select]

unsigned long divideBig(unsigned long dividend, unsigned long divisor)
{
  unsigned long result = 0;
  unsigned long accumulator = 0;
  int precission=sizeof(unsigned long)*8; // number of bits
  unsigned long msb = 1<<(precission-1); // most significant bit constant.
  unsigned char fraction = 0; // for fractional part later on (8 bits)

  while(precission) {
    // check most significant bit
    if(dividend&msb) { // We are going to carry a bit into accumulator
      accumulator = (accumulator<<1)|1;
    } else { // we just shift left the accumulator.
      accumulator = (accumulator<<1);
    }
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to substract the divisor from accumulator
      accumulator -= divisor;
      // adjust result with a 1
      result = (result<<1)|1;
    } else {
      // adjust result with a 0
      result = (result<<1);
    }
    // shift the dividend (for the next bit)
    divided <<=1;
    // You might be able to do an early out check
    // if the accumulator and dividend are 0.
    // but you will have to shift the result by the precission counter (number of bits) left.

    // decrement the number of bits to process.
    precission--;
  }
  // At this point you have the result (integer division)
  // if you want fixpoint decimals you can run it through the same loop for
  // as many binary bits of precission you need.
  // the fraction needed is in the accumulator and you just pump 0's per loop.
  // for example, we want the fraction for 8 bits.

  precission = sizeof(unsigned char)*8; // number of bits in an unsigned char (really is 8 if you want to hard code it)

  while(precission) {
    // we just shift left the accumulator since we are done with the dividend.
    accumulator = (accumulator<<1);
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to substract the divisor from accumulator
      accumulator -= divisor;
      // adjust fraction with a 1
      fraction = (fraction <<1)|1;
    } else {
      // adjust fraction with a 0
      fraction = (fraction<<1);
    }
    // shift the dividend (for the next bit)
    divided <<=1;
    // You might be able to do an early out check
    // if the accumulator is 0.
    // but you will have to shift the fraction by the precission counter (number of bits) left.

    // decrement the number of bits to process.
    precission--;
  }
  // fraction will have your fractional part in binary format.
  // meaning the most significant bit will be 2^-1 or 1/2
  // the next 1/4, 1/8, 1/16, 1/32, 1/64, 1/128 and 1/256
  // if the bit is on then you add the fraction and you should come up with the
  // value.
  ...
}

Edit: I didn't deal with the sign because the inputs were unsigned long, otherwise you have to compute the result for the sign with an xor

Edit2: once you reach the fractional part after exhausting the divider you can do some optimization. Say you keep a history of accumulator values if the accumulator gets to the same value again then you know it's periodical and will repeat over and over. since you have the binary result for those bits, then you can just shift and or the periodical result.

Edit3: also I didn't return the result and didn't set how to pass the fractional part back, but it was to give you an idea on how to do this.

Edit4: there are a lot of optimizations that you can make if you know the sizes of the operands, you can hard code a lot of those values for your needs, if using fixed point math say 16.16 you can optimize the code to deal with that too. If done in assembly you could make a function that will deal with both the integer part and the fractional part reducing code size, also you can use the overflow flag instead of the accumulator, the example is just a template in C but assembly language will give you the best results.

npelov · « **Reply #12 on:** March 12, 2014, 01:43:03 am »

Thanks for all suggestions! I was a bit tired and I didn't explain the best. So the best term is "fixed point" math. The only problem is that maximum size of int is 32 bits in XC8. Some of you have nice suggestions. hamster_nz's code seams like it'll do the job - instead of multiplying it upfront, he does that for every digit after the integer division. Yes, performance is still not an issue, unless it takes seconds to calculate. Currently I'm using 32 bit floating point that is supported by microchip's XC8 and it takes about 10-15 ms to calculate with 10MHz clock (2.5MHz instruction clock). If this time rises to 100-200 ms it's still fine. miguelvp's idea to try it. I'm not sure how easy it'll be to do it with the bits directly though...
I'll review all posts tomorrow again and I'll share results.

miguelvp · « **Reply #13 on:** March 12, 2014, 02:45:52 am »

On:

Quote from: hamster_nz on March 11, 2014, 09:03:27 pm

/* Prevent any overflows */
if(a > 0xFFFFFFFF/10)
return -1;
if(b > 0xFFFFFFFF/10)
return -1;

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.

Edit: looked it up, pretty spartan instruction set
http://en.wikipedia.org/wiki/PIC_microcontroller#Instruction_set

but my algorithm should map fine to that instruction set, but will be better to actually code it in assembler to begin with

npelov · « **Reply #14 on:** March 12, 2014, 03:26:22 am »

Quote from: miguelvp on March 12, 2014, 02:45:52 am

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.

Yes, binary will be easier to divide, but the number still has to be output to the screen - how do you do that - with divide by 10. You still have to use divide library and yes, these basic 8 bit MCUs don't have divide or multiply.
Also the whole is to have more digits than long int can represent. Long is 4,294,967,295 (unsigned) which is about 9 and a bit more. So you can't fit write the number 16 000 000.123456 in a long (or integer representation of the fixed point float 16 000 000 123 456 )

In theory you can shift 2 long numbers and save the result in a 64 integer even when compiler doesn't support it, but in every operation you'll have to execute 8 shifts ... which better be an assembler function to use carry flag to avoid additional instructions when shifting bits between bytes. But even this is still not a show stopper how are you gonna output 64 bit integer as a decimal string?

npelov · « **Reply #15 on:** March 12, 2014, 03:55:19 am »

Quote from: hamster_nz on March 11, 2014, 08:52:05 pm

Here's my stab at it - seems to work for the one test case you supplied. No recursion, not the highest performance but that wasn't one of your constraints.

I think it won't work. max digits is 8 in your case. So t the start place = 8-1 = 7. In int divide_extract_next_digit(void) first thing you do is:
if(place > 0) {
int i;
for(i = 0; i < place; i++)
dtemp*=10;
}

dtemp here is the divisor. You check if the divisor is > 0xFFFFFFFF/10, but if divisor is 0xFFFFFFFF/10 - 1 then it'll overflow because you multiply it by 10 eight times.
I'll try to modify the algorithm.

T3sl4co1l · « **Reply #16 on:** March 12, 2014, 04:22:43 am »

I wrote a uint16 divide-by-ten for AVR8, which takes 45 words and 60 cycles. It uses a multiplier of (2^24 / 10) + 1 for an effective 24 bit multiply, getting the byte-aligned shifts for free, and returns a pair of uint16s (quotient and remainder). The result is exact for all operands.

But AVR8 has 8x8 hardware multiply and piles of registers, so I'm not sure this is at all relevant to PIC. I would just as well suggest dumping the PIC altogether and using something actually, you know, nice.

Tim

hamster_nz · « **Reply #17 on:** March 12, 2014, 04:37:01 am »

Quote from: miguelvp on March 12, 2014, 02:45:52 am

On:
Quote from: hamster_nz on March 11, 2014, 09:03:27 pm
/* Prevent any overflows */
if(a > 0xFFFFFFFF/10)
return -1;
if(b > 0xFFFFFFFF/10)
return -1;

Dividing by 10 not only will bring a dividing library but 10 is 2*5 and 1/5 is not well suited for binary divisions. Not sure if the processor can do divisions (even integer ones) most don't, not even multiplications but I didn't look at the processor so I might be totally wrong.

That division can/will be optimized away at compile time - no division is used at run-time. It will just be testing a and b against a number a little bit more than 400,000,000. No division is needed - I just didn't want to include a magic number - at least using 0xFFFFFFFF / 10 makes some sense (ie. you can multiply it by 10 without an overflow in an unsigned 32-bit integer).

And the only multiplication used is by 10, so it can all be done in a single subroutine using bit-shifts "out = ((in<<3) + (in<<1)", making that pretty easy too.

miguelvp · « **Reply #18 on:** March 12, 2014, 04:52:28 am »

For your example 1234/33 = 37.393939

0010 0101.0110 0100 110(110 0100 110) where () means periodical so they repeat because the accumulator ended up to be the same.
Lets say you want the fixed point to be 8 bits integer and 24 bits fraction for 32 bits then you have

0010 0101(.)0110 0100 1101 1001 0011 0110

The integer part is simple you just take the 0010 0101 and print it as a decimal or use a table to add the values together.

Code: [Select]

00100101|01100100 11011001 00110110
++++++++|-------- -------- --------
00000000|00000000 01111111 11122222
76543210|12345678 90123456 78901234

The numbers below the + and - are the 2 exponential so on the positive part is fairly easy.

Now on the negative part I need to think about it.

You can't really use a table since it's the invert i.e. 1/2,1/4,1/8..... (Edit: you actually can use a table or shift a value, look edits towards the end).
I guess you could treat them backwards and make a table but then you will have to do the inverse which is another division.

I have to think about this further.

Edit: if you have a printf function that actually handles floats without math it might be possible to change the fractional float to a format that printf will recognize. i.e (1.)00101011001001101100100110110*2^6
which translates to IEEE 754 single precision:

127 biased + 6 = 133 = 10000101 binary

so equal to:

Code: [Select]

   Exponent|Mantisa (1.) implied
0|10000101|00101011001001101100100
^
sign positive

Also on the integer number approach: 00100101011001001101100100110110 in decimal (the result for your sample) is 627366198
divided by 256^3 (16777216 because 24 bit mantissa on the fixed point) = 37.39393937587738037109375 (yeah, round off, or I made a mistake on my computations, not on the way to do it. Might have missed a 0 or a 1 since i'm doing these calculations manually)

I know that means the representation is binary sound but maybe there is something to it, but only base changes that come to mind implies exponents and logs (not cheap) but there has to be a way.

Edit again, found a way to print the mantissa in decimal form, make a table for the mantissa for the sample it yields: 393939375877380371093750 for the fractional side. And if you don't want a table, just shift right to get to the next value from the 500000000000000000000000 base number, the code should be simple and in assembly you should be able to use the carry to concatenate the shifts if using rotate right through carry (RRF)

Note: this denotes a fixed point 8 bit integer and 24 bit mantissa because it makes sense but I don't know your range and my earlier code sample was not that way it was a long integer and 8 bit mantissa but you should have enough to mold things together. You have to figure out if shifting is faster than fetching a table value on your micro-controller depending on your integer and fractional bits.

Code: [Select]

xxxx xxxx(.)0110 0100 1101 1001 0011 0110

500000000000000000000000 0 (1/2)        2^-1
250000000000000000000000 1 (1/4)        2^-2
125000000000000000000000 1 (1/8)        2^-3
062500000000000000000000 0 (1/16)       2^-4
031250000000000000000000 0 (1/32)       2^-5
015625000000000000000000 1 (1/64)       2^-6
007812500000000000000000 0 (1/128)      2^-7
003906250000000000000000 0 (1/256)      2^-8

001953125000000000000000 1 (1/512)      2^-9
000976562500000000000000 1 (1/1024)     2^-10
000488281250000000000000 0 (1/2048)     2^-11
000244140625000000000000 1 (1/4096)     2^-12
000122070312500000000000 1 (1/8192)     2^-13
000061035156250000000000 0 (1/16384)    2^-14
000030517578125000000000 0 (1/32768)    2^-15
000015258789062500000000 1 (1/65536)    2^-16

000007629394531250000000 0 (1/131072)   2^-17
000003814697265625000000 0 (1/262144)   2^-18
000001907348632812500000 1 (1/524288)   2^-19
000000953674316406250000 1 (1/1048576)  2^-20
000000476837158203125000 0 (1/2097152)  2^-21
000000238418579101562500 1 (1/4194304)  2^-22
000000119209289550781250 1 (1/8388608)  2^-23
000000059604644775390625 0 (1/16777216) 2^-24

total:
393939375877380371093750

The table for the integer part is easy as well, and you can shift from 1 left to get the next table value if needed and you don't want a table.

Code: [Select]

0010 0101(.)xxxx xxxx xxxx xxxx xxxx xxxx

128 0 (2^7 = 128)
064 0 (2^6 = 64)
032 1 (2^5 = 32)
016 0 (2^4 = 16)

008 0 (2^3 = 8)
004 1 (2^2 = 4)
002 0 (2^1 = 2)
001 1 (2^0 = 1)

32+4+1 = 37

So 1234/33 = 37(int)"."393939375877380371093750(fraction)

one last edit before bed, i had 2^0=0 that's wrong, it's = 1 so I corrected it

dannyf · « **Reply #19 on:** March 12, 2014, 11:09:03 am »

Quote

I wrote a uint16 divide-by-ten for AVR8, which takes 45 words and 60 cycles.

You can easily beat that.

Division of a given number is fairly easy. For example, in your case, X / 10 can be decomposed into (X * 3.2) / 32 = (X * 2 + X + X * 2 / 10) / 32. X * 2 / 10 can be further expanded to reach your desired precision, as the / 32 done with shifting.

Quote

It uses a multiplier of (2^24 / 10) + 1 for an effective 24 bit multiply

There are very efficient routines for (2^M / N), particularly in the DDS libraries.

miguelvp · « **Reply #20 on:** March 13, 2014, 03:01:47 am »

Bringing it home and returning a formatted static string to display the result.

Notes:

Once you reach the fractional part after exhausting the divider you can do some optimization. Say you keep a history of accumulator values if the accumulator gets to the same value again then you know it's periodical and will repeat over and over. since you have the binary result for those bits, then you can just shift and or the periodical result.

There are a lot of optimizations that you can make if you know the sizes of the operands, you can hard code a lot of those values for your needs, if using fixed point math say 16.16 you can optimize the code to deal with that too. If done in assembly you could make a function that will deal with both the integer part and the fractional part reducing code size, also you can use the overflow flag instead of the accumulator, the example is just a template in C but assembly language will give you the best results.

Also note that I didn't even compile any of the sample code I'm submitting here, should work but consider it as a guideline.

Code: [Select]

//
// divideBig, the inputs can be fixed point and we don't care where the period is at
// since a division is a relation, so 3.56/2.45 will yield the same result as 356/245
// so as long as both inputs have the same fixed point representation the results will be good.
//
// The return string must be static or global somewhere because we are returning the pointer to it
// so the values must be valid when we are outside of the scope of this function.
// In this example the return string will be 8.8 bynary unsigned and 0 filled but converted to decimal
// So the integer part will be in the range [0,65535] and the fractional part will be in the range [0,9999847412109375] correction [0,999984741]
// in increments of 2^-16 (0000152587890625), correction (000015258) because that's the largest we can fit in an unsigned long of 32 bits.
//
// Even if the result and the fraction fit on unsigned shorts, we use unsigned longs so the compiler
// can do the right operations. You could try using unsigned shorts instead but you will have to test
// if there is bad behaviour specially since we ignore where the fraction is in the dividend and divisor.
// so the result will overflow in some cases.
//
char * divideBig(unsigned long dividend, unsigned long divisor)
{
  //
  // edit: 16 digits for the mantissa is too big for an unsigned long, correcting it.
  // static char strResult[5+1+16+1]; // you can hard code it to 23 but using adds to show why we pick this value
  //
  // Integer part 5 decimal digits, dot one digit, decimal mantissa 9 digits + null termination.
  static char strResult[5+1+9+1]; // you can hard code it to 16 but using adds to show why we pick this value
  unsigned long result = 0; // This holds the integer part we only need 16 bits of this so it could be an unsigned short.
  unsigned long accumulator = 0; // temporary accumulator for computing the division and coding the string.
  int precision=16; // number of bits for integer result and later own for the fractional number of bits.
  unsigned long msb = 1<<(precision-1); // most significant bit constant.
  unsigned long fraction = 0; // for fractional part later on could be an unsigned short (16 bits).
  unsigned long decimalFraction = 0; // for the decimal representation for the end result.

  // Place code here to deal with divide by 0, infinite (+|-), NaNs, undetermined values.
  
  // If inputs are not unsigned, place code here to extract the msb from both and compute the
  // resulting sign by XOring both bits if the result is 1 then the final result is negative
  // otherwise is positive, bitwise XOR operator in C is '^'
  // If inputs are signed you must strip the sign by doing an AND with ~msb
  
  // Do the integer part of the division.
  while(precision) {
    // check most significant bit
    if(dividend&msb) { // We are going to carry a bit into accumulator
      accumulator = (accumulator<<1)|1;
    } else { // we just shift left the accumulator.
      accumulator = (accumulator<<1);
    }
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to subtract the divisor from accumulator
      accumulator -= divisor;
      // adjust result with a 1
      result = (result<<1)|1;
    } else {
      // adjust result with a 0
      result = (result<<1);
    }
    // shift the dividend (for the next bit)
    dividend <<=1;
    // You might be able to do an early out check
    // if the accumulator and dividend are 0.
    // but you will have to shift the result by the precision counter (number of bits) left.

    // decrement the number of bits to process.
    precision--;
  }
  // At this point you have the result (integer division) if you were dividing by the smallest number
  // the result might take an extra digit
  // if you want fixed point fractional part you can run it through the same loop for
  // as many binary bits of precision you need.
  // the fraction needed is in the accumulator and you just pump 0's per loop.
  // for example, we want the fraction for 16 bits.

  precision = 16; // number of bits for the fractional result predefined for an 16.16 unsigned fixed point float.

  while(precision) {
    // we just shift left the accumulator since we are done with the dividend.
    accumulator = (accumulator<<1);
    // Check if accumulator is binary divisible by divisor.
    if(accumulator >= divisor) {
      // we just need to subtract the divisor from accumulator
      accumulator -= divisor;
      // adjust fraction with a 1
      fraction = (fraction <<1)|1;
    } else {
      // adjust fraction with a 0
      fraction = (fraction<<1);
    }
    // shift the dividend (for the next bit)
    dividend <<=1;
    // You might be able to do an early out check
    // if the accumulator is 0.
    // but you will have to shift the fraction by the precision counter (number of bits) left.

    // decrement the number of bits to process.
    precision--;
  }
  // fraction will have your fractional part in binary format.
  // meaning the most significant bit will be 2^-1 or 1/2
  // the next 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, 1/256, 1/512, 1/1024, 1/2048, 1/4096, 1/8192, 1/16384, 1/32768 & 1/65536
  // if the bit is on then you add the fraction and you should come up with the
  // value.
  
  // Now to produce the string
  // result has the integer part so we don't need to mess with it
  // fraction however has to be converted to decimal.
  // we need enough space to hold the increment (0000152587890625) 16 decimal digits but max it out to 1/2
  // correction: increment 000015258, because we can only fit 9 digits on an unsigned long.
  //
  // Edit: this number is too big to fit on a 32 bit unsigned long
  // decimalFraction = 5000000000000000;
  decimalFraction = 500000000;
  precision = 16; // number of bits for the fractional part that needs to be converted to decimal form.
  accumulator = 0; // use the accumulator to store the decimal fractional value.
  
  while(precision) {
      // check most significant bit
    if(fraction&msb) { // We are going to carry a bit into accumulator otherwise nothing.
      accumulator += decimalFraction;
    }  
    // adjust the bit value weight to the right (divide by 2)
    decimalFraction >>= 1;
    // shift the fractional result left (for the next bit)
    fraction <<=1;
   
    // decrement the number of bits to process.
    precision--;
  }

  // 16 decimal digits too big for an unsigned long, correcting code
  // sprintf(strResult,"%05lu.%016lu", result, accumulator);

  // Now we have the integer part in result, and the fractional part printf friendly in the accumulator
  sprintf(strResult,"%05lu.%09lu", result, accumulator);

  // We are done
  return strResult;
}

Edit: change 16 to 9 so it will fit on an unsigned long
if you don't want the zero filled integer part, substitute the formatting string from "%05lu.%09lu" to "%5lu.%09lu" if you need to right justify it, or "%lu.%09lu" if you don't care about justification.

Caveats:

Overflow/Underflow, even if we are symmetrical (16.16) on the integer part and the fractional part, a division can overflow if you divide by the smallest possible number (2^-16) because the inverse will be 2^16 and that's too big to represent numerically in 16 bits.
However we are using 32 bits unsigned values and returning a string for the integer part can accommodate 32768, then in 5 digits it can accommodate the overflow value of 65536.

Changing the symmetry and/or the ranges will need some adjustments to the algorithm. That includes adjusting the msb to check if the bit being worked on (integer, fraction, or when computing the fraction decimal part) will need to be adjusted. I'm not sure if leaving msb to look at bit 15 for the whole algorithm will work, maybe needs to be adjusted when computing the fractional decimal representation on the third loop.

Similarly if you make the fractional part smaller then dividing by a large integer will underflow. Or something like that. In this example it can't underflow as far as I can tell, and since we are returning a string the overflow won't matter.

Since the smallest fractional step is 2^-16 (0.0000152587890625) (correction, we can only fit 0.000015258) your floating point accuracy is only 0.0001 with those 16 bits I think. So the results will be accurate from 0.0 to 65536.0-0.0001 in 0.0001 steps. So you might want to adjust the sprintf to only give you the accurate values.

Taking it further
code this in assembler using the rotates through carry for concatenating shifts and using the carry for concatenating adds, etc.
Also in assembler you could make some of the code reentrant for computing the integral value and the fractional value reducing memory footprint. Some of the routines could be re-purposed for multiplication by doing jumps depending if you are dividing or multiplying.
you can use the Z (zero) flag too for early outs, and I bet more optimizations can be done, specially finding periodic values so you can exit the processing and exit early if there is a pattern that can be repeated.

Hope this helps someone.

Edit: the 16 digit conversion value was too big to fit on an unsigned long, corrected the code but probably introduced bugs if I missed anything. Then again, it's a guideline.

Edit: decided to compile the code and I was missing a couple of semicolons and instead of dividend i typed divided in a couple of places so I fixed it.

dannyf · « **Reply #21 on:** March 13, 2014, 11:33:43 am »

You have a choice:

1) build your propritetary fixed point routines - challenging, interesting and probably not as reliable;
2) use the widely available Qn format.

rs20 · « **Reply #22 on:** March 13, 2014, 01:25:22 pm »

This whole thread seems very puzzling to me, it seems to have drifted off the tracks.

Quote from: dannyf on March 13, 2014, 11:33:43 am

You have a choice:

1) build your propritetary fixed point routines - challenging, interesting and probably not as reliable;
2) use the widely available Qn format.

Miguel's code seems, as far as I can tell (based on this idea of incrementing by some coarse approximation of 2^-16), to be performing binary division, including a binary fractional part, and then converting that binary fraction part to a decimal representation, probably inaccurately. Similarly, the suggestion above to use the Qn format must be implying the same idea, because the Qn format is just fixed-point binary representation. Again, it requires further work to get the decimal output.

Key point: converting a fixed-point binary fraction to decimal is a hard (as in, about as hard as the problem the OP asked) problem, not a throwaway afterthought.

The OP's insight at the beginning to just calculate (A*1000000)/B was much closer to the mark. Remembering how to do *decimal* long division from your school days and writing that in code form is also a good approach.

If A*1000000 is too big, then just do integer divide A/B for the integer part, then another integer divide ((A%B)*1000000)/B to get the fractional part. If that's still to big (A%B can be as big as B, so it's not technically any better), then doing each digit one by one:

Code: [Select]

print A/B
remainder = A //yeah, this is looks weird, but it gets clobbered on the first line of the loop
for each digit() {
   remainder = (remainder % B) * 10
   print remainder / B;
}

This is, essentially, decimal long division. You can get really clever and work in base 100 or base 1000, depending on how much latitude you have from your denominator. Note also how it can be asked to provide a million digits, and it can do so in constant memory and linear time. The binary long division code above is probably good for doing the integer divisions, but the key is to stop once you reach the fractional bit, and just retain the remainder (notice how the loop prints x % B and x / B; long division produces a pair of outputs, divisor (x/b) and remainder (x%b)).

It would help to have clear bounds provided on the numerator, denominator, and how large the largest integer types available are. It's past midnight here so I'm dropping this for tonight, but I'm just going to encourage moving away from the binary fractional part business for now.

miguelvp · « **Reply #23 on:** March 13, 2014, 07:38:35 pm »

So I decided to compile the code and test it, here are the results on an intel processor:

Code: [Select]

// debug
Result: 37.393936155, cycles: 7350   fixed point 16.16
Result: 37.393940, cycles: 133       floating point IEEE 754
// profile
Result: 37.393936155, cycles: 4981
Result: 37.393940, cycles: 81

IEEE 754 has 24 bit mantissa (1 implied) so it's going to be a bit more accurate, but then you need to use floating point

I did look a the assembly generated and you can do better manually.
the floating point result is what it takes the Floating Point Unit to do it in dedicated hardware (way faster of course).
Do not understand how the FPU can vary from debug build to profile build.
For cycle measurement I used the performance counter.

rs20, what kind of assembly does your code generate?
a PIC 18 has a very minimal set of instructions, it can't even divide integers or multiply for that matter.

If you look at my code, the first loop does just what you said, but with the bits and hinting the compiler how to produce the code, only better way would be assembly.

As for the converting a fixed point binary fraction to decimal, is not hard, the code is above and it works. (hint) third loop does that.

why do a long division base 10 when the processor is base 2? base 2 long division is easier too, if the value is divisible then you put a 1 and shift, if not you just shift because the value is 0.

base 10 has 100 permutations on multiplications and divisions 10x10 array
base 2 has only 4 permutations on multiplications and divisions 2x2 array

And out of those permutations it's only true on one case so it's a binary decision that is fast and simple.

Just apply the same basic math but instead of 10 digits you have 2, it's the same thing.

And no, i'm not incrementing things by 2^-16 steps, i'm doing a long division to obtain the integer, then the fraction, then converting the fraction to decimal in three loops with lots of comments so you can follow.

But the code has it's limitations, it only accepts dividends under 65535, you you can alter it to change precision too

hamster_nz · « **Reply #24 on:** March 13, 2014, 08:22:57 pm »

Quote from: miguelvp on March 13, 2014, 07:38:35 pm

So I decided to compile the code and test it, here are the results on an intel processor:

Code: [Select]
// debug Result: 37.393936155, cycles: 7350 fixed point 16.16 Result: 37.393940, cycles: 133 floating point IEEE 754 // profile Result: 37.393936155, cycles: 4981 Result: 37.393940, cycles: 81

But it is wrong [grin] - "I need to divide 2 long ints up to an arbitrary precision" as the original poster's problem.

Now I agree that 16.16 should be enough for anybody (well, I have used use 4.68 on a FPGA fractal generator....) - but in my code I can change the how many decimal digits I want to extract to an arbitrary number and keep on extracting digits:

Code: [Select]

...
while(place >= -100)
     putchar('0'+divide_extract_next_digit());
...

37.3939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939

Sure, it might be off by one least significant count (as it always rounds down), but it can keep extracting valid digits to the end of time if needed...

As a check - what do you get for 3/400000001? I get

0.00000000749999998125000004687499988281250029296874926757812683105468292236329269409176826477057933807355165481612086295969784260075539349811151625472120936319697659200755851998110370004724074988189812

(as does 'bc')


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Divide algorithm (Read 18655 times)

Share me