Bringing it home and returning a formatted static string to display the result.
Notes:
Once you reach the fractional part after exhausting the divider you can do some optimization. Say you keep a history of accumulator values if the accumulator gets to the same value again then you know it's periodical and will repeat over and over. since you have the binary result for those bits, then you can just shift and or the periodical result.
There are a lot of optimizations that you can make if you know the sizes of the operands, you can hard code a lot of those values for your needs, if using fixed point math say 16.16 you can optimize the code to deal with that too. If done in assembly you could make a function that will deal with both the integer part and the fractional part reducing code size, also you can use the overflow flag instead of the accumulator, the example is just a template in C but assembly language will give you the best results.
Also note that I didn't even compile any of the sample code I'm submitting here, should work but consider it as a guideline.
//
// divideBig, the inputs can be fixed point and we don't care where the period is at
// since a division is a relation, so 3.56/2.45 will yield the same result as 356/245
// so as long as both inputs have the same fixed point representation the results will be good.
//
// The return string must be static or global somewhere because we are returning the pointer to it
// so the values must be valid when we are outside of the scope of this function.
// In this example the return string will be 8.8 bynary unsigned and 0 filled but converted to decimal
// So the integer part will be in the range [0,65535] and the fractional part will be in the range [0,9999847412109375] correction [0,999984741]
// in increments of 2^-16 (0000152587890625), correction (000015258) because that's the largest we can fit in an unsigned long of 32 bits.
//
// Even if the result and the fraction fit on unsigned shorts, we use unsigned longs so the compiler
// can do the right operations. You could try using unsigned shorts instead but you will have to test
// if there is bad behaviour specially since we ignore where the fraction is in the dividend and divisor.
// so the result will overflow in some cases.
//
char * divideBig(unsigned long dividend, unsigned long divisor)
{
//
// edit: 16 digits for the mantissa is too big for an unsigned long, correcting it.
// static char strResult[5+1+16+1]; // you can hard code it to 23 but using adds to show why we pick this value
//
// Integer part 5 decimal digits, dot one digit, decimal mantissa 9 digits + null termination.
static char strResult[5+1+9+1]; // you can hard code it to 16 but using adds to show why we pick this value
unsigned long result = 0; // This holds the integer part we only need 16 bits of this so it could be an unsigned short.
unsigned long accumulator = 0; // temporary accumulator for computing the division and coding the string.
int precision=16; // number of bits for integer result and later own for the fractional number of bits.
unsigned long msb = 1<<(precision-1); // most significant bit constant.
unsigned long fraction = 0; // for fractional part later on could be an unsigned short (16 bits).
unsigned long decimalFraction = 0; // for the decimal representation for the end result.
// Place code here to deal with divide by 0, infinite (+|-), NaNs, undetermined values.
// If inputs are not unsigned, place code here to extract the msb from both and compute the
// resulting sign by XOring both bits if the result is 1 then the final result is negative
// otherwise is positive, bitwise XOR operator in C is '^'
// If inputs are signed you must strip the sign by doing an AND with ~msb
// Do the integer part of the division.
while(precision) {
// check most significant bit
if(dividend&msb) { // We are going to carry a bit into accumulator
accumulator = (accumulator<<1)|1;
} else { // we just shift left the accumulator.
accumulator = (accumulator<<1);
}
// Check if accumulator is binary divisible by divisor.
if(accumulator >= divisor) {
// we just need to subtract the divisor from accumulator
accumulator -= divisor;
// adjust result with a 1
result = (result<<1)|1;
} else {
// adjust result with a 0
result = (result<<1);
}
// shift the dividend (for the next bit)
dividend <<=1;
// You might be able to do an early out check
// if the accumulator and dividend are 0.
// but you will have to shift the result by the precision counter (number of bits) left.
// decrement the number of bits to process.
precision--;
}
// At this point you have the result (integer division) if you were dividing by the smallest number
// the result might take an extra digit
// if you want fixed point fractional part you can run it through the same loop for
// as many binary bits of precision you need.
// the fraction needed is in the accumulator and you just pump 0's per loop.
// for example, we want the fraction for 16 bits.
precision = 16; // number of bits for the fractional result predefined for an 16.16 unsigned fixed point float.
while(precision) {
// we just shift left the accumulator since we are done with the dividend.
accumulator = (accumulator<<1);
// Check if accumulator is binary divisible by divisor.
if(accumulator >= divisor) {
// we just need to subtract the divisor from accumulator
accumulator -= divisor;
// adjust fraction with a 1
fraction = (fraction <<1)|1;
} else {
// adjust fraction with a 0
fraction = (fraction<<1);
}
// shift the dividend (for the next bit)
dividend <<=1;
// You might be able to do an early out check
// if the accumulator is 0.
// but you will have to shift the fraction by the precision counter (number of bits) left.
// decrement the number of bits to process.
precision--;
}
// fraction will have your fractional part in binary format.
// meaning the most significant bit will be 2^-1 or 1/2
// the next 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, 1/256, 1/512, 1/1024, 1/2048, 1/4096, 1/8192, 1/16384, 1/32768 & 1/65536
// if the bit is on then you add the fraction and you should come up with the
// value.
// Now to produce the string
// result has the integer part so we don't need to mess with it
// fraction however has to be converted to decimal.
// we need enough space to hold the increment (0000152587890625) 16 decimal digits but max it out to 1/2
// correction: increment 000015258, because we can only fit 9 digits on an unsigned long.
//
// Edit: this number is too big to fit on a 32 bit unsigned long
// decimalFraction = 5000000000000000;
decimalFraction = 500000000;
precision = 16; // number of bits for the fractional part that needs to be converted to decimal form.
accumulator = 0; // use the accumulator to store the decimal fractional value.
while(precision) {
// check most significant bit
if(fraction&msb) { // We are going to carry a bit into accumulator otherwise nothing.
accumulator += decimalFraction;
}
// adjust the bit value weight to the right (divide by 2)
decimalFraction >>= 1;
// shift the fractional result left (for the next bit)
fraction <<=1;
// decrement the number of bits to process.
precision--;
}
// 16 decimal digits too big for an unsigned long, correcting code
// sprintf(strResult,"%05lu.%016lu", result, accumulator);
// Now we have the integer part in result, and the fractional part printf friendly in the accumulator
sprintf(strResult,"%05lu.%09lu", result, accumulator);
// We are done
return strResult;
}
Edit: change 16 to 9 so it will fit on an unsigned long
if you don't want the zero filled integer part, substitute the formatting string from "%05lu.%09lu" to "%5lu.%09lu" if you need to right justify it, or "%lu.%09lu" if you don't care about justification.
Caveats:
Overflow/Underflow, even if we are symmetrical (16.16) on the integer part and the fractional part, a division can overflow if you divide by the smallest possible number (2^-16) because the inverse will be 2^16 and that's too big to represent numerically in 16 bits.
However we are using 32 bits unsigned values and returning a string for the integer part can accommodate 32768, then in 5 digits it can accommodate the overflow value of 65536.
Changing the symmetry and/or the ranges will need some adjustments to the algorithm. That includes adjusting the msb to check if the bit being worked on (integer, fraction, or when computing the fraction decimal part) will need to be adjusted. I'm not sure if leaving msb to look at bit 15 for the whole algorithm will work, maybe needs to be adjusted when computing the fractional decimal representation on the third loop.
Similarly if you make the fractional part smaller then dividing by a large integer will underflow. Or something like that. In this example it can't underflow as far as I can tell, and since we are returning a string the overflow won't matter.
Since the smallest fractional step is 2^-16 (0.0000152587890625) (correction, we can only fit 0.000015258) your floating point accuracy is only 0.0001 with those 16 bits I think. So the results will be accurate from 0.0 to 65536.0-0.0001 in 0.0001 steps. So you might want to adjust the sprintf to only give you the accurate values.
Taking it further
code this in assembler using the rotates through carry for concatenating shifts and using the carry for concatenating adds, etc.
Also in assembler you could make some of the code reentrant for computing the integral value and the fractional value reducing memory footprint. Some of the routines could be re-purposed for multiplication by doing jumps depending if you are dividing or multiplying.
you can use the Z (zero) flag too for early outs, and I bet more optimizations can be done, specially finding periodic values so you can exit the processing and exit early if there is a pattern that can be repeated.
Hope this helps someone.
Edit: the 16 digit conversion value was too big to fit on an unsigned long, corrected the code but probably introduced bugs if I missed anything. Then again, it's a guideline.
Edit: decided to compile the code and I was missing a couple of semicolons and instead of dividend i typed divided in a couple of places so I fixed it.