So I decided to compile the code and test it, here are the results on an intel processor:
// debug
Result: 37.393936155, cycles: 7350 fixed point 16.16
Result: 37.393940, cycles: 133 floating point IEEE 754
// profile
Result: 37.393936155, cycles: 4981
Result: 37.393940, cycles: 81
But it is wrong [grin] - "I need to divide 2 long ints up to an arbitrary precision" as the original poster's problem.
The code is a guideline, he can expand it by changing the precision of the integral and fractional parts.
He will need to change the fractional constant as well to a 5000... that fits on the number of bits that he needs.
edit: 5000... constant explanation at the end of this post.
Also use more than one unsigned long to expand the operation to concatenate the accumulator and results.
I wouldn't do it in C because it's inefficient, Targeted to the very spartan instruction set that the PIC has will yield better results.
One more thing, I added my code to get those results to an existing program that I was working on that uses many threads, that might affect the precision counter so the cycles could be way less. Didn't want to do a one off or make a new project just to test the results.
Now I agree that 16.16 should be enough for anybody (well, I have used use 4.68 on a FPGA fractal generator....) - but in my code I can change the how many decimal digits I want to extract to an arbitrary number and keep on extracting digits:
...
while(place >= -100)
putchar('0'+divide_extract_next_digit());
...
37.3939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939393939
Sure, it might be off by one least significant count (as it always rounds down), but it can keep extracting valid digits to the end of time if needed...
As a check - what do you get for 3/400000001? I get
0.00000000749999998125000004687499988281250029296874926757812683105468292236329269409176826477057933807355165481612086295969784260075539349811151625472120936319697659200755851998110370004724074988189812
(as does 'bc')
Very impressive! I will test your code to see what it generates in assembly and how fast it is.
I'm just trying to get people to think at the system level instead of relying on C code generation that makes sense programming wise but without knowing the underlying hardware you are not doing yourself a favour by not understanding these concepts. Lots of people shy from binary math, when in reality is easier than decimal math, just 2 values instead of 10 per digit.
As for the converting a fixed point binary fraction to decimal, is not hard, the code is above and it works. (hint) third loop does that.
Sorry, I meant doing it precisely was "hard" (as opposed to correct to 4dp), and as I mentioned in my original reply, I define "hard" as similarly difficult to the OP's question.
But that's the thing it's not hard at all, you place the maximum constant 500000.... that fits into your working memory model unsigned long in the sample above but you could concatenate it ad-infinitum by changing the precision value. for example if you place it on an unsigned long long (quad word) then
you can fit 5000000000000000000 into it and change the precision to 64 obtaining the accuracy of 2^-64 (5.4210108624275221703311375920553e-20) instead of just 2^-16 but it's all into what you need.
edit: 5000... constant explanation at the end of this post.
why do a long division base 10 when the processor is base 2? base 2 long division is easier too, if the value is divisible then you put a 1 and shift, if not you just shift because the value is 0.
You could explain to me how easy adding numbers is, it's not particularly relevant if you end up with something that's of no use to solving the problem at hand. For example, consider two different calculations, one of which works out to 0.10000000000000..., and another that works out to 0.099999999999... You want to print this number to 1 decimal place. Clearly, the first should print 0.1, the second should print 0.0. This is trivial to achieve correctly using base-10 division code.
Now, how do we achieve this with binary division and then decimal conversion second? 0.099999999999 and 0.10000000000000 in binary are:
0.0001100110011001100110011001100110011001100110011...
0.0001100110011001100110011001100110011000100000000...
So you need words and words of binary expansion before you can decide what the very first decimal digit is. One tenth has an infinite recurring binary expansion, never forget this. A contrived example for sure, but one that I think justifies my uneasiness with the approach in general. It's just unnecessarily wrong when a single line using AVR's standard compiler can do this with absolute precision.
You just add them as you would any integer as long as they are aligned (i.e. both numbers 16.16) it will keep the result because it doesn't matter where the fraction start just like on base 10, if you ignore the dot you can add them then you place the dot afterwards (figuratively since there is no dot unless you want to print the result) .
And no, i'm not incrementing things by 2^-16 steps, i'm doing a long division to obtain the integer, then the fraction, then converting the fraction to decimal in three loops with lots of comments so you can follow.
Why have you got numbers like 000015258 in your comments, and nothing like that in the code itself? It's confusing. All I know is that your very own sample output shows an error in the fifth decimal place, so there has to be some sort of approximation going on here. When asked a problem as simple as outputting a division to 6 dp, something that could be done with perfect precision in one line in AVR's standard compiler, just making such approximations is bit rough, I suspect.
As I mentioned, my code does involve performing divides and getting remainders; if I were to write this code I'd use your divide code. I'd just pull out once I hit the fractional part, and keep the remainder, and start working digit by digit. All I ever need to do in addition to your code is * 10, which is trivial. My code will have longer run-time, of course, but still easily within OP's spec.
It has to do with the precision you work with. IEEE 754 for example has a precision of 2^-24 but you can change the range via the exponent so it can shift the value by 128 shifts each way.
My example had a precision of 2^-16 (0.0000152587890625) but since I did choose and unsigned 32bit variable then it truncates precision so it can fit (.)999984742 at 000015258 increments. I can change the accumulator to be a quad word (64 bits unsigned) and the precision variable and increase the precision by many orders of magnitude at the cost of longer computations and resources, so it depends on what you need my constrains in the text under Caveats I state that this routine as coded will be accurate from 0.0 to 65536.0-0.0001 in 0.0001 steps.
That said, it can be expanded by using more storage dealing with the extra bits at the cost of time to get the result by using a quad word as mentioned above.
Being fixed point the precision is stuck at that no shifting by exponent unless you want to add that into the function and make it floating point. shifting in binary is the same as moving the decimal dot in decimal, we tend to think in decimal so we think of shifting as multiplying or dividing by 2 but we are actually moving the binary fractional point that happens to be base 2.
Thanks for your code by the way I will look into it since it looks like a nice compromise between both methods.
Edit: for clarification the 500000... constant represents 2^-1 or 1/2 or 0.5 so when you shift that constant right it now represents 2^-2 or 1/4 or 0.25, further right shifts gives you the next negative exponent for the fractional part.