On a increasingly off-topic note, all this made me interested in looking at the assembler for my own project I'm working on using an AVR32 chip. AVR-GCC does optimization for free, and (I'm not an AVR fanboy, honest) I was very impressed by some of the things it did. Here's the C code:
uint32_t valu = ...;
valu = ((valu - 10178195) * 71) / 1940 - 20;
Now, for background, the AVR32 can do 32-bit multiply in 1 clock, 32-bit times 32-bit to 64-bit output in 2 clocks, but 32-bit divide takes 17 clocks. So the assembler output, with my commentary added. 8000406c is the highlight for me:
The input valu is r8, the output ends up in r11.
r11 = r8 * 71, done as (r8 + r8 << 3) << 3 - r8 for some reason
8000404e: f0 08 00 3b add r11,r8,r8<<0x3 ; *8(r8)+1(r8) = *9(r8)
80004052: a3 7b lsl r11,0x3 ; *8(r8) = *72(r8)
80004054: 10 1b sub r11,r8 ; -1(r8) = *71(r8)
I guess the highlight for me is the replacement of 17-clock division with 2-clock 64-bit multiply, keeping only the upper 32 bits of that multiplication (plus some other miscellaneous stuff that takes a few more clock cycles). I guess the multiplicand used is sort of the reciprocal of the denominator? The coolest thing is that the only reason I have that weird formula is to approximate a multiplication by a fractional number (I didn't realise division was so expensive); but it turns out I can just use muls.d and take the upper register and get it done in one hit. Learning assembler tricks from your compiler? Awesome!
I added comments following the code what it's doing for the r8*71.
Also the reason the compiler was able to optimize the division is because you are dividing by a constant. If you had a variable on the denominator there is no much the compiler can do.
It seems like AVR-GCC is really working hard to optimize the code by looking what are constants and what not, seems it would have been cheaper to use the muls in one clock instead of that shift, add, shift, subs? strange. Might be a power optimization if the clock savings are not that much.
Edit:
The divide is done by multiplying with the inverse of 1940
0.00051546391752577319587628865979381
In binary (fixed point 0. implied at the left)
(.)0000000000 1000 0111 0010 0000 0011 0010 1010 1101
so it's the constant 0x872032AD (2267034285) shifted to the right 10 bits.
Pretty clever compiler, although it will be really dividing by
1939.9999992077755453971883799719
because of the binary limitations to store fractions and the inverse represented is:
2267034285/(1024*4294967296) = 0.00051546391773626965004950761795044
where 2267034285 is the constant the compiler figured out (0x872032AD), 1024 is the 10 bit right shift and 4294967296 is 2^32 needed to change the meaning of the constant to 2^-32.
Reason being for the inaccuracy is that 1940 = 2*2*5*97 and you can't store 1/5 or 1/97 very well in binary format.
Edit again: that said, you won't find a problem since if you divide max unsigned int32 by either value you'll get the same result 2213900 so the precision should not affect 32 bit unsigned operations by themselves.