As with many disciplines there are almost always gaps in knowledge, and as CatalinaWOW pointed out, writing a few lines of code that use prebuilt libraries does gives some the sense they are 'good at coding for microcontrollers' - with all manner of cool things happening, but with very little 'logic' being written. I see it quite often with those with IT backgrounds starting to do embedded stuff - easily tripped up by things they take for granted on larger CPU's with seemingly endless resources. Those are then countered by what some have called the 'bare metal zealots' who go as far as to say you should code in assembly...
To the op. If you're a software guy I'm sure you're used to writing down exactly what you want to do in flow chart format, and used to using equations. What I think has tripped you up here is that you're dealing with an 8-bit device which has limitations, and therefore can require a different approach to how you do things, to achieve the desired effect.
I was surprised when you said you had realised that right shifting a value (unsigned long) by 16 divides it by 65536 (similarlly left shifting multiples it). The same reason that left shifting a decmial number by 1 multiplies it by 10. Because its an 8-bit device, it stores a 32-bit number in 4 locations, so it doesn't even need to shift any bits for that - it just takes the top two bytes and discards the bottom two. And this is where I believe quite a few people get stuck... bytes, chars, ints, words, longs etc.. are of course finite, whilst I'm sure many software people are aware of this, for micro's, often one needs more than just 'awareness' of logical functions and how the mathematical operations are actually performed on hardware. You may not need it to be fast, and higher level languages give you the option of using floats, dealing with very precise numbers, but what if you're trying to make something use less power? or perform that calculation a thousand times a second? It seems a common approach these days is to 'use a bigger chip'.
That is often why multiplications and divisions for things like averaging, are done to the power of two - to make it much easier for a chip without dedicated division hardware to divide a number. It simply becomes a case of barrel shifting.
As for your project, it would help if you explained in your first post, what it is you're trying to achieve, and what you have tried so far. We've all been 'peeved' when the software we've written doesn't do as we expected, but in every single case - it was my mistake and lack of understanding. (with the possible exception of a tiny bug in MikroC PRO that caused an if statement to always execute..).