One easy mistake is to put the watchdog kick in an interrupt service routine (ISR). When your main loop has crashed for some reason, ISRs will most likely continue to operate just fine. So watchdog kicks need to be at strategic places in your main loop.
Another approach when using complex code comprising several parallel tasks is for each task to set an "I'm running OK" flag. The watchdog kick routine then checks that all the "OK" flags are set before kicking the watchdog and clearing all the flags.
It's quite easy to distinguish in your code between watchdog-initiated resets and power cycles. I select a digital input port, connect a capacitor to 0V and a resistor to Vcc. I set the values such that it takes a few seconds for the voltage at the port to reach a logic 1. Early on in my code I check the value of the port: if it is logic 0, then the chip has just powered up; if it is logic 1 then it must have been a watchdog reset.
If you are concerned about instability as the voltage on this digital input port slowly ramps up you can add a Schmitt trigger, or alternatively use an analogue input port. So far in my projects it has never been necessary.
Finally, another thing you can do is move the watchdog function off the chip completely. I used a 556 timer chip connected to the reset line of the chip. The code kicks the external watchdog chip via a digital output port (resetting the timer). When the first 555 (there are two 555s in a 556) times out it triggers the second 555 set up as a monostable, which creates a reset pulse lasting (say) 500ms or so.