I like this one, it's ultra robust. It works even if input never stabilizes and consists of never ending bounce.
<snip>
This method is equivalent to the method I detailed above, and similar or equivalent to those others have mentioned. The difference is practically semantic: my exact wording uses more shift registers (exactly N) and less logic (a handful of full adders); the above uses fewer registers (one for the input, and Lg(N) for the state) and more complicated logic (full Lg(N) bit adder/subtractor, plus comparators and mux to implement the saturating arithmetic). A sufficiently powerful compiler/synthesizer should be able to generalize more-or-less the same logic from either method, although I don't know if they are quite *that* powerful yet.
So as you can see, it's a pretty good way to do it.
Thanks to the Lg(N) state, and relatively modest logic requirement (loads of combinatorial logic often come for free, e.g. the huge macrocells of a CPLD, or the word-sized addition and conditional statements of software programs), this implementation is much better suited to large N, which through many variations, might be used for debouncing, timers, delay/phase shifter chains, missing pulse detectors, etc.
For the small N counts used for simple debounce, it's not usually worth it, but there might be reasons to do it this way regardless. For example, if your processor has SIMD "packed" instructions, you can do a whole pile of these additions and stuff at once. (True SIMD instructions often use saturating arithmetic anyway, so you don't even have to implement that, it comes for free!) Probably not likely to be relevant on a microcontroller, but a possibility to be aware of.
Tim