There's your problem! As you pointed out, Arduino libs are a joke when it comes to this sort of thing. Don't be hating on interrupts because Arduino is bad at it!
...the pin change doesn't tell you which edge it was so you may poll it after the next bounce.
Missing bounces isn't a problem; as long as you're not using a rotary encoder that's so terrible that the bounces actually never stop, simple state machines will work. I do take your point that keeping timer interrupts from being starved can be difficult; it depends entirely on whether you're trying to handle higher priority interrupts and whether you've got too much code in those interrupts, which can be a question of some subtlety. So I do agree with you that in many cases, a dedicated MCU is a reasonable solution.
I'm not hating on interrupts, I have written perfectly good uC rotary decoders before otherwise I wouldn't feel qualified to comment on it
. I am just saying it is a non-trivial problem and making things easy with hardware is so trivial in comparison. Unless I was making a billion of something, saving a few cents on cost for many hours of debugging and reduced flexibility in the main uC is not worth it. And every time you write a firmware upgrade or try to add functionality, you have to take how it interacts with the decoder into account so it is an ongoing dev cost you are committing to that can't be amortized between projects because the particular solution is always tailored to whatever else the uC is actually doing it has to interact with.
I was mainly using arduino as a visible example of what happens when people don't think about it and just assume "it can be done in software" as i see it all the time. Most of the arduino libraries have this issue where they don't combine well, people will get the rotary encoder sketch working and think great, then try the lcd driver sketch and it is working, then design their hardware and suddenly be up the creek when they try to use both libraries at once and find they do not play well together at all without completely rewriting both to take each others timing constraints into account. And then if you didn't have the foresight to route your encoder to the edge interrupt capable pins, that's some bodge wires you are going to be soldering. sigh.
I think a lot of open hardware designs suffer from this especially, since the hardware tends to be designed first then it is up to the community to write software (not that there is anything wrong with that order of doing things). But spending a slight amount more time in the hardware offloading as much as you can from the uC (whether it will actually be strictly necessary or not) makes everything really nice in the future when you try to write the firmware.
Note this doesn't apply to a dedicated uC properly implemented, because then you know beforehand there is nothing else the uC is doing that might cause an issue in the future, whether you do that or have some discrete logic depends on what you are doing. For reusable circuits, I tend to design a discrete logic solution with jellybean chips once and never worry about it again. The schematic is good forever with many different logic families and doesn't depend on any particular manufacturer part or devel environment still existing in ten years when I revisit it or just want to use it as a black box in something new. If it turns out something I design is going to be produced by the millions where saving a dime makes a huge difference to he bottom line then I can afford to spend the time rethinking things for a respin which will probably need to happen anyway.