Given that this keypad simulator is just doing keypad simulation and nothing else, it can spend all of its time polling a port for the user's column input. If you have an interrupt driven system, you have the latency to deal with.
I checked latency for a PIC16F triggered by an external interrupt - 4 cycles, worst case.
The shortest polling loop I can do is 4 cycles. But when user data is "true", it takes 3 cycles (no jump to start-of-loop) -- polling wins by 1 cycle.
When polling just misses the data, it's 4 cycles until it can read it again plus 2 more to fall out of the loop -- interrupts win by 2 cycles.
So given any interrupt latency at all, I think you would still need to be running faster than the user micro.
Ages ago, we did this very thing on a Raytheon 500 based system. Instead of a micro we did a Mealy-Moore machine. It was so much faster than the Raytheon bus that none of this was an issue.