Also often the need for low latency if to do some tiny simple operation, like write one GPIO pin, then continue doing something else which is not that latency critical anymore. You can split stacking in two parts (maybe with the first part with zero registers stacked, or just one). I don't know if you can control C compiler to do this for you, so may need to use assembly for the whole ISR.