Writing cycle-accurate asm and interleaving all code manually between timed events is possible, but extremely time-consuming, and a nightmare to maintain when you want any changes to it. Using a bit more modern CPU core from this millennium would give you not only cheaper parts with better availability, but things like priorized interrupts and enough clock speed to overcome interrupt latency, making firmware development a lot easier. For example, I have a real power + RMS current + power factor measurement (and much more) running on Cortex-M3 @ 64MHz so that ADC DMA finishing triggers one high-priority interrupt (@ 4kHz or so), which processes the samples, and once enough samples are processed after 3-second integration period, said interrupt triggers a lower-priority software interrupt, which has full 3 seconds to do anything it wishes, while the higher priority interrupts keep running in the background. And the rest of the CPU time can be used for doing anything you wish, e.g. UI, either in the main loop (which is still completely free), or other interrupts.