>ARM MCU
>UART
This combination is almost never a problem. UART is almost always significantly slower than CPU. Even with no FIFOs at all, you can easily afford the overhead of interrupting for every byte.
Usual problem in thinking is overestimating the interrupt latency or processing complexity. This may be a problem with application processors, but not with microcontrollers.
As a rule of thumb, you can do "a few simple things" - making some sanity checks, bounds checking, writing to a buffer, maybe a few calculations - in 50-100 CPU clock cycles, including ISR entry and exit overhead.
Because a typical UART can only clock up to f_peripheral/8 because of x8 oversampling, and because every 10 baud time slots (start - data - stop) requires at most one interrupt, even if f_peripheral equals f_cpu, and even if you use UART at maximum possible speed, you have 80 clock cycles to process each byte. In reality, UART is usually not used at maximum theoretical possible speed.