I try to follow:
you are copying relevant info to ram, in binary format (efficient) each ISR() call, and in the superloop you (slowly) dump it but only when your ram buffer is full.
by doing so (having a blocking dump) you may (and probably will) clog your superloop to the point that you probably have moved all the rest of the code to ISRs.
My projects tend to have, especially on ARM Cortex (because it has such good and simple to use interrupt priority system, including software interrupts allowing continuing a high-priority task as low-priority task), either completely empty "superloop", or superloop which does totally non-timing critical things, such as blocking networking code or UART debug printing.
Yeah, rest of the code in ISRs. That works very well. It is like event-driven programming. (Really the main difference to using threaded RTOS is, with simple interrupts, tasks are always triggered so that function execution starts at the beginning, while RTOS threads can have longer functions that wait for other signals and continue execution. In bare-metal (no OS) project, you only have ONE thread capable of that behavior (what you call superloop). Sometimes I take advantage of that; sometimes not.) Interrupt-driven bare metal works very well in projects where you don't have dependencies on poorly designed, blocking libraries.
Besides, blocking UART print has completely predictable timing.
why you wait for buffer full before start dumping? to simplify (and fasten) things without the circular hassle?
That was just a made-up example which I wrote in 3 minutes in the reply window, to communicate the idea with code instead of wall of text. Obviously you choose what you exactly want to do. For periodic interrupts (like control loops), checking for "trace full" condition is good because it happens quickly anyway, and if you have too much data, just ignore the rest.
also i miss completely the role of trigger_condition :-(
I mean something like a pushbutton "start recording", or maybe trigger it when motor turns on, some voltage level is exceeded, or maybe even leave that code in production and trigger on an error condition. What I show is, after all, pretty equivalent to what storage oscillosscopes do, but with insight into internal state variables if you so wish.
The only other (possibly even better performance) option really is hardware trace monitor system (like Segger RTT and friends). The massive advantage of your own code instrumentation is, it works in production, over the air, whatever. You don't need to plug in a debugger and hope to catch the event on the lab table!
And performance gain between printf() and binary data collection is probably nearly two orders of magnitude, and same can be said about RAM usage. You get much longer trace in the same amount of RAM when you don't need to store repeated text labels, separators, or inefficient decimal numbers.