A few points for a painless process. These can be violated for a good reason, and with experience, but I wouldn't recommend doing so.
1) Don't try to push microcontrollers to do
literally cycle-accurate things on software. That's not the intended purpose.
Peripherals are for this purpose, hence take a careful look at the availability of peripherals you need, including a sanity check whether all you need are actually available at the same time (pin mapping, DMA mapping if that applies). Sometimes an
external IC is what you need; likely the problem is already solved for you.
Sometimes you cannot solve the cycle-accurate problem with on-chip peripherals; then look at microcontollers that provide programmable glue logic; otherwise, go for external CPLD. Or, why not a XMOS product
.
2) Don't confuse understanding timing constraints with being cycle-accurate. Two different things, completely different league.
Often, the real-world specification is like "do this thing within 1 us from a trigger signal" and/or "calculate thing A every 10us with max 2us jitter, and thing B every 100us, with max 20us jitter". Neither requires cycle-accurate timing, not even close. It's enough to make interrupt A higher priority than interrupt B, make sure interrupt A runs in a few us, and never disable interrupts for more than about 1us, which is trivial if you follow sane practices of only disabling interrupts for short atomic operations, typically a few cycles.
Note that on a typical mid-cost ARM MCU running at around 100MHz, the interrupt latency is around 120ns, and may have some jitter around 50ns; for example, if a lower priority interrupt entry is interrupted by a higher-priority interrupt, the CPU is "wise" enough to utilize the stack-push work already being done, lowering the latency in that particular case, increasing the jitter. Some will argue it's a good feature, others will see it as a bad feature. But the jitter is still in tens of nanoseconds! Does that matter? I have never yet seen a case where it does, all high-speed communication is standardized and working through the peripherals. Peripherals designed to work with high-priority, high-accuracy safety signals, such as motor controller peripherals, implement their own low-latency, low-jitter input channels directly into that peripheral. (Like a motor controller / power controller bridge overcurrent signal.)
3) Leave margin. A lot of margin. The less margin you have, the closer to cycle-accurate your analysis needs to be, the slower and more demanding it gets to design.
But CPU performance is cheap! An MCU which costs $1 more may save you $10000 in development. You need to sell more than 10000 to break even. This is only counting the direct employment cost. In real world, you can see non-optimized solutions (i.e., "excess" CPU performance) even in large-scale mass products, because having to rely on optimization, and having no option of adding SW features later during the process without PCB respin, is something that will also stretch the calendar time, possibly not solvable by throwing larger engineering budget; the product launch cannot wait. Hence, they throw more CPU power at it from the start, even if it costs more in components.
Margin allows you to leave minor details out of your timing estimate. Margin saves your day when you forgot about a thing you need to cover, and need to add new, unexpected code.
4) Do actual projects to gain experience. With some experience, you can hand-wave a timing estimate that is accurate, say, from -50% to +100%,
in minutes. It's valuable: upon seeing the expected specification, you can instantly give red or green (or maybe yellow) light to the project, and give an estimate whether you need a $1 MCU or a $100 FPGA, or maybe a $10 application CPU to pull it off.
5) When #3 fails you, or more likely, you fail following #3: Save an actual project with low-level optimization once or twice, so you learn there usually are ways out of trouble even when you run out of your margin. Don't make it a habit, though. You likely miss the deadline; and requiring low-level manual optimization regularly tells something about your margins.