Typically after months of work you have something that kind of works but you have locked yourself into something that is hard to modify, so you decide to scrap the code and start from scratch as you now have a better understanding of the requirements, the limitations of the microcontroller and the hardware you are interfacing with.
Even with poorly defined requirements, it's possible to write code that supports iterative development. Avoiding excessive coupling between functional units, for example. A lot of it comes with experience, especially with writing reusable libraries. You learn how to localize dependencies, and how to decide whether those dependencies should be resolved at link time or dynamically (via callbacks, pointers to driver functions, or OOP), and in either case how to make sure the interface between functional units is clean and maintainable. You'll still run into situations occasionally that you didn't anticipate in a library, but you revise it and move on. It's a good idea to write anything that you are likely to reuse as a library, as it means you have that much less to write next time and can get a basic implementation of an idea up and running with minimal investment.
If you're running into hardware issues, perhaps because the way that certain hardware features interact isn't clear, well, that's what prototypes are for. Cheap dev boards are great, as long as you can localize your testing to functionality the dev boards support. Sometimes that isn't possible, and you'll need to design a plug-in board for a dev board, or even a minimal custom PCB. Sometimes you still run into issues and have to re-spin a production board, you do your best to avoid that, but it happens.
As far as the actual edit/compile/upload/test cycle, you can speed up the editing phase with a good IDE and well-designed application, the compile phase with a well organized project (keeping the dependency tree simple so that the impact of each changed file is minimized), the upload phase really depends on the debug tool and the memory size (J-Link probes are noticeably faster than STLinks, in my experience), and the test phase comes back to a well-designed application where side effects are minimized and the application state is clearly understood.