Check on SysTick for Cortex-M processors.
Yes, it's the same thing I mentioned but for ARM processors, it returns the clock ticks elapsed. But you have to know the frequency of your CPU and sample a well known oscillator that can give you a second worth of ticks that you can rely on for the rest of the program.
If you don't care about actually keeping time, then that is fine. But if you are going to use it to interface with other devices or humans, I would recommend you can accurately determine at least milliseconds without drift (so not discarding ticks and keeping them around for the next loop)
That's easily implemented. Here is one example:
time0=systick_get(); //obtain current time
while (systick_get() - time0 < desired_duration) continue; //wait for desired duration to pass
//do your things.
You can implement it, I am sure, a gazillion different ways too. But the basic idea is to have one free-running timer at all times.
But that doesn't keep a constant time, just delta times. Actually just delta ticks.
Having a master clock that doesn't drift and you can determine milliseconds since start at any given time would be a better approach.
say systick_get() - time0 < desired_duration gives you 3 extra ticks over desired_duration, you have drifted then 4 ticks.
On your approach you are really just delaying for the next frame, which is fine for applications that need to keep up a desired frame time say 120Hz per program loop then eat up extra cycles for the next frame. But you have to keep the running clock to prevent the tick drift even if it's probably no more than 4 ticks per loop.
Edit: but if it was for keeping up with a constant frame time, I would do the work first then find out how many ticks I have to delay by, instead of delay first then do work.
And I'm all for having one free-running timer at all times, that's why I suggested it in the first place