It's not a matter of calculating if something is within the limit, it's not so simple, you have a decision variable that is a true hidden state
I mean, if you close your eyes and ignore the accumulator in the middle of the algorithm, maybe.
therefore it's a matter of having a floating error that changes at every step, while the algorithm needs to take branch according to the decision variable, which is not predictable at each step.
How is it not predictable? It is the remainder of a division operation. These things aren't new by any means -- division, factoring and LCM/GCM algorithms were known to the ancients (and probably not even invented by them, that is, by the ones who left written history).
The ingenuity that went into them, is probably harder to understand though... many of these things, you look at them and it seems as if they're a gift from the heavens themselves.
Myself personally: I probably developed some of my insight on this subject, working with raycasters (i.e., the pseudo-3D technique famous from Wolfenstein 3D and such). In this case, you must calculate not only the grid locations touched by a given ray, but the exact sub-pixel of each grid line intersection (which is used for the wall texture coordinate). Some linear algebra takes care of perspective correction (a linear sequence of vectors is rotated by the viewing angle, and the iteration proceeds according to the X and Y components of the vector), and then you have it.
It's a problem I come back to every couple of years (or decade, it seems closer to now). My first version was the naive add-a-fixed-distance-and-test method, which is very inefficient and coarse. Then I learned the slope-intercept method, which involved a multiplication per step and which blows up under special cases (i.e., parallel to axes). Finally, I developed the dual-accumulator method (which, I forget if that's also what Carmack worked with back then, or if I looked at the WOLF3D source and got it from there..), which uses two divisions to set up the calculation, then only conditional increment and addition in the inner loop. Finally, the accumulator residues are converted to the texture coordinate.
For your case, the endpoints are known, which are the vector components fed to the accumulators. At any point along the line, the accumulator values can be read and converted to the sub-pixel coordinate, in terms of X or Y intercept, however you like.
(If you're interested, I've got a public example here:
https://github.com/T3sl4co1l/raycast_win/blob/master/main.cpp , written in floating point since a PC doesn't care, but it works equally well (with suitable adaptations of course) in fixed point. Also, this isn't textured, so it doesn't have the coordinate transformation step, sorry about that.)
as you see the next point x and y depends on "x += sx;" and "y += sy;" which depends on the decision variable "magic", which depends on the floating error, which may change at each step "err += Dy;", "err += Dx;", and it's difficult to predict the value of the next_step, because if you try to open/unroll this loop in order to calculate the error at each step, you get a recursive function.
It can be demonstrated, even mathematically, as Zeda did.
If you want to know for a specific point, remainderAtX(Dx, Dy, AtX), then you have no choice but to calculate it the hard way, i.e., by division. That division can be performed iteratively, or by the various numerical algorithms known. Doesn't much matter, as long as the results are equivalent (as they must).
The loop can be unrolled, by using C-style logic operators. That is (e.g.):
int condition = xAccum > yAccum; // 1 if true, 0 else
xAccum += Dx * condition;
yAccum += Dy * (condition - 1);
xAccum -= Dy * (condition - 1);
yAccum -= Dx * condition;
On many machines, this will compile as a conditional anyway (i.e., when the only nonlinear operation available is a test-and-branch sequence), but some have an instruction that can perform it without the pipeline hit, a sign-extend or a proper C-style condition-bits-to-register instruction.
Or, on still other machines: multiplication and division may be practically free, compared to, say, IO bandwidth limitations. Vector machines -- GPUs and such -- fit here. Example:
https://youtu.be/bIjrSvGddDQTim