my rambling are not kleins', I'm just trying to infer each part of how these work,

My understanding of it is, lets say you feed in 7V, by feeding it though a fixed resistor we get a current or coulombs per second, in this case 70uA for 20ms, so removing the time component is 1.4 micro-coulomb,

You need to cancel out that amount of charge to get the integrator back to the switching point of the comparator, of which you have 2 other voltages you can feed through a resistor to cancel that charge,

In our current case that is 13.4V (2.68 micro coulomb), and -12.2V (-2.44 micro coulomb), it may be easier to think of these as 2 lines of a fixed slope,

Your trying to measure as best you can the ratio of these 2 that cancel out the charge, while still adding up to 100%, e.g. 40% negative and 60% positive. of which there is only 1 point it can happen, you can see this in the modulation pattern if you imagine first using all the positive bits, then using all the negative bits,

The pattern is mainly there to ensure at least 1 step of the opposite reference every now and then, for charge injection reasons, (or so I assume) but could also be to deliberately make the integrator go a bit further from 0,

so lets say you only counted the modulation patterns, a total of 1000 for a 20 step pattern, you can think of it like a triangle with a hypotenuse of 1000, with the +Count and -Count representing the contribution of reference, now that duty cycle is a fixed number after your conversion, but as its rare it will perfectly cancel with only those 1000 pattern steps, you are left with some residue, in the example image, you would use this to extract extra information, and move the true duty cycle amount to correct out that residue,

I was trying to determine this point by using the zero crossings to sum up the ratio of time above 0 (an excess of charge has been removed), and below 0 (there is still charge to be removed) to extract more information out of the pattern, on top of the residue, in a sense trying to make it behave like a much faster modulation frequency, but was still unwrapping the math on it, as its not as clear to imagine as the residue example. best example would be capturing lots of smaller triangles, and using them to reduce the uncertainty of the total count / residue

The PWM approach I was describing above could be used as a true PWM, by changing the ratio of positive and negative. per pattern, but in reality, the steeper the slopes those references contribute, the more information that can be extracted by the residue, as if you imagine it, if this triangle was a lot taller, moving that duty cycle to the left to remove the residue moves the duty cycle point less, with the most usable information extracted when that residue contributes only about 1-2 counts,