The only issue I was seeing is that if I want to operate at 100 Hz at 2kHz, that's 20 different values for the PWM if I want to operate at 100 Hz at 100 kHz that's 1000 values for the PWM, and was thinking that is a lot of interpolation that needs to occur and thought it would distort my output.
If you're worried about distortion, well the great thing is you can run through all the numbers ahead of time, in a spreadsheet or whatever, and not wonder at all. You can prove it. Not to mention you have much greater analytical tools at hand than you may in the lab: run an FFT on the series? No problem! Error from ideal (sin)? Easily done. Simulate inductor ripple (say with the aim of compensating for its centerline value)? Can do! Anything else, done practically, means you have to build the whole damn circuit, and program the system, AND fix all the bugs that you put in that also made it blow up multiple times, oh and replace the components that died in the process -- and why make so much work for yourself?!
And there are other ways to interpolate:
First off, the naive interpolation requires a set of (x, y) points, performs a division (slow!) to find the distance between points, then multiplies by the offset to get the difference. And some arithmetic to patch up the differences.
Well, if the table is fixed spacing, the division can be reduced to a constant. That speeds things up greatly. (Note that division by a constant is equivalent to multiplication by its reciprocal, give or take some shifting to account for the decimal point.)
If you want to minimize the size of the table (say you're short on memory -- note with 32/64kB Flash to spare, you'll have to be quite busy with other functions to be sweating about merely a couple kB!), maybe you store precalculated divisions for arbitrary sized steps, doubling the amount of data to store per point but greatly reducing computational overhead and number of points required (i.e., more points can be clustered closer in regions of rapid change).
Maybe you choose something more nuanced than a linear interpolation. We can rank types of interpolation by the order of the function used to smooth it.
Zeroeth order is simply taking the nearest point. Pretty gross, right, well, think of it as the rounding error of a conversion table. It's a resolution thing, you're always giving up something. You could always pick more points -- potentially as many as you have possible inputs, which isn't even bad if you had say an 8-bit phase accumulator here.
First order is linear: we take the extra bits (maybe our lookup table is 8-bit (256 elements) but our phase accumulator is 16-bit so we take the low byte (remainder)), and use that to interpolate between adjacent elements. Nice, but if the curve is supposed to be, well, curved in this spot, well, we're missing a lot; the line segment has to cross through that arc segment in order to best-fit to it. And it can only do that in two points (or very rarely, three or more, only where the curve doubles back on itself).
Second order is quadratic: we record a value for that curvature, and do a quadratic spline interpolation.
Third order is cubic: we record not just the curvature, but the curvature's curvature, as it were.
And so on. We can generalize this to any order polynomial, and we merely need to store the start/end points of each segment, and the curvature rates (coefficients) in each segment.
In fact, if we have a smooth enough function, we might not need divide it into any points at all. We can trivially compute the squaring function f(x) = x^2 by just, well, computing the square. A quadratic spline will fit any range of this curve perfectly; they're equivalent for this purpose, and the same is true of other locally-quadratic-like curves.
Maybe we need a couple points yet, or a couple higher-order terms, to get a better fit. A circle for example, has the form f(x) = sqrt(r^2 - x^2). sqrt() is a heavy-weight function, but we know splines fit that shape very well (indeed, perfectly, for parametric quadratic splines, i.e. {x(t), y(t)}; the explicit function y = f(x) has a harder time, however).
As it happens, sin(x) has the form: x - x^3 / 3! + x^5 / 5! + ..., and since the factorial denominator increases quite quickly (compared to small x), we know this function converges quite quickly. If we're only interested in the range from say -pi to pi, we might only need a handful of terms to solve it -- even for bit-exact accuracy!
Which in fact, is exactly what they do, in most floating point libraries -- as far as I know. Floats (i.e., ~24 bits accuracy) might need 5 terms or so.
But if you're only feeding say a 16-bit DAC, do you really need the time taken to compute those higher terms?
Or, for a 100kHz PWM counter clocked at even 100MHz, that's still only 10 bits of timing accuracy (granted, some timers support vernier or extended-precision timing features), so it seems unlikely you need anywhere close to even
half the accuracy afforded by full floating point precision!
So, there is advantage to be gained by truncating the calculation earlier, taking matters into your own hands.
Also, as it happens, when the sin(x) Taylor series is truncated, the coefficients can be tweaked a bit to achieve a best-fit condition (the Taylor theorem merely equates a function to a polynomial series plus an error term; only if that error term decreases sufficiently fast with the number of terms, will the truncated series serve as an effective approximation, and at that, with error given by the value of that error term!). When you're computing values from approxmiations like this, the best idea is to just plug everything into a spreadsheet, or MATLAB/Octave, or Python or whatever programming/mathing environment you prefer, and just let a solver (root finder algorithm) search for optimal values.
Anyway, the value of polynomials, for embedded platforms, is the cheapness of multiplication and addition, at least on most. Even on AVR, I can calculate the 5th order correction for, say, a thermistor transfer curve, with guaranteed worst-case error comparable to the ADC (12 bits ~ 0.05°C), needless to say, greatly outperforming the spec of the thermistor itself (~1% say). And all this in just 300 cycles or so -- maybe a bit more than you'd want to spend in an ISR (about 15µs on AVR), but also something that can be done, evidently, many thousands of times a second! And obviously that can be greatly improved on a 32-bit platform like STM32, by both the higher clock frequency and the wider and more efficient operations.
For reference, earlier this year I did a digital control on AVR, where the ISR computes a new timer value at up to 50kHz. ADC acquisitions are interleaved with timer interrupts, bringing in process values (power supply stuff, so, voltages and currents, also interleaved with housekeeping stuff like temperature measurements). The control consists of basic DSP operations: low-pass filters and a PID loop. It's pretty tight on CPU cycles at max frequency (~80% used), but as you can see, even for such a humble platform, real practical DSP work can be done! Granted, this is an optimized case (hand ASM), but that degree of optimization will not be necessary on an STM32, at least before going to much higher sample rates.
The low-hanging fruit is to just reduce work required at all. Avoid floating point (fixed point is absolutely worth understanding), simplify whatever math you can, reduce data flows, and finally only then, look into numerical tricks like these, or the last resort, hand-written ASM.
Tim