Thanks for the replies!
As I said, there is no shortage of relatively cheap devices to do this - I had noticed the PCA96 series lin, but thought they were for slower PWM (servos). TI has many options.
I'm actually only after 1 -3 channels, and I'm going to order a TLC5940 (stock 12-bit) and hopefully if I can source one a TLC5948 (with the newer "ES" PWM) for comparison. As I said, this is partly because I want a nice smooth fade for a single lamp (which could be the only light source in the room, which means flicker can be more noticable) and partly out of interest as to what "looks" good. On top of this I have some PIC12F1572's I got for this very task a couple of years ago - I'll try the ES/spread algorithm with one LED, and stock 12-bit on the other. The PIC is fast enough to do 12-bit PWM at ~2kHz, so I can play about with the master clock and compare which has more/less flicker.
Mike, I was kind of hoping you'd chime in, after seeing your video on "driving LED matrix displays with an FPGA"
which provided foods for thought
(binary modulation for one). I'm not sure I follow your pseudo code. I was initially going to do something like the following:
-Group 16 8-bit PWM periods to give effectively 12-bit PWM over a "super period". Using 8-bits is easier for data alignment whilst still keeping this relatively fast.
-Use a 16-bit 16 entry lookup table that has bit patterns for incrementing each of the 16 PWM duty cycles.
-Get PWM value (12-bit), divide by 16 (actually shift right 4 times) to get the base duty cycle value for each of the 16 PWM periods.
-Get the remainder (just the lower 4-bits of our 12-bit value) and..
-Use the above 4-bit value and the lookup table to get a 16-bit pattern, cycle through each duty cycle, and increment based on the lookup table pattern.
This seems like a lot of work just to update a duty cycle value for PWM, but it'll actually be pretty quick, and given that a PWM "super period" will be 16x 8-bit periods, one can easily update faster than the whole PWM period. Even then because this is for slow fades I will not be drastically changing the brightness regularly so whilst the PWM clock should be quite fast, in terms of software, doesn't have to be fast at all. The only thing that may be slow is squaring for gamma correction - but most of the devices I use these days have at the very least a 8x8 hardware multiplier.
It seems you're doing something quite similar but "on the fly", that is, adding 1 to each PWM period within the super-period, as and when it is required. Again using the upper 8-bits as the base PWM value, and the lower 4 to for the accumulator? Which seems way easier and spreads out the algorithm across the whole PWM period (much better for all interrupt driven implementation). Is this right?