You basically need to manually simulate a serial position when sending a fat parallel word into the serializer to get that extended 2x/4x/5x output via software. This is simple when you use for example, a 2:1 serializer or DDR output and have an even PWM size.
Example, say I want a 65536 PWM running at 1GHz, but my FPGA logic can only handle a 500MHz counter max, I would:
Main loop_timer = 65535 -> 0 , -2 per 500MHz clock, begin
Data serilizer 2:1 source bit 1 <= PWM_output_val > (loop_timer - 1) ;
bit 0 <= PWM_output_val > (loop_timer - 0) ;
end
You can imagine using a 4:1 serializer by running the loop_timer at 250MHz, -4 per clock and extending the source bit input to source bits 3,2,1,0.
Or you can do a 8:1, or 16:1 and slow down the speed of your loop_timer and it's section of code. You should be able to use 3/10/20GHz transmitters to make an insanely precise PWM.
Odd numbers like 5:1 or 10:1 can also work, but, the compiler wont simplify out the logic gates a neatly as power of 2s.
(Oh, don't forget to latch 'PWM_output_val' once every loop around of 'loop_timer', otherwise you will get random gobble in the middle if the that reg changes at a lemon time.)
Also flip everything around to positive and add to simplify gate count.
Writing it out my way allows the same code to function on any vendor's FPGA who has a DDR output or a serializer output with a parallel input. Only the IOBUF/OSERDES function initiation will be vendor specific.