Fixed the DMA memory copy.
We can change the CPU buffer at 60 fps, that is if you can do much in 16.667 ms minus the time it takes to refresh the screen.
Still plenty of time for a lot of things since that is 666,666.7 clock ticks per frame at 40MHz CPU clock and 60Hz frame rate.
The refresh fits during the vertical retrace (28 lines out of 628 lines total) so we have at least 636,942.7 CPU cycles left when it's not refreshing because the DMA copy is probably less than the 28 lines required.
That means we have at least 15.924 ms left out of the 16.667 per frame to do things to the CPU frame buffer.
Edit: 800x600@60Hz with 60fps update time from the CPU on a small Cortex M3 MCU on a $10 dev kit is pretty cool IMHO.
Current schematic so far:
Added the archive on that post (archive 09) and also included the main.c as an attachment (DMAmain.c)
Next the Verilog optimization using the UDBs and Datapaths to make it even more resource efficient and hopefully allowing us to go back to the internal oscillator instead of the OCXO
Edit: the Verilog optimization might take a while because explaining how the UDBs and Datapath work is going to take a lot of explanation, but I'll try to keep it concise.
Edit again, since the MCU can handle 80MHz I could derive that by halving the PIXEL_CLK by 2 and doubling the number of ticks we have to spare when it's not refreshing. Problem is that I want to target the internal IMO so we can't hit 80MHz, but there is still plenty of room for improvement, with the OCXO we can double the performance FWIW since we are at just 50% of what the MCU can do.
Edit again once more: with the MCU at 80MHz at 800x600@60Hz our idle time would be 1,273,885 cycles on idle time, maybe better since the DMA memory copy would take less than the 26 lines frame time. This chip really rocks!
I do wish that the PLL derived clocks were separate, meaning that we could bias the digital clock by 2 and leave the CPU clock at full 80MHz but there are ways around that as well.