Use a dual port, dual clock fifo in the FPGA. The output of the fifo should run at the pixel clock. On the internal system clock use the 'Almost Full' flag to decide how to transfer data. This method can have syncing issues as you need to pass the horizontal alignment into the buffer and fill the buffer with exact pixel counts per line.
My working method is a dual port ram, which has a multiple of 2048 x (X lines of cache) x 24 bits. On the output clock side, running at the pixel clock, with my horizontal raster line generator and address counter reset to the beginning of active video, on the output of that dual port ram I feed the data into my HDMI out and for the MSB on the dual port ram, I place a 2 bit counter which increments on the HS out only during an active video region and it resets on VS. This means, if the output mode is 1920, in that cache ram, I will waste 2048-1920=128 pixels. In lower resolution modes, less of this buffer will be used. On the system clock side, all I monitor is an asynchronous VS from the output and the 2 bit counter on the output to tell where my vertical position is in my 4 line output buffer. In other words, right after a VS out before new active video, I begin to fill my 4 line video out dual port cache ram. As it's 2 bit output counter increases, I know I have free new lines to fill. This means I have a video raster generator on the output clock and only 3 asynchronous signals are going back to the system core clock, VSout and a 2 bit vertical buffer position. (This makes making any core DDR video system ram, or scan rate converters a breeze when paging a fill line of DDR ram in at a time making the fastest possible burst leaving blank DDR cycles for other uses)
Yes there are more advanced methods to do this, but with the scope of you project time, choose this extra simple method to fill a clean line by line video out, and the exact reverse for a video in buffer.
The rules get simpler and change for real time processing where the output image format matches the input image format where you enhance video on the fly. But going that way, you will not be able to cache an image in DDR Memory and play back a full screen buffer. You will also need to sync copy in and out syncs with and appropriate delay of clocks or line to do this.