That is not the way I would do it. Since I do not see the rest of your code, I cannot help you. It looks as if you are feeding a character at a time, but where is this coming from.
Anyways, here is my example elaborated out :
-------------------------------------------------------
Step1:
I have an OSD_raster_X&Y pixel position counter which runs only where I want my OSD text.
My text display memory in my old designs were dual port RAMs in the FPGA.
Port A was an address, WE and data in where I could write text to my display memory.
Port B of that ram was read only where it's address was wired to the OSD_raster_X position shifted down by 4 bits and the OSD_raster_Y position shifted down by 4 bits. (My font was 16x16 pixels). The data output was called 'Display_Character' and it came on the next pixel clock since them ram module was registered.
--------------------------------------------------------------
Step2:
I make a copy of the LSB 4 bits on the OSD_raster_X&Y pixel counters, hence now delayed by 1 pixel clock, now called DLY_OSD_raster_X&Y.
My Font memory is also a dual port ram (Just so I have a port to edit the font in software).
Port A (Address, WE, & data inputs to allow software modification of the font)
Port B read address was the DLY_OSD_raster_X,Y, which came from the OSD position counters, plus, the 'Display_charcter' from the display ram's Port B data out. My font's Port B output was 2 bits which then went to the palette memory, then display.
---------------------------------------------------------------
Now, in this example, it is not that my code runs only Step1, then only runs Step2. Both always run simultaneously and always. What is going on is that every pixel clock, the screen position counters only add, or reset at HS and VS. These counters realtime feed the display memory's PortB read address. At the same time, my Step2, even though the first pixel clock shot around had bad data, still clocks in the delayed lemon contents DLY_OSD_raster_X&Y coordinates and the lemon output of the display memory 'Display_Character'.
It is at the next pixel clock, the display ram's 'Display_Character' and it's DLY_OSD_raster_X&Y coordinates have the right data from the previous pixel clock's position which now feeds the font memory's address input.
At the next clock, the display ram's address input is now on the third pixel position. The font's address input is on the second pixel position and the font output finally has the first pixel position valid pixel. This pipe stream goes on and on. This is a valid pipeline for maximum speed.
What's important here is that no math is being done at all, all you need to fee into this is a reset for the OSD X&Y counters and always increment the X when not in reset and increment the Y once on each HS unless it's in reset. Everything else is just shifting blindly through registers or ram blocks.
Now, I left out the part of when to reset the OSD's internal X&Y counter which is done with 2 flags, or registers, which is calculate done in advance of Step1. The FPGA can increment or reset these 2 X&Y position counters without doing any add or subtract from you master reference raster generator's X&Y counters to position the OSD on the display using something simple like:
---------------------------------------------------------------------------------------------------------------
if (raster_position_X == X_OSD_POSITION_LEFT_PARAMETER) {
x_osd_reset <= 0;
} else if (raster_position_X == X_OSD_POSITION_RIGHT_PARAMETER) {
x_osd_reset <= 1;
}
if (raster_position_Y == Y_OSD_POSITION_TOP_PARAMETER) {
y_osd_reset <= 0;
} else if (raster_position_X == Y_OSD_POSITION_BOTTOM_PARAMETER) {
y_osd_reset <= 1;
}
-----------------------------------------------------------------------------------------------------------------
(Your parameter may also be a register if you want a software programmable OSD position window and programmable X&Y size.)
Notice here, there are only 4 equality compares, which generate the x&y_osd_reset registers. These registers would control my 'OSD raster X&Y' counters, and also be passed through (register delayed) to the display output as an inverted enable OSD for the MUX.
Basically:
---------------------------------------------------------------------------------
if (x_osd_reset) begin {
OSD_raster_x <= 0;
} else OSD_raster_x <= OSD_raster_x + 1;
if (y_osd_reset) begin {
OSD_raster_y <= 0;
} else if (HS) OSD_raster_y <= OSD_raster_y + 1; // (This assumes HS is 1 pixel wide, otherwise you may need to adjust this code. I can give you a simple foolproof trick, but, you'll need to ask)
----------------------------------------------------------------------------------
Think this through as you may adapt some of your code without going way off the mark all the way to my design.
And don't forget I have a:
-----------------------------------------------------------------------
OSD_output_enable_early2 <= ~x_osd_reset && ~y_osd_reset;
OSD_output_enable_early1 <= OSD_output_enable_early2;
OSD_output_enable_early0 <= OSD_output_enable_early1;
OSD_output_enable <= OSD_output_enable_early0;
------------------------------------------------------------------------
This generates the OSD_output_Enable in sync with the 3 registered delayed memory clock cycles.