Author Topic: FPGA VGA Controller for 8-bit computer (Read 426441 times)

BrianHG · « **Reply #3050 on:** December 12, 2021, 11:00:06 am »

A little typo above. I just had the closing brackets in the wrong place:

localparam endian_h16 = (ENDIAN[0] == "B") ? 1 : 0 ;
localparam endian_l16 = (ENDIAN[0] == "B") ? 0 : 1 ;
localparam endian_h32 = (ENDIAN[0] == "B") ? 3 : 0 ;
localparam endian_m32 = (ENDIAN[0] == "B") ? 2 : 1 ;
localparam endian_n32 = (ENDIAN[0] == "B") ? 1 : 2 ;
localparam endian_l32 = (ENDIAN[0] == "B") ? 0 : 3 ;

My earlier example would probably still function.

nockieboy · « **Reply #3051 on:** December 12, 2021, 07:17:42 pm »

Okay, so the changes to the 16- and 32-bit registers results in this:

Looking okay?

Quote from: BrianHG on December 12, 2021, 09:27:24 am

BTW, I still do not like your reset values system.
Remember, we want to specify the same Z80 address and data if it were a Z80 equivalent write.

... so instead of using zero-based addresses for the default byte values, you want full address values as if the Z80 were writing these values? At the moment, the HW_Regs__8bit array starts at 0x0000 as BASE_WRITE_ADDRESS is 20'h0, but we're not doing any processing of the incoming address so when we change BASE_WRITE_ADDRESS, we're going to jump outside of the HW_Regs__8bit array very quickly?

Is this needed, then?

Code: [Select]

HW_REGS__8bit[( (ADDR_IN[HW_REGS_SIZE-1:0] - BASE_WRITE_ADDRESS) | (i^(PORT_CACHE_BITS/8-1)) )] <= DATA_IN[i*8+:8] ;

I can then do this with the reset lines:

Code: [Select]

HW_REGS__8bit[{(RESET_VALUES[i][29:17] - BASE_WRITE_ADDRESS), 1'b0}] <= RESET_VALUES[i][ 7:0] ;
HW_REGS__8bit[{(RESET_VALUES[i][29:17] - BASE_WRITE_ADDRESS), 1'b1}] <= RESET_VALUES[i][15:8] ;

... and then the default RESET_VALUES can be specified with literal addresses for the HW registers being set to default values, instead of zero-based addresses. Is this what you meant?

Quote from: BrianHG on December 12, 2021, 09:27:24 am

I would personally make 3 sets of input reset parameters, one for 8 bit values, one for 16bit values and one for 32bit values. The 3 input parameters each would have their own counter for the number of presets and you can use the same 2&4 Endian localparams I created in the previous post to define how a 16bit or 32bit reset default ends up sorted into the 8bit regs.

Something like this?

Code: [Select]

module HW_Regs #(

    parameter string      ENDIAN                           = "Big" , // Enter "B****" for Big Endian, anything else for Little Endian.
    parameter int         PORT_ADDR_SIZE                   = 19    , // This parameter is passed by the top module
    parameter int         PORT_CACHE_BITS                  = 128   , // This parameter is passed by the top module
    parameter             HW_REGS_SIZE                     = 14    , // 2^14 = 16384 bytes
    parameter int         DEFAULT_PARAMS                   = 16    , // Indicate which set of RESET_VALUES to use; 8, 16 or 32 bit ones
    parameter int         RST_8_PARAM_SIZE                 = 4     , // Number of default values
    parameter int         RST16_PARAM_SIZE                 = 2     , // Number of default values
    parameter int         RST32_PARAM_SIZE                 = 1     , // Number of default values
    parameter int         BASE_WRITE_ADDRESS               = 20'h0 , // Where the HW_REGS are held in RAM
    parameter bit [23:0]  RESET_VALUES_8[1:RST_8_PARAM_SIZE] = '{
            {16'h00, 8'h10}, {16'h01, 8'h00}, {16'h02, 8'h10}, {16'h03, 8'h00}
    },
    parameter bit [31:0]  RESET_VALUES16[1:RST16_PARAM_SIZE] = '{
            {16'h00, 16'h0010}, {16'h02, 16'h0010}
    },
    parameter bit [47:0]  RESET_VALUES32[1:RST32_PARAM_SIZE] = '{
            {16'h00, 32'h00100010}
    }

)(

    input                               RESET,
    input                               CLK,
    input                               WE,
    input          [PORT_ADDR_SIZE-1:0] ADDR_IN,
    input         [PORT_CACHE_BITS-1:0] DATA_IN,
    input       [PORT_CACHE_BITS/8-1:0] WMASK,
    output  reg                 [  7:0] HW_REGS__8bit[0:(2**HW_REGS_SIZE-1)],
    output  logic               [ 15:0] HW_REGS_16bit[0:(2**HW_REGS_SIZE-1)],
    output  logic               [ 31:0] HW_REGS_32bit[0:(2**HW_REGS_SIZE-1)]

);

localparam endian_h16 = (ENDIAN[0] == "B") ? 1 : 0 ;
localparam endian_l16 = (ENDIAN[0] == "B") ? 0 : 1 ;

localparam endian_h32 = (ENDIAN[0] == "B") ? 3 : 0 ;
localparam endian_m32 = (ENDIAN[0] == "B") ? 2 : 1 ;
localparam endian_n32 = (ENDIAN[0] == "B") ? 1 : 2 ;
localparam endian_l32 = (ENDIAN[0] == "B") ? 0 : 3 ;

wire enable   = ( ADDR_IN[PORT_ADDR_SIZE-1:HW_REGS_SIZE] == BASE_WRITE_ADDRESS[PORT_ADDR_SIZE-1:HW_REGS_SIZE] ) ;   // upper x-bits of ADDR_IN should equal BASE_WRITE_ADDRESS for a successful read or write
wire valid_wr = WE && enable ;

integer x;
always_comb begin
    for (x = 0; x < (HW_REGS_SIZE**2) - 3; x = x + 1)  begin
        HW_REGS_16bit[x] = { HW_REGS__8bit[x+endian_h16], HW_REGS__8bit[x+endian_l16] } ;
        HW_REGS_32bit[x] = { HW_REGS__8bit[x+endian_h32], HW_REGS__8bit[x+endian_m32], HW_REGS__8bit[x+endian_n32], HW_REGS__8bit[x+endian_l32] } ;
    end
end

integer i ;
always @( posedge CLK ) begin
    
    if ( RESET ) begin
        // reset registers to initial values
        if (DEFAULT_PARAMS == 8) begin
            for (i = 0; i < RST_8_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[(RESET_VALUES_8[i][21:8])] <= RESET_VALUES_8[i][ 7:0] ;
            end
        end else if (DEFAULT_PARAMS == 16) begin
            for (i = 0; i < RST16_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[{RESET_VALUES16[i][29:17], endian_h16}] <= RESET_VALUES16[i][ 7:0] ;
                HW_REGS__8bit[{RESET_VALUES16[i][29:17], endian_l16}] <= RESET_VALUES16[i][15:8] ;
            end
        end else if (DEFAULT_PARAMS == 32) begin
            for (i = 0; i < RST32_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_h32}] <= RESET_VALUES32[i][ 7: 0] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_m32}] <= RESET_VALUES32[i][15: 8] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_n32}] <= RESET_VALUES32[i][23:16] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_l32}] <= RESET_VALUES32[i][31:24] ;
            end
        end
    end
    else
    begin
        for (i = 0; i < PORT_CACHE_BITS/8; i = i + 1)  begin
            if (valid_wr && WMASK[i]) begin
                HW_REGS__8bit[( ADDR_IN[HW_REGS_SIZE-1:0] | (i^(PORT_CACHE_BITS/8-1)) )] <= DATA_IN[i*8+:8] ;
            end
        end
    end
    
end

endmodule

I'm not sure that the part marked in red below is working as I'm intending it to:

HW_REGS__8bit[{RESET_VALUES16[29:17], endian_h16}] <= RESET_VALUES16[ 7:0] ;

I'm trying to set the last bit of the HW_REGS__8bit 'address' to 1 or 0, dependent on the endianness selected. endian_h16 is set up as a a localparam with no type specified, so I guess SystemVerilog will treat it as a though it's a 32-bit integer? Am I going to have to be more specific in how I use that value? Something like endian_h16[0] instead?

Quote from: BrianHG on December 12, 2021, 09:27:24 am

The only thing left is an optional 'strobe' output for each 8bit address. This will allow you to move the GPU geometry Z80 output ports to a Z80 write memory address. (There are some weird caveats here when using this feature without the Z80 doing the writing, but we can work out a few work arounds.)

Run this one past me again, I'm not getting the implications?

nockieboy · « **Reply #3052 on:** December 12, 2021, 07:19:04 pm »

Quote from: BrianHG on December 12, 2021, 11:00:06 am

A little typo above. I just had the closing brackets in the wrong place:

Code: [Select]
localparam endian_h16 = (ENDIAN[0] == "B") ? 1 : 0 ; localparam endian_l16 = (ENDIAN[0] == "B") ? 0 : 1 ; localparam endian_h32 = (ENDIAN[0] == "B") ? 3 : 0 ; localparam endian_m32 = (ENDIAN[0] == "B") ? 2 : 1 ; localparam endian_n32 = (ENDIAN[0] == "B") ? 1 : 2 ; localparam endian_l32 = (ENDIAN[0] == "B") ? 0 : 3 ;
My earlier example would probably still function.

No matter, I must have copied that code after you edited it as I had no issues with it.

BrianHG · « **Reply #3053 on:** December 12, 2021, 11:33:28 pm »

How about changing :

Code: [Select]

            for (i = 0; i < RST16_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[RESET_VALUES16[i][29:16]+ endian_l16] <= RESET_VALUES16[i][ 7:0] ;
                HW_REGS__8bit[RESET_VALUES16[i][29:16]+ endian_h16] <= RESET_VALUES16[i][15:8] ;
            end
        end else if (DEFAULT_PARAMS == 32) begin
            for (i = 0; i < RST32_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_h32}] <= RESET_VALUES32[i][ 7: 0] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_m32}] <= RESET_VALUES32[i][15: 8] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_n32}] <= RESET_VALUES32[i][23:16] ;
                HW_REGS__8bit[{RESET_VALUES32[i][45:34], endian_l32}] <= RESET_VALUES32[i][31:24] ;
            end

to:

Code: [Select]

if (RST_8_PARAM_SIZE != 0) begin
            for (i = 1; i <= RST_8_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[(RESET_VALUES_8[i][21:8])] <= RESET_VALUES_8[i][ 7:0] ;
            end
end

if (RST16_PARAM_SIZE != 0) begin
            for (i = 1; i <= RST16_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[RESET_VALUES16[i][29:16]+ endian_l16] <= RESET_VALUES16[i][ 7:0] ;
                HW_REGS__8bit[RESET_VALUES16[i][29:16]+ endian_h16] <= RESET_VALUES16[i][15:8] ;
            end
end

if (RST32_PARAM_SIZE != 0) begin
            for (i = 1; i <= RST32_PARAM_SIZE; i = i + 1) begin
                HW_REGS__8bit[RESET_VALUES32[i][45:32]+ endian_h32] <= RESET_VALUES32[i][ 7: 0] ;
                HW_REGS__8bit[RESET_VALUES32[i][45:32]+ endian_m32] <= RESET_VALUES32[i][15: 8] ;
                HW_REGS__8bit[RESET_VALUES32[i][45:32]+ endian_n32] <= RESET_VALUES32[i][23:16] ;
                HW_REGS__8bit[RESET_VALUES32[i][45:32]+ endian_l32] <= RESET_VALUES32[i][31:24] ;
            end
end

No "else"s as we will have a few 8, 16 and 32 bit reset values.
Also, keep the + endian so any address we enter here will match the Z80 writes.

Quote

I'm not sure that the part marked in red below is working as I'm intending it to:

HW_REGS__8bit[{RESET_VALUES16[29:17], endian_h16}] <= RESET_VALUES16[ 7:0] ;

This is because the 'endian_h16' is seen as an integer of unknown bits, up to 32. there are 2 ways to fix this, 1:

Code: [Select]

localparam bit       endian_h16 = (ENDIAN[0] == "B") ? 1 : 0 ;
localparam bit       endian_l16 = (ENDIAN[0] == "B") ? 0 : 1 ;
localparam bit [1:0] endian_h32 = (ENDIAN[0] == "B") ? 3 : 0 ;
localparam bit [1:0] endian_m32 = (ENDIAN[0] == "B") ? 2 : 1 ;
localparam bit [1:0] endian_n32 = (ENDIAN[0] == "B") ? 1 : 2 ;
localparam bit [1:0] endian_l32 = (ENDIAN[0] == "B") ? 0 : 3 ;

2:
HW_REGS__8bit[{RESET_VALUES16[29:17], (1)'(endian_h16)}] <= RESET_VALUES16[ 7:0] ;

Telling the compiler to trim endian_h16 down to 1 bit.
Whenever you use the braces { }, the compiler needs to know how many bits to stuff together.

BrianHG · « **Reply #3054 on:** December 12, 2021, 11:42:57 pm »

Also:

Code: [Select]

parameter int BASE_WRITE_ADDRESS = 20'h0 , // Where the HW_REGS are held in RAM
Should be:

Code: [Select]

parameter int BASE_WRITE_ADDRESS = 32'h0 , // Where the HW_REGS are held in RAM
Remember, the 'ram' address exceeds your puny Z80 address and you may assign these controls outside the Z80's range if you wish, then once the Z80 sets everything within it's range, you may DMA copy that to where these address may be held.

nockieboy · « **Reply #3055 on:** December 13, 2021, 03:38:46 pm »

Okay, made those changes. Latest file attached. Now, can you expand on this next bit a little? I'm not sure I understand the implications:

Quote from: BrianHG on December 12, 2021, 09:27:24 am

The only thing left is an optional 'strobe' output for each 8bit address. This will allow you to move the GPU geometry Z80 output ports to a Z80 write memory address. (There are some weird caveats here when using this feature without the Z80 doing the writing, but we can work out a few work arounds.)

BrianHG · « **Reply #3056 on:** December 13, 2021, 03:51:05 pm »

Quote from: nockieboy on December 13, 2021, 03:38:46 pm

Okay, made those changes. Latest file attached. Now, can you expand on this next bit a little? I'm not sure I understand the implications:

Quote from: BrianHG on December 12, 2021, 09:27:24 am
The only thing left is an optional 'strobe' output for each 8bit address. This will allow you to move the GPU geometry Z80 output ports to a Z80 write memory address. (There are some weird caveats here when using this feature without the Z80 doing the writing, but we can work out a few work arounds.)

Remember when writing to the Geometry unit's FIFO using a port output?
You have 2 bytes for the 16 bit word, correct?
Now, how does the FIFO know that the 16bit word is ready to be accepted?

The issue stems if you have a piece of logic or function, tied to the HW_REGS, which needs to know you are sending a new data. Imagine I want to send five 8'h00 then five 8'hFF to my serial port transmitter connected to HW_REGS address 15. How will it know that five and five identical bytes are being sent. This case is different than for example having my serial baud rate selector tied to HW_REGS address 16 & 17 which will be whatever it will be at any time once the Z80 writes to it. The baud rate will just control a count-down period in the serial transmitter, but, address 15 may need to transmit hundreds of 8'h00. How will it know how many writes were made to address 15?

BrianHG · « **Reply #3057 on:** December 13, 2021, 04:08:56 pm »

You only want the strobe for the 8 bit address, so, something like this should simplify things:

output logic [(2**HW_REGS_SIZE-1):0] HW_REGS_strobe = 0, // default to '0'

inside a reset:
HW_REGS_strobe <= 0 ;

For now, scrap it as I think of a good way to write out the 1 clock bit set inside the strobe. It's going to be something simple stupid when I code it.

BrianHG · « **Reply #3058 on:** December 15, 2021, 07:06:16 am »

Ok Nockieboy, I need test files for my new multi-layer window video generator system, all in raw binary format. Note that if you need or want anything different, now is the time to speak up...

Tiles/Fonts: (Supports separate 4/8/16/32 widths and heights, 1/2/4/8/16a/16b/32 bpp, up to 16k characters with mirror and flip per character.)

1bpp 8x16, 256 character VGA Font (I already have this one.)
1bpp 8x8, 256 character VGA Font.
1bpp 16x32, up to 256 character Font.
1bpp 32x32, up to 256 character Font.
2bpp 16x32, up to 256 character Font. (4 color.)
4bpp 8x16, up to 256 character Font. (16 color.)
4bpp 16x32, up to 256 character Font. (16 color.)
8bpp 8x16, up to 256 character Font. (256 color.)
16bpp 8x16, up to 256, 16 is enough, character Font. (4x4x4x4 ABGR 4096 true color font with with 16 translucency levels.)
16bpp 8x16, up to 256, 16 is enough, character Font. (5x6x5 BGR 65536 true color font.)
32bpp 8x16, 16 is enough, character Font. (8x8x8x8 ABGR 16.7M true color font with with 256 translucency levels.)

Palette -> Supports 256 entries per layer at 32 bit, in ABGR format: (Yes, make a real palette as it needs to be used for the multicolored sample fonts and images.)

8 bit Alpha translucency - 0 = completely transparent to the layer below, 255= completely opaque.
8 bit Blue intensity.
8 bit Green intensity.
8 bit Red intensity.

Graphics: Supports 1/2/4/8/16a/16b/32 bpp, with window dimensions up to 65535x65535.
(All the horizontal size needs to round down to 32bits, IE 4 bytes)

~2048x1536 1bpp monochrome image.
~1024x768 2bpp 4 color mage. (Make it work with your provided palette)
~1024x768 4bpp 16 color image. (Make it work with your provided palette)
~1024x768 8bpp 256 color image. (Make it work with your provided palette)
~512x512 16bpp stored in 4x4x4x4 ABGR 4096 true color mode. (I suggest first making the 32bpp version, then computing this 16bpp version.)
~512x512 16bpp stored in 5x6x5 BGR 65536 true color mode.
~512x512 32bpp stored in 8x8x8x8 ABGR 16.7M true color mode with with 256 translucency levels.

For the 32bpp, use a paint software to render a foreground with a soft stencil so that it may be overlayed on top of one of the other images you sent with smooth edges.

I'm around half way done with the new display system and I need to fill the DDR3 with graphics to test.

nockieboy · « **Reply #3059 on:** December 15, 2021, 09:32:19 am »

Quote from: BrianHG on December 15, 2021, 07:06:16 am

Ok Nockieboy, I need test files for my new multi-layer window video generator system, all in raw binary format. Note that if you need or want anything different, now is the time to speak up...

Can't think of anything, you've covered all the bases and far more than I'd ever have thought of.

Here's a monochrome test image in 2048x1536x1, RAW format. Let me know if this is suitable - I'm no expert with a paint program, so I need to know I'm saving in the right format etc.

BrianHG · « **Reply #3060 on:** December 15, 2021, 09:47:44 am »

Since it is not a picture of anything, I cannot tell.
However, the final binary file size should be:

1bpp -> 2048*1536/8 = 393216 bytes
2bpp -> 2048*1536/4 = 786432 bytes
4bpp -> 2048*1536/2 = 1572864 bytes
8bpp -> 2048*1536/1 = 3145728 bytes
16bpp -> 2048*1536*2 = 6291456 bytes
32bpp -> 2048*1536*4 = 12582912 bytes

a 1bpp is stored in single bits per pixel:
byte 1 byte 2 byte 3.....
0101010101010100101010....
24 Individual pixels on the screen from left to right, not 3 pixels, or 1 pixel.

*** I need actual pictures, not patterns which may be misinterpreted as a bug or may hide a bug.
*** Also, it is good to have a 1 pixel wide border surrounding the screen

You will need to find or generate your own converter. Or make the image in the GPU and copy the raw binary screen data to a file.

nockieboy · « **Reply #3061 on:** December 15, 2021, 10:48:40 am »

This isn't going to be quick and I'm short on time this week. I've got a lovely picture, right size, 1-pixel border, 1 bpp. Gimp won't save it any smaller than 3MB a RAW/data file.

I'm going to have to dig out that really old copy of Photoshop I've got buried somewhere on the old PC.

Also, regarding the palette creation, is there a particular app or web resource I can use to create what you're after? Gimp allows custom palettes, but the one I'm using is RGB. I don't have time to write a Python script to translate it, especially if there's already options out there in the correct format, or offering the ability to convert between formats.

BrianHG · « **Reply #3062 on:** December 15, 2021, 11:00:47 am »

Well, I'm writing a new multi-layer window system from scratch supporting many functions which will be compatible with both my DDR3 system and then new TAP_xxx port which is looking to be around 3k lines of code. As for getting a picture and font out which is 8/32 bit, you might find a paint program to do that, but, for 16bit 4444 and 565 BGR mode, plus 4/2/1 bit modes, you will have to find tools for 8bit which store the image this way. Especially for the fonts. Though, the current VGA font is already in 1 bit mode, so, you are lucky there, but what about the larger X&Y sizes?

nockieboy · « **Reply #3063 on:** December 15, 2021, 08:33:24 pm »

Could you test this image for me please? It is 2048x1536x1 with a solid single-pixel border all around the edge - it's an image of a parrot in flight. Hopefully.

You'll have to change the attached file's .txt extension to something appropriate like .bin as the good old forum security won't allow me to upload the file with a .bin extension.

BrianHG · « **Reply #3064 on:** December 15, 2021, 11:27:18 pm »

You do realize you could have zipped the file, right?

Anyways, are you asking me to make a program to convert the .bin back to a .bmp. Remember, when I show it on hardware, I might end coding my display hardware to correct for any mistakes in the file you provided me to show it correctly even if the data is in error.

A hint you may use is if you make a 1 bit binary to regular 24bit .bmp picture, if you load our VGA font and say it is 1 bit 8 pixels wide, your resulting .bmp should be a vertical tall entirely displayed font.

Also, I don't remember, but, I believe that my GPU test-bench should be able to load and blit convert it onto a 8bit image, though, I do not remember the maximum X/Y screen coordinates. You might need to re-do it with a res of something like 800x600.

BrianHG · « **Reply #3065 on:** December 16, 2021, 01:21:50 am »

Ok, I've patched the 'GPU_GEO_tb.sv' to support 2048x1536, though, the frame buffer maxes out at 1 megabyte due to the rest of the geometry unit. So, if any file you generate is over 1mb, it will be corrupted. You will just need to lower the image resolution.

I provided the edited the 'GEO_tb_Blitter.txt' to process 3 files so you can see how to make the .bmp's.

You provided image above has the bit orientation backwards, take a look.
I also provided & saved the original VGA 1bit font's and saved it as a .BMP, and as you can see, the letters are forwards. Though, this bug may not be on you as the bit orientation issue isn't a universal standard, it is just the way all the old 8-bit computers ran their 1bit color video modes. IE: Atari 8bit / Amiga / Commodore 64 / Old school VGA, and many others packed each byte the other way around.

Perhaps, I might add a display control which will allow you to select the 'bit-orientation'.
Note that you can only process 1/2/4/8 bit images. The BMP saver will not support 16/32 bit images.

BrianHG · « **Reply #3066 on:** December 16, 2021, 11:03:13 am »

Hi Nockieboy, please approve the new tile layer modes and features:

Code: [Select]

// **********************************************************************************
// Tile selection when using different 'CMD_vid_bpp' modes, 8/16a/32/16b bpp modes.
// * On a tile layer, bpp will actually mean bpc -> Bits Per Character Tile.
// ----------------------------------------------------------------------------------
// FGC  = Foreground color.  Adds this FGC value to any tile pixels whose color data is != 0.
// BGC  = Background color.  Replace tile pixels whose color data = 0 with this BGC value.
// MIR  = Mirror the tile.
// FLIP = Vertically flip the tile.
// ----------------------------------------------------------------------------------
//
//'CMD_vid_bpp' mode:
//
// 8   bpp -> Each byte = 1 character, 0 through 255, no color, mirror or flip functions.
//
//             BGC,  FGC,  Char 0-255.   *** BGC & FGC are multiplied by 16 in this mode.
// 16a bpp -> {4'h0, 4'h0, 8'h00 }                       = 16 bits / 256 possible tiles.
//
//             FLIP, MIR,  Char 0-16383.
// 16b bpp -> {1'b0, 1'b0, 14'h0000 }                    = 16 bits / 16384 possible tiles.
//
//              BGC,   FGC,  FLIP, MIR,  Char 0-16383.
// 32  bpp -> {8'h00, 8'h00, 1'b0, 1'b0, 14'h0000 }      = 32 bits / 16384 possible tiles.
//
//
// Remember, the contents inside a tile set's 'CMD_vid_tile_bpp' can be 1/2/4/8/16a/32/16b bpp.
// The tile set can only be as large as the reserved fixed available FPGA blockram.
// It is possible to have multiple tile layers when using the 'SDI_LAYERS' feature
// where each layer may share or have different tile sets so long as there is enough
// room in the single reserved FPGA blockram.
//
// **********************************************************************************

Note that your GPU will be using 2 for 'SDI_LAYERS' when running 720p60, 1080p30 modes.
If you stick to 480p, you can bump that up to 4 or maybe even 8 layers. Because of FMAX routing on the MAX10 fpga, the 'SDI_LAYERS' can only be set to 1 for 1080p60. (2 for 1080p60, 4 for 1080p30 would be pushing timings into the red, but most likely functional...) The Lattice EPC5 series may allow doubling all the SDI_LAYER figures as their core ram in some scenarios will clear the required 300MHz.

Also note that multiple window modules may run in parallel, in tile or graphics mode. So, running 4 graphics plus 1 tile unit in 480p at 4xSDI_LAYERS means your total layer count will be 4x4 graphics = 16 graphics layers and 4 tile layers for a total of 20 layers. The FPGA blockram exclusively holds the tile sets and individual palettes for each layer or combined palette within each module to save on blockram.

The text data and graphics data for each layer are all stored on DDR3 with a limit of 65535x65535 pixel window size. IE, on a tile layer, with a 16x16 pixel tile set, you can open a window displaying 1048576x1048576 pixels, but do to a lack of DDR3 ram, your limit would be more like 16384x16384, a window display of 262144x262144 pixels.

Is this enough for your Z80?
How about a 68000?
Maybe a 68040, or even 68060 as they are only 75Mhz... I think my DDR3 could handle it with graphics and sound. Especially if we were to design a pcb with 32 or 64 bit DDR3 instead of 16bit.

BrianHG · « **Reply #3067 on:** December 16, 2021, 01:04:57 pm »

**** I need to know the 'BYTE' order your Z80 os uses to store 16bit numbers.

A) With a 16bit int stored at address 0, address 0 = low 8 bits, and address 1 = high 8 bits.

or

B) With a 16bit int stored at address 0, address 0 = high 8 bits, and address 1 = low 8 bits.

So far, I designed the tile system to operate in B) BigEndian.

nockieboy · « **Reply #3068 on:** December 16, 2021, 03:45:20 pm »

Quote from: BrianHG on December 15, 2021, 11:27:18 pm

You do realize you could have zipped the file, right?

Uh, just chalk that up to me doing two jobs and three things at once at the moment.

Quote from: BrianHG on December 15, 2021, 11:27:18 pm

Anyways, are you asking me to make a program to convert the .bin back to a .bmp. Remember, when I show it on hardware, I might end coding my display hardware to correct for any mistakes in the file you provided me to show it correctly even if the data is in error.

Oh jeez no, you've got more than enough on! I was just bouncing off walls yesterday trying to get lots of stuff done and was worried I was unable to do something as basic as create a bitmap.

Fortunately, the image you've produced is exactly what I expected for it to be working properly - yes, the bit order is reversed but I did that accidentally on purpose. Should be able to fix that with the next images I produce now I know the default order the bits are being spat out by the software I'm using.

Quote from: BrianHG on December 16, 2021, 01:21:50 am

Perhaps, I might add a display control which will allow you to select the 'bit-orientation'.

This shouldn't be necessary now I know the bit order of the conversion software I'm using.

Quote from: BrianHG on December 16, 2021, 01:04:57 pm

**** I need to know the 'BYTE' order your Z80 os uses to store 16bit numbers.

A) With a 16bit int stored at address 0, address 0 = low 8 bits, and address 1 = high 8 bits.

or

B) With a 16bit int stored at address 0, address 0 = high 8 bits, and address 1 = low 8 bits.

So far, I designed the tile system to operate in B) BigEndian.

The Z80 is a little-endian system. So A is the order it would store a 16-bit int. Always LSB, then MSB.

BrianHG · « **Reply #3069 on:** December 16, 2021, 04:23:48 pm »

Quote from: nockieboy on December 16, 2021, 03:45:20 pm

Quote from: BrianHG on December 16, 2021, 01:04:57 pm
**** I need to know the 'BYTE' order your Z80 os uses to store 16bit numbers.

A) With a 16bit int stored at address 0, address 0 = low 8 bits, and address 1 = high 8 bits.

or

B) With a 16bit int stored at address 0, address 0 = high 8 bits, and address 1 = low 8 bits.

So far, I designed the tile system to operate in B) BigEndian.

The Z80 is a little-endian system. So A is the order it would store a 16-bit int. Always LSB, then MSB.

Ok, this produces a weird issue with the tile/font addressing when storing the 'text/tile' data.
Since I have modes which can address 16k characters, not just the old fashioned limited 256 where 1 byte = 1 character, and this is in the DDR3, I will need to add an ENDIAN swap option for the line buffer tile mode. Otherwise, when you address the extended character set, you will always need to swap bytes in the Z80.

This is not a problem with the HW_REGs controls since for those, we have 16 & 32 bit outputs where 1 parameter will change the entire GPU control system's address order, however, this does not change the order of the large ints stored in the DDR3 where the display characters are stored.

nockieboy · « **Reply #3070 on:** December 16, 2021, 05:15:01 pm »

Quote from: BrianHG on December 16, 2021, 11:03:13 am

Hi Nockieboy, please approve the new tile layer modes and features:

Code: [Select]
// ********************************************************************************** // Tile selection when using different 'CMD_vid_bpp' modes, 8/16a/32/16b bpp modes. // * On a tile layer, bpp will actually mean bpc -> Bits Per Character Tile. // ---------------------------------------------------------------------------------- // FGC = Foreground color. Adds this FGC value to any tile pixels whose color data is != 0. // BGC = Background color. Replace tile pixels whose color data = 0 with this BGC value. // MIR = Mirror the tile. // FLIP = Vertically flip the tile. // ---------------------------------------------------------------------------------- // //'CMD_vid_bpp' mode: // // 8 bpp -> Each byte = 1 character, 0 through 255, no color, mirror or flip functions. // // BGC, FGC, Char 0-255. *** BGC & FGC are multiplied by 16 in this mode. // 16a bpp -> {4'h0, 4'h0, 8'h00 } = 16 bits / 256 possible tiles. // // FLIP, MIR, Char 0-16383. // 16b bpp -> {1'b0, 1'b0, 14'h0000 } = 16 bits / 16384 possible tiles. // // BGC, FGC, FLIP, MIR, Char 0-16383. // 32 bpp -> {8'h00, 8'h00, 1'b0, 1'b0, 14'h0000 } = 32 bits / 16384 possible tiles. // // // Remember, the contents inside a tile set's 'CMD_vid_tile_bpp' can be 1/2/4/8/16a/32/16b bpp. // The tile set can only be as large as the reserved fixed available FPGA blockram. // It is possible to have multiple tile layers when using the 'SDI_LAYERS' feature // where each layer may share or have different tile sets so long as there is enough // room in the single reserved FPGA blockram. // // **********************************************************************************
Note that your GPU will be using 2 for 'SDI_LAYERS' when running 720p60, 1080p30 modes.
If you stick to 480p, you can bump that up to 4 or maybe even 8 layers. Because of FMAX routing on the MAX10 fpga, the 'SDI_LAYERS' can only be set to 1 for 1080p60. (2 for 1080p60, 4 for 1080p30 would be pushing timings into the red, but most likely functional...) The Lattice EPC5 series may allow doubling all the SDI_LAYER figures as their core ram in some scenarios will clear the required 300MHz.

Also note that multiple window modules may run in parallel, in tile or graphics mode. So, running 4 graphics plus 1 tile unit in 480p at 4xSDI_LAYERS means your total layer count will be 4x4 graphics = 16 graphics layers and 4 tile layers for a total of 20 layers. The FPGA blockram exclusively holds the tile sets and individual palettes for each layer or combined palette within each module to save on blockram.

The text data and graphics data for each layer are all stored on DDR3 with a limit of 65535x65535 pixel window size. IE, on a tile layer, with a 16x16 pixel tile set, you can open a window displaying 1048576x1048576 pixels, but do to a lack of DDR3 ram, your limit would be more like 16384x16384, a window display of 262144x262144 pixels.

Is this enough for your Z80?
How about a 68000?
Maybe a 68040, or even 68060 as they are only 75Mhz... I think my DDR3 could handle it with graphics and sound. Especially if we were to design a pcb with 32 or 64 bit DDR3 instead of 16bit.

Looks good.

Yes, should hopefully be enough for the Z80.

Just have one question - is tile mode necessary if we're going for a full DDR3 implementation? I've gone from tiled to full-graphics mode thanks to the increased space in the DDR3. Are there benefits to using a tiled mode instead of blitting the text characters in a full graphics mode? Have I got the wrong idea and am I missing something blatantly obvious?

BrianHG · « **Reply #3071 on:** December 16, 2021, 05:41:35 pm »

Quote from: nockieboy on December 16, 2021, 05:15:01 pm

Quote from: BrianHG on December 16, 2021, 11:03:13 am
Hi Nockieboy, please approve the new tile layer modes and features:

Code: [Select]
// ********************************************************************************** // Tile selection when using different 'CMD_vid_bpp' modes, 8/16a/32/16b bpp modes. // * On a tile layer, bpp will actually mean bpc -> Bits Per Character Tile. // ---------------------------------------------------------------------------------- // FGC = Foreground color. Adds this FGC value to any tile pixels whose color data is != 0. // BGC = Background color. Replace tile pixels whose color data = 0 with this BGC value. // MIR = Mirror the tile. // FLIP = Vertically flip the tile. // ---------------------------------------------------------------------------------- // //'CMD_vid_bpp' mode: // // 8 bpp -> Each byte = 1 character, 0 through 255, no color, mirror or flip functions. // // BGC, FGC, Char 0-255. *** BGC & FGC are multiplied by 16 in this mode. // 16a bpp -> {4'h0, 4'h0, 8'h00 } = 16 bits / 256 possible tiles. // // FLIP, MIR, Char 0-16383. // 16b bpp -> {1'b0, 1'b0, 14'h0000 } = 16 bits / 16384 possible tiles. // // BGC, FGC, FLIP, MIR, Char 0-16383. // 32 bpp -> {8'h00, 8'h00, 1'b0, 1'b0, 14'h0000 } = 32 bits / 16384 possible tiles. // // // Remember, the contents inside a tile set's 'CMD_vid_tile_bpp' can be 1/2/4/8/16a/32/16b bpp. // The tile set can only be as large as the reserved fixed available FPGA blockram. // It is possible to have multiple tile layers when using the 'SDI_LAYERS' feature // where each layer may share or have different tile sets so long as there is enough // room in the single reserved FPGA blockram. // // **********************************************************************************
Note that your GPU will be using 2 for 'SDI_LAYERS' when running 720p60, 1080p30 modes.
If you stick to 480p, you can bump that up to 4 or maybe even 8 layers. Because of FMAX routing on the MAX10 fpga, the 'SDI_LAYERS' can only be set to 1 for 1080p60. (2 for 1080p60, 4 for 1080p30 would be pushing timings into the red, but most likely functional...) The Lattice EPC5 series may allow doubling all the SDI_LAYER figures as their core ram in some scenarios will clear the required 300MHz.

Also note that multiple window modules may run in parallel, in tile or graphics mode. So, running 4 graphics plus 1 tile unit in 480p at 4xSDI_LAYERS means your total layer count will be 4x4 graphics = 16 graphics layers and 4 tile layers for a total of 20 layers. The FPGA blockram exclusively holds the tile sets and individual palettes for each layer or combined palette within each module to save on blockram.

The text data and graphics data for each layer are all stored on DDR3 with a limit of 65535x65535 pixel window size. IE, on a tile layer, with a 16x16 pixel tile set, you can open a window displaying 1048576x1048576 pixels, but do to a lack of DDR3 ram, your limit would be more like 16384x16384, a window display of 262144x262144 pixels.

Is this enough for your Z80?
How about a 68000?
Maybe a 68040, or even 68060 as they are only 75Mhz... I think my DDR3 could handle it with graphics and sound. Especially if we were to design a pcb with 32 or 64 bit DDR3 instead of 16bit.

Looks good. Yes, should hopefully be enough for the Z80. Just have one question - is tile mode necessary if we're going for a full DDR3 implementation? I've gone from tiled to full-graphics mode thanks to the increased space in the DDR3. Are there benefits to using a tiled mode instead of blitting the text characters in a full graphics mode? Have I got the wrong idea and am I missing something blatantly obvious?

Well, every 4 bytes can equal an 8x8 or 16x16 256 color character. That's editing 4 bytes to change 64 or 256 bytes of pixels on the screen. Now if you fill the screen with a repetitive pattern or tiles, and now edit/paint new contents in those tiles. Only drawing into 4 or 8 16x16 tiles will completely fill the screen with the animated graphics instead of re-drawing a megabytes of blits to repaint that full-screen sized animated backdrop.

And again, super huge X/Y scrolling levels can be constructed with a fraction the storage if they are painted in tiles.

What else do you want me to say. Imagine how many Super Nintendo and Amiga games were made, but blow up the memory content and available colors & palettes many fold.

Each open 1080p 256/65536 color screens eats a lot of DDR3 bandwidth just to maintain the picture, 16.7m color screen even more. A tile/text layer typically uses 1/16^th the DDR3 bandwidth and 1/16^th the memory for the same colors and screen size making it 16x faster fill/erase, load & blit data to and from.

Even with accelerated graphics, this is not a NVIDIA RTX card and not a 64bit 5GHz cpu...

asmi · « **Reply #3072 on:** December 16, 2021, 06:04:07 pm »

Quote from: BrianHG on December 16, 2021, 01:04:57 pm

So far, I designed the tile system to operate in B) BigEndian.

Pretty much all modern systems are little endian, and even those which can be configured either way (like some ARM CPUs) are typically ran in little endian mode.

asmi · « **Reply #3073 on:** December 16, 2021, 06:26:07 pm »

Quote from: BrianHG on December 16, 2021, 05:41:35 pm

Each open 1080p 256/65536 color screens eats a lot of DDR3 bandwidth just to maintain the picture, 16.7m color screen even more. A tile/text layer typically uses 1/16^th the DDR3 bandwidth and 1/16^th the memory for the same colors and screen size making it 16x faster fill/erase, load & blit data to and from.

Most modern cards actually support floating point colors, meaning each of R, G and B color will take 4 bytes - this is often used for post-processing like tone mapping. Also they often trade speed for memory efficiency, so for example for FullHD resolution(1920x1080) they actually use lines which are 2048 pixel long, as this makes the address math easier and allow for course alignment - remember than most video cards have very wide memory bus (256 bit is pretty usual, but there are wider buses out there), so they read/write in big chunks. Also just about all of them now even in integer RGB mode use 32bit pixels - again to make processing them easier. Modern video memory has a very large access latency, so they need to have a very regular access patterns because they have to know well in advance what data will they require - in order for it to be fetched in time for the processing cores to work on. Another way they deal with such large latency is through cycle-level parallelism, when processing cores send a request to memory (for example for texture fetch), and then switch to some other task while that request is being executed, and switch back once request is completed and data has arrived. The reason it works is that there are literally millions of pixels in a single frame, so at any point in time you will have pixels which have the data ready for processing, and so those cores can be occupied pretty much all the time without any stalls. There is a dedicated hardware on a video chip which controls all of that scheduling so that processing cores are utilized as much as possible to the point that each clock cycle they can end up processing different pixel.

nockieboy · « **Reply #3074 on:** December 16, 2021, 06:52:00 pm »

Quote from: BrianHG on December 16, 2021, 05:41:35 pm

Quote from: nockieboy on December 16, 2021, 05:15:01 pm
Looks good. Yes, should hopefully be enough for the Z80. Just have one question - is tile mode necessary if we're going for a full DDR3 implementation? I've gone from tiled to full-graphics mode thanks to the increased space in the DDR3. Are there benefits to using a tiled mode instead of blitting the text characters in a full graphics mode? Have I got the wrong idea and am I missing something blatantly obvious?

Well, every 4 bytes can equal an 8x8 or 16x16 256 color character. That's editing 4 bytes to change 64 or 256 bytes of pixels on the screen. Now if you fill the screen with a repetitive pattern or tiles, and now edit/paint new contents in those tiles. Only drawing into 4 or 8 16x16 tiles will completely fill the screen with the animated graphics instead of re-drawing a megabytes of blits to repaint that full-screen sized animated backdrop.

And again, super huge X/Y scrolling levels can be constructed with a fraction the storage if they are painted in tiles.

What else do you want me to say. Imagine how many Super Nintendo and Amiga games were made, but blow up the memory content and available colors & palettes many fold.

Each open 1080p 256/65536 color screens eats a lot of DDR3 bandwidth just to maintain the picture, 16.7m color screen even more. A tile/text layer typically uses 1/16^th the DDR3 bandwidth and 1/16^th the memory for the same colors and screen size making it 16x faster fill/erase, load & blit data to and from.

Ah okay, I'm with you. I was just thinking of text when I though of tile mode - clearly I was missing the whole tiled-graphics display possibilities.

Quote from: BrianHG on December 16, 2021, 05:41:35 pm

Even with accelerated graphics, this is not a NVIDIA RTX card and not a 64bit 5GHz cpu...

I wonder how long it'll be before we (hobbyists) have access to that level of power? I mean, the (low-end) FPGA's we're talking about and using provide us with power and versatility unheard of back in the 80's, even on dedicated hardware.

Quote from: asmi on December 16, 2021, 06:26:07 pm

Most modern cards actually support floating point colors, meaning each of R, G and B color will take 4 bytes - this is often used for post-processing like tone mapping. Also they often trade speed for memory efficiency, so for example for FullHD resolution(1920x1080) they actually use lines which are 2048 pixel long, as this makes the address math easier and allow for course alignment - remember than most video cards have very wide memory bus (256 bit is pretty usual, but there are wider buses out there), so they read/write in big chunks. Also just about all of them now even in integer RGB mode use 32bit pixels - again to make processing them easier. Modern video memory has a very large access latency, so they need to have a very regular access patterns because they have to know well in advance what data will they require - in order for it to be fetched in time for the processing cores to work on. Another way they deal with such large latency is through cycle-level parallelism, when processing cores send a request to memory (for example for texture fetch), and then switch to some other task while that request is being executed, and switch back once request is completed and data has arrived. The reason it works is that there are literally millions of pixels in a single frame, so at any point in time you will have pixels which have the data ready for processing, and so those cores can be occupied pretty much all the time without any stalls. There is a dedicated hardware on a video chip which controls all of that scheduling so that processing cores are utilized as much as possible to the point that each clock cycle they can end up processing different pixel.

And I thought what BrianHG was doing with the DDR3 was complicated. My question is, is the access latency literally "very large", or does it just appear that way because the GPU is running at such high clock speeds that it's hitting the physical limits of DDR memory cell capacitor charge/discharge times? Would be cool if they could come up with a quicker memory technology. Isn't SRAM faster, but a lot more expensive, or did I imagine that?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 426441 times)

Share me