Author Topic: FPGA VGA Controller for 8-bit computer  (Read 426374 times)

0 Members and 3 Guests are viewing this topic.

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #300 on: November 11, 2019, 10:51:26 am »
After initiating the "gpu_dual_port_ram_INTEL gpu_RAM(.....);", you need the:
---------------------------
          defparam
                           gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS ;
---------------------------
     This will pass the module multiport_gpu_ram's MAX_ADDR_BITS parameter into the gpu_dual_port_ram_INTEL's MAX_ADDR_BITS parameter.  It may be useful to pass the 'altsyncram_component.numwords_a&b' since it may be possible to allocate 24kb in the FPGA since it has that much memory, yet not 32kb.

Okay, stupid question - altsyncram specifies altsyncram_component.numwords_a (and b) - I had 2 ** MAXSIZE in there, but if they're the number of words, I'll need to divide that by word size (8), otherwise the RAM will (try to be) 8 times larger than what I think I'm specifying?

So, for example, this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = 2 ** MAX_ADDR_BITS;


..needs to be this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = (2 ** MAX_ADDR_BITS) / 8;


??
Nope, the first one not the divide by 8.
Quote

   // address pass-thru bus (output)
   output reg [19:0] addr_out,   There are 5 of these to match the read address ins 0 through 5 in.

   // auxilliary read command buses (input)
   input [7:0] aux_read_0,
   input [7:0] aux_read_1,
   input [7:0] aux_read_2,
   input [7:0] aux_read_3,
   input [7:0] aux_read_4,
 change all these to cmd_in[15:0].  (global search and replace)

   // auxilliary read command buses (pass-thru output)
   output reg [7:0] auxRdPT_0,
   output reg [7:0] auxRdPT_1,
   output reg [7:0] auxRdPT_2,
   output reg [7:0] auxRdPT_3,
   output reg [7:0] auxRdPT_4,
change these to cmd_out[15:0]

reg [MAX_ADDR_BITS - 1:0] address_mux; 
change to reg [19:0] address_mux; 

reg [7:0] aux_read_mux;
change to reg [15:0] cmd_read_mux  (global search and replace)

These should all be present and correct now... I think.  Got a little confused earlier with all the changes, so I'll be double-checking it all, but I think it would benefit from a close look.

Your missing a few of the new ports for 'gpu_dual_port_ram_INTEL gpu_RAM(...);'

They should all be present and correct now.  :D

Almost done, next you will resort the read ram contents, the piped through address & cmds into their output registers and sync those to your new delayed 'pc_ena_out[3:0]' coming out of the Intel ram module.

Have made a bit of a start on this - the 5:1 mux code is modified according to my present understanding.  The read address is passed through to the ram module, the pass-through address is passed out to the appropriate address bus according to the current mux step, as is the data read from memory.

I'm a little unsure about the command bus, though.  It's piped into the memory via cmd_read_mux, but that seems like an unnecessary step as I only have one cmd_in bus (and one cmd_out bus) - should these be increased to 5 as well?  It's possible I've misunderstood your instruction to 'change all these to cmd_in[15:0]'...  ???
The 1 command bus is inside the INTEL dual port ram module.  Just like the read addresses, it should be piped through in a single file fashion.
On the multiport GPU ram, there should be 5 groups going in, grouped with the 5 read addresses going in, and 5 grouped 16 bit cmd coming out, just like the 5 read datas, 5 read addresses, 5 cmd_outs, all in parallel...
 
Quote

Note that we forgot to wire through the 'pc_ena_out[3:0]' coming out of the Intel ram module thought to the multiport_gpu_ram ( ...) ports, so that the rest of our graphics pipe heading to the output pins will incorporate the delay shift generated by the memory.  (Though we can work around this through sophisticated re-syncing all the ram outputs back to the next pc_ena_in==0 cycle, this ena signal in the FPGA is beginning to drive so much logic limiting our FMAX, this is an opportune point to D-clock pipe the signals for the second half of our graphics pipe.)

Okay, I think I understand - but pc_ena passes through the gpu_dual_port_ram_INTEL module via a register pipe, which will fulfil the need to D-clock the signal, right?
Pipe it just like a read address and the auxiliary 16 bit cmd, delayed by two 125MHz clocks.  The difference is when it comes back through the GPU multiport ram module, there it is not muxed, it's just wired through without delay.
Quote


gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [3:0] pc_ena_in,
input clk_b,
input wr_en_b,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
input [15:0] cmd_in,

// registered outputs
output reg [19:0] addr_out_a,
output reg [3:0] pc_ena_out,
output reg [15:0] cmd_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit
parameter ADDR_SIZE = 14;   **********************************************************

// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on ADDR_SIZE if not otherwise specified
parameter NUM_WORDS = 2 ** ADDR_SIZE;   **********************************************************

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[ADDR_SIZE-1:0]),   ***************************************************************
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[ADDR_SIZE-1:0]),   ****************************************************************************
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = NUM_WORDS,
altsyncram_component.numwords_b = NUM_WORDS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = ADDR_SIZE,  ********************************************************************
altsyncram_component.widthad_b = ADDR_SIZE,  *********************************************************************
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe <= addr_a;
addr_out_a <= rd_addr_pipe;

cmd_pipe <= cmd_in;
cmd_out <= cmd_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule


multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena_in, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [15:0] cmd_in,

// outputs
output wire [3:0] pc_ena_out,

// address pass-thru bus (output)
output reg [19:0] addr_passthru_0,
output reg [19:0] addr_passthru_1,
output reg [19:0] addr_passthru_2,
output reg [19:0] addr_passthru_3,
output reg [19:0] addr_passthru_4,
output reg [19:0] addr_host_passthru,

// auxilliary read command bus (pass-thru output)
output reg [15:0] cmd_out,  *************************************  NEED 5x cmd_out0/1/2/3/4 and we also need 5x cmd_in#

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bits - effectively the RAM size
parameter ADDR_SIZE = 14;                 *******************************************
parameter NUM_WORDS = 2 ** ADDR_SIZE ;                 *******************************************

reg [19:0] address_mux;
reg [15:0] cmd_read_mux;
wire [19:0] addr_passthru_mux;
wire [7:0] data_mux;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(address_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_read_mux),
.addr_out_a(addr_passthru_mux),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_out),
.data_out_a(data_mux),
.data_out_b()
);
// pass MAX_ADDR_BITS into the gpu_RAM instance
defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE,    *************************************************************************
                 gpu_RAM.NUM_WORDS = NUM_WORDS ;  // **************  Actual word count

always @(posedge clk) begin

// route non-muxed pass-throughs
cmd_read_mux <= cmd_in;

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
address_mux <= address_0;
addr_passthru_0 <= addr_passthru_mux;
dataOUT_0 <= data_mux;
end
3'b001 : begin
address_mux <= address_1;
addr_passthru_1 <= addr_passthru_mux;
dataOUT_1 <= data_mux;
end
3'b011 : begin
address_mux <= address_2;
addr_passthru_2 <= addr_passthru_mux;
dataOUT_2 <= data_mux;
end
3'b100 : begin
address_mux <= address_3;
addr_passthru_3 <= addr_passthru_mux;
dataOUT_3 <= data_mux;
end
3'b101 : begin
address_mux <= address_4;
addr_passthru_4 <= addr_passthru_mux;
dataOUT_4 <= data_mux;
end
endcase

end // always @clk

endmodule


Read all my ********************************************** in the 2 codes above

To make 1 thing clear, I changes the 'MAX_ADDR_BIT' to 'ADDR_SIZE'.  So, 14 = 14 address lines = [13:0]...
Studying the settings you setup in the Megawizard, and analyzing the example dualp...v file it generated should confirm this.
Same for 'WORDS', I changed it to 'NUM_WORDS'.

Check all the new ***************************** as there were one of 2 other items...


Next, re-assemble all the outputs of the INTEL dualport ram into 5 addresses, 5 datas, 5 cmds.
Helpful hint:
Since we want all the 5 outputs to parallel appear, each with the write contents when the input (pc_ena[2:0] == 0), and you have a bunch of delays through this module where you can easily loose count of clocks cycles, especially if you need to make your mux take 2 or 3 clocks instead of 1 to help improve FMAX, make these local params and I'll leave it up to you to figure out how to implement them:

localparam   CLK_CYCLES_MUX  = 1;  // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam   CLK_CYCLES_RAM  = 2;  // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.

Im not sure of this one, we will need to send this parameter back to the OSD generator so it know how many pixels to delay the H&V ena, and OSD ena to align the picture.
localparam   CLK_CYCLES_PIXEL  ??? = ;  // Adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready.
« Last Edit: November 11, 2019, 11:13:52 am by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #301 on: November 11, 2019, 11:17:02 am »
You can only have 1 'defparam' after each module is initiated.  To set multiple parameters, you use the " , " at the end of each parameter and the ' ; ' at the end of the final setting.


When you pass multiple parameters to a sub module, you type it like this:
--------------------------------------------------------------------
// pass MAX_ADDR_BITS into the gpu_RAM instance
defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE, 
                 gpu_RAM.NUM_WORDS = NUM_WORDS ; 
-------------------------------------------------------------------
« Last Edit: November 11, 2019, 11:19:31 am by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #302 on: November 11, 2019, 11:39:18 am »
The 1 command bus is inside the INTEL dual port ram module.  Just like the read addresses, it should be piped through in a single file fashion.
On the multiport GPU ram, there should be 5 groups going in, grouped with the 5 read addresses going in, and 5 grouped 16 bit cmd coming out, just like the 5 read datas, 5 read addresses, 5 cmd_outs, all in parallel...

All done - I've renamed some of the buses as well to make it clearer what's going on with all the pass-throughs etc.


Pipe it just like a read address and the auxiliary 16 bit cmd, delayed by two 125MHz clocks.  The difference is when it comes back through the GPU multiport ram module, there it is not muxed, it's just wired through without delay.

Sorted.  pc_ena is treated the same way as the other delayed signals in the memory module, but passed straight to the output in multiport_gpu_ram.

Read all my ********************************************** in the 2 codes above

Thanks - have updated the code accordingly.

Next, re-assemble all the outputs of the INTEL dualport ram into 5 addresses, 5 datas, 5 cmds.
Helpful hint:
Since we want all the 5 outputs to parallel appear, each with the write contents when the input (pc_ena[2:0] == 0)...

Ooookay... so all five outputs should be valid when pc_ena[2:0] == 0?  At the moment, addr_out_0, cmd_out_0, data_out_0 will all be valid two or three clock cycles (at least) before addr_out_1, cmd_out_1 and data_out_1, etc. with the delays compounding up to the 5th set of outputs?  Not to mention pc_ena needing to be delayed as well until the 5th outputs of the mux are ready?

Would it work to just route the results of the first 4 mux cycles into registers and then assign all 5 sets of results to the outputs at the end of the 5th mux cycle?  Am I even understanding the issue?

Currently, the mux code is just putting the results onto the multiport outputs as soon as they come in.

...and you have a bunch of delays through this module where you can easily loose count of clocks cycles, especially if you need to make your mux take 2 or 3 clocks instead of 1 to help improve FMAX, make these local params and I'll leave it up to you to figure out how to implement them:

localparam   CLK_CYCLES_MUX  = 1;  // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam   CLK_CYCLES_RAM  = 2;  // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.

Im not sure of this one, we will need to send this parameter back to the OSD generator so it know how many pixels to delay the H&V ena, and OSD ena to align the picture.
localparam   CLK_CYCLES_PIXEL  ??? = ;  // Adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready.

localparams added.  CLK_CYCLES_PIXEL (CLK_CYCLES_PIX in code) needs to be added to the OSD generator code then?

gpu_dual_port_ram_INTEL.v:
Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [3:0] pc_ena_in,
input clk_b,
input wr_en_b,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
input [15:0] cmd_in,

// registered outputs
output reg [19:0] addr_out_a,
output reg [3:0] pc_ena_out,
output reg [15:0] cmd_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit
parameter ADDR_SIZE = 14;

// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on ADDR_SIZE if not otherwise specified
parameter NUM_WORDS = 2 ** ADDR_SIZE;

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - ADDR_SIZE wide (14 bits default)
// Memory word size - NUM_WORDS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[ADDR_SIZE:0]),
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[ADDR_SIZE:0]),
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = NUM_WORDS,
altsyncram_component.numwords_b = NUM_WORDS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = ADDR_SIZE - 1,
altsyncram_component.widthad_b = ADDR_SIZE - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe <= addr_a;
addr_out_a <= rd_addr_pipe;

cmd_pipe <= cmd_in;
cmd_out <= cmd_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule


multiport_gpu_ram.v:
Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena_in, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] addr_in_0,
input [19:0] addr_in_1,
input [19:0] addr_in_2,
input [19:0] addr_in_3,
input [19:0] addr_in_4,
input [19:0] addr_host_in,

// auxilliary read command buses (input)
input [15:0] cmd_in_0,
input [15:0] cmd_in_1,
input [15:0] cmd_in_2,
input [15:0] cmd_in_3,
input [15:0] cmd_in_4,

// outputs
output wire [3:0] pc_ena_out,

// address pass-thru bus (output)
output reg [19:0] addr_out_0,
output reg [19:0] addr_out_1,
output reg [19:0] addr_out_2,
output reg [19:0] addr_out_3,
output reg [19:0] addr_out_4,
output reg [19:0] addr_host_out,

// auxilliary read command bus (pass-thru output)
output reg [15:0] cmd_out_0,
output reg [15:0] cmd_out_1,
output reg [15:0] cmd_out_2,
output reg [15:0] cmd_out_3,
output reg [15:0] cmd_out_4,

// data buses (output)
output reg [7:0] data_out_0,
output reg [7:0] data_out_1,
output reg [7:0] data_out_2,
output reg [7:0] data_out_3,
output reg [7:0] data_out_4,
output [7:0] data_host_out

);

// dual-port GPU RAM handler

// define the maximum address bits and number of words - effectively the RAM size
parameter ADDR_SIZE = 14;
parameter NUM_WORDS = 2 ** ADDR_SIZE;

localparam CLK_CYCLES_MUX = 1; // adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2; // adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5; // adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready

reg [19:0] addr_in_mux;
reg [15:0] cmd_mux_in;
reg [15:0] cmd_mux_out;
wire [19:0] addr_out_mux;
wire [7:0] data_mux_out;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(addr_in_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_mux_in),
.addr_out_a(addr_out_mux),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_mux_out),
.data_out_a(data_mux_out),
.data_out_b()
);

defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE, // pass ADDR_SIZE into the gpu_RAM instance
gpu_RAM.NUM_WORDS = NUM_WORDS; // set non-default word size for the RAM (16 KB)

always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
addr_in_mux <= addr_in_0;
cmd_mux_in <= cmd_in_0;
addr_out_0 <= addr_out_mux;
cmd_out_0 <= cmd_mux_out;
data_out_0 <= data_mux_out;
end
3'b001 : begin
addr_in_mux <= addr_in_1;
cmd_mux_in <= cmd_in_1;
addr_out_1 <= addr_out_mux;
cmd_out_1 <= cmd_mux_out;
data_out_1 <= data_mux_out;
end
3'b011 : begin
addr_in_mux <= addr_in_2;
cmd_mux_in <= cmd_in_2;
addr_out_2 <= addr_out_mux;
cmd_out_2 <= cmd_mux_out;
data_out_2 <= data_mux_out;
end
3'b100 : begin
addr_in_mux <= addr_in_3;
cmd_mux_in <= cmd_in_3;
addr_out_3 <= addr_out_mux;
cmd_out_3 <= cmd_mux_out;
data_out_3 <= data_mux_out;
end
3'b101 : begin
addr_in_mux <= addr_in_4;
cmd_mux_in <= cmd_in_4;
addr_out_4 <= addr_out_mux;
cmd_out_4 <= cmd_mux_out;
data_out_4 <= data_mux_out;
end
endcase

end // always @clk

endmodule

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #303 on: November 11, 2019, 12:04:28 pm »
Ooookay... so all five outputs should be valid when pc_ena[2:0] == 0?  At the moment, addr_out_0, cmd_out_0, data_out_0 will all be valid two or three clock cycles (at least) before addr_out_1, cmd_out_1 and data_out_1, etc. with the delays compounding up to the 5th set of outputs?  Not to mention pc_ena needing to be delayed as well until the 5th outputs of the mux are ready?

Would it work to just route the results of the first 4 mux cycles into registers and then assign all 5 sets of results to the outputs at the end of the 5th mux cycle?  Am I even understanding the issue?

Currently, the mux code is just putting the results onto the multiport outputs as soon as they come in.

Ok, here lies the trick/headache (at least until you figure out how to do it).
Yes, the way you have it written, the outputs are scrambled into the wrong demuxed ports.

Also, remember, I said I want all the demuxed outputs from the gpu_ram module to all become properly valid during the next valid pixel clock (pc_ena[3:0]==0) time slot, and, to hold their contents and all switch during the next valid pixel clock once again.

As an aid, I would move all the :

            addr_out_# <= addr_out_mux;
            cmd_out_# <= cmd_mux_out;
            data_out_# <= data_mux_out;

Outside the case statement.

Here is the hint#1 :
------------------------------------
localparam   CLK_CYCLES_MUX  = 1;  // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam   CLK_CYCLES_RAM  = 2;  // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.
-------------------------------------
With additional variable sized piped regs will be used to generate the desired output.


Here is hint #2:   (make a buss version of this single bit/wire example)
-----------------------
bla_pipe[0]   <= bla_in;
bla_pipe[7:1] <= bla_pipe[6:0];
out                  <= bla_pipe[PIPE_DELAY-2];
-----------------------------------


« Last Edit: November 11, 2019, 12:08:28 pm by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #304 on: November 11, 2019, 12:27:23 pm »
Ok, here lies the trick/headache (at least until you figure out how to do it).
Yes, the way you have it written, the outputs are scrambled into the wrong demuxed ports.

Also, remember, I said I want all the demuxed outputs from the gpu_ram module to all become properly valid during the next valid pixel clock (pc_ena[3:0]==0) time slot, and, to hold their contents and all switch during the next valid pixel clock once again.

Right, so I'm now assigning the data returned from the ram module into a register that will hold that data, for each section of the mux.  Outside of the case statement, I've got a check for pc_ena[3:0] == 0 - when that condition is satisfied, all five data paths from the ram module are passed to the multiport's output IOs:

Code: [Select]
// declare registers to hold data until pc_ena[3:0] == 0 and
// it can be passed to the output IOs
reg [19:0] addr_buf_out_0,
addr_buf_out_1,
addr_buf_out_2,
addr_buf_out_3,
addr_buf_out_4;

reg [15:0] cmd_buf_out_0,
cmd_buf_out_1,
cmd_buf_out_2,
cmd_buf_out_3,
cmd_buf_out_4;

reg [7:0] data_buf_out_0,
data_buf_out_1,
data_buf_out_2,
data_buf_out_3,
data_buf_out_4;

always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
addr_in_mux <= addr_in_0;
cmd_mux_in <= cmd_in_0;
addr_buf_out_0 <= addr_mux_out;
cmd_buf_out_0 <= cmd_mux_out;
data_buf_out_0 <= data_mux_out;
end
3'b001 : begin
addr_in_mux <= addr_in_1;
cmd_mux_in <= cmd_in_1;
addr_buf_out_1 <= addr_mux_out;
cmd_buf_out_1 <= cmd_mux_out;
data_buf_out_1 <= data_mux_out;
end
3'b011 : begin
addr_in_mux <= addr_in_2;
cmd_mux_in <= cmd_in_2;
addr_buf_out_2 <= addr_mux_out;
cmd_buf_out_2 <= cmd_mux_out;
data_buf_out_2 <= data_mux_out;
end
3'b100 : begin
addr_in_mux <= addr_in_3;
cmd_mux_in <= cmd_in_3;
addr_buf_out_3 <= addr_mux_out;
cmd_buf_out_3 <= cmd_mux_out;
data_buf_out_3 <= data_mux_out;
end
3'b101 : begin
addr_in_mux <= addr_in_4;
cmd_mux_in <= cmd_in_4;
addr_buf_out_4 <= addr_mux_out;
cmd_buf_out_4 <= cmd_mux_out;
data_buf_out_4 <= data_mux_out;
end
endcase

if (pc_ena[3:0] == 0)
begin
addr_out_0 <= addr_buf_out_0;
cmd_out_0 <= cmd_buf_out_0;
data_out_0 <= data_buf_out_0;

addr_out_1 <= addr_buf_out_1;
cmd_out_1 <= cmd_buf_out_1;
data_out_1 <= data_buf_out_1;

addr_out_2 <= addr_buf_out_2;
cmd_out_2 <= cmd_buf_out_2;
data_out_2 <= data_buf_out_2;

addr_out_3 <= addr_buf_out_3;
cmd_out_3 <= cmd_buf_out_3;
data_out_3 <= data_buf_out_3;

addr_out_4 <= addr_buf_out_4;
cmd_out_4 <= cmd_buf_out_4;
data_out_4 <= data_buf_out_4;
end

end // always @clk

Is that going down the right road?  Obviously I haven't given any consideration to pipe delays yet...

Here is the hint#1 :
------------------------------------
localparam   CLK_CYCLES_MUX  = 1;  // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam   CLK_CYCLES_RAM  = 2;  // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.
-------------------------------------
With additional variable sized piped regs will be used to generate the desired output.


Here is hint #2:   (make a buss version of this single bit/wire example)
-----------------------
bla_pipe[0]   <= bla_in;
bla_pipe[7:1] <= bla_pipe[6:0];
out                  <= bla_pipe[PIPE_DELAY-2];
-----------------------------------

Just run this part by me once more - why do I need more delays?  If what I've done above is correct, the results are held in registers until pc_ena[3:0] == 0, where they all get passed out to the output IOs.  What are the additional delay pipes for?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #305 on: November 11, 2019, 01:11:55 pm »
Ok, not quite.  Your outputs would still be scrambled, not only that, but, now, some of your outputs will be ahead by 1 pixel and the rest will have the correct pixel.

What you have setup has no configuration/adjustment capabilities.

Currently, at pc_ena 0, you may be switching to addr0, but the memory is seeing the previous addr4, while what is coming out of the ram is the previous addr2. but, you are currently snapping that into addr_buf0/data_buf0, and then feeding the previous  holdings of every addr_buf#/data_buf# into addr_out#/data_out#.

At pc_ena 1, you may be switching to addr1, but the memory is seeing the previous addr0, while what is coming out of the ram is the previous addr3. and are currently snapping that into addr_buf1/data_buf1, though the addr_out#/data_out# all hold their new state from when pc_ena0 phase has updated them.

Now, you can see the headache here.  Say you weed out everything as set the delays fixed and you code appears to work good.  What happens after adding some features, you might need to expand your mux for speed by making it collapse in 3 clocks stages instead of the current 1 clock.  What will sorting out the mess look like?

Step 1: Get rid of any and all demuxing in the case statement.
Step 2: Make a easily variable clock delay pipe tap coming out of the gpu_ram.

Example:
reg [9*8+7:0]  data_pipe;   // make a large enough register to store 10 words (words 0 through 9).  In the case of ram data, the width of each word is 8 bits.

Now, inside the always @(posedge clk) place

data_pipe[7:0]                <= data_mux_out[7:0];         // fill the first 8 bit word in the register pipe
data_pipe[9*8+7:1*8]  <= data_pipe[8*8+7:0*8];    // shift over the next 9 words in this 10 word 8 bit wide pipe


if (pc_ena[3:0] == 0)
   begin
   data_out_0 <= data_pipe[MUX_0_POS*8+7:MUX_0_POS*8];
   data_out_1 <= data_pipe[MUX_1_POS*8+7:MUX_1_POS*8];
   data_out_2 <= data_pipe[MUX_2_POS*8+7:MUX_2_POS*8];
   data_out_3 <= data_pipe[MUX_3_POS*8+7:MUX_3_POS*8];
   data_out_4 <= data_pipe[MUX_4_POS*8+7:MUX_4_POS*8];

....
end

Now, the key is to workout the values for the 5 'MUX_#_POS' which are a fixed value based of the delays incurred by both   the  CLK_CYCLES_MUX  = 1 and CLK_CYCLES_RAM  = 2, and the tricky one, where each piece of data is in the pipe when the next  pc_ena==0 comes around so that all 5 output regs take the correct pipe position.

Resolve MUX_#_POS using the 2 CLK_CYCLES parameters with the knowledge that pc_ena[] counts from 0-4, and in the future if you may any changes which add additional clocks in the preparation for the memory data, or different clock cycles in the FPGA memory core, or if you wire external static memory, just adjusting the 'CLK_CYCLES' parameters will allow your design to continue to function without re-doing all the individual written hardwired logic.

When calculating the MUX_#_POS may also mean adding a full additional pixel delay if one or more pixels comes in earlier than the rest, or if 1 pixel comes in later than the rest.
« Last Edit: November 11, 2019, 01:50:20 pm by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #306 on: November 11, 2019, 01:57:37 pm »
Ok, not quite.  Your outputs would still be scrambled, not only that, but, now, some of your outputs will be ahead by 1 pixel and the rest will have the correct pixel.

What you have setup has no configuration/adjustment capabilities.

Currently, at pc_ena 0, you may be switching to addr0, but the memory is seeing the previous addr4, while what is coming out of the ram is the previous addr2. but, you are currently snapping that into addr_buf0/data_buf0, and then feeding the previous  holdings of every addr_buf#/data_buf# into addr_out#/data_out#.

At pc_ena 1, you may be switching to addr1, but the memory is seeing the previous addr0, while what is coming out of the ram is the previous addr3. and are currently snapping that into addr_buf1/data_buf1, though the addr_out#/data_out# all hold their new state from when pc_ena0 phase has updated them.

Now, you can see the headache here.  Say you weed out everything as set the delays fixed and you code appears to work good.  What happens after adding some features, you might need to expand your mux for speed by making it collapse in 3 clocks stages instead of the current 1 clock.  What will sorting out the mess look like?

My mind has just melted...  :scared:

Step 1: Get rid of any and all demuxing in the case statement.

Like so?

Code: [Select]
always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
addr_in_mux <= addr_in_0;
cmd_mux_in <= cmd_in_0;
end
3'b001 : begin
addr_in_mux <= addr_in_1;
cmd_mux_in <= cmd_in_1;
end
3'b011 : begin
addr_in_mux <= addr_in_2;
cmd_mux_in <= cmd_in_2;
end
3'b100 : begin
addr_in_mux <= addr_in_3;
cmd_mux_in <= cmd_in_3;
end
3'b101 : begin
addr_in_mux <= addr_in_4;
cmd_mux_in <= cmd_in_4;
end
endcase

end // always @clk

Step 2: Make a easily variable clock delay pipe tap coming out of the gpu_ram.

Example:
reg [9*8+7:0]  data_pipe;   // make a large enough register to store 10 words (words 0 through 9).  In the case of ram data, the width of each word is 8 bits.

Now, inside the always @(posedge clk) place

data_pipe[7:0]                <= data_mux_out[7:0];         // fill the first 8 bit word in the register pipe
data_pipe[9*8+7:1*8]  <= data_pipe[8*8+7:0*8];    // shift over the next 9 words in this 10 word 8 bit wide pipe


if (pc_ena[3:0] == 0)
   begin
   data_out_0 <= data_pipe[MUX_0_POS*8+7:MUX_0_POS*8];
   data_out_1 <= data_pipe[MUX_1_POS*8+7:MUX_1_POS*8];
   data_out_2 <= data_pipe[MUX_2_POS*8+7:MUX_2_POS*8];
   data_out_3 <= data_pipe[MUX_3_POS*8+7:MUX_3_POS*8];
   data_out_4 <= data_pipe[MUX_4_POS*8+7:MUX_4_POS*8];

....
end

Okay, so we're feeding the data from the RAM into a pipeline which, at the start when pc_ena=0, is passed out to the data outputs and is bit-shifted up the pipe by 8 places on each clk count?

Now, the key is to workout the values for the 5 'MUX_#_POS' which are a fixed value based of the delays incurred by both   the  CLK_CYCLES_MUX  = 1 and CLK_CYCLES_RAM  = 2, and the tricky one, where each piece of data is in the pipe when the next  pc_ena==0 comes around so that all 5 output regs take the correct pipe position.

Errrrrr..... I've just had to cold-reset my brain twice.  I have literally lost myself in all this and am almost throwing code at the wall to see what sticks...:

Code: [Select]
localparam MUX_0_POS = (CLK_CYCLES_RAM + CLK_CYCLES_MUX) * 1;
localparam MUX_1_POS = (CLK_CYCLES_RAM + CLK_CYCLES_MUX) * 2;
localparam MUX_2_POS = (CLK_CYCLES_RAM + CLK_CYCLES_MUX) * 3;
localparam MUX_3_POS = (CLK_CYCLES_RAM + CLK_CYCLES_MUX) * 4;
localparam MUX_4_POS = (CLK_CYCLES_RAM + CLK_CYCLES_MUX) * 5;

always @(posedge clk) begin

data_pipe[7:0] <= data_mux_out[7:0]; // fill the first 8-bit word in the register pipe with data from RAM
data_pipe[9*8+7:1*8] <= data_pipe[8*8+7:0*8]; // shift over the next 9 words in this 10 word, 8-bit wide pipe

if (pc_ena[3:0] == 0)
begin
data_out_0 <= data_pipe[MUX_0_POS*8+7:MUX_0_POS*8];
data_out_1 <= data_pipe[MUX_1_POS*8+7:MUX_1_POS*8];
data_out_2 <= data_pipe[MUX_2_POS*8+7:MUX_2_POS*8];
data_out_3 <= data_pipe[MUX_3_POS*8+7:MUX_3_POS*8];
data_out_4 <= data_pipe[MUX_4_POS*8+7:MUX_4_POS*8];
end

Resolve MUX_#_POS using the 2 CLK_CYCLES parameters with the knowledge that pc_ena[] counts from 0-4, and in the future if you may any changes which add additional clocks in the preparation for the memory data, or different clock cycles in the FPGA memory core, or if you wire external static memory, just adjusting the 'CLK_CYCLES' parameters will allow your design to continue to function without re-doing all the individual written hardwired logic.

When calculating the MUX_#_POS may also mean adding a full additional pixel delay if one or more pixels comes in earlier that the rest, or if 1 pixel comes in later that the rest.

Yeah, I'm going to have to walk away and come back to this later in the hope that I can get my head around it from a different angle, because this just isn't going in.   :-\
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #307 on: November 11, 2019, 02:25:57 pm »
Yes, the first part you got right.  Yes, the data_pipe connected to the output of the ram module clocks at 125MHz nonstop, just like the memory itself, and the 5 cycled address going into the ram.

Yeah, I'm going to have to walk away and come back to this later in the hope that I can get my head around it from a different angle, because this just isn't going in.   :-\

Ok, this should really help, however, you will still need to concentrate a little and print out my little 2 tables on some paper to make what's happening clear.

Lay out these 2 tables on 2 sheets of paper, one side list each reg in the pipe in order, and on the second is a list of how the muxed addresses and screen pixel number is being fed through:

On paper page #1
-------------------------------------------
into data_pipe[8*8+7:8*8]
into data_pipe[7*8+7:7*8]
into data_pipe[6*8+7:6*8]
into data_pipe[5*8+7:5*8]
into data_pipe[4*8+7:4*8]
into data_pipe[3*8+7:3*8]
into data_pipe[2*8+7:2*8]
into data_pipe[1*8+7:1*8]
into data_pipe[0*8+7:0*8]
inside ram 2
inside ram 1
addr to addr-mux
PC_ENA pos0
-------------------------------------------

On paper #2

----------------------------------------

pc_ena 0 pixel 0
pc_ena 1 pixel 0
pc_ena 2 pixel 0
pc_ena 3 pixel 0
pc_ena 4 pixel 0
pc_ena 0 pixel 1
pc_ena 1 pixel 1
pc_ena 2 pixel 1
pc_ena 3 pixel 1
pc_ena 4 pixel 1
pc_ena 0 pixel 2
pc_ena 1 pixel 2
pc_ena 2 pixel 2
pc_ena 3 pixel 2
pc_ena 4 pixel 2
pc_ena 0 pixel 3
pc_ena 1 pixel 3
pc_ena 2 pixel 3
pc_ena 3 pixel 3
pc_ena 4 pixel 3

------------------------------------

Slide paper #2 vertically across paper #1, 5 steps at a time, (paper 2 starts at the bottom, then you slide it vertically upwards) to see where each data is in your output pipe.  Remember, the numbers in red which all have the same pixel number in green are your 5 correct functional MUX_#_POS.  (Use the smallest possible valid pipe).  And every time your shift the paper vertically 5 steps, the next 5 pixels should line up to the same 5 MUX_#_POS.

Subtracting the pixel number at the reg output from the bottom line on page 1, plus 1 gives you to total number of 25MHz screen pixels it takes from address# in on your gpu multiport ram, to data out#.

I hope this helps.  Note that this is the most difficult part of your project.  There may be a few simpler tricks to do this, but, this offers maximum flexibility and upgrade paths as the 5 ports can address anything and all results are given in parallel.


Remember, the 5 'MUX_#_POS' are 5 localparams you will set.
The 5 localparams are identical for the addr_out# & cmd_out# pip selection.  It's only the *8 which will change to *20 for the address_pipe reg and *16 for the cmd_pipe reg.

« Last Edit: November 11, 2019, 02:43:43 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #308 on: November 11, 2019, 03:03:42 pm »
To the more advance users out there, yes, all the 5 channel memory buses and shift pipes could be compacted down to a single reg in a single line with just the right expression within two brackets[], latching and shifting muxing, demuxing the entire results.  For example for the 5 data bytes output register, all 40 bits of the grouped buss may be latched on 1 line time with the right 8 bit (1 word) offset from the source pipe.  This could make the multi-port-ram module nothing more than a 3 lines of code for the 3 muxed buss inputs, and another 3 for the 3 bus outputs.  same for the 5x20bit address meaning a 100bit address input and 100 bits out with the right 20bit offset when latching the entire demux pipe.

However, this GPU project is an exercise for beginners and having me just dictate all these compact tricks to tap memory in a sequence pipe, aligning it to a multiphase system clock would not explain:

What I have done.
How it works.
Why it works.
Why I did it that way.
How can I design my own version?
What can I use it for?
Where should I use it?

Well, you should get the idea.
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #309 on: November 11, 2019, 05:29:01 pm »
Ok, this should really help, however, you will still need to concentrate a little and print out my little 2 tables on some paper to make what's happening clear.

One empty printer cartridge and sore wrist later...  ;D

Slide paper #2 vertically across paper #1, 5 steps at a time, (paper 2 starts at the bottom, then you slide it vertically upwards) to see where each data is in your output pipe.

Okay, so I'm starting with paper #2 below #1, so the first step brings pc_ena 0 pixel 0 into pc_ena pos 0.

Remember, the numbers in red which all have the same pixel number in green are your 5 correct functional MUX_#_POS.  (Use the smallest possible valid pipe).  And every time your shift the paper vertically 5 steps, the next 5 pixels should line up to the same 5 MUX_#_POS.

So after 10 steps, I get this:

into data_pipe [5*8+7:5*8] <= pc_ena 0 pixel 0
into data_pipe [4*8+7:4*8] <= pc_ena 1 pixel 0
into data_pipe [3*8+7:3*8] <= pc_ena 2 pixel 0
into data_pipe [2*8+7:2*8] <= pc_ena 3 pixel 0
into data_pipe [1*8+7:1*8] <= pc_ena 4 pixel 0
into data_pipe [0*8+7:0*8] <= pc_ena 0 pixel 1
inside ram 2                        <= pc_ena 1 pixel 1
inside ram 1                        <= pc_ena 2 pixel 1
addr to addr_mux                <= pc_ena 3 pixel 1
pc_ena pos 0                       <= pc_ena 4 pixel 1

So the five correct functional values are:
5,
4,
3,
2,
1??

Subtracting the pixel number at the reg output from the bottom line on page 1, plus 1 gives you to total number of 25MHz screen pixels it takes from address# in on your gpu multiport ram, to data out#.

2 pixels then?  ???

I hope this helps.  Note that this is the most difficult part of your project.  There may be a few simpler tricks to do this, but, this offers maximum flexibility and upgrade paths as the 5 ports can address anything and all results are given in parallel.

That makes me feel so much better knowing this is the most difficult part.   :o ;D

Remember, the 5 'MUX_#_POS' are 5 localparams you will set.
The 5 localparams are identical for the addr_out# & cmd_out# pip selection.  It's only the *8 which will change to *20 for the address_pipe reg and *16 for the cmd_pipe reg.

So my code looks like this:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena_in, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] addr_in_0,
input [19:0] addr_in_1,
input [19:0] addr_in_2,
input [19:0] addr_in_3,
input [19:0] addr_in_4,
input [19:0] addr_host_in,

// auxilliary read command buses (input)
input [15:0] cmd_in_0,
input [15:0] cmd_in_1,
input [15:0] cmd_in_2,
input [15:0] cmd_in_3,
input [15:0] cmd_in_4,

// outputs
output wire [3:0] pc_ena_out,

// address pass-thru bus (output)
output reg [19:0] addr_out_0,
output reg [19:0] addr_out_1,
output reg [19:0] addr_out_2,
output reg [19:0] addr_out_3,
output reg [19:0] addr_out_4,
output reg [19:0] addr_host_out,

// auxilliary read command bus (pass-thru output)
output reg [15:0] cmd_out_0,
output reg [15:0] cmd_out_1,
output reg [15:0] cmd_out_2,
output reg [15:0] cmd_out_3,
output reg [15:0] cmd_out_4,

// data buses (output)
output reg [7:0] data_out_0,
output reg [7:0] data_out_1,
output reg [7:0] data_out_2,
output reg [7:0] data_out_3,
output reg [7:0] data_out_4,
output [7:0] data_host_out

);

// dual-port GPU RAM handler

// define the maximum address bits and number of words - effectively the RAM size
parameter ADDR_SIZE = 14;
parameter NUM_WORDS = 2 ** ADDR_SIZE;

localparam CLK_CYCLES_MUX = 1; // adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2; // adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5; // adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready

reg [19:0] addr_in_mux;
reg [15:0] cmd_mux_in;
reg [15:0] cmd_mux_out;
wire [19:0] addr_mux_out;
wire [7:0] data_mux_out;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(addr_in_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_mux_in),
.addr_out_a(addr_mux_out),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_mux_out),
.data_out_a(data_mux_out),
.data_out_b()
);

defparam gpu_RAM.ADDR_SIZE = ADDR_SIZE, // pass ADDR_SIZE into the gpu_RAM instance
gpu_RAM.NUM_WORDS = NUM_WORDS; // set non-default word size for the RAM (16 KB)

reg [9*8+7:0] data_pipe;
reg [9*20+19:0] addr_pipe;
reg [9*16+15:0] cmd_pipe;

localparam MUX_0_POS = 5;
localparam MUX_1_POS = 4;
localparam MUX_2_POS = 3;
localparam MUX_3_POS = 2;
localparam MUX_4_POS = 1;

always @(posedge clk) begin

data_pipe[7:0] <= data_mux_out[7:0]; // fill the first 8-bit word in the register pipe with data from RAM
data_pipe[9*8+7:1*8] <= data_pipe[8*8+7:0*8]; // shift over the next 9 words in this 10 word, 8-bit wide pipe
// this moves the data up one word at a time, dropping the top most 8 bits
addr_pipe[19:0] <= addr_mux_out;
addr_pipe[9*20+19:1*20] <= addr_pipe[8*20+19:0*20];

cmd_pipe[15:0] <= cmd_mux_out[15:0];
cmd_pipe[9*16+15:1*16] <= cmd_pipe[8*16+15:0*16];

if (pc_ena[3:0] == 0)
begin
addr_out_0 <= addr_pipe[MUX_0_POS*20+19:MUX_0_POS*20];
addr_out_1 <= addr_pipe[MUX_1_POS*20+19:MUX_1_POS*20];
addr_out_2 <= addr_pipe[MUX_2_POS*20+19:MUX_2_POS*20];
addr_out_3 <= addr_pipe[MUX_3_POS*20+19:MUX_3_POS*20];
addr_out_4 <= addr_pipe[MUX_4_POS*20+19:MUX_4_POS*20];

cmd_out_0 <= cmd_pipe[MUX_0_POS*16+15:MUX_0_POS*16];
cmd_out_1 <= cmd_pipe[MUX_1_POS*16+15:MUX_1_POS*16];
cmd_out_2 <= cmd_pipe[MUX_2_POS*16+15:MUX_2_POS*16];
cmd_out_3 <= cmd_pipe[MUX_3_POS*16+15:MUX_3_POS*16];
cmd_out_4 <= cmd_pipe[MUX_4_POS*16+15:MUX_4_POS*16];

data_out_0 <= data_pipe[MUX_0_POS*8+7:MUX_0_POS*8];
data_out_1 <= data_pipe[MUX_1_POS*8+7:MUX_1_POS*8];
data_out_2 <= data_pipe[MUX_2_POS*8+7:MUX_2_POS*8];
data_out_3 <= data_pipe[MUX_3_POS*8+7:MUX_3_POS*8];
data_out_4 <= data_pipe[MUX_4_POS*8+7:MUX_4_POS*8];
end

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
addr_in_mux <= addr_in_0;
cmd_mux_in <= cmd_in_0;
end
3'b001 : begin
addr_in_mux <= addr_in_1;
cmd_mux_in <= cmd_in_1;
end
3'b011 : begin
addr_in_mux <= addr_in_2;
cmd_mux_in <= cmd_in_2;
end
3'b100 : begin
addr_in_mux <= addr_in_3;
cmd_mux_in <= cmd_in_3;
end
3'b101 : begin
addr_in_mux <= addr_in_4;
cmd_mux_in <= cmd_in_4;
end
endcase

end // always @clk

endmodule


CLK_CYCLES_MUX, RAM and PIX aren't used yet?
« Last Edit: November 11, 2019, 05:41:54 pm by nockieboy »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #310 on: November 11, 2019, 11:57:57 pm »
Don't threat, I am deliberately pushing to to take big steps, and you are doing fine for such a complex 1st project.
You are doing great.  :-+

Your bulk of code is perfect, though 2 little mistakes:  (I hope Im not making one....  ;) )
----------------------------------------
into data_pipe [5*8+7:5*8] <= pc_ena 0 pixel 0
into data_pipe [4*8+7:4*8] <= pc_ena 1 pixel 0
into data_pipe [3*8+7:3*8] <= pc_ena 2 pixel 0
into data_pipe [2*8+7:2*8] <= pc_ena 3 pixel 0
into data_pipe [1*8+7:1*8] <= pc_ena 4 pixel 0
into data_pipe [0*8+7:0*8] <= pc_ena 0 pixel 1
inside ram 2                        <= pc_ena 1 pixel 1
inside ram 1                        <= pc_ena 2 pixel 1
addr to addr_mux                <= pc_ena 3 pixel 1
pc_ena pos 0                       <= pc_ena 4 pixel 1
------------------------------------------------------

You need to push all the pixels on the right up by 1 more since the 'pc_ena pos 0' is the time in 'IF (pc_ena pos == 0)' statement and not a register.  At the beginning of 'pc_ena pos == 0', 'addr to addr_mux's output  should holding the last 'pc_ena 4 pixel 1' and  'pc_ena pos == 0' should be ready to take the next pixel 'pc_ena 0 pixel 2'... This also adds 1 whole pixel to the pixel pipe delay.

Next: (I corrected your offset figures)
localparam MUX_0_POS = 6;
localparam MUX_1_POS = 5;
localparam MUX_2_POS = 4;
localparam MUX_3_POS = 3;
localparam MUX_4_POS = 2;

Is functional, however, there is a smarter way to fill these out and it includes my earlier hint:
Here is the hint#1 :
------------------------------------
localparam   CLK_CYCLES_MUX  = 1;  // Adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed inputs
localparam   CLK_CYCLES_RAM  = 2;  // Adjust this figure to the number of clock cycles the DP_ram takes to retrieve a valid data from the read address in.
localparam   CLK_CYCLES_PCENA  = 5;  // Adjust this figure to the number of clock cycles per pixel.
-------------------------------------

Currently, in your code, your first mux takes 1 clock and the INTEL altsyncram megafunction takes 2 clocks:
----------------------------
into data_pipe[0*8+7:0*8]
inside ram 2                               2 clock cycles here for INTEL's altsyncram function.
inside ram 1
addr to addr-mux                       1 clock cycle here for you current MUX code.
PC_ENA pos0
-------------------------------
Now, also knowing that PC_ENA has 5 positions per pixel, and using those 3 reference 'CLK_CYCLES_xxxx' which describes the # of clocks as each step in your delay pipe, write a formula which fills in all 5 'localparam MUX_#_POS's numbers.

Next, test you formula against a slower addr-mux algorithm which takes 2 clocks instead of 1 clock.

Step 1:Example, fill in 'localparam   CLK_CYCLES_MUX  = 2'
Step 2, change table on page 1 so addr-mux has 2 clock steps:
-----------------------------
nto data_pipe[2*8+7:2*8]
into data_pipe[1*8+7:1*8]
into data_pipe[0*8+7:0*8]
inside ram 2
inside ram 1
addr to addr-mux        step #2
addr to addr-mux        step #1
PC_ENA pos0
-------------------------------------------

Now verify that you formula generating the 5  'localparam MUX_#_POS's have valid pipe positions.
Do this a few more times with 'CLK_CYCLES_MUX  = 3', 'CLK_CYCLES_MUX  = 4', CLK_CYCLES_RAM  = 3'.

Everything else you wrote looks correct.  :-+

Also, you should realize trying to properly unmux the data stream coming out of the ram could never be done properly any other way without pure luck.  And with luck, if you ever had to increase a number of clock steps in for example the addr-mux stage, or, the FPGA's altsyncram dual port ram function, everything would fall apart and luck again would be needed to hope you get the right output all in parallel.
« Last Edit: November 12, 2019, 12:27:05 am by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #311 on: November 12, 2019, 05:28:48 am »
Ok, next....

You will need to create a 16 kilobyte "gpu_ram_init.mif" file.  This file should contain my Atari font beginning at byte 4'h0800 & the test memory text file beginning at byte 4'h1000.

Next, remove my old 2 alt synccrams from the OSG generator and wire in your new "multiport_gpu_ram.v".
You will need to appropriately cross wire in my old addresses into the 20 bit address, hard wiring the upper addresses to the 2 bases 4h'0800 & 4h'1000.

Also, in my old code, each read took 2 pixel clocks, now each new ram read takes 3 pixel clocks.  (even I might have counted wrong here, but I'm pretty sure it's 3...)
This means touching up inside the OSD:

Code: [Select]
parameter   PIPE_DELAY =  4;   // This parameter selects the number of pixel clocks which the output VDE and syncs are delayed



//  The disp_x is the X coordinate counter.  It runs from 0 to 512 and stops there
//  The disp_y is the Y coordinate sounter.  It runs from 0 to 256 and stops there

assign disp_pos[4:0]  = disp_x[8:4] ;  // The disp_pos[4:0] is the lower address for the 32 characters wide display ascii text.
assign disp_pos[8:5]  = disp_y[7:4] ;  // the disp_pos[8:5] is the upper address for the 16 lines of text


//  The result from the ascii memory component 'altsyncram_component_osd_mem'  is called letter[7:0]
//  Since disp_pos[8:0] has entered the read address, it takes 2 pixel clock cycles for the resulting letter[7:0] to come out.

//  Now, font_pos[12:0] is the read address for the memory block containing the font memory

assign font_pos[12:6] = letter[6:0] ;       //  Selec the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]  = dly2_disp_x[3:1] ;  //  select the font X coordinate with a 2 pixel clock DELAYES disp_x address.  [3:1] is used so that every 2 x pixels are repeats
assign font_pos[5:3]  = dly2_disp_y[3:1] ;  //  select the font y coordinate with a 2 pixel clock DELAYES disp_y address.  [3:1] is used so that every 2 y lines are repeats


//  The resulting font image, 2 bits since I made a 2 bit color atari font is assigned to the OSD[1:0] output
//  Also, since there is an 8th bit in the ascii test memory, I use that as a third OSD output color bit.

assign osd_image[1:0] = osd_img[1:0];
assign osd_image[2]   = dly2_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]


// **********************************************************************************************
// AND
// **********************************************************************************************
osd_ena_out  <= dly2_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
                           // It needs to be delayed by the number of pixel clocks required for the above memories


You should be able to recreate your last OSD image perfectly.  If so, congratulations, you passed the 50% mark of completing version 1.0!.

If you have time, you should now think about getting 12bit color working as this is coming up next.
Then comes proper dynamic display of any memory as text or graphics at any resolution.

This is the fun part is coming...  You will soon need to think about videos, not just pictures, but, you will need to fill the memory_init file with data for things to happen and you will need to get a Z80 connected with software to achieve anything interesting.
« Last Edit: November 12, 2019, 05:38:48 am by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #312 on: November 12, 2019, 09:05:28 am »
Currently, in your code, your first mux takes 1 clock and the INTEL altsyncram megafunction takes 2 clocks:
----------------------------
into data_pipe[0*8+7:0*8]
inside ram 2                               2 clock cycles here for INTEL's altsyncram function.
inside ram 1
addr to addr-mux                       1 clock cycle here for you current MUX code.
PC_ENA pos0
-------------------------------
Now, also knowing that PC_ENA has 5 positions per pixel, and using those 3 reference 'CLK_CYCLES_xxxx' which describes the # of clocks as each step in your delay pipe, write a formula which fills in all 5 'localparam MUX_#_POS's numbers.

Code: [Select]
localparam CLK_CYCLES_MUX = 1; // adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2; // adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5; // adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready

localparam MUX_0_POS = 6; // pixel offset positions in their respective synchronisation
localparam MUX_1_POS = 5; // pipelines (where the pixels will be found in the pipeline
localparam MUX_2_POS = 4; // when pc_ena[3:0]==0).
localparam MUX_3_POS = 3; //
localparam MUX_4_POS = 2; //

To get the MUX_x_POS values from those parameters, I've come up with this:

Code: [Select]
MUX_POS = 2*CLK_CYCLES_MUX + 2*CLK_CYCLES_RAM + 2*CLK_CYCLES_PIX - (POS + 10);
It's not elegant, but seems to do the job and works with parameter changes as far as I can tell.  I've had to add another parameter - POS - which replaces the _x_ in MUX_x_POS...  And I'm not 100% happy with the constant '10' being in there... so I guess what I'm trying to say is that I'm not 100% happy with the formula...  ::)

I know Verilog isn't a programming language, but is there any chance we can replace all those MUX_x_POS parameters with a single array of values and access them using POS as the index?

Also, you should realize trying to properly unmux the data stream coming out of the ram could never be done properly any other way without pure luck.  And with luck, if you ever had to increase a number of clock steps in for example the addr-mux stage, or, the FPGA's altsyncram dual port ram function, everything would fall apart and luck again would be needed to hope you get the right output all in parallel.

Yeah, with my luck it'd blow up or something.  :o
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #313 on: November 12, 2019, 11:47:15 am »
Ok, next....

You will need to create a 16 kilobyte "gpu_ram_init.mif" file.  This file should contain my Atari font beginning at byte 4'h0800 & the test memory text file beginning at byte 4'h1000.

That's done.  File attached (gpu_16K_RAM.mif).

Next, remove my old 2 alt synccrams from the OSG generator and wire in your new "multiport_gpu_ram.v".
You will need to appropriately cross wire in my old addresses into the 20 bit address, hard wiring the upper addresses to the 2 bases 4h'0800 & 4h'1000.

You make it sound so easy... :o

I'm a little confused about how to wire up the 5 batches of address, data and command buses.  Well, actually, completely clueless.   :scared:

So disp_pos is a 9-bit address that needs to go into the new multiport_gpu_ram module, and needs to be shifted to address the display RAM portion of memory starting at 0x1000 in gpu_16K_RAM.mif.  This needs to be put into a 13-bit address value - disp_addr for example:

Code: [Select]
wire disp_addr[12:0];
disp_addr[12] = 1'b1;    // set 13th bit to 1 to start address at 0x1000
disp_addr[8:0] = disp_pos[8:0];    // map display position address into new memory address

And to get the font data, which uses a 10-bit address, I'll need to map its address into a new 11-bit address for the new memory like so:

Code: [Select]
wire font_addr[11:0];
font_addr[11] = 1'b1;    // set 12th bit to 1 to start address at 0x800
disp_addr[9:0] = font_pos[9:0];    // map display position address into new memory address

Above is pseudo-Verilog code... I'm writing this from the top of my head at the moment (not a great place to start from!)  ::)

But, in any case, I'm not sure where these values should be going into the multiport_gpu_ram instance and what I should be doing with the masses of unassigned inputs?

Code: [Select]
// ****************************************************************************************************************************
// create a multiport GPU RAM handler instance
// ****************************************************************************************************************************
multiport_gpu_ram gpu_RAM(

.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(),
.write_ena_b(),

.addr_in_0(),
.addr_in_1(),
.addr_in_2(),
.addr_in_3(),
.addr_in_4(),
.addr_host_in(),

.cmd_in_0(),
.cmd_in_1(),
.cmd_in_2(),
.cmd_in_3(),
.cmd_in_4(),

.pc_ena_out(),

.addr_out_0(),
.addr_out_1(),
.addr_out_2(),
.addr_out_3(),
.addr_out_4(),
.addr_host_out(),

.cmd_out_0(),
.cmd_out_1(),
.cmd_out_2(),
.cmd_out_3(),
.cmd_out_4(),

.data_out_0(),
.data_out_1(),
.data_out_2(),
.data_out_3(),
.data_out_4(),
.data_host_out()

);

Also, in my old code, each read took 2 pixel clocks, now each new ram read takes 3 pixel clocks.  (even I might have counted wrong here, but I'm pretty sure it's 3...)
This means touching up inside the OSD:

Code: [Select]
assign osd_image[1:0] = osd_img[1:0];
assign osd_image[2]   = dly2_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]

Is that second-to-last line of code correct?  Assigning osd_image[1:0] to itself?  (Assuming osd_img is a typo)

// **********************************************************************************************
// AND
// **********************************************************************************************
osd_ena_out  <= dly2_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
                           // It needs to be delayed by the number of pixel clocks required for the above memories

[/code]

You should be able to recreate your last OSD image perfectly.  If so, congratulations, you passed the 50% mark of completing version 1.0!.

Yeah, not quite there yet with the code.  ;)

If you have time, you should now think about getting 12bit color working as this is coming up next.[/code]

Ah okay - will see what I can do.

This is the fun part is coming...  You will soon need to think about videos, not just pictures, but, you will need to fill the memory_init file with data for things to happen and you will need to get a Z80 connected with software to achieve anything interesting.

 ;D

Looking forward to getting the host connected and getting a working video console.  That'll be a big win for me.
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #314 on: November 12, 2019, 11:51:42 am »
Currently, in your code, your first mux takes 1 clock and the INTEL altsyncram megafunction takes 2 clocks:
----------------------------
into data_pipe[0*8+7:0*8]
inside ram 2                               2 clock cycles here for INTEL's altsyncram function.
inside ram 1
addr to addr-mux                       1 clock cycle here for you current MUX code.
PC_ENA pos0
-------------------------------
Now, also knowing that PC_ENA has 5 positions per pixel, and using those 3 reference 'CLK_CYCLES_xxxx' which describes the # of clocks as each step in your delay pipe, write a formula which fills in all 5 'localparam MUX_#_POS's numbers.

Code: [Select]
localparam CLK_CYCLES_MUX = 1; // adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2; // adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5; // adjust this figure to the number of PIXEL clock cycles it takes the demuxed output data to be ready

localparam MUX_0_POS = 6; // pixel offset positions in their respective synchronisation
localparam MUX_1_POS = 5; // pipelines (where the pixels will be found in the pipeline
localparam MUX_2_POS = 4; // when pc_ena[3:0]==0).
localparam MUX_3_POS = 3; //
localparam MUX_4_POS = 2; //

To get the MUX_x_POS values from those parameters, I've come up with this:

Code: [Select]
MUX_POS = 2*CLK_CYCLES_MUX + 2*CLK_CYCLES_RAM + 2*CLK_CYCLES_PIX - (POS + 10);
It's not elegant, but seems to do the job and works with parameter changes as far as I can tell.  I've had to add another parameter - POS - which replaces the _x_ in MUX_x_POS...  And I'm not 100% happy with the constant '10' being in there... so I guess what I'm trying to say is that I'm not 100% happy with the formula...  ::)

I know Verilog isn't a programming language, but is there any chance we can replace all those MUX_x_POS parameters with a single array of values and access them using POS as the index?


Ok, I'll give this one to you in a simpler way, but, to do so in a simple way, I've added 1 parameter, PIXEL_PIPE.  PIXEL_PIPE needs to be large enough so that the 'MUX_4_POS' doesn't become a negative number, otherwise, you will get a compile error and need to increase the new 'PIXEL_PIPE' parameter until 'MUX_4_POS' is a positive integer.

Code: [Select]
parameter   PIXEL_PIPE = 3;  // This externally set parameter defines the number of 25MHz pixels it takes to receive a new pixel from a presented address

localparam CLK_CYCLES_MUX = 1; // adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2; // adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5; // adjust this figure to the number of 125MHz clocks there are for each pixel, IE number of muxed inputs for each pixel

//  This parameter begins with the wanted top number of 125Mhz pixel clock headroom for the pixel pipe, then subtracts the additional 125MHz clocks used by the _MUX and _RAM cycles used to arrive at the first pixel out, DEMUX_PIPE_TOP position.
localparam  DEMUX_PIPE_TOP    =  (( (PIXEL_PIPE - 1) * CLK_CYCLES_PIX ) - 1) - CLK_CYCLES_MUX - CLK_CYCLES_RAM;

localparam MUX_0_POS = DEMUX_PIPE_TOP - 0;  // pixel offset positions in their respective synchronisation
localparam MUX_1_POS = DEMUX_PIPE_TOP - 1;   // pipelines (where the pixels will be found in the pipeline
localparam MUX_2_POS = DEMUX_PIPE_TOP - 2;   // when pc_ena[3:0]==0).
localparam MUX_3_POS = DEMUX_PIPE_TOP - 3;   //
localparam MUX_4_POS = DEMUX_PIPE_TOP - 4; //

Code: [Select]
// Now that we know the DEMUX_PIPE_TOP, we can assign the top size of the 3 pipe regs

reg [DEMUX_PIPE_TOP*8+7:0] data_pipe;
reg [DEMUX_PIPE_TOP*20+19:0] addr_pipe;
reg [DEMUX_PIPE_TOP*16+15:0] cmd_pipe;

Code: [Select]
// We also need to limit the pipe in the 3 ' <= '

data_pipe[7:0]                     <= data_mux_out[7:0]; // fill the first 8-bit word in the register pipe with data from RAM
data_pipe[DEMUX_PIPE_TOP*8+7:1*8] <= data_pipe[ (DEMUX_PIPE_TOP-1) *8+7:0*8]; // shift over the next 9 words in this 10 word, 8-bit wide pipe
// this moves the data up one word at a time, dropping the top most 8 bits
addr_pipe[19:0]                   <= addr_mux_out;
addr_pipe[DEMUX_PIPE_TOP*20+19:1*20] <= addr_pipe[ (DEMUX_PIPE_TOP-1) *20+19:0*20];

cmd_pipe[15:0]                         <= cmd_mux_out[15:0];
cmd_pipe[DEMUX_PIPE_TOP*16+15:1*16] <= cmd_pipe[ (DEMUX_PIPE_TOP-1) *16+15:0*16];

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #315 on: November 12, 2019, 12:09:24 pm »
First, only mistakes in gpu_dual_port_ram_INTEL.v:

Code: [Select]
.address_b (addr_b[ (ADDR_SIZE-1) :0]),
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[ (ADDR_SIZE-1) :0]),
Changed the 'ADDR_SIZE)' to '(ADDR_SIZE-1)'.

Code: [Select]
altsyncram_component.widthad_a = ADDR_SIZE,
altsyncram_component.widthad_b = ADDR_SIZE,
Changed the 'ADDR_SIZE - 1' to 'ADDR_SIZE'.

Next comes the 'vid_osd_generator.v'.

 
The following users thanked this post: nockieboy

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #316 on: November 12, 2019, 12:39:28 pm »
Ok, I know we are 'Jerry' rigging my old disp_x/y addresses into this new algorithm, so temporarily, we will use wires to make this happen and temporarily touch up my OSD code.  The next step will make all of this obsolete, however, for now let's do it like this:

Insert the following into the OSD code:

wire [19:0] read_text_adr;
wire [19:0] read_font_adr;

assign read_text_adr[8:0]   = disp_pos[8:0];
assign read_text_adr[9]      =  1b'0;
assign read_text_adr[19:10] = 1'h4;

assign read_font_adr[9:0]   = font_pos[9:0];
assign read_font_adr[19:10] = 1'h2;


Now, pass  read_font_adr &  read_text_adr into 2 memory address 'addr_in_#()' ports of your liking.
And in the appropriate 'data_out_#()' ports, place the 'letter[7:0]' and 'char_line[7:0]'.

Without any other changes, this should generate a messed up text display as my old pipe delays were designed for 2 pixel clocks on each read, not the new current 3.  To fix this, you need to make the changes I listed in red:

parameter   PIPE_DELAY =  6;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.


assign font_pos[12:6]= letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]   = dly3_disp_y[3:1] ;  // select the font x coordinate with a 2 pixel clock DELAYED disp_x address.  [3:1] is used so that every 2 x lines are repeats
assign font_pos[5:3]   = dly3_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

assign osd_image[1:0] = osd_image[1:0]; You are working with the wrong version of OSD generator. You already changed this line to convert a 8bit font line into 8 individual pixels, 1 bit color B&W font, remember?

assign osd_image[2] = dly3_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]



      osd_ena_out  <= dly4_dena;   // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown

« Last Edit: November 12, 2019, 12:41:53 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #317 on: November 12, 2019, 12:53:12 pm »
 :palm: There is a bug in your mux (partially my bad...) in the 'multiport_gpu_ram.v', however, we will fix it after you get your garbled text with the current version.  It's one of those 'gotcha' things which would leave many helpless without running a simulator, even so, it would take days to fix depending on the rest of the design.  Don't threat, the real solution is easy if you know what you are doing, we will fix it next.
« Last Edit: November 12, 2019, 12:56:13 pm by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #318 on: November 12, 2019, 01:30:12 pm »
Insert the following into the OSD code:

wire [19:0] read_text_adr;
wire [19:0] read_font_adr;

assign read_text_adr[8:0]   = disp_pos[8:0];
assign read_text_adr[9]      =  1b'0;
assign read_text_adr[19:10] = 1'h4;

assign read_font_adr[9:0]   = font_pos[9:0];
assign read_font_adr[19:10] = 1'h2;

Righto - I've pasted that in before the multiport_gpu_ram declaration.

Now, pass  read_font_adr &  read_text_adr into 2 memory address 'addr_in_#()' ports of your liking.
And in the appropriate 'data_out_#()' ports, place the 'letter[7:0]' and 'char_line[7:0]'.

Code: [Select]
parameter   PIPE_DELAY =  6;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.

wire [9:0] font_pos;
wire [8:0] disp_pos;
wire [2:0] osd_image;
wire [19:0] read_text_adr;
wire [19:0] read_font_adr;

assign read_text_adr[8:0] = disp_pos[8:0];
assign read_text_adr[9] =  1b'0;
assign read_text_adr[19:10] = 1'h4;

assign read_font_adr[9:0] = font_pos[9:0];
assign read_font_adr[19:10] = 1'h2;

// ****************************************************************************************************************************
// create a multiport GPU RAM handler instance
// ****************************************************************************************************************************
multiport_gpu_ram gpu_RAM(

.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(),
.write_ena_b(),

.addr_in_0(read_font_adr),
.addr_in_1(read_text_adr),
.addr_in_2(),
.addr_in_3(),
.addr_in_4(),
.addr_host_in(),

.cmd_in_0(),
.cmd_in_1(),
.cmd_in_2(),
.cmd_in_3(),
.cmd_in_4(),

.pc_ena_out(),

.addr_out_0(),
.addr_out_1(),
.addr_out_2(),
.addr_out_3(),
.addr_out_4(),
.addr_host_out(),

.cmd_out_0(),
.cmd_out_1(),
.cmd_out_2(),
.cmd_out_3(),
.cmd_out_4(),

.data_out_0(letter[7:0]),
.data_out_1(char_line[7:0]),
.data_out_2(),
.data_out_3(),
.data_out_4(),
.data_host_out()

);


Without any other changes, this should generate a messed up text display as my old pipe delays were designed for 2 pixel clocks on each read, not the new current 3.  To fix this, you need to make the changes I listed in red:

parameter   PIPE_DELAY =  6;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.


assign font_pos[12:6]= letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]   = dly3_disp_y[3:1] ;  // select the font x coordinate with a 2 pixel clock DELAYED disp_x address.  [3:1] is used so that every 2 x lines are repeats
assign font_pos[5:3]   = dly3_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

All done.

assign osd_image[1:0] = osd_image[1:0]; You are working with the wrong version of OSD generator. You already changed this line to convert a 8bit font line into 8 individual pixels, 1 bit color B&W font, remember?

Yeah, I'd commented the line out where I'd changed it as you specified 'assign osd_image[1:0] = osd_image[1:0];' in place of 'assign osd_image[1:0] = char_line[(~dly4_disp_x[3:1])];'.  Which should I be using?

assign osd_image[2] = dly3_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]



      osd_ena_out  <= dly4_dena;   // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown[/code]

Done.  I've also added the extra dly lines.

Code: [Select]
module vid_osd_generator ( clk, pc_ena, hde_in, vde_in, hs_in, vs_in, osd_ena_out, osd_image, hde_out, vde_out, hs_out, vs_out,
wren_disp, wren_font, wr_addr, wr_data );

// To write contents into the display and font memories, the wr_addr[15:0] selects the address
// the wr_data[7:0] contains a byte which will be written
// the wren_disp is the write enable for the ascii text ram.  Only the wr_addr[8:0] are used as the character display is 32x16.
// the wren_font is the write enable for the font memory.  Only 2 bits are used of the wr_data[1:0] and wr_addr[12:0] are used.
// tie these ports to GND for now disabling them

input clk;
input [3:0] pc_ena;
input hde_in, vde_in, hs_in, vs_in;

output osd_ena_out;
reg    osd_ena_out;
output [2:0] osd_image;
output hde_out, vde_out, hs_out, vs_out;
reg hde_out, vde_out, hs_out, vs_out;

input wren_disp, wren_font;
input [15:0] wr_addr;
input [7:0] wr_data;

reg  [9:0] disp_x,dly1_disp_x,dly2_disp_x,dly3_disp_x,dly4_disp_x;
reg  [8:0] disp_y,dly1_disp_y,dly2_disp_y;

reg  dena,dly1_dena,dly2_dena,dly3_dena,dly4_dena;
reg  [7:0] dly1_letter, dly2_letter;

reg  [7:0] hde_pipe, vde_pipe, hs_pipe, vs_pipe;

parameter   PIPE_DELAY =  6;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.

wire [9:0] font_pos;
wire [8:0] disp_pos;
wire [2:0] osd_image;
wire [19:0] read_text_adr;
wire [19:0] read_font_adr;

assign read_text_adr[8:0] = disp_pos[8:0];
assign read_text_adr[9] =  1b'0;
assign read_text_adr[19:10] = 1'h4;

assign read_font_adr[9:0] = font_pos[9:0];
assign read_font_adr[19:10] = 1'h2;

// ****************************************************************************************************************************
// create a multiport GPU RAM handler instance
// ****************************************************************************************************************************
multiport_gpu_ram gpu_RAM(

.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(),
.write_ena_b(),

.addr_in_0(read_font_adr),
.addr_in_1(read_text_adr),
.addr_in_2(),
.addr_in_3(),
.addr_in_4(),
.addr_host_in(),

.cmd_in_0(),
.cmd_in_1(),
.cmd_in_2(),
.cmd_in_3(),
.cmd_in_4(),

.pc_ena_out(),

.addr_out_0(),
.addr_out_1(),
.addr_out_2(),
.addr_out_3(),
.addr_out_4(),
.addr_host_out(),

.cmd_out_0(),
.cmd_out_1(),
.cmd_out_2(),
.cmd_out_3(),
.cmd_out_4(),

.data_out_0(letter[7:0]),
.data_out_1(char_line[7:0]),
.data_out_2(),
.data_out_3(),
.data_out_4(),
.data_host_out()

);

//  The disp_x is the X coordinate counter.  It runs from 0 to 512 and stops there
//  The disp_y is the Y coordinate counter.  It runs from 0 to 256 and stops there

// Get the character at the current x, y position
assign disp_pos[4:0]  = disp_x[8:4] ;  // The disp_pos[4:0] is the lower address for the 32 characters for the ascii text.
assign disp_pos[8:5]  = disp_y[7:4] ;  // the disp_pos[8:5] is the upper address for the 16 lines of text

//  The result from the ascii memory component 'altsyncram_component_osd_mem'  is called letter[7:0]
//  Since disp_pos[8:0] has entered the read address, it takes 2 pixel clock cycles for the resulting letter[7:0] to come out.

//  Now, font_pos[12:0] is the read address for the memory block containing the character specified in letter[]

assign font_pos[12:6]= letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0] = dly3_disp_y[3:1] ;  // select the font x coordinate with a 2 pixel clock DELAYED disp_x address.  [3:1] is used so that every 2 x lines are repeats
assign font_pos[5:3] = dly3_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

//  The resulting 2-bit font image at x is assigned to the OSD[1:0] output
//  Also, since there is an 8th bit in the ascii text memory, I use that as a third OSD output color bit
assign osd_image[1:0] = char_line[(~dly4_disp_x[3:1])];
assign osd_image[2] = dly3_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]

always @ ( posedge clk ) begin

if (pc_ena[2:0] == 0) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

hde_pipe[0]   <= hde_in;
hde_pipe[7:1] <= hde_pipe[6:0];
hde_out       <= hde_pipe[PIPE_DELAY-2];

vde_pipe[0]   <= vde_in;
vde_pipe[7:1] <= vde_pipe[6:0];
vde_out       <= vde_pipe[PIPE_DELAY-2];

hs_pipe[0]    <= hs_in;
hs_pipe[7:1]  <= hs_pipe[6:0];
hs_out        <= hs_pipe[PIPE_DELAY-2];

vs_pipe[0]    <= vs_in;
vs_pipe[7:1]  <= vs_pipe[6:0];
vs_out        <= vs_pipe[PIPE_DELAY-2];

// **********************************************************************************************
// This OSD generator's window is only 512 pixels by 256 lines.
// Since the disp_X&Y counters are the screens X&Y coordinates, I'm using an extra most
// significant bit in the counters to determine if the OSD ena flag should be on or off.

if (disp_x[9] || disp_y[8])
dena <= 0; // When disp_x > 511 or disp_y > 255, then turn off the OSD's output enable flag
else
dena <= 1; // otherwise, turn on the OSD output enable flag

if (~vde_in)
disp_y[8:0] <= 9'b111111111; // preset the disp_y counter to max while the vertical display is disabled

else if (hde_in && ~hde_pipe[0])
begin // isolate a single event at the begining of the active display area

disp_x[9:0] <= 10'b0000000000; // clear the disp_x counter
if (!disp_y[8] | (disp_y[8:7] == 2'b11))
disp_y <= disp_y + 1; // only increment the disp_y counter if it hasn't reached it's end

end
else if (!disp_x[9])
disp_x <= disp_x + 1;  // keep on addind to the disp_x counter until it reaches it's end.

// **********************************************************************************************
// *** These delay pipes registers are explained in the 'assign's above
// **********************************************************************************************
dly1_disp_x <= disp_x;
dly2_disp_x <= dly1_disp_x;
dly3_disp_x <= dly2_disp_x;
dly4_disp_x <= dly3_disp_x;

dly1_disp_y <= disp_y;
dly2_disp_y <= dly1_disp_y;
dly3_disp_y <= dly2_disp_y;

dly1_letter <= letter;
dly2_letter <= dly1_letter;
dly3_letter <= dly2_letter;

dly1_dena   <= dena;
dly2_dena   <= dly1_dena;
dly3_dena   <= dly2_dena;
dly4_dena   <= dly3_dena;

// **********************************************************************************************
osd_ena_out  <= dly4_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
// It needs to be delayed by the number of pixel clocks required for the above memories

end // ena

end // always@clk

endmodule

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #319 on: November 12, 2019, 01:36:02 pm »
'assign osd_image[1:0] = char_line[(~dly4_disp_x[3:1])];' is the right line as your 8bit font ram needs yo be turned into 8 individual pixels.

Since I don't know how quartus deals with a missing clock on a dual port ram, for now, feed a clock here:
-----------------
      .clk_b(clk),
-----------------

For all other unused inputs, set them to '0', so at least you are sure they are doing nothing...

Hopefully you can compile the thing and get a first picture.
« Last Edit: November 12, 2019, 01:38:43 pm by BrianHG »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #320 on: November 12, 2019, 01:45:23 pm »
Yeah, getting it to compile was problematic - there were a few errors, but nothing insurmountable.

I'm not sure about the fix for this one, though:

Error (10663): Verilog HDL Port Connection error at multiport_gpu_ram.v(75): output or inout port "cmd_out" must be connected to a structural net expression

The lines in question is this one:

Code: [Select]
reg [15:0] cmd_mux_out;              **** I APPLIED FIX HERE - CHANGED TO 'WIRE'
wire [19:0] addr_mux_out;
wire [7:0] data_mux_out;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(addr_in_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_mux_in),
.addr_out_a(addr_mux_out),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_mux_out),    *********** THIS LINE IS WHERE THE ERROR POINTS TO
.data_out_a(data_mux_out),
.data_out_b()
);

As stated in the code sample above, I changed the cmd_mux_out from a REG to a WIRE and it compiles now... Not sure that later breaks something else, though?

Off to see what the output is like on my screen....   :-BROKE

EDIT:

Just getting a black screen...

Here's the data I'm passing to the multiport_gpu_ram module:

Code: [Select]
multiport_gpu_ram gpu_RAM(

.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(clk),
.write_ena_b(1'b1),

.addr_in_0(read_font_adr),
.addr_in_1(read_text_adr),
.addr_in_2(20'b0),
.addr_in_3(20'b0),
.addr_in_4(20'b0),
.addr_host_in(20'b0),

.cmd_in_0(16'b0),
.cmd_in_1(16'b0),
.cmd_in_2(16'b0),
.cmd_in_3(16'b0),
.cmd_in_4(16'b0),

.pc_ena_out(),

.addr_out_0(),
.addr_out_1(),
.addr_out_2(),
.addr_out_3(),
.addr_out_4(),
.addr_host_out(),

.cmd_out_0(),
.cmd_out_1(),
.cmd_out_2(),
.cmd_out_3(),
.cmd_out_4(),

.data_out_0(letter[7:0]),
.data_out_1(char_line[7:0]),
.data_out_2(),
.data_out_3(),
.data_out_4(),
.data_host_out()

);
« Last Edit: November 12, 2019, 01:47:28 pm by nockieboy »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #321 on: November 12, 2019, 01:50:07 pm »
Does your altsyncram memory have anything in it?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7747
  • Country: ca
Re: FPGA VGA Controller for 8-bit computer
« Reply #322 on: November 12, 2019, 01:57:59 pm »
Yeah, getting it to compile was problematic - there were a few errors, but nothing insurmountable.

I'm not sure about the fix for this one, though:

Error (10663): Verilog HDL Port Connection error at multiport_gpu_ram.v(75): output or inout port "cmd_out" must be connected to a structural net expression

Something else is off, otherwise the "addr_out" would have a similar error.
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #323 on: November 12, 2019, 02:25:04 pm »
Does your altsyncram memory have anything in it?

Nope.  :palm:  Was just testing.  ;)

Still getting a black screen, though.  Latest files attached.

Error (10663): Verilog HDL Port Connection error at multiport_gpu_ram.v(75): output or inout port "cmd_out" must be connected to a structural net expression

Something else is off, otherwise the "addr_out" would have a similar error.

That's what I thought, but I can't see any differences that would cause the error for cmd_out and nothing else...

There is another error message after it - this is both together in the compilation report:

Error (10663): Verilog HDL Port Connection error at multiport_gpu_ram.v(75): output or inout port "cmd_out" must be connected to a structural net expression
Error (12152): Can't elaborate user hierarchy "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM"

« Last Edit: November 12, 2019, 02:27:26 pm by nockieboy »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: FPGA VGA Controller for 8-bit computer
« Reply #324 on: November 12, 2019, 02:38:57 pm »
Don't outputs from modules have to be wires?  So cmd_mux_out should be a wire that is assigned to a reg somewhere else?

Have tried this, with cmd_tmp_out being a wire on the cmd_out output of the module, and with the cmd_mux_out <= cmd_tmp_out in the always block, but still getting a black screen.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf