Author Topic: FPGA VGA Controller for 8-bit computer (Read 426364 times)

BrianHG · « **Reply #325 on:** November 12, 2019, 02:41:09 pm »

Your .mif file is still wrong and it should be in the main file path.
here is what you have now:
altsyncram_component.init_file = "../osd_mem.mif",

Code: [Select]

	defparam
		altsyncram_component.address_reg_b = "CLOCK1",
		altsyncram_component.clock_enable_input_a = "BYPASS",
		altsyncram_component.clock_enable_input_b = "BYPASS",
		altsyncram_component.clock_enable_output_a = "BYPASS",
		altsyncram_component.clock_enable_output_b = "BYPASS",
		altsyncram_component.indata_reg_b = "CLOCK1",
************************		altsyncram_component.init_file = "../osd_mem.mif",   *************************************
		altsyncram_component.intended_device_family = "Cyclone IV E",
		altsyncram_component.lpm_type = "altsyncram",
		altsyncram_component.numwords_a = NUM_WORDS,
		altsyncram_component.numwords_b = NUM_WORDS,
		altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
		altsyncram_component.outdata_aclr_a = "NONE",
		altsyncram_component.outdata_aclr_b = "NONE",
		altsyncram_component.outdata_reg_a = "CLOCK0",
		altsyncram_component.outdata_reg_b = "CLOCK1",
		altsyncram_component.power_up_uninitialized = "FALSE",
		altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
		altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
		altsyncram_component.widthad_a = ADDR_SIZE,
		altsyncram_component.widthad_b = ADDR_SIZE,
		altsyncram_component.width_a = 8,
		altsyncram_component.width_b = 8,
		altsyncram_component.width_byteena_a = 1,
		altsyncram_component.width_byteena_b = 1,
		altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1",
		altsyncram_component.init_file = "gpu_16K_RAM.mif";

also, make:
   .write_ena_b(1'b1),

into a
   .write_ena_b(1'b0),

Your not writing anything to ram....

The only difference I could find is this: (See red)
      // this moves the data up one word at a time, dropping the top most 8 bits
   addr_pipe[19:0]            <= addr_mux_out;
   addr_pipe[DEMUX_PIPE_TOP*20+19:1*20]   <= addr_pipe[(DEMUX_PIPE_TOP-1)*20+19:0*20];

   cmd_pipe[15:0]         <= cmd_mux_out[15:0];
   cmd_pipe[DEMUX_PIPE_TOP*16+15:1*16]      <= cmd_pipe[(DEMUX_PIPE_TOP-1)*16+15:0*16];

BrianHG · « **Reply #326 on:** November 12, 2019, 02:42:35 pm »

Quote from: nockieboy on November 12, 2019, 02:38:57 pm

Don't outputs from modules have to be wires? So cmd_mux_out should be a wire that is assigned to a reg somewhere else?

Have tried this, with cmd_tmp_out being a wire on the cmd_out output of the module, and with the cmd_mux_out <= cmd_tmp_out in the always block, but still getting a black screen.

cmd_out is currently not being use, so I'm not worried about that.

Check your compiler reports and see how much 'ram' is being used...

nockieboy · « **Reply #327 on:** November 12, 2019, 03:21:47 pm »

Quote from: BrianHG on November 12, 2019, 02:41:09 pm

Your .mif file is still wrong and it should be in the main file path.
here is what you have now:
altsyncram_component.init_file = "../osd_mem.mif",

Code: [Select]
defparam altsyncram_component.address_reg_b = "CLOCK1", altsyncram_component.clock_enable_input_a = "BYPASS", altsyncram_component.clock_enable_input_b = "BYPASS", altsyncram_component.clock_enable_output_a = "BYPASS", altsyncram_component.clock_enable_output_b = "BYPASS", altsyncram_component.indata_reg_b = "CLOCK1", ************************ altsyncram_component.init_file = "../osd_mem.mif", ************************************* altsyncram_component.intended_device_family = "Cyclone IV E", altsyncram_component.lpm_type = "altsyncram", altsyncram_component.numwords_a = NUM_WORDS, altsyncram_component.numwords_b = NUM_WORDS, altsyncram_component.operation_mode = "BIDIR_DUAL_PORT", altsyncram_component.outdata_aclr_a = "NONE", altsyncram_component.outdata_aclr_b = "NONE", altsyncram_component.outdata_reg_a = "CLOCK0", altsyncram_component.outdata_reg_b = "CLOCK1", altsyncram_component.power_up_uninitialized = "FALSE", altsyncram_component.read_during_write_mode_port_a = "OLD_DATA", altsyncram_component.read_during_write_mode_port_b = "OLD_DATA", altsyncram_component.widthad_a = ADDR_SIZE, altsyncram_component.widthad_b = ADDR_SIZE, altsyncram_component.width_a = 8, altsyncram_component.width_b = 8, altsyncram_component.width_byteena_a = 1, altsyncram_component.width_byteena_b = 1, altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1", altsyncram_component.init_file = "gpu_16K_RAM.mif";

Darn it - missed the component.init setting near the top - added another at the bottom. Beginner's mistake.

Quote from: BrianHG on November 12, 2019, 02:41:09 pm

also, make:
.write_ena_b(1'b1),

into a
.write_ena_b(1'b0),

Your not writing anything to ram....

Chalk that one up to confusion over active high/low signals.

Quote from: BrianHG on November 12, 2019, 02:41:09 pm

The only difference I could find is this: (See red)
      // this moves the data up one word at a time, dropping the top most 8 bits
   addr_pipe[19:0]            <= addr_mux_out;
   addr_pipe[DEMUX_PIPE_TOP*20+19:1*20]   <= addr_pipe[(DEMUX_PIPE_TOP-1)*20+19:0*20];

   cmd_pipe[15:0]         <= cmd_mux_out[15:0];
   cmd_pipe[DEMUX_PIPE_TOP*16+15:1*16]      <= cmd_pipe[(DEMUX_PIPE_TOP-1)*16+15:0*16];

Okay, have removed the bit in red, but it's the same for the data_pipe as well (that specifies [7:0] and doesn't generate an error). Still getting the same error after compilation - must be to do with outputs from the module needing to be wires.

Quote from: BrianHG on November 12, 2019, 02:42:35 pm

Check your compiler reports and see how much 'ram' is being used...

According to the compilation report:

Code: [Select]

Flow Status                 Successful - Tue Nov 12 15:16:18 2019
Quartus Prime Version       18.1.0 Build 625 09/12/2018 SJ Lite Edition
Family	                    Cyclone IV E
Device	                    EP4CE6E22C8
Timing Models	            Final
Total logic elements	    168 / 6,272 ( 3 % )
Total registers	            127
Total pins	            10 / 92 ( 11 % )
Total virtual pins	    0
Total memory bits	    8,224 / 276,480 ( 3 % )
Embedded Multiplier 9-bit elements	0 / 30 ( 0 % )
Total PLLs	            1 / 2 ( 50 % )

Still just a black screen...

BrianHG · « **Reply #328 on:** November 12, 2019, 03:34:11 pm »

Quote from: nockieboy on November 12, 2019, 03:21:47 pm

Code: [Select]
Flow Status Successful - Tue Nov 12 15:16:18 2019 Quartus Prime Version 18.1.0 Build 625 09/12/2018 SJ Lite Edition Family Cyclone IV E Device EP4CE6E22C8 Timing Models Final Total logic elements 168 / 6,272 ( 3 % ) Total registers 127 Total pins 10 / 92 ( 11 % ) Total virtual pins 0 Total memory bits 8,224 / 276,480 ( 3 % ) Embedded Multiplier 9-bit elements 0 / 30 ( 0 % ) Total PLLs 1 / 2 ( 50 % )
Still just a black screen...

Total memory bits 8,224 / 276,480 ( 3 % ) OOOPSSSS this should be at least 131072....
The memory isn't being allocated...
Try this, all the parameters where you have NUM_WORDS = 2**ADDR_SIZE, change that to NUM_WORDS=16384.

There is another thing to try if this doesn't work.
Inside the 'altsyncram_component', hard wire the ADDR_SIZE to 14 and NUM_WORDS to 16384.

nockieboy · « **Reply #329 on:** November 12, 2019, 03:43:30 pm »

Okay, nothing happening still. I've hardwired the values into the ALTSYNCRAM component, still no joy.

Here's my gpu_dual_port_ram_INTEL code:

Code: [Select]

module gpu_dual_port_ram_INTEL (

	// inputs
	input clk,
	input [3:0] pc_ena_in,
	input clk_b,
	input wr_en_b,
	input [19:0] addr_a,
	input [19:0] addr_b,
	input [7:0] data_in_b,
	input [15:0] cmd_in,
	
	// registered outputs
	output reg [19:0] addr_out_a,
	output reg [3:0] pc_ena_out,
	output reg [15:0] cmd_out,
	
	// direct outputs
	output wire [7:0] data_out_a,
	output wire [7:0] data_out_b
	
);

// define the maximum address bit
parameter ADDR_SIZE = 14;

// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on ADDR_SIZE if not otherwise specified
parameter NUM_WORDS = 16384;

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
// 
// Port A 				- read only by GPU
// Port B 				- read/writeable by host system
// Data buses 			- 8 bits / 1 byte wide
// Address buses 		- ADDR_SIZE wide (14 bits default)
// Memory word size 	- NUM_WORDS (16384 bytes default)
// ****************************************************************************************************************************
	altsyncram	altsyncram_component (
				.clock0 (clk),
				.wren_a (1'b0),
				.address_b (addr_b[13:0]),
				.clock1 (clk_b),
				.data_b (data_in_b),
				.wren_b (wr_en_b),
				.address_a (addr_a[13:0]),
				.data_a (8'b00000000),
				.q_a (data_out_a),
				.q_b (data_out_b),
				.aclr0 (1'b0),
				.aclr1 (1'b0),
				.addressstall_a (1'b0),
				.addressstall_b (1'b0),
				.byteena_a (1'b1),
				.byteena_b (1'b1),
				.clocken0 (1'b1),
				.clocken1 (1'b1),
				.clocken2 (1'b1),
				.clocken3 (1'b1),
				.eccstatus (),
				.rden_a (1'b1),
				.rden_b (1'b1));
				
	defparam
		altsyncram_component.address_reg_b = "CLOCK1",
		altsyncram_component.clock_enable_input_a = "BYPASS",
		altsyncram_component.clock_enable_input_b = "BYPASS",
		altsyncram_component.clock_enable_output_a = "BYPASS",
		altsyncram_component.clock_enable_output_b = "BYPASS",
		altsyncram_component.indata_reg_b = "CLOCK1",
		altsyncram_component.init_file = "../gpu_16K_RAM.mif",
		altsyncram_component.intended_device_family = "Cyclone IV E",
		altsyncram_component.lpm_type = "altsyncram",
		altsyncram_component.numwords_a = 16384,
		altsyncram_component.numwords_b = 16384,
		altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
		altsyncram_component.outdata_aclr_a = "NONE",
		altsyncram_component.outdata_aclr_b = "NONE",
		altsyncram_component.outdata_reg_a = "CLOCK0",
		altsyncram_component.outdata_reg_b = "CLOCK1",
		altsyncram_component.power_up_uninitialized = "FALSE",
		altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
		altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
		altsyncram_component.widthad_a = 14,
		altsyncram_component.widthad_b = 14,
		altsyncram_component.width_a = 8,
		altsyncram_component.width_b = 8,
		altsyncram_component.width_byteena_a = 1,
		altsyncram_component.width_byteena_b = 1,
		altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";
		
// ****************************************************************************************************************************

always @(posedge clk) begin

	// **************************************************************************************************************************
	// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
	// **************************************************************************************************************************
	rd_addr_pipe_a <= addr_a;
	addr_out_a <= rd_addr_pipe_a;
	
	cmd_pipe <= cmd_in;
	cmd_out <= cmd_pipe;
	
	pc_ena_pipe <= pc_ena_in;
	pc_ena_out <= pc_ena_pipe;
	// **************************************************************************************************************************
		
end

endmodule

Reported memory usage is still 8,224 / 276,480 (3%)

BrianHG · « **Reply #330 on:** November 12, 2019, 03:47:21 pm »

way too few logic elements too:
Total logic elements 168 / 6,272 ( 3 % )
This is enough just for the sync generator and IO.

What was the logic element count when the project used to display text?

It's time to learn how to read the compiler reports, and see what is being left out. I have a feeling it may be connected to the odd error where you placed a 'REG' to fix the bug.

Can you zip it and send a copy. I believe it's a .html file.

Worst case, I may have to install Quartus on one of my PCs.

nockieboy · « **Reply #331 on:** November 12, 2019, 03:58:59 pm »

Not sure what's going on here in the compilation report:

Code: [Select]

Warning (14284): Synthesized away the following node(s):
	Warning (14285): Synthesized away the following RAM node(s):
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a8"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a9"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a10"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a11"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a12"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a13"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a14"
		Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a15"

Quartus isn't saving the compilaton report as a file - there's probably a setting somewhere for it - although there is an export option in the r/click menu that doesn't seem to work.

BrianHG · « **Reply #332 on:** November 12, 2019, 04:19:24 pm »

Quote from: nockieboy on November 12, 2019, 03:58:59 pm

Not sure what's going on here in the compilation report:

Code: [Select]
Warning (14284): Synthesized away the following node(s): Warning (14285): Synthesized away the following RAM node(s): Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a8" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a9" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a10" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a11" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a12" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a13" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a14" Warning (14320): Synthesized away node "vid_osd_generator:inst10|multiport_gpu_ram:gpu_RAM|gpu_dual_port_ram_INTEL:gpu_RAM|altsyncram:altsyncram_component|altsyncram_tnf2:auto_generated|ram_block1a15"
Quartus isn't saving the compilaton report as a file - there's probably a setting somewhere for it - although there is an export option in the r/click menu...

That part is just the rough overview. You should be able to enter a complete breakdown usage report where the compiler tells you everything it has done with every file in your project.

No memory should have been simplified away. Something fishy is happening.

Ok, try this one weird thing I used to have to do:

Inside "gpu_dual_port_ram_INTEL.v":

Code: [Select]

wire [7:0] sub_data_out_a;        // ***NEW***
wire [7:0] data_out_a = sub_data_out_a[7:0];      // ***NEW***
wire [7:0] sub_data_out_b;      // ***NEW***
wire [7:0] data_out_b = sub_data_out_b[7:0];      // ***NEW***

	altsyncram	altsyncram_component (
				.clock0 (clk),
				.wren_a (1'b1),
				.address_b (addr_b[ADDR_SIZE - 1:0]),
				.clock1 (clk_b),
				.data_b (data_in_b),
				.wren_b (wr_en_b),
				.address_a (addr_a[ADDR_SIZE - 1:0]),
				.data_a (8'b00000000),
				.q_a (sub_data_out_a),      // ***NEW******NEW******NEW******NEW***
				.q_b (sub_data_out_b),      // ***NEW******NEW******NEW******NEW***
				.aclr0 (1'b0),
				.aclr1 (1'b0),
				.addressstall_a (1'b0),
				.addressstall_b (1'b0),
				.byteena_a (1'b1),
				.byteena_b (1'b1),
				.clocken0 (1'b1),
				.clocken1 (1'b1),
				.clocken2 (1'b1),
				.clocken3 (1'b1),
				.eccstatus (),
				.rden_a (1'b1),
				.rden_b (1'b1));

ALSO: you have not set the ".defparam" for the multiport_gpu_ram gpu_RAM() in the osd generator, though, the sub_modules should have ended up using their default values.

I'm downloading QuartusPrime 18.1 now. How big is you project in .zip?

nockieboy · « **Reply #333 on:** November 12, 2019, 05:11:44 pm »

Quote from: BrianHG on November 12, 2019, 04:19:24 pm

Ok, try this one weird thing I used to have to do:

Inside "gpu_dual_port_ram_INTEL.v":
Code: [Select]
wire [7:0] sub_data_out_a; // ***NEW*** wire [7:0] data_out_a = sub_data_out_a[7:0]; // ***NEW*** wire [7:0] sub_data_out_b; // ***NEW*** wire [7:0] data_out_b = sub_data_out_b[7:0]; // ***NEW***

Quartus freaks with those declarations - I've had to change:

Code: [Select]

wire [7:0] data_out_a = sub_data_out_a[7:0];      // ***NEW***
wire [7:0] data_out_b = sub_data_out_b[7:0];      // ***NEW***

as they're already declared as outputs. Changed them to:

Code: [Select]

assign data_out_a = sub_data_out_a[7:0];      // ***NEW***
assign data_out_b = sub_data_out_b[7:0];      // ***NEW***

... which compiles, but still to a black screen.

Quote from: BrianHG on November 12, 2019, 04:19:24 pm

ALSO: you have not set the ".defparam" for the multiport_gpu_ram gpu_RAM() in the osd generator, though, the sub_modules should have ended up using their default values.

Yes, spotted that earlier but didn't think it'd matter as the sub-modules should use their default values.

Quote from: BrianHG on November 12, 2019, 04:19:24 pm

I'm downloading QuartusPrime 18.1 now. How big is you project in .zip?

Just over 7 MB - too big for the forum, unless I just archive the main files and ignore the sub-folders (db, greybox_tmp, incremental_db, output_files)?

BrianHG · « **Reply #334 on:** November 12, 2019, 06:13:31 pm »

I've confirmed that the 'gpu_dual_port_ram_INTEL.v' compiles fine as it's own project. See here:

As you can see, Quartus Prime reports 45% of your FPGA ram being used.

Next, I made a stand alone project with 'multiport_gpu_ram.v' as the top in it's own project, with the ''gpu_dual_port_ram_INTEL.v'' being accessed within. See here:

As you can see, Quartus Prime still reports 45% of your FPGA ram being used. So, there is nothing wrong with the 5 channel multiport module. Next, I'll try the OSD code.

And this is with the cmd_mux_out as a 'WIRE':

Code: [Select]

wire [15:0] cmd_mux_out;

Next, I'll try the OSD module. Things like this usually lie down with a typo, or, you forgot to 'ADD SOURCE FILES' to your project and make the top hierarchy you top .bdf file.

YUP: The error is in the OSD module. Give me a few minutes...

See, we dropped to 3% ram usage. Now, why? Something is amiss.

nockieboy · « **Reply #335 on:** November 12, 2019, 07:25:02 pm »

Clearly something up with the osd module...

I have the project zipped and in a shared dropbox folder - if you pm me your email, I'll add you to it so you can download the entire project.

BrianHG · « **Reply #336 on:** November 12, 2019, 07:27:14 pm »

Found it!

In the OSD generator, I had:

Code: [Select]

assign read_text_adr[19:10] = 1'h4;
....
assign read_font_adr[19:10] = 1'h2;

When it should have been:

Code: [Select]

assign read_text_adr[19:10] = 10'h4;        // my mistake, I has 1bit instead of 10bits
......
assign read_font_adr[19:10] = 10'h2;        // my mistake, I has 1bit instead of 10bits

The compiler was shorting all the higher addresses to GND. This means only the first [9:0] of x&y counters made it to the memory and Quartus said, hey, all other addresses are shorted to GND, so, I only need 8192 bits of ram...

There were a few other little mistakes like:
   //output reg [19:0] addr_host_out, I never specified this port to be made...

   .wr_en_b(write_ena_b), // **** error, you wrote (wr_en_b), it should be (write_ena_b)

   .data_out_b(data_host_out) // ****** error, you had this field empty.

and all this wasn't updated from my earlier post:

Code: [Select]

parameter   PIXEL_PIPE = 3;  // This externally set parameter defines the number of 25MHz pixels it takes to receive a new pixel from a presented address

localparam CLK_CYCLES_MUX = 1;	// adjust this parameter to the number of 'clk' cycles it takes to select 1 of 5 muxed outputs
localparam CLK_CYCLES_RAM = 2;	// adjust this figure to the number of clock cycles the DP_ram takes to retrieve valid data from the read address in
localparam CLK_CYCLES_PIX = 5;	// adjust this figure to the number of 125MHz clocks there are for each pixel, IE number of muxed inputs for each pixel

//  This parameter begins with the wanted top number of 125Mhz pixel clock headroom for the pixel pipe, then subtracts the additional 125MHz clocks used by the _MUX and _RAM cycles used to arrive at the first pixel out, DEMUX_PIPE_TOP position.
localparam  DEMUX_PIPE_TOP    =  (( (PIXEL_PIPE - 1) * CLK_CYCLES_PIX ) - 1) - CLK_CYCLES_MUX - CLK_CYCLES_RAM;

localparam MUX_0_POS = DEMUX_PIPE_TOP - 0;  // pixel offset positions in their respective synchronisation
localparam MUX_1_POS = DEMUX_PIPE_TOP - 1;	  // pipelines (where the pixels will be found in the pipeline
localparam MUX_2_POS = DEMUX_PIPE_TOP - 2;	  // when pc_ena[3:0]==0).
localparam MUX_3_POS = DEMUX_PIPE_TOP - 3;	  //
localparam MUX_4_POS = DEMUX_PIPE_TOP - 4;	//

// Now that we know the DEMUX_PIPE_TOP, we can assign the top size of the 3 pipe regs

reg [DEMUX_PIPE_TOP*8+7:0] data_pipe;
reg [DEMUX_PIPE_TOP*20+19:0] addr_pipe;
reg [DEMUX_PIPE_TOP*16+15:0] cmd_pipe;

always @(posedge clk) begin

// We also need to limit the pipe in the 3 ' <= '

	data_pipe[7:0] 	                   	<= data_mux_out[7:0];		// fill the first 8-bit word in the register pipe with data from RAM
	data_pipe[DEMUX_PIPE_TOP*8+7:1*8]	   <= data_pipe[ (DEMUX_PIPE_TOP-1) *8+7:0*8];	// shift over the next 9 words in this 10 word, 8-bit wide pipe
																	// this moves the data up one word at a time, dropping the top most 8 bits
	addr_pipe[19:0]	                  	<= addr_mux_out;
	addr_pipe[DEMUX_PIPE_TOP*20+19:1*20]	<= addr_pipe[ (DEMUX_PIPE_TOP-1) *20+19:0*20];
	
	cmd_pipe[15:0]	                     	<= cmd_mux_out[15:0];
	cmd_pipe[DEMUX_PIPE_TOP*16+15:1*16]	   <= cmd_pipe[ (DEMUX_PIPE_TOP-1) *16+15:0*16];

I've attached the latest verilog files for you to use. Also don't forget to update our top block diagram file with the OSD generator's new pipeline delay as it may still be 4. It might be best to re-generate symbol files and clear & re-insert the OSD generator in you block diagram to clear out any old junk.

There is only the matter of patching your 'mux' in the multiport ram module, however, you should now be getting a picture. If so, I'll explain the patch and everything should work.

nockieboy · « **Reply #337 on:** November 12, 2019, 08:30:30 pm »

Quote from: BrianHG on November 12, 2019, 07:27:14 pm

Found it!

Well done!

Quote from: BrianHG on November 12, 2019, 07:27:14 pm

...I've attached the latest verilog files for you to use. Also don't forget to update our top block diagram file with the OSD generator's new pipeline delay as it may still be 4. It might be best to re-generate symbol files and clear & re-insert the OSD generator in you block diagram to clear out any old junk.

No worries, all done. Was so excited I rushed to see it working and got the black screen again...

Quote from: BrianHG on November 12, 2019, 07:27:14 pm

There is only the matter of patching your 'mux' in the multiport ram module, however, you should now be getting a picture. If so, I'll explain the patch and everything should work.

Nope - no picture - still just a black screen. $:-\$

BrianHG · « **Reply #338 on:** November 13, 2019, 07:42:29 am »

@nockieboy,

I'M GONNA MURDER YOU!!!!!!!

Ok, you are god damn lucky I still have a Quartus 9.1 full install, with it's built in high speed logic simulator.

Let's begin.
Step 1. Simulate the 'gpu_dual_port_ram_INTEL.v'.
I made a project with only the ram with the .mif file like so:

I setup a simulation feeding the above inputs to see what the outputs would look like when beginning the read address just before the ASCII text begins where you have stored 0,1,2,3,4,5,...:

So far, the read of the data looks fine.

Step 2. Simulate the 'multiport_gpu_ram.v'.
I made the project feed the clock, ena, all 4 addresses, all 4 cmds, even the 'host' address & it's read results. (I used a 484 pin Cyclone III to get the IOs)
See in green the 'host' address wiring is taken from the second addr_in.

This is what the simulator gave me.

WTF? Addr ports 1 and 2 seem read to data out properly, yet, the reading of port addr4' data out has all 0s. Also, a weird thing, the 'HOST' data out, which is also fed the addr2 reads the right data, BUT, it suddenly goes 'UU' (undefined), then goes back to 0.

Without that 'HOST' data out, seeing the data being erased, I would have never found the bug.
In you code, you have:

Code: [Select]

// ****************************************************************************************************************************
// Dual-port GPU RAM
// 
// Port A 				- read only by GPU
// Port B 				- read/writeable by host system
// Data buses 			- 8 bits / 1 byte wide
// Address buses 		- ADDR_SIZE wide (14 bits default)
// Memory word size 	- NUM_WORDS (16384 bytes default)
// ****************************************************************************************************************************
	altsyncram	altsyncram_component (
				.clock0 (clk),
				.wren_a (1'b1),  ************************F--K******************
				.address_b (addr_b[ADDR_SIZE - 1:0]),
				.clock1 (clk_b),
				.data_b (data_in_b),
				.wren_b (wr_en_b),
				.address_a (addr_a[ADDR_SIZE - 1:0]),
				.data_a (8'b00000000),
				.q_a (data_out_a),
				.q_b (data_out_b),
				.aclr0 (1'b0),
				.aclr1 (1'b0),
				.addressstall_a (1'b0),
				.addressstall_b (1'b0),
				.byteena_a (1'b1),
				.byteena_b (1'b1),
				.clocken0 (1'b1),
				.clocken1 (1'b1),
				.clocken2 (1'b1),
				.clocken3 (1'b1),
				.eccstatus (),
				.rden_a (1'b1),
				.rden_b (1'b1));
				
	defparam

.wren_a (1'b1), **********F--K***********

You made the Write Enable for the read address forced on!!!!!!!!
Every address we sent to be read was instead written clear to all 0's.

It gets worse, in you mux algorithm in the 'multiport_gpu_ram.v' module, you did this:

Quote

   // perform 5:1 mux for all inputs to the dual-port RAM
   case (pc_ena_in[2:0])
      3'b000 : begin //******** Excellent, this is state 0 and you made the case 3'b000 which equals 0.
                  addr_in_mux <= addr_in_0;
                  cmd_mux_in <= cmd_in_0;
               end
      3'b001 : begin //******** Excellent, this is state 1 and you made the case 3'b001 which equals 1.
                  addr_in_mux <= addr_in_1;
                  cmd_mux_in <= cmd_in_1;
               end
      3'b011 : begin //******** Hun? What? This is state 2 and you made the case 3'b011 which equals 3?
                  addr_in_mux <= addr_in_2;
                  cmd_mux_in <= cmd_in_2;
               end
      3'b100 : begin //******** Hun? What? This is state 3 and you made the case 3'b100 which equals 4?
                  addr_in_mux <= addr_in_3;
                  cmd_mux_in <= cmd_in_3;
               end
      3'b101 : begin //******** Hun? What? This is state 4 and you made the case 3'b101 which equals 5?
                  addr_in_mux <= addr_in_4;
                  cmd_mux_in <= cmd_in_4;
               end
   endcase

After these 2 fixes, the simulation looks like this:

This is a zoom out of the simulation. As you can see, I'm reading 5 different ports at 5 different addresses simultaneously, with all the outputs coming in parallel.

Now, for the last little bit. When performing the 'mux' the only addition I wanted to do was 'snap' all the address_# and cmd_# inputs at (pc_ena_in==0), then feed those latched results to the ram. To do this, here is the simple addition I made to your 'case' statement in the 'mux'.

Code: [Select]

	case (pc_ena_in[2:0])
		3'b000 : begin
						addr_in_mux <= addr_in_0;  // Send the first, #0 addr & cmd to the memory module.
						cmd_mux_in <= cmd_in_0;
						
						addr_lat_1 <= addr_in_1;  // latch all addr_in_# in parallel
						addr_lat_2 <= addr_in_2;
						addr_lat_3 <= addr_in_3;
						addr_lat_4 <= addr_in_4;

						cmd_lat_1  <= cmd_in_1;  // latch all cmd_in_# in parallel
						cmd_lat_2  <= cmd_in_2;
						cmd_lat_3  <= cmd_in_3;
						cmd_lat_4  <= cmd_in_4;
						
					end
		3'b001 : begin
						addr_in_mux <= addr_lat_1; //  Send the latched, #1 addr & cmd to the memory module.
						cmd_mux_in  <= cmd_lat_1;
					end
		3'b010 : begin   
						addr_in_mux <= addr_lat_2; //  Send the latched, #2 addr & cmd to the memory module.
						cmd_mux_in  <= cmd_lat_2;
					end
		3'b011 : begin    
						addr_in_mux <= addr_lat_3; //  Send the latched, #3 addr & cmd to the memory module.
						cmd_mux_in  <= cmd_lat_3;
					end
		3'b100 : begin    
						addr_in_mux <= addr_lat_4; //  Send the latched, #4 addr & cmd to the memory module.
						cmd_mux_in  <= cmd_lat_4;
					end
	endcase

I've attached the latest source verilog files. Setting up 2 different Quartus' and preparing the simulation stimulus, plus finding that "Write Enable = 1" was a good 5 hours out of my day.

If you don't get a picture now, I will have no choice but to simulate your entire OSD project.

BrianHG · « **Reply #339 on:** November 13, 2019, 08:03:33 am »

@nockieboy, go here:

Code: [Select]

https://www.intel.com/content/www/us/en/programmable/downloads/software/quartus-ii-we/91sp2.html
And download QuartusII V9, sp2 and install it. Don't worry, you can have 2 different Quartuses in your system at 1 time.

We'll use it's built in 'Quick' simulator tool to create the graphics address generator as we do not want to create the same hidden problem as above.

Note that you'll just use that Quartus to edit and simulate your verilog.v file, selecting a huge IO pincount Cyclone III, while you may use Quartus Prime to build your chip, or, that Quartus as it does support Cyclone IV, just not with simulation support.

I don't have time to teach you how to use the new Modelsim and the old abandoned forgotten built in Quartus simulator is just too fast and easy to setup/manipulate inputs and compile instantly when in 'Functional' logic mode. Yes, the timing mode is still very quick for a small project like yours.

BrianHG · « **Reply #340 on:** November 13, 2019, 08:37:38 am »

Ooops, 1 typo: In 'vid_osd_generator.v', fix this one line #99:
------------------------------------------------
gpu_RAM.PIXEL_PIPE = 2; // set the length of the pixel pipe to offset multi-read port sequencing
------------------------------------------------
It said gpr_RAM instead of gpu_RAM.

nockieboy · « **Reply #341 on:** November 13, 2019, 09:26:43 am »

Quote from: BrianHG on November 13, 2019, 07:42:29 am

@nockieboy, I'M GONNA MURDER YOU!!!!!!!

Quote from: BrianHG on November 13, 2019, 07:42:29 am

Let's begin...

...You made the Write Enable for the read address forced on!!!!!!!!
Every address we sent to be read was instead written clear to all 0's.

So sorry about the write enable setting. That's because I'm used to LOWs being the active state with all the Z80 work I've been doing recently - I have to stop and check every time I look at a 1'b0 or 1'b1 now.

Thank you for persevering and finding the issues.

Quote from: BrianHG on November 13, 2019, 07:42:29 am

It gets worse, in you mux algorithm in the 'multiport_gpu_ram.v' module, you did this:
Quote
   // perform 5:1 mux for all inputs to the dual-port RAM
   case (pc_ena_in[2:0])
      3'b000 : begin //******** Excellent, this is state 0 and you made the case 3'b000 which equals 0.
                  addr_in_mux <= addr_in_0;
                  cmd_mux_in <= cmd_in_0;
               end
      3'b001 : begin //******** Excellent, this is state 1 and you made the case 3'b001 which equals 1.
                  addr_in_mux <= addr_in_1;
                  cmd_mux_in <= cmd_in_1;
               end
      3'b011 : begin //******** Hun? What? This is state 2 and you made the case 3'b011 which equals 3?
                  addr_in_mux <= addr_in_2;
                  cmd_mux_in <= cmd_in_2;
               end
      3'b100 : begin //******** Hun? What? This is state 3 and you made the case 3'b100 which equals 4?
                  addr_in_mux <= addr_in_3;
                  cmd_mux_in <= cmd_in_3;
               end
      3'b101 : begin //******** Hun? What? This is state 4 and you made the case 3'b101 which equals 5?
                  addr_in_mux <= addr_in_4;
                  cmd_mux_in <= cmd_in_4;
               end
   endcase

I can count in binary, honest!

It's really difficult for me to maintain a good focus on this when I'm working around it, doing bits here and there, then coming back to it later. I can only mark this mistake up to some earlier code I'd written when I made case statements for all 8 combinations of the 3-bit pc_ena value, before you reminded me that it resets at 4, and not checking the conditions when I deleted the extra states.

Okay, I've made the additions to the code, fixed the stupid errors, and almost got it to compile.

In vid_osd_generator.v:

Code: [Select]

defparam gpu_RAM.ADDR_SIZE = 14,		// pass ADDR_SIZE into the gpu_RAM instance
         gpu_RAM.PIXEL_PIPE = 2;		// set the length of the pixel pipe to offset multi-read port sequencing

I was getting negative indexes in the demux pipe in multiport_gpu_ram.v. The pipeline wasn't deep enough, so I upped gpu_RAM.PIXEL_PIPE value to 3 and it compiles now.

This is what I'm getting on the screen now...

nockieboy · « **Reply #342 on:** November 13, 2019, 10:25:38 am »

Here's my workspace:

BrianHG · « **Reply #343 on:** November 13, 2019, 10:52:11 am »

Don't threat too bad, FPGAs are a new experience for those who buy CPU and wire and software program them.
- get used to wiring everything as POSITIVE logic.
- if you want a negative input or output, place an 'EXP' by the final pin.
- if you can't count binary eg " 4'b1010 ", instead use hex " 4'h0A " or decimal " 4'd10' ".

There were numerous other errors. Today has turned into a non-stop 10 hour debug fest. Nothing wrong with the architecture, it's just when I correcting what you have written, I have no experience on how you typed everything in.

Here is why your text is scrambled:
------------------------------------------------------------
.addr_in_0(read_font_adr[19:0]),
.addr_in_1(read_text_adr[19:0]),
-------------------------------------------------------------
The text and font was backwards.

This is what my full simulation setup looks like (Yes, full OSD output now.)

There were a few other little patches. Please don't change any of the pixel pipe settings until we get clean text. A pipe error would just shave a pixel on one side of the letters.

Now, I've attached my Quartus project to see if it opens in Prime.
To make it work on you PCB, all you need to do is change the FPGA to your FPGA IC#, enter the pin numbers. (Delete my test unused IOs), and add a PLL. You can always install QuartusIIv9sp2 since it supports Cyclone IV, and, it's a lot faster.

BrianHG · « **Reply #344 on:** November 13, 2019, 11:08:29 am »

I just opened my 'GPU_OSD_Sim_q2v9.zip ' project in Quartus Prime 18.1.

During the open, it asks you to switch to a Cyclone IV or Max10 FPGA chip.
On the top hierarchy, only the parameter boxes need to be adjusted in size to show all the variables due to the system font being different.

Lets work with this version as it is a clean virgin with no old crap lingering around.
(You can tell by the .zip file size.)

Don't forget to set your IO pins and add a PLL to get 125MHz.

LOL, QuartusIIv9sp2 = 15 seconds for compile.
Quartus Prime = 40 seconds for compile.

BrianHG · « **Reply #345 on:** November 13, 2019, 12:02:38 pm »

Ive added the PLL and changed the project to a Cyclone IV IC, I believe it's the one on your dev board.
You still need to assign the IO pins.
I also setup the timing analyzer so it give you valid results.

Do not worry that I set the clock to 54Mhz, technically, you are just slightly under-clocking the design for now, but the PLL has been setup so that our builds are capable of driving DDR-IOs to transmit a home built HDMI serdes directly.

LOL, my architecture is so well laid out, even though we asked for 135MHz, quartus says it can achieve 238MHz, and this is on a -c8, the slowest version of the FPGA.

LOL, the C6 with my code will run at 315MHz....

nockieboy · « **Reply #346 on:** November 13, 2019, 12:06:11 pm »

Ookay... switched to the clean project now, added a PLL and just had to change a timing error in the sync generator on line 67:

Code: [Select]

	if (pc_ena[3:2] == 0)	// once per pixel

... was too slow for the monitor - changed it to pc_ena[2:0] and I'm getting this picture now:

BrianHG · « **Reply #347 on:** November 13, 2019, 12:27:16 pm »

Quote from: nockieboy on November 13, 2019, 12:06:11 pm

Ookay... switched to the clean project now, added a PLL and just had to change a timing error in the sync generator on line 67:

Code: [Select]
if (pc_ena[3:2] == 0) // once per pixel
... was too slow for the monitor - changed it to pc_ena[2:0] and I'm getting this picture now:

(Attachment Link)

Arrrrgggg this:
if (pc_ena[3:2] == 0) // once per pixel
Everywhere?

How?

Ok, patching everything...
Here is the patched project.
When uploading you latest xxxx.v files on the forum, please make sure they are actually the latest versions.
I feel as if I'm revisiting errors patched long ago...

Now, I hope the text come out correct. The simulation on my side has the right horizontal size for the text box. Only an addressing error on the font address would mean garbled up letters (since I only get an oscilloscope waveform), or, the conversion from the 8bit wide font into a 1 bit pixel.

BrianHG · « **Reply #348 on:** November 13, 2019, 12:33:31 pm »

Quote from: nockieboy on November 13, 2019, 12:06:11 pm

Ookay... switched to the clean project now, added a PLL and just had to change a timing error in the sync generator on line 67:

Code: [Select]
if (pc_ena[3:2] == 0) // once per pixel
... was too slow for the monitor - changed it to pc_ena[2:0] and I'm getting this picture now:

(Attachment Link)

Look carefully, your Microcom text is there, it's just the FONT which isn't being rendered properly.

BrianHG · « **Reply #349 on:** November 13, 2019, 04:32:52 pm »

Again, half old code, half new code error.....

This code was hald the original Atari 2 bit font and half the 8bit atari font code squished together.

Code: [Select]

assign font_pos[12:6]   = letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]	= dly3_disp_y[3:1] ;  // select the font x coordinate with a 2 pixel clock DELAYED disp_x address.  [3:1] is used so that every 2 x lines are repeats
assign font_pos[5:3]	= dly3_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

It should be this way, and you had it this way since you were able to display the red text:

Code: [Select]

assign font_pos[9:3]   = letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]	= dly3_disp_y[3:1] ;  // select the font x coordinate with a 2 pixel clock DELAYED disp_x address.  [3:1] is used so that every 2 x lines are repeats

Please, Please, Please, when doing corrections, please post you latest updates. It's taking hours to find something I though was already working just to find that the last xxxxx.v source code you posted has bugs in it, yet you had a fine picture back with the 8bit Atari font in red...

Please be more careful in the future. I should not be fixing these problems as I feel like I'm giving you wrong directions, yet my layout is fine.

From now on, lets star using revision numbers and dates for each verilog module. The next stuff coming up isn't as huge, but a single mistake to our existing code will lead to hours of debugging not knowing where the problem originates.

(This took hours to proof by simulation....)
Make this attachment your new project and if it works and you are ready, we will tackle the universal pixel pointing read address generator.

(If everything works and you are waiting for me, think about getting the 4 bit RGB dac working...)
Also, install:

Code: [Select]

https://www.intel.com/content/www/us/en/programmable/downloads/software/quartus-ii-we/91sp2.htmlYour going to need it for engineering the address generator.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 426364 times)

Share me