FPGA VGA Controller for 8-bit computer

#775 Reply
Posted by BrianHG on 25 Dec, 2019 18:41
Quote from: nockieboy on 25 Dec, 2019 15:34
Wow - thank you so much!!

Okay, so I've got some documentation to do and some work on the Z80 interface (the mux with the RS232_debugger interface is nowhere near done yet) and it's done.

Merry Christmas to you too!! (and anyone else reading this!)

(Attachment Link)

(Attachment Link)

In the image above, the white fade-out on the right-side of the Z80 image isn't a photographic artefact - could that be caused by capacitance in the output wires?

Check your IOs as I have made the outputs 24 bits instead of 12 bits. You only want the upper 12 bits [7:4] wired to the DAC pins. Maybe check IO assignments as you may be using the lower IOs or having new adjacent IOs new interfering with the video.

#776 Reply
Posted by nockieboy on 25 Dec, 2019 19:07
Quote from: BrianHG on 25 Dec, 2019 18:29
Strange. Since I'm using a real DAC, I see nothing but a perfect image on my end. Maybe something happened with the IO current strength settings, or, too much on screen for the the method of the analog dac.

When you have time, see if you can get a scope shot.

Will check IOs when I get the time - it probably doesn't help that I've got wires from the FPGA board to the resistor DAC breadboard, then more wires to the VGA connector. I'm waiting on a couple of parts to arrive in the post then I'll have a proper PCB-based 24-bit DAC up and running.

Quote from: BrianHG on 25 Dec, 2019 18:29
BTW, I now have all 15 layers working now on my side, but, your Cyclone IV is too small. Maybe I will come up with an idea. 1 Cyclone larger and 15 would be no problem on your side too as you are ran out of logic cells and ram needs to shrink to fit the design as well.

Hmm.. I knew this Cyclone IV board was cheap, but it's got the smallest Cyclone IV imaginable on it!

Well, I'm guessing the Lattice LFE5U won't have these issues.

Quote from: BrianHG on 25 Dec, 2019 18:41
Check your IOs as I have made the outputs 24 bits instead of 12 bits. You only want the upper 12 bits [7:4] wired to the DAC pins. Maybe check IO assignments as you may be using the lower IOs or having new adjacent IOs new interfering with the video.

Hmm.. will go check this in a bit.

#777 Reply
Posted by nockieboy on 25 Dec, 2019 20:18
Quote from: BrianHG on 25 Dec, 2019 18:41
Check your IOs as I have made the outputs 24 bits instead of 12 bits. You only want the upper 12 bits [7:4] wired to the DAC pins. Maybe check IO assignments as you may be using the lower IOs or having new adjacent IOs new interfering with the video.

Yeah, that's an improvement. I can see the HV_trigger lines around edges now, too. There is still a very faint amount of 'runoff' from the white background on the sprite into the black background behind it, but nowhere near as noticeable as before. I'm no electronics expert, but I'd be happy to bet that once I've got a less 'cobbled-together' DAC created, that runoff will disappear.

Thanks again, BrianHG. Hope you're taking a break over the holiday season?!!

#778 Reply
Posted by BrianHG on 25 Dec, 2019 22:41
Quote from: nockieboy on 25 Dec, 2019 19:07
Hmm.. I knew this Cyclone IV board was cheap, but it's got the smallest Cyclone IV imaginable on it!

Well, I'm guessing the Lattice LFE5U won't have these issues.

Quote from: BrianHG on 25 Dec, 2019 18:41
Check your IOs as I have made the outputs 24 bits instead of 12 bits. You only want the upper 12 bits [7:4] wired to the DAC pins. Maybe check IO assignments as you may be using the lower IOs or having new adjacent IOs new interfering with the video.

Hmm.. will go check this in a bit.

Well, 15 layers as the current project stands takes 11K logic gates and your CycloneIV has 6K.
To turn back on transparency between a number of layers, add 1K-2k more gates as it is all a bit table of 3x3x3 multiply adds.

I just compiled the project with the LFE5U settings as my old CycloneIII EP3C55 has the gates and ram for the job. It takes 17% of the logic elements and 76% of the ram bits with over 220kilobytes of graphics ram. The Lattice LFE5U is 10% smaller FPGA, so no problem there.

With the addition of a 1$ DDR ram chip on you PCB, you can reserve 1 MAGGIE channel to that ram controller and get a 32 bit graphics layer with 16mb of addressable space. Making a DDR ram controller which only has to deal with a single 27million pixels a second read and a Z80 at 2-8 million transactions a second is a joke compared to the all the cross-reads going between all the other 14 MAGGIEs which read all random memory locations every-which-way as I designed the core to do so.

Basically you would treat the onboard 220kb as texture and sprite memory as the bulk DDR ram would be for background and swapping of large chunks of additional animation graphics.

The rest of the FPGA logic would be 75% empty for a cool hardware accelerated drawing engine.

#779 Reply
Posted by nockieboy on 25 Dec, 2019 23:37
Quote from: BrianHG on 25 Dec, 2019 22:41
Well, 15 layers as the current project stands takes 11K logic gates and your CycloneIV has 6K.
To turn back on transparency between a number of layers, add 1K-2k more gates as it is all a bit table of 3x3x3 multiply adds.

I'm making a start on designing a custom board for an LFE5U - it'll be a few months and probably a few test runs before I get to attaching a BGA to a custom board, but once that's done it seems the sky is the limit.

Quote from: BrianHG on 25 Dec, 2019 22:41
With the addition of a 1$ DDR ram chip on you PCB, you can reserve 1 MAGGIE channel to that ram controller and get a 32 bit graphics layer with 16mb of addressable space. Making a DDR ram controller which only has to deal with a single 27million pixels a second read and a Z80 at 2-8 million transactions a second is a joke compared to the all the cross-reads going between all the other 14 MAGGIEs which read all random memory locations every-which-way as I designed the core to do so.

DDR RAM, not the SDRAM that I've got on the dev board at the moment? Might have to look into that - very tempting.

Quote from: BrianHG on 25 Dec, 2019 22:41
Basically you would treat the onboard 220kb as texture and sprite memory as the bulk DDR ram would be for background and swapping of large chunks of additional animation graphics.

The rest of the FPGA logic would be 75% empty for a cool hardware accelerated drawing engine.

Sounds awesome. I wish I had 10% of the knowledge you have with FPGAs.. but then I'd be in a different career, I guess.

Random question - would there be any mileage in a hardware scrolling capability for text mode? Specifically, I was thinking of using the sub-pixel offset capabilities to scroll text upwards (or downwards) when a new line is written to a single-line buffer which would be 'off the top or bottom of screen' and not visible? Or having a 'viewport sub-pixel' setting, to allow the entire screen (of text) to be smoothly scrolled by offsetting the viewport up to a single line of text? Too much work for little payoff?

#780 Reply
Posted by BrianHG on 26 Dec, 2019 00:19
Quote from: nockieboy on 25 Dec, 2019 23:37
Random question - would there be any mileage in a hardware scrolling capability for text mode? Specifically, I was thinking of using the sub-pixel offset capabilities to scroll text upwards (or downwards) when a new line is written to a single-line buffer which would be 'off the top or bottom of screen' and not visible? Or having a 'viewport sub-pixel' setting, to allow the entire screen (of text) to be smoothly scrolled by offsetting the viewport up to a single line of text? Too much work for little payoff?

It's all there. If you hadn't noticed, I literally finished your graphics engine. Sup-pixel scrolling is there. As well as modifying the base address to shift 1 character at a time, or a line at a time. Including the source bitmap/raster 'width' setting which allows you to set a display width of up to 65535 characters wide. This is how side scrolling video games like Super Mario Bros is where the backdrop is nothing more that character tiles from a font memory with a horizontal display set larger than the width of the screen. With what you have now, memory limiting of course, you can use 16 bit characters for a 4096 character font of 4x8 or 4x16 pixels, or 8x8/8x16 pixels to make a huge display of regular repeated items. Or, you can use multiple layers in graphics modes, 1 for a bottom background, next layer for onscreen animated objects and another layer or 2/3.../10 for player sprites.

Even the soft blending in the palette mixer is there, I just removed the multiplies in place for a layer switch, but, all the controls and assigns are present.

Your just going to have to learn play and re-document every control to date to get a feel for what is possible.
Though I specified all the control names and they are all organized into defparams in the beginning of the MAGGIE & BART, you still need to fill the ram with usable image data and play with all the controls to see what happens.

#781 Reply
Posted by BrianHG on 27 Dec, 2019 07:46
How to generate rich compact code in C++17 for x86, though the code example here is also finally translated into 6502 for the Commodore C64, the example output given here is assembled into x86 code, and the presenter shows you how to generate Z80 driven bitmap graphics and color tables using the right C++17 headers in the compiler to make the tiniest, most compact Z80 assembly code possible which directly addresses memory in connected hardware. He gets the C++17 to actually synthesize bitplane data from arrays and encode them into the smallest x86 assembly possible. You would just need to get a x86 to Z80 converter, or a direct Z80 output of the C++ compiler.

#782 Reply
Posted by BrianHG on 30 Dec, 2019 07:16
Ok, here is the GPU, with the core ram at 250MHz. Though I selected the -C7 FPGA, in -C8 it is so close to 250MHz, it should still work fine. (My Cyclone III -C8 works fine and it compiles with even a slightly slower FMAX (220MHz) than the Cyclone IV -C8 (238MHz) The -C7 compiles with a 270MHz FMAX.)

Here are the changes:
Code: [Select]
Latest memory layout. 24576 = size of current GPU ram 0 - 511 = All HW_Regs shared with generic GPU RAM. 00 thru 07 = H&V triggers for 4 yellow test cursors 12..15 = H&V triggers which is the H&V reset coordinates for all 15 MAGGIE_Layer#s 16..19 + 2*MAGGIE_Layer# = H&V Top left edge of each MAGGIE_Layer# window. 96 + 16*MAGGIE_Layer# = 16 byte controls for each of the 15 MAGGIE layers. 512 - 4607 = Default IBM VGA 8x16 Font 4608 - 7007 = Default ASCII text buffer, 80 characters x 30 lines (MAGGIE 0&1) 7008 - 8207 = Color text buffer, 2 bytes per character, 40 characters x 15 lines. (MAGGIE 2&3) 13056 - 24575 = 16 color Z80 CPU graphic image. (Rendered at 3 sizes across 3 different palettes, MAGGIE 4,5,6) 31744 - 32255 = Primary palette, ARGB 4444 style. 32256 - 32767 = Secondary palette, RGB 565 style. New GPU project parameters: NUM_LAYERS = 2 through 15 = 2 layers through 15 layers. PALETTE_ADDR = Sets the base address for the 2 palettes. This one is automatically set to (2**ADDR_SIZE - 1024), so the palettes are the last 1024 bytes.
Though you lost a little ram because of the all 16bit core, and memory allocation size parameter, you now have 7 active layers. And with this final code, all you need to do is change the parameter setting on the block diagram to 15 to get 15 layers as well as increase memory size once you get a larger FPGA as the HW Regs is now at the bottom of the ram, it will always powerup to the right default settings.

The only thing missing is the semi-translucent layer feature which I will finish off tomorrow. However, turning the feature on may eat more logic cells, lowering your layer count since now you are at 94% utilization with 7 layers.

All the other changes means I also had to update the RS232_Debugger files.
Send photos of the powerup test screen once compiled. I hope it works.

GPU_7layers_250MHz_core.zip

RS232_Debugger_with16bit_mif_generator.zip

#783 Reply
Posted by nockieboy on 30 Dec, 2019 09:15
Quote from: BrianHG on 30 Dec, 2019 07:16
Ok, here is the GPU, with the core ram at 250MHz. Though I selected the -C7 FPGA, in -C8 it is so close to 250MHz, it should still work fine. (My Cyclone III -C8 works fine and it compiles with even a slightly slower FMAX (220MHz) than the Cyclone IV -C8 (235MHz) The -C7 compiles with a 270MHz FMAX.)

That looks great! Well done!!

Quote from: BrianHG on 30 Dec, 2019 07:16
Though you lost a little ram because of the all 16bit core, and memory allocation size parameter, you now have 7 active layers. And with this final code, all you need to do is change the parameter setting on the block diagram to 15 to get 15 layers as well as increase memory size once you get a larger FPGA as the HW Regs is now at the bottom of the ram, it will always powerup to the right default settings.

I've just scanned through the changes - I'm liking the vid_osd_generator parameters and starting GPU RAM contents file. It's all working, as the screenshot above shows. Thank you so much for this!

Quote from: BrianHG on 30 Dec, 2019 07:16
The only thing missing is the semi-translucent layer feature which I will finish off tomorrow. However, turning the feature on may eat more logic cells, lowering your layer count since now you are at 94% utilization with 7 layers.

I can manage without that for the moment until I get a bigger FPGA sorted - I've been working on getting the Z80_bridge and data_mux modules finished (though that sounds like I've actually had any time to work on them - I've basically just sorted out the wiring for them so far); when I get more time I'll see about getting a Lattice dev board done.

Thanks again.

#784 Reply
Posted by BrianHG on 30 Dec, 2019 09:53
It's a close match to my board's display on a CRT...

#785 Reply
Posted by BrianHG on 30 Dec, 2019 10:14
DVI output on LCD version:

In person, the CRT obviously looks best!

#786 Reply
Posted by nockieboy on 30 Dec, 2019 11:03
Quote from: BrianHG on 30 Dec, 2019 10:14
In person, the CRT obviously looks best!

Seems you can't beat quality CRT output.

It's a quiet morning here at the moment, so I've just soldered together my 12-bit resistor DAC since the SSOP 74HCT541s I ordered arrived this weekend. Have gone from this:

... to this:

... and the run-off in the white areas has pretty much disappeared. It plugs straight into the dev board, so no annoying jumper cables anymore. Just waiting on an ADV7125 to arrive in the post and I'll have a 24-bit DAC to use in its place as well.

#787 Reply
Posted by nockieboy on 30 Dec, 2019 13:01
I'm trying to make some progress on the data_mux to gate the RS232 and Z80_bridge I/O to the GPU RAM. I'm making baby steps, and testing as I go, but something really odd is going on.

In the attached project, the RS232_debugger should NOT be able to read or write to the GPU RAM. I've only attempt to implement reading anyway, but the data_mux should be disabled, yet when I test the project it works normally (reading AND writing)...

Note that this isn't the latest version of the project, but it is the latest version of the data_mux work.

GPU_Z80_dev.zip

#788 Reply
Posted by BrianHG on 30 Dec, 2019 19:04
Quote from: nockieboy on 30 Dec, 2019 13:01

I'm trying to make some progress on the data_mux to gate the RS232 and Z80_bridge I/O to the GPU RAM. I'm making baby steps, and testing as I go, but something really odd is going on.

In the attached project, the RS232_debugger should NOT be able to read or write to the GPU RAM. I've only attempt to implement reading anyway, but the data_mux should be disabled, yet when I test the project it works normally (reading AND writing)...

Note that this isn't the latest version of the project, but it is the latest version of the data_mux work.
Ok, you made a few booboos. Take a look at the changes I made on the main .bdf and inside your bridge.

You have too many cross versions of everything going around....
You will eventually need to upgrade everything to my 7-layer MAGGIE version as it is not backwards compatible as you have things now...

This means re-editing the top .bdf as well.

GPU_Z80_dev_patches.zip

#789 Reply
Posted by nockieboy on 30 Dec, 2019 20:01
Quote from: BrianHG on 30 Dec, 2019 19:04
You have too many cross versions of everything going around....
You will eventually need to upgrade everything to my 7-layer MAGGIE version as it is not backwards compatible as you have things now...

This means re-editing the top .bdf as well.

Yeah, I'd already made a start on the bridge and mux module modifications when you dropped the 7-layer project, so I shunted my existing project to the side with the intention of working on the bridge/mux portion and porting their .v files over to the 7-layer project once they're done.

I'll go take a look at the mods you've made - thanks very much.

#790 Reply
Posted by nockieboy on 30 Dec, 2019 20:34
Thanks for clarifying the DFFs on the RS232 output - I was going to use them (or another set) on the mux output to the vid_osd_generator, but I guess I won't need them as the mux runs at system clock speed?

As for the mux code - ENABLED is only there whilst I was trying to debug the RS232_debugger still being able to communicate with the GPU RAM, I'll remove that - or make it specific to the RS232 module.

Thanks for updating the code - I'd have got there eventually! Port A code will just be a copy-pasta of the existing two if-else-end blocks with vars renamed appropriately?

#791 Reply
Posted by BrianHG on 30 Dec, 2019 23:47
What you should better do is make 2 wires, run_portA and run_portB.

Assign a set of rules for each wire with simple Boolean terms.

And use those wires to drive the 1-2 if statements and read/write request flags...

Remember to set and clear busy status for each port in the 'if's, use those as status in the boolean selection and make sure the Z80 get's first choice/priority over the RS232.

#792 Reply
Posted by nockieboy on 31 Dec, 2019 10:51
Quote from: BrianHG on 30 Dec, 2019 23:47
What you should better do is make 2 wires, run_portA and run_portB.

Assign a set of rules for each wire with simple Boolean terms.

And use those wires to drive the 1-2 if statements and read/write request flags...

Hmm.. not sure I follow.. run_portA and run_portB would just reflect the existing 'if' conditions?

Quote from: BrianHG on 30 Dec, 2019 23:47
Remember to set and clear busy status for each port in the 'if's, use those as status in the boolean selection and make sure the Z80 get's first choice/priority over the RS232.

The read-ready flags are handled by the sequencer pipelines and assign statements. It's just the write flag to the GPU that is shared between the two ports and needs a little extra logic to enable the inclusion of port A? (Would it make sense to have two gpu_wr_ena's, one for each port, that are OR'd together to make the existing gpu_wr_ena output?) Otherwise it'll be set to zero each clock cycle by the port B 'else' condition in the write if.. block.

No doubt you're describing a more efficient way of doing what I was going to do by just copying the existing port B if.. statements, but I can't see it at the moment.

#793 Reply
Posted by BrianHG on 31 Dec, 2019 11:01
Does this help any?

wire run_r_porta, run_r_portb,run_w_porta, run_w_portb;

assign run_r_porta = rd_req_a && ~portb_bsy;
assign run_w_porta = wr_ena_a && ~portb_bsy;
assign run_r_portb = rd_req_b && ~porta_bsy && ~run_r_porta && ~run_w_porta;
assign run_w_portb = wr_ena_b && ~porta_bsy && ~run_r_porta && ~run_w_porta;

Note that there is no latching or latching logic here. Everything is realtime.
(Hint, since port B, the RS232 is running at 50MHz, it's write and read request will always be at least 2 or more clock cycles long. In fact, the way I programmed it, you can multiply that figure by a good 3x.)

#794 Reply
Posted by nockieboy on 31 Dec, 2019 11:20
Quote from: BrianHG on 31 Dec, 2019 11:01
Does this help any?

wire run_r_porta, run_r_portb,run_w_porta, run_w_portb;

assign run_r_porta = rd_req_a && ~portb_bsy;
assign run_w_porta = wr_ena_a && ~portb_bsy;
assign run_r_portb = rd_req_b && ~porta_bsy && ~run_r_porta && ~run_w_porta;
assign run_w_portb = wr_ena_b && ~porta_bsy && ~run_r_porta && ~run_w_porta;

Note that there is no latching or latching logic here. Everything is realtime.

Ah okay, thanks.

I've copied the data_mux.v and Z80_bridge.v files from the older project that you updated yesterday into the current project (7-layer one), so I'm working on the mux from the current project now.

Code: [Select]
gpu_wr_ena <= run_w_porta || run_w_portb;
Would the above be okay in the always loop?

#795 Reply
Posted by nockieboy on 31 Dec, 2019 11:24
Code: [Select]
always @ (posedge clk) begin gpu_wr_ena <= run_w_porta || run_w_portb; porta_bsy <= run_r_porta || run_w_porta; portb_bsy <= run_r_portb || run_w_portb; if (run_r_portb) begin rd_sequencer_b[9:0] <= { rd_sequencer_b[8:0], 1'b1 }; gpu_address <= address_b; end else begin rd_sequencer_b[9:0] <= { rd_sequencer_b[8:0], 1'b0 }; // this line must always run no matter any other state end if (run_w_portb) begin gpu_address <= address_b; gpu_data_out <= data_in_b; end end
Would that be better?

#796 Reply
Posted by BrianHG on 31 Dec, 2019 11:27
Does it work?

#797 Reply
Posted by nockieboy on 31 Dec, 2019 11:31
Quote from: BrianHG on 31 Dec, 2019 11:27
Does it work?

Sure does.

Compiles are taking 1 min 50 seconds now.

Code: [Select]
always @ (posedge clk) begin // *** UPDATE GPU_WR_ENA AND FLAGS EACH CLOCK *** gpu_wr_ena <= run_w_porta || run_w_portb; porta_bsy <= run_r_porta || run_w_porta; portb_bsy <= run_r_portb || run_w_portb; // *** HANDLE PORT A READ REQUESTS AND SEQUENCER *** if (run_r_porta) begin rd_sequencer_a[9:0] <= { rd_sequencer_a[8:0], 1'b1 }; gpu_address <= address_a; end else begin rd_sequencer_a[9:0] <= { rd_sequencer_a[8:0], 1'b0 }; // this line must always run no matter any other state end // *** HANDLE PORT A WRITE REQUESTS *** if (run_w_porta) begin gpu_address <= address_a; gpu_data_out <= data_in_a; end // *** HANDLE PORT B READ REQUESTS AND SEQUENCER *** if (run_r_portb) begin rd_sequencer_b[9:0] <= { rd_sequencer_b[8:0], 1'b1 }; gpu_address <= address_b; end else begin rd_sequencer_b[9:0] <= { rd_sequencer_b[8:0], 1'b0 }; // this line must always run no matter any other state end // *** HANDLE PORT B WRITE REQUESTS *** if (run_w_portb) begin gpu_address <= address_b; gpu_data_out <= data_in_b; end end

#798 Reply
Posted by BrianHG on 31 Dec, 2019 11:35
So next, wire in the Z80,...

#799 Reply
Posted by nockieboy on 01 Jan, 2020 21:43
Quote from: BrianHG on 31 Dec, 2019 11:35
So next, wire in the Z80,...

Got that done today, have just been debugging the connection with a little spare time. Started out oddly - clearly a bad connection for bit 1 on the data bus, fixed that now but am still not getting 100% results.

I'm using two methods to read and set the contents of the GPU RAM from the Microcom - the PEEK direct command in my monitor program (DMI) and MEMX, a memory viewer/editor program in the DMI. I'm trying to set contents using the POKE direct command.

Here's the output from MEMX when viewing the first page of the GPU RAM (a 16KB bank of the GPU's memory window mapped to $C000), compared with the (simultaneous) output from the RS232_debugger viewing the same page:

MEMX:

RS232_debug:

As you can see, there are some errors in the MEMX output (addresses $C01C, $C01E, $C02C are the first). Looks like a timing issue getting data out from the GPU to the Z80?

Writing the GPU RAM using POKE seems to be fine for the most part, unless I try to perform a block write - I have another command that sets a block of RAM to a user-specified value (it uses LDIR iirc - one of the Z80's block data-shifting commands) - that completely fails - doesn't change anything according to the RS232_debugger, but returns all zeros (or whatever value was written) on the Microcom.

Going to dig out the Z80 manual and see what it says around the memory RD/WR cycles. I'm wondering if I need to keep the data valid on the data bus for a short while AFTER the end of the memory cycle, but I'm pretty sure the Z80 doesn't require the data to be held after RD goes high...

Sometimes PEEK returns the correct value, sometimes it just returns $FF and the data returned using MEMX changes each time I view the page (well, the errors change in terms of value and location).