EEVblog Electronics Community Forum

Electronics => FPGA => Topic started by: nockieboy on October 16, 2019, 01:42:20 pm

Title: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 16, 2019, 01:42:20 pm
Hi all,

Don't know if anyone has noticed, but Grant Searle's website (http://searle.hostei.com/grant/z80/ (http://searle.hostei.com/grant/z80/)) seems to have blinked out of existence.

This poses a problem for me as I didn't download the code or related material for his FPGA Multicomp design.  I'm now thinking about my first foray into FPGAs by creating a VGA driver for my DIY computer, but it seems a big 'leg up' has now disappeared from the interweb.  |O

Does anyone have the code for the Multicomp?  I was hoping to sift through it and either adapt or just learn from Grant's implementation of the graphics driver in VHDL.

For clarity, I've got an Altera Cyclone II EP2C5 which I was hoping to use to make a VGA controller, either with enough on-board RAM (or a discrete SRAM memory chip) for a frame buffer in the region of 128 KB, to allow a resolution of (up to) 640x480 with two colours, or down to 160x120 with 256 colours and double-buffering.

I've since ordered a Xilinx Spartan 6 as well, thinking it might be even more overpowered for the job.  ;D

I've seen designs where the controller outputs an RGB signal directly to the VGA connector via resistors, and others where the controller passes data from the frame buffer to a video DAC.  Bearing in mind I'm using 8-bit data (so a maximum of 256 colours unless I start doubling-up on bytes-per-pixel in the frame buffer), the DACs I've seen seem to expect a lot more bits on their inputs...

I'm also wondering about how best to interface to the Z80 computer, too.  Would it be best to send data to the VGA controller via IO calls and let the VGA controller handle the frame buffer?  I'm not a fan of restricting the Z80's access to memory for 75% of its runtime whilst the VGA controller reads from a frame buffer in the Z80's memory space, and dual-port RAM is too expensive for my needs.

Any thoughts, comments etc. appreciated!  I'm not very experienced at this stuff having spent the last two years learning electronics by building a DIY Z80-based computer...

 :-+

EDIT:

For anyone interested, this topic evolved into a fully developed video card using the EP4CE10 FPGA from Altera.  The github for this project is here: https://github.com/nockieboy/gpu (https://github.com/nockieboy/gpu)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 16, 2019, 01:54:57 pm
Does this contain what you need? https://github.com/wsoltys/multicomp
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 16, 2019, 02:55:18 pm
Does this contain what you need? https://github.com/wsoltys/multicomp

Ah yes, thanks SiliconWizard!  :-+  That gives me a fair starting point to work from, I think.  As I'm only going to be building the video driver circuit, hopefully I'll be able to make use of some more RAM space.  Will have to go start on some VHDL/FPGA tutorials as I'm starting from scratch with them.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: dferyance on October 16, 2019, 05:43:02 pm
Just a few thoughts / ideas.

The resistor ladder dacs are popular and mostly work fine. However I have run into their limitations in my own work. I've been doing VGA with 256 color, but a palette system out of 24-bit color. I often see obvious color differences due to the limited number of bits in the resistor ladder. If you can live with this, they work great and are simple. It sounds like you are doing 256 bit color without a palette so you should be fine, but if you extend it latter this is a limitation you will run into.

For interfacing the two, do you need framebuffer-level access? For example, the original gameduino outputs VGA but takes in different drawing commands (over SPI). This way you aren't having read and transmit entire framebuffers over a slow bus.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: NorthGuy on October 16, 2019, 08:10:33 pm
Why VGA? It would be more natural to go digital.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 16, 2019, 08:27:06 pm
Why VGA? It would be more natural to go digital.
Got a back-of-napkin design for that?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: kizmit99 on October 16, 2019, 09:14:56 pm
Why VGA? It would be more natural to go digital.
Got a back-of-napkin design for that?

The CylconeII OP mentioned supports 3.3V LVDS outputs which can (pretty easily) be used to drive an HDMI monitor.  The resolutions OP mentioned would have pixel clocks below the HDMI minimums, so would require pixel replication to get into the supported rates.  But I suspect that some quick math would find something acceptable without blowing past the output speed limit on the FPGA.

Both of these are useful examples:
http://hamsterworks.co.nz/mediawiki/index.php/Minimal_HDMI (http://hamsterworks.co.nz/mediawiki/index.php/Minimal_HDMI)
https://www.fpga4fun.com/HDMI.html (https://www.fpga4fun.com/HDMI.html)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Canis Dirus Leidy on October 16, 2019, 11:00:09 pm
Why VGA? It would be more natural to go digital.
Got a back-of-napkin design for that?
There were several examples at marsohod.org (in Russian):
Connecting MAXII (EPM240) to a LCD panel with a parallel RGB interface (https://marsohod.org/projects/plata1/172-phframe1)
Driving LVDS interface with MAXII (https://marsohod.org/projects/plata1/173-phframe2) (dirty hack, due to lack of PLL in EPM240, but works).
DIY framebuffer (from old SDRAM) for DIY photoframe (https://marsohod.org/projects/plata1/174-phframe3)
HDMI interface with MAX10 (https://marsohod.org/projects/proekty-dlya-platy-marsokhod3/307-max10-hdmi)

P.S. Also look for "Circuit Design and Simulation with VHDL". In particular, chapters 16 "VHDL Design of DVI Video Interfaces" and 17 "VHDL Design of FPD-Link Video Interfaces".
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 16, 2019, 11:35:23 pm
Well. I learned something today! I had preferred the VGA idea because it's more "tangible" and offers easier-to-read feedback in case of errors, but if it's that easy, objection withdrawn. :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Codex on October 17, 2019, 03:59:37 am
Web Archive has a copy :)
https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html (https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html)

He has a mirror here.
http://zx80.netai.net/grant/index.html (http://zx80.netai.net/grant/index.html)

twitter for updates and links to the mirror :)
https://twitter.com/zx80nut?lang=en (https://twitter.com/zx80nut?lang=en)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: CJay on October 17, 2019, 05:22:24 am
Hi all,

Don't know if anyone has noticed, but Grant Searle's website (http://searle.hostei.com/grant/z80/ (http://searle.hostei.com/grant/z80/)) seems to have blinked out of existence.


Bugger.

Wish I'd archived it all now.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 17, 2019, 06:21:39 am
Well. I learned something today! I had preferred the VGA idea because it's more "tangible" and offers easier-to-read feedback in case of errors, but if it's that easy, objection withdrawn. :-+

DVI-D is a direct replacement of the VGA signals with streams of bits. It is from the FPGA designers point of view just a little bit bolted onto the end of the video pipeline, rather than the DACs.

The only trick (if you can call it that) I'd the high speed digital signals from the FPGA to the connector, but for standard 640x480 that isn't too exacting, and you can get away with quite a few sins. The raw bit rate is 250Mb/s per channel, so bits are around meter long on the wire (ignoring velocity factors and so on), and the protocol is designed to resist errors.

A few mm of length mismatch or an impedance bump is far less a problem then at 1080p rates.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 17, 2019, 01:03:29 pm
Bugger.

Wish I'd archived it all now.

Don't panic CJay - Codex has linked to some excellent mirrors and I can personally vouch for the z80.netai.net one, it's all there.  :-+

Web Archive has a copy :)
https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html (https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html)

He has a mirror here.
http://zx80.netai.net/grant/index.html (http://zx80.netai.net/grant/index.html)

twitter for updates and links to the mirror :)
https://twitter.com/zx80nut?lang=en (https://twitter.com/zx80nut?lang=en)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 17, 2019, 01:32:29 pm
Just a few thoughts / ideas.

The resistor ladder dacs are popular and mostly work fine. However I have run into their limitations in my own work. I've been doing VGA with 256 color, but a palette system out of 24-bit color. I often see obvious color differences due to the limited number of bits in the resistor ladder. If you can live with this, they work great and are simple. It sounds like you are doing 256 bit color without a palette so you should be fine, but if you extend it latter this is a limitation you will run into.

A resistor ladder is the simplest way to get a colour output I guess, but it seems a little... inexact... and dependant on resistor tolerances and chosen values, so when I saw an example using a video DAC I thought I'd likely want to give that a try.

I'm not sure I understand colour encoding/generation properly, so would appreciate correction if I'm wrong, but whilst thinking about the frame buffer design the other day I gave some thought to how the data would be stored in the frame buffer.  Storing discrete values for R, G and B channels using a bit (or bits) seems a little wasteful.  If I want 64 colours for example, I could use 2 bits per channel for 6 bits of RGB per pixel, or one byte per pixel (with a couple of bits spare) in the frame buffer.  Thing is, if I want to go to 3 bits per channel or more, I'm looking at two bytes per pixel and doubling the memory requirement for a single frame.

So I hit on the idea of using a look-up table for the colours.  Is that the 'palette system' you're referring to?  Basically, I'd have one byte per pixel in the frame buffer.  Each byte would be a value between 0 and 255, so when the 'pixel' is read from the frame buffer, the value would be used to look up the RGB value in the LUT, which could be anything up to 24-bit values that would get passed out to the DAC...

... would that work?  The other thing is, I don't really know anything about FPGAs - I get the impression they're fast, so using a LUT for the RGB values shouldn't slow the FPGA down so much that it couldn't keep up with the clock? 

The FPGA I have currently isn't up to what I want from it, really - it's an Altera Cyclone II EP2C5T144, so doesn't have the RAM I need for a frame buffer for anything more than straight text display.  I'm wondering if using an external SRAM chip would be the way to go for the frame buffer?  Would something with an access time <15ns for example be fast enough?  Alternatively, I'm waiting on a Xilinx Spartan 6 which I'll likely develop on instead. That has a 32MB SDRAM on its board, I might experiment using that for the frame buffer if the RAM in the Spartan isn't enough (though I think I can make the internal RAM dual-port, which is highly desirable for a frame buffer?)

For interfacing the two, do you need framebuffer-level access? For example, the original gameduino outputs VGA but takes in different drawing commands (over SPI). This way you aren't having read and transmit entire framebuffers over a slow bus.

I'm intending to interface the VGA controller with my Z80-based computer running at 8 MHz.   I'm not hung-up on direct frame-buffer access for the Z80 at all - in fact, I'd rather not mess with the Z80's memory space at all if another interface method is fast enough.  Unless anyone here tells me otherwise, I'm going the route of having the Z80 send commands to the FPGA, which it will interpret and modify the frame buffer's contents accordingly.  How the Z80 does that, I haven't decided yet, but I could use a serial connection from the Z80's SIO, I could use my Z80's hardware I2C port, or a more direct connection using OUT commands and the data bus, which I'm erring towards as I feel it will be the quickest of the three.  An SPI interface isn't currently an option, but may be later if I ever finish a hardware SPI interface for my system.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: dferyance on October 17, 2019, 01:40:32 pm
Quote
So I hit on the idea of using a look-up table for the colours.  Is that the 'palette system' you're referring to?  Basically, I'd have one byte per pixel in the frame buffer.  Each byte would be a value between 0 and 255, so when the 'pixel' is read from the frame buffer, the value would be used to look up the RGB value in the LUT, which could be anything up to 24-bit values that would get passed out to the DAC...

Yes, that is exactly right. The palette is a LUT. A 256 byte LUT can pretty easily fit in a FPGA so the lookup can be quite fast. One common technique used for old DOS VGA games is to modify the palette to do simple animations. This is called palette shifting. So while the palette can be limiting as you can only have 256 colors at a time it can also be handy as well.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 17, 2019, 01:52:55 pm
Why VGA? It would be more natural to go digital.

Err, well a couple of reasons really - I'm not very knowledgeable when it comes to electronics (experienced users of the forum who remember my DIY Z80 computer posts will agree!  ;)), and because of that I guess the other reason is that I didn't even know it would be an option!  :o

I guess the concept of VGA is relatively easy for me to grasp, but if I can do HDMI then I most certainly will!


The CylconeII OP mentioned supports 3.3V LVDS outputs which can (pretty easily) be used to drive an HDMI monitor.  The resolutions OP mentioned would have pixel clocks below the HDMI minimums, so would require pixel replication to get into the supported rates.  But I suspect that some quick math would find something acceptable without blowing past the output speed limit on the FPGA.

Both of these are useful examples:
http://hamsterworks.co.nz/mediawiki/index.php/Minimal_HDMI (http://hamsterworks.co.nz/mediawiki/index.php/Minimal_HDMI)
https://www.fpga4fun.com/HDMI.html (https://www.fpga4fun.com/HDMI.html)

Thanks kizmit - that second link is certainly intriguing and has piqued my interest.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 17, 2019, 01:59:57 pm
Quote
So I hit on the idea of using a look-up table for the colours.  Is that the 'palette system' you're referring to?  Basically, I'd have one byte per pixel in the frame buffer.  Each byte would be a value between 0 and 255, so when the 'pixel' is read from the frame buffer, the value would be used to look up the RGB value in the LUT, which could be anything up to 24-bit values that would get passed out to the DAC...

Yes, that is exactly right. The palette is a LUT. A 256 byte LUT can pretty easily fit in a FPGA so the lookup can be quite fast. One common technique used for old DOS VGA games is to modify the palette to do simple animations. This is called palette shifting. So while the palette can be limiting as you can only have 256 colors at a time it can also be handy as well.

My goal is to build a computer as good as, if not better than the first computer I ever had (an Amstrad CPC464 back in the 80's).  The only challenge left for me to meet or exceed is the graphics display, so 256 colours at a time may be 'limiting', but it'll blow my old Amstrad out of the water (it had a 27 colour palette)!

My ignorance of FPGAs is limiting my understanding here, though.  How could the palette be modified 'on the fly' in an FPGA?  Oh, unless it's stored in block RAM I suppose?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 17, 2019, 02:40:49 pm
The thing is that R2R DACs built out of just cheap standard tolerance resistors are only really good enough to get you about 4 bits of resolution reliably. Past that the resistor tolerances start becoming a significant enough part of the signal that the non linearity of the steps messes up the least significant bits. So its not really worth it building a 8bit R2R ladder DAC as its performance won't be all that great.

If you want proper 24bit color you are better off buying a proper DAC chip that gives you 8bits per channel. This can be a purpose built VGA DAC or just 3 separate generic 8bit DACs for each color. By the time you have all of this you will essentially have a 24bit RGB bus(Also called DPI bus) feeding it and this is a simple standard video bus and this means you can replace your 24bit VGA DAC chip with a LCD panel and it will work, or you can replace it with a RGB to LVDS or RGB to HDMI converter to drive larger serial LCD panels or display it on a modern TV. Tho this conversion can also be done inside a FPGA, but does tend to need specialized serdes capability to go fast enough for larger resolutions.

And once you have 24bit color you can do palatalized 256 colors nicely because each pallet entry can pick from the 16 Milion available colors that you are now capable of displaying.  Color palettes are just a way of keeping the amount of data down inside the graphics subsystem, its not a way of getting around the limitations of the video output hardware as it still has to be capable of reproducing all the colors the palette would want to ask for, its just a workaround that the graphics hardware is not powerful enough to tell the video output hardware exactly what color it wants, so its job is made easier by having to only pick from 256 colors.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 17, 2019, 02:56:25 pm
I'm also wondering about how best to interface to the Z80 computer, too.  Would it be best to send data to the VGA controller via IO calls and let the VGA controller handle the frame buffer?  I'm not a fan of restricting the Z80's access to memory for 75% of its runtime whilst the VGA controller reads from a frame buffer in the Z80's memory space, and dual-port RAM is too expensive for my needs.

Ideally, you could consider using a dedicated framebuffer (dedicated RAM chip) for the display.

The CPU could for instance access it (to write to it or read from it) via some kind of memory mapping scheme, like maybe in chunks ("windows") of 8KB, and a bank selection mechanism. More efficient than using I/O accesses IIRC. Of course, when the CPU would access the framebuffer, it could only do so when it's not being read by the display controller, but when the CPU is not accessing the framebuffer, it could mind its own business with its own RAM without any penalty.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 17, 2019, 03:14:03 pm
Ideally, you could consider using a dedicated framebuffer (dedicated RAM chip) for the display.

The CPU could for instance access it (to write to it or read from it) via some kind of memory mapping scheme, like maybe in chunks ("windows") of 8KB, and a bank selection mechanism. More efficient than using I/O accesses IIRC. Of course, when the CPU would access the framebuffer, it could only do so when it's not being read by the display controller, but when the CPU is not accessing the framebuffer, it could mind its own business with its own RAM without any penalty.

I had considered this before - my system's MMU breaks the memory down into 16KB banks with a total of 4 MB of physical memory space.  I could just break out one (or more) of the eight 512 KB chip sockets to dedicate them to being a frame buffer, I suppose.  The only issue is that I'd need to put together some switching logic to isolate their data and address bus from the Z80's whilst the Z80 isn't accessing the frame buffer, and some way of making the Z80 wait until the frame buffer is free before access is allowed to it (I guess I could read an IO port on the FPGA which would go low when the display is in a front porch, for example.)

I guess whether I'd want to go that route, with the additional complexity of creating the 'shared RAM' interface and control logic, would depend on how much mileage I can get out of using an on-board frame buffer on the Spartan 6, or whether just sending commands and data one OUT at a time would be too much of a bottleneck.

EDIT:

Ah, answered my question with a quick Google - the Spartan 6 has up to 18 KB of RAM on board.  Subtract 2 KB for symbol/character set storage (unless ROM is a separate entity to the RAM?) and I'm struggling to do much with it on a resolution larger than 160x120.  Looks like the frame buffer will have to go into the SDRAM and I'll need a way to buffer commands/data from the Z80 whilst the controller waits for a 'porch window' to do what it needs to do with the frame buffer.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 17, 2019, 03:37:49 pm
Shared ram chips, in DIP-package, are about 2 Kbyte.
The alternative is to force the Z80 releasing the bus.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: CJay on October 17, 2019, 03:39:37 pm
Bugger.

Wish I'd archived it all now.

Don't panic CJay - Codex has linked to some excellent mirrors and I can personally vouch for the z80.netai.net one, it's all there.  :-+

Web Archive has a copy :)
https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html (https://web.archive.org/web/20181123194029/http://searle.hostei.com/grant/index.html)

He has a mirror here.
http://zx80.netai.net/grant/index.html (http://zx80.netai.net/grant/index.html)

twitter for updates and links to the mirror :)
https://twitter.com/zx80nut?lang=en (https://twitter.com/zx80nut?lang=en)

Yeah, found them and they're safe now :)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 17, 2019, 04:24:43 pm
You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.

Any Spartan 6 certainly has more than 18KBytes of embedded RAM. The LX4 (smallest) has 216Kbits (27KBytes), and the LX45, which is still reasonable price-wise, has 2088Kbits (261KBytes!)
You confused the "18Kb" (Kbits) figure for the Spartan 6, which is the individual size of the RAM blocks, certainly not the total available!
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: mariush on October 17, 2019, 08:42:20 pm
Any reason you can't use a microcontroller to generate the VGA signal?
There's cheap microcontrollers (under 10$) which exceed 200 Mhz and fast DACs are cheap, if you don't want to use resistors.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 17, 2019, 11:49:55 pm
You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.
Yep, just like almost all the home computers of the 1980s. But that requires some close coupling between the video hardware and the rest of the system, which he said he wanted to try to avoid. With the huge difference in memory bandwidth demand between the CPU and 640*480*8bpp, I can understand that.

Personally I'd have put a Z80 soft core and an SDRAM controller on the FPGA and called it 80% done :D

Any reason you can't use a microcontroller to generate the VGA signal?
There's cheap microcontrollers (under 10$) which exceed 200 Mhz and fast DACs are cheap, if you don't want to use resistors.
Any reason you would want to bit bang video, other than to show off for the demoscene (https://www.linusakesson.net/scene/craft/)?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 18, 2019, 01:11:58 am
I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

I wonder what happened to Grant Searle's website? I have a lot of the code stashed away but there was a ton of cool stuff on there.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 18, 2019, 06:33:04 am
These day its still common to share main system RAM between the CPU and GPU, but its done differently.

Today caches are used on all sorts of CPUs even small ones that run under 100MHz. This means the CPU does not always need to be poking and prodding at RAM in order to execute every single instruction. For code that loops around a lot (this is usually where speed matters the most) all of the instructions end up inside the cache and so the CPU completely stops reading code from RAM while for any temporary calculations there are usually plenty of registers to keep things in. So once things are cached the access to RAM is just to move the final computation results in and out of RAM and even that happens in large blocks of cache writebacks (Dynamic RAM usually loves large burst accesses due to pipelining). All this was required to push CPUs into the 100s of Mhz since main system RAM could not keep up.

So this means you can actually take the RAM away from the CPU for many cycles at a time and it won't even notice. Also modern DRAM can easily be run  at 100MHz or more and the wide burst access nature of caches means you can benefit from having a really wide RAM bus such as 64 bit, 128 bit... etc to get even more speed out of it even tho the CPU might only be 16 or 32bit (In graphics cards 512 bit wide RAM buses are not uncommon because they really need lots of bandwidth).

All of this means plenty of RAM bandwidth can be leftover from the CPU and can be used for the GPU. The bus arbitrator can split the bandwidth however it wants between them (Not the usual fixed 50/50 split that interleaved access did in the old days). But it also means that the video output hardware has to have some cache of its own as it will not be guaranteed RAM access at a moments notice. The CPU might be in the middle of a cache writeback and is taking up the bus for the next say 16 cycles or a DRAM refresh cycle might be happening, so the video hardware must have enough pixels in its own internal buffer to keep outputting the image while it waits to finally get to the RAM and catch up.

If you are after a ton of graphics horsepower you could just give the GPU its own set of RAM chips with a huge wide bus to make sure it really has a firehose worth of guaranteed memory bandwidth and then implement a bus MUX that lets the CPU use the memory when the GPU is idle and video output is in blanking, or have the GPU listen to commands from the CPU of what to put into graphics memory. But for a retro computer project that is way overkill since on a decent FPGA this would have the graphics horsepower surpassing early 3D graphics accelerator cards in PC such as the Voodoo
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 18, 2019, 08:33:46 am
Personally I'd have put a Z80 soft core and an SDRAM controller on the FPGA and called it 80% done :D

But for a retro computer project that is way overkill since on a decent FPGA this would have the graphics horsepower surpassing early 3D graphics accelerator cards in PC such as the Voodoo

I guess my motivations may be hard to understand for some as I could achieve all this by just putting it ALL onto the FPGA, but my original intention was to learn how to build a computer from scratch (having been given the desire to do so by stumbling across Grant Searle's page) and learn some electronics skills on the way.  I'm now working with a system I want other people to be able to build who want to go along the same journey I did - although it involves a lot of SMT parts now as my soldering skills have developed and an FPGA graphics card certainly IS overkill, it just seems too good an opportunity for me to miss to be able to just plug my little home-made computer into my living room TV and play Pong on it with the family.

(https://i.ibb.co/P9r24jM/20190730-184211.gif) (https://ibb.co/nnwXfmB)

You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.

The CPU runs at 8 MHz (or 4 MHz, if you want to experience true 80's CP/M!).  So you think there would be no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen?  To me, knowing little about this sort of stuff, that equates to slowing the computer down by almost three quarters?

Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example.  Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?

Any Spartan 6 certainly has more than 18KBytes of embedded RAM. The LX4 (smallest) has 216Kbits (27KBytes), and the LX45, which is still reasonable price-wise, has 2088Kbits (261KBytes!)
You confused the "18Kb" (Kbits) figure for the Spartan 6, which is the individual size of the RAM blocks, certainly not the total available!

Ah, my mistake.  I was a little surprised it was so low actually, considering I was comparing it against an Altera Cyclone II (which I have currently) which is much older and only has a little less.

The LX45 is VERY promising - a little expensive for what I'm trying to do, but would make life VERY easy as I could have an internal dual-port frame buffer that could give me resolutions up to 640x480 with 4 bit colour depth using a LUT, and still have over 100 KB free for the LUT, buffers, character set(s) and even sprites... And the best thing?  It's available in TQFP, which I consider still reasonably easy to solder (have never tried BGA... other than using a frying pan, don't see how it could be done at home). Hmmmm.... thanks SiliconWizard, that's a great suggestion.  :-+

I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

I wonder what happened to Grant Searle's website? I have a lot of the code stashed away but there was a ton of cool stuff on there.

True, VGA was my original intention when I started this post, but having had HDMI suggested as not being impossible, I think it'd be far more future-proof to use that instead.  I'm concerned VGA is diminishing in popularity these days - it will only get harder to get hold of TVs / monitors with VGA inputs, and whilst I've built a 'retro' computer, I'm trying to keep in mind future availability of parts (partly hence the move to SMT components as some are like gold dust in DIP form).

Grant's site seems to have been deleted, according to the site message - so it doesn't look like the domain has expired, it looks more like an intentional removal of the site.  :-//

EDIT:

I made a mistake - the LX4 is NOT available in TQFP, only BGA or wafer, which means I can't use it unfortunately (my SMT-soldering-fu isn't that strong!)  At least, not unless I find a PCB manufacturer who is willing to solder it to the PCB for me as part of the fabrication process.  I tend to use JLCPCB, so I've contacted them to find out if they will do it as part of their new SMT-assembly service.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 18, 2019, 09:52:40 am
With a FPGA you can go as far as you want to go. You can just implement a RAM buffer that gets drawn to the screen if that's what you are after. But you can then add on things like hardware graphics acceleration if you want to have the graphics seen on the 16bit game consoles back in the day. Sound can also be implemented in the same FPGA giving you emulation of the popular chiptune or FM synthesizer chips.

But if you just want to output video from a RAM framebuffer then you can get something like the Solomon Systech SSD1963 in a LQFP128 package:
https://www.buydisplay.com/download/ic/SSD1963.pdf (https://www.buydisplay.com/download/ic/SSD1963.pdf)

While it is called a "LCD Display Controller" its actually outputting a RGB video bus that can be turned into VGA via a DAC or into HDMI via a RGB to HDMI converter chip. The input to it is just a memory bus so you connect it to your Z80 in the same way you would connect SRAM or any other peripheral chip (Tho it is 3.3V so you might need level shifting). The framebuffer is internal inside the chip and it can run at up to 110MHz so any retro CPU will not be too fast for it. Its essentially a video card in a chip.

But the downside of using such a ready made solution is that it can't emulate anything else. So if you have existing software that expects to talk to certain video hardware then it wont work here, but replacing this chip with a FPGA lets you customize its memory mapping and registers in a way that mimics some existing video chip in some retro computer and allows the existing software to run on it.

In a way a FPGA is kinda cheating if you are after the genuine retro computer building experience. Having to breadboard 1 to 10 chips and running 10 to 100s of wires just to add some peripheral to your computers memory bus. But with a FPGA you just edit a few lines of code on your PC and load the new firmware into the FPGA and boom suddenly your computer has 3 extra GPIO ports and a PWM output. Few more lines of code and suddenly you also have a floating point coprocessor in your computer, some more core and you have a separate programmable DSP core in there too. Need more grunt? Okay copy paste 3 more of those DSP cores to now have a quad core DSP on the bus. Doing the same in the 80s would requires a few suitcases worth of boards hanging off your computer, these days its just adding some more code into a FPGA chip sitting on the bus.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Canis Dirus Leidy on October 18, 2019, 10:07:06 am
Yes, that is exactly right. The palette is a LUT. A 256 byte LUT can pretty easily fit in a FPGA so the lookup can be quite fast. One common technique used for old DOS VGA games is to modify the palette to do simple animations. This is called palette shifting. So while the palette can be limiting as you can only have 256 colors at a time it can also be handy as well.
And also a programmable palette (in combination with bit planes video RAM ) made life somewhat easier for a programmer, when your target machine (https://zx-pk.ru/content/136-Vektor-06C-sovetski-bog-tcveta-i-zvuka) don't have hardware sprites:
[attach=1] [attach=2] [attach=3]

The CPU could for instance access it (to write to it or read from it) via some kind of memory mapping scheme, like maybe in chunks ("windows") of 8KB, and a bank selection mechanism. More efficient than using I/O accesses IIRC.
*cough* MSX (https://www.chibiakumas.com/z80/msx.php) *cough* This standard explicitly prescribed that all access to video memory should be made only through the ports of the video controller.

Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example.  Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?
Well, if you look at PC video chips from EGA/VGA/Early SVGA era (like GD542x family (http://www.s100computers.com/My System Pages/VGA_16_Board (Cirrus)/GD542x Technical Reference Manual.pdf) or ET4000 (http://bitsavers.informatik.uni-stuttgart.de/components/tsengLabs/Tseng_Labs_ET4000_Graphics_Controller_1990.pdf)), you will see that video memory was actually isolated from CPU bus. All that the processor could see was an intermediate buffer and translation logic, creating the illusion of direct access to video memory.

P.S. And talking about resistor-based DACs. Little Chinese trick:
[attach=4]
Instead of R-2R ladder he (or she, or whatever) used SMD resistor arrays for making binary weighted resistors values.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 18, 2019, 10:19:44 am
I use VGA (or composite) for this retro stuff because retro machines just look weird on LCD displays generally and I have a lot more monitors with VGA and only a couple things with HDMI. It's also a lot easier to wire VGA.

My Sonoko project uses three VGA LCD-screens, a VGA service card, and a VGA KVM. NEC VGA LCD-screens are 17inc and were bought new. Personally, I do not like HDMI because it consumes too much bandwidth on wires and on the circuit, and products are even more expensive. The HDMI KVM version costs double, cabling is much more expensive. This is a premium if you are on super modern technology, and e.g. the Advoli TA6 is a PCIe GPU card with nothing but HDMI via LAN outputs, but ... this stuff is a no-way unless you are in business with a squad of engineers.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 18, 2019, 02:49:33 pm
You could also design your MMU so that CPU and video accesses are interleaved. I dunno how fast your CPU is going to run, but if it's a few MHz (as that was back in the day) that should pose no issues.

The CPU runs at 8 MHz (or 4 MHz, if you want to experience true 80's CP/M!).  So you think there would be no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen?  To me, knowing little about this sort of stuff, that equates to slowing the computer down by almost three quarters?

Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 18, 2019, 04:40:14 pm
If you want to play with (crude) graphics hardware, have a look at some of the early arcade games I've done.

https://github.com/james10952001

I've made them quite modular so you can easily mix & match pieces. The old Atari B&W games are cool because the video hardware is entirely independent of the CPU which just writes into RAM and then the hardware continuously reads these RAM locations and uses them to address object ROMs that result in objects on the screen. Originally this used standard SRAM with data selectors but in some of these I just made it dual ported RAM which makes it super easy to "wire" it up to whatever CPU you want. The object ROMs are easily customized as well to display whatever characters you want, and the schematics for the original hardware are readily available. Hack away.

I should add that these are all targeted to the same $12 FPGA board that Grant used, and in fact if you poke around there's a PCB in that repository for a daughter board that plugs right into that FPGA and has sockets for RAM, keyboard, micro SD, etc.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 18, 2019, 07:02:29 pm
With a FPGA you can go as far as you want to go. You can just implement a RAM buffer that gets drawn to the screen if that's what you are after. But you can then add on things like hardware graphics acceleration if you want to have the graphics seen on the 16bit game consoles back in the day. Sound can also be implemented in the same FPGA giving you emulation of the popular chiptune or FM synthesizer chips.

Hmm.. if I can get an AY-3-8910 implementation up and running on the same FPGA as the graphics, that'd be great and save me a lot of kerfuffle with the sound card using a genuine chip, which I'm having timing problems with at 8 MHz at the moment.

On that note, I'd like to include a keyboard handler as well - the thought crossed my mind that I might actually be able to include a USB host (or USB-to-go, at least), so that I could plug a USB keyboard in without having to depend on dwindling supplies of PS2 keyboards.  A quick search earlier showed me that an IP is available for USB-to-go for the Spartan, but looks like it could be costly to buy/licence.

...But the downside of using such a ready made solution is that it can't emulate anything else. So if you have existing software that expects to talk to certain video hardware then it wont work here, but replacing this chip with a FPGA lets you customize its memory mapping and registers in a way that mimics some existing video chip in some retro computer and allows the existing software to run on it.

You're right, using an FPGA is cheating - but I might take a look at the video chip you mentioned as an alternative.  My system is highly modular, so I could develop an FPGA-based video/sound/keyboard card AND one based on separate and more authentic discrete chips for those features and leave the choice of which they'd prefer to the person building the system.
[/quote]

And also a programmable palette (in combination with bit planes video RAM ) made life somewhat easier for a programmer, when your target machine (https://zx-pk.ru/content/136-Vektor-06C-sovetski-bog-tcveta-i-zvuka) don't have hardware sprites:
[attach=1] [attach=2] [attach=3]

Спасибо - I wish my Russian was better and I could read the writing, but I think I get the idea.

*cough* MSX (https://www.chibiakumas.com/z80/msx.php) *cough* This standard explicitly prescribed that all access to video memory should be made only through the ports of the video controller.

Yes, a friend of mine has been recommending I look at the MSX architecture for tips, I'm starting to see why now.  So passing data via IO calls to the FPGA is a viable proposition then.  I can see this method being quite useful for things like clearing the screen, or drawing rectangles and other basic shapes using only a handful of IO instructions.  I'm going to have to give some serious thought to what sort of instructions I'll want to implement in the video controller.

A gold-standard outcome for me would be to create a video controller that enables my computer to produce graphics for half-decent games.  Initially I see things like Pong and Breakout being easy enough, but I'd like to simulate the kind of 80's games that were commonplace on the 8-bit systems of the day.  As the 80's computers I'm familiar with all used interleaved memory frame buffers, so I'm assuming these sorts of games would still be possible using the MSX method, as the MSX managed it?

Well, if you look at PC video chips from EGA/VGA/Early SVGA era (like GD542x family (http://www.s100computers.com/My System Pages/VGA_16_Board (Cirrus)/GD542x Technical Reference Manual.pdf) or ET4000 (http://bitsavers.informatik.uni-stuttgart.de/components/tsengLabs/Tseng_Labs_ET4000_Graphics_Controller_1990.pdf)), you will see that video memory was actually isolated from CPU bus. All that the processor could see was an intermediate buffer and translation logic, creating the illusion of direct access to video memory.

Yes, I figured having a buffer would be helpful as the FPGA may not be able to access the frame buffer immediately on receipt of a command without causing tearing.  I'm still not 100% convinced I fully understand how it'll work with the Z80 sending commands/data to the FPGA via an IO port, but I'm sure I'll get my head around it at some point.

CP/M was designed to use VT100 and VT220 terminal, hence the FPGA needs to implement a simple VDU, or the SOC needs to implement a simple serial link with a decent FIFO to use an external VT100.

Yes, I'm intending to emulate the VT100 terminal and handle ANSI-escape codes in the video controller.  Grant's Multicomp VHDL does a good job of that already.

Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.

You mean the FPGA would read the 'frame buffer' in the Z80's logical memory space into an internal buffer really quickly, then pass that out as a pixel stream?  Otherwise surely the frame buffer would be sending data at the rate the screen mode requires, which means 73% of the time it would be sending data and locking the Z80 out?  Not sure I understand this fully.

If you want to play with (crude) graphics hardware, have a look at some of the early arcade games I've done.
...Hack away.

Ah thanks for that james_s - will take a good look over the weekend. :)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 18, 2019, 07:39:32 pm
I guess my motivations may be hard to understand for some
I understand it and respect it, even if I don't personally feel the attraction (anymore).

To paraphrase an old proverb, maybe Daoist, "The man who knows no stories is a fool. The man who knows many stories is wise. The man who knows one story is dangerous." The Old Ones have bequeathed many useful stories to us through their artifacts. I urge you to look at what other designers were doing in the 1980s when tasked to write a display engine. Even a close reading through the programmer's manual of some of these older systems will give you some flavor of their concerns and the clever tricks they used to get (then) high-performance video out of (today) low-performance hardware while holding to an accessible price point.

But you have to know your medium before you can create a useful artifact. Digital design is not a qualitative discipline. There is no substitute for looking at timing diagrams and characteristics tables, establishing cause and effect, looking up propagation delays, and doing the sums. For example,
Quote
no noticeable slow-down if I were to shut down memory access for ~73% of the time while the FPGA is accessing the frame buffer to draw the visible area of the screen
isn't a very useful question. The better question is, how much slowdown would there be if...? To reckon that, we have to look at the Timing section of the Z80 CPU User Manual, UM0080. For example, if you look at the instruction fetch cycle you'll see that the last 2T of the M1 cycle is spent "refreshing", where the Z80 doesn't care what's on the bus but still drives some of the address pins as a convenience to users of then-new DRAM, and that the Z80's proper business is done at the end of T2. We also know that our memory system can service instruction reads in 2T, and that, if we are using SRAM or a DRAM controller that handles its own refresh, the Z80's refresh cycle is wasted time. What if we simply disconnect those pins from the data bus at T3, hold the read value in a latch for the convenience of the Z80, and let other hardware access the bus instead? You just found a £10 note in the sofa cushions, depending on what kind of deal your company was able to score on DRAM. :)

Back to cases. Having written out some ins and outs, you might then look at I/O read and write machine cycles, and see that they are 4T in length. For I/O writes, we see that all address, control, and data of interest are available in T2. If all our devices are fast enough to have completed the write by the end of T2, we can just disconnect the Z80 from the bus and let the system bus free for other business during TW* (third beat) and T3 (fourth beat). For I/O reads, we see that all the address and control are once again available in T2, but the CPU doesn't read the data until the second half of T3. So, again assuming that our devices are fast enough, we will have our read data by the end of T2 and need only hold it for the CPU through TW and T3. We can then unhook the Z80 from the bus at the end of T2 and go about our other business. Cool, another £10 in the cushions!

Now we look at loads and stores. We see that memory write cycles are 3T in length, and that our snappy little jig has become instruction-dependent math rock. But [attachimg=1]! The Z80, like most other processors of the time, allows bus cycles to be stretched to accommodate slow hardware. We can insert one TW to lengthen the cycle to 4T and keep the rhythm, taking note of the penalty in order to answer our original question. Having done that, we look again at the memory write cycle and see that data and address are valid by the end of T1, but the !WR signal isn't valid until the end of T2. We can assume by the end of T1 that, if !MREQ is asserted and !RD is not, !WR will be asserted by the end of T2, and prepare accordingly. The rest of the write cycle follows that of I/O out cycles with the exception of the wait state we inserted, and we can likewise disconnect the CPU from the bus for TW and T3 without the Z80 any the wiser. £10! As for loads, we see that they too are 3T in length, so to keep the rhythm let's add the wait state and mark the penalty. Looking at our extended read cycle, we once again see that data isn't sampled until the end of T3 (fourth beat, as TW was inserted as the third beat). If our memories are fast enough, we can sample the data at the end of T2 just as we did for the I/O and hold it for the CPU, while we unhook the CPU from the bus and use the last 2T for our own business. £10!

Finally let's look at the interrupt request/acknowledge cycle, which is at minimum 5T long, and now we've gone to playing experimental jazz. But we will know what kind of cycle it is by the end of T1, and know it is an interrupt if we see !M1 asserted and neither !MREQ nor !IORQ asserted. In our system design we have a decision to make: do we want to pass the vector cycle through to the system bus, or handle it off the system bus? If you pass it through, you can treat it much the same as any other memory read cycle but generate the !WAIT signal and hold the received vector for 5T longer than otherwise, which we will mark into the penalty column. If you prefer to handle it off the system bus, you do the same with the !WAIT signal but disconnect the CPU from the bus and supply the vector by your choice of means. Your call. In either case, to keep the rhythm we have to stretch the interrupt cycle out to 8T. A 50p coin is better than nothing. ;D The NMI cycle is just a dummy instruction fetch with a runt pulse on !MREQ in T3 which we can ignore because the CPU would be off the bus anyway. £1!

The overall effect, then, is that we have introduced some ancillary logic to the Z80 so that it can vacate the system bus for 2T out of every 4T, at the cost of memory access by the CPU taking 14.3% longer than theoretically possible, and a modest hit to interrupt latency which in practice may not mean all that much. In doing so we have saved millions of RAM chips and tens of millions of pounds. Going a little bit out of brief, while the video system is not actively fetching frame buffer data, its 2T cycles could be borrowed to service other peripherals, for example, to buffer up sprite data, service disk controllers, or output PCM audio. Going very far afield, it may be evident that, instead of display DMA, we could place a second Z80 with much the same ancillary logic, sync them up, and have them both run at nearly full speed out of the same memory and I/O space, with the usual caveats about multiprocessing. If you were feeling especially naughty, you could pull the wool over the second CPU's data lines and feed it NOP instructions while exploiting its program counter as an address generator for the video output (beware, people have been knighted (https://en.wikipedia.org/wiki/ZX81) for doing this sort of thing).

Quote
Part of the reason for this post was to discuss the best way to interface the FPGA with the computer - frame buffer in computer RAM, or gated via the FPGA, for example.  Both have positives and negatives, I just don't have the knowledge or experience to understand their weight and balance the options properly, if that makes sense?
The word salad I wrote above is a walk through the sort of thinking the Old Ones utilized in many of the 1980s home computer designs, starting with the need to compete vigorously on both BOM cost and performance. The IBM PC, coming from its mainframe background and the culture of modularity, could not engage in this level of coupling. Either approach is certainly feasible. Which one is more desirable is a systems-level decision that depends, in part, on the desired display size and depth, and in turn the desired pixel rate, but also on cost, code size, flexibility, programmer convenience, and so on. "Better than the Amstrad" is an open-ended brief that could encompass anything from the tightly-coupled home computer systems as napkin-designed here, to an ISA bus bridge to a CGA/VGA/EGA/Hercules card rescued from the scrap pile. In any case, there will be side effects which you also have to pursue down the line and decide to live with for your application.

Oh, if you want to play USB host, the SL811HS is a classic choice, which incidentally can also be configured as a device should you wish to link your system to your PC. Be advised that USB HID can be a bit of a hairball.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 18, 2019, 08:09:10 pm
You are worried a bit too much about the FPGA implementation of the memory, framebuffer and access methods.

A lot of the retro ways of doing things in 80s computers are optimizations to get the thing working with as few chips as possible that containin as few transistors as possible in order to make a cost effective computer that a home user could afford. But modern FPGAs are rather large so keeping down the number of gates is not so important while you can now cheaply buy a few megabytes of SRAM with 10ns access time (So a max clock speed of about 100 MHz). This means such optimization tricks to reduce transistor count and memory usage are not required to make the design fit into a reasonably sized FPGA.

So if you connect all your components to the pins of a FPGA you can implement your graphics card in any way you like. The block RAM inside FPGAs is naturally dual port anyway and allows being written to at the same time as its being read out of and can typically run at >100MHz. This means you can implement access to the memory in any way you wish. Infact it would be perfectly possible to have 8 of those Z80s talking to the same RAM inside the FPGA simultaneously, doing read/write access to the same memory locations and all at the full 8MHz speed on each Z80. Due to external modern SRAM having 10ns access times the same is possible even withe external memory chips. But since you have only one Z80 this means that 90% of the available memory bandwidth is left unused and so can be utilized by any of the video or graphics acceleration hardware built in the FPGA without slowing down the Z80 at all.

So just pick whatever graphics architecture you would like to have and implement it in the FPGA. You can also implement multiple ones at the same time, like have a register command based interface and memory mapped simultaniusly, or have multiple modes like PC graphics cards do where you get various text or graphics modes that provide different advantages (Some provide lots of resolution and colors, some very few but take less CPU horsepower to use).

That being said the modern way of handling graphics acceleration in PCs is in the form of "draw calls". The GPU memory is available to read/write from the CPU as it wishes, but GPU memory tends to be in physically separate memory chips for speed reasons, so the GPU just forwards the read/write operations trough to its memory. Then the way that the GPU is made to do work is that the CPU places a list of things to do into its memory, each item on that list is a draw call and its a structure containing information of what you want the GPU to do ranging from "Set pixel 23,500 to color 255,255,255" to "Draw bitmap located at 0x554600 (512x512 RGBA 32bit) into framebuffer at coordinates 84,46 with alpha blending enabled" to "Apply the matrix transformation [0.45,1,1,0,5.4,5.5 .....] to the array of 16384 points in XYZ located at 0x7755000" to "Load the shader script bytecode at 0x5477400 into 128 shader units and run in parallel on the entire framebuffer" and so on and so on.... This list can be as long as you like and is typically built these days by a API like OpenGL or DirectX with the help of the graphics drivers. Once this "cooking recipe" structure is in the GPUs memory it is told to execute it and the GPU will run along on on its own without the CPUs intervention and some miliseconds later the delicious resulting image will be sitting there in another location of video memory. If the video output hardware is also pointing to that location in memory as its framebuffer than the image will also be sent out via HDMI so that your eyes can also enjoy the delicious image the GPU has baked for you.

This draw call approach is by far not the only solution to doing this, but its a solution that fits well on modern computers because the CPU can just throw this few kilobyte large "cooking recipe" and go do something more important while the GPU independantly works hard on baking the recipe without hogging any of the CPUs resources.

Obviously implementing all the draw calls of a modern graphics card is an insanely huge task, but in order to get some impressive graphics going you only need a few simple ones. Things like filling a rectangle with a single color or drawing one bitmap on top of another bitmap, perhaps also with transparency effects and alpha blending. Later on you might want to add support for 2D transformation matrices as that requires little hardware but lets you implement the stuff that for example the SNES can do in Mode 7. All of this can result in graphics that surpass the 16bit console era because you have a unfair advantage of having 10 times more memory bandwidth than they did. But for 8bit games what you will find the most useful is tilemap and sprite support since this requires very little data to be manipulated by the slow CPU, so its very fast even on wimpy old chips.

Oh and you want to avoid things like USB. Yes you can get a USB host IP module for a FPGA but that will only implement the actual USB port and its transfer protocol (glorified UART), it still needs the higher level protocols that initialize USB devices and drivers to talk to them. This will need a CPU to handle, so you will likely end up with a softcore CPU inside your FPGA that's actually more powerful than the Z80 itself, sitting there to run drivers for USB devices. If you want mouse and keyboard stick to the simple PS/2
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 18, 2019, 09:56:46 pm
For example, if you look at the instruction fetch cycle you'll see that the last 2T of the M1 cycle is spent "refreshing"...
The overall effect, then, is that we have introduced some ancillary logic to the Z80 so that it can vacate the system bus for 2T out of every 4T, at the cost of memory access by the CPU taking 14.3% longer than theoretically possible, and a modest hit to interrupt latency which in practice may not mean all that much...

I suspect I'm going to be reading and analysing what you've written here for months to come.  :o  But from what I can gather, there's the opportunity to steal cycles from the Z80 and open the address and data buses up to peripherals (like the video controller) whilst the Z80 is pausing for breath in the middle of its memory/IO cycles?  That's some major optimisation!!  Did you design these systems back in the 80's?

(beware, people have been knighted (https://en.wikipedia.org/wiki/ZX81) for doing this sort of thing).

Haha - yes, I'm aware there's quite a few hacks in the old Sinclair ZX80 and ZX81 (and even in the later Spectrums, I believe) to get them to display video inbetween reading the keyboard, or something? I remember there being a reason why the screen blanks out when you press a key...

Either approach is certainly feasible. Which one is more desirable is a systems-level decision that depends, in part, on the desired display size and depth, and in turn the desired pixel rate, but also on cost, code size, flexibility, programmer convenience, and so on. "Better than the Amstrad" is an open-ended brief that could encompass anything from the tightly-coupled home computer systems as napkin-designed here, to an ISA bus bridge to a CGA/VGA/EGA/Hercules card rescued from the scrap pile. In any case, there will be side effects which you also have to pursue down the line and decide to live with for your application.

That's really what I wanted to know - is there some big reason NOT to go for one approach over another.  I'm thinking for the sake of simplicity, I'll go with the Z80 only having a simple IO connection to the FPGA.  At least in the first instance - I can always change to an alternative method of interfacing later if the need arises, I guess.

Oh, if you want to play USB host, the SL811HS is a classic choice, which incidentally can also be configured as a device should you wish to link your system to your PC. Be advised that USB HID can be a bit of a hairball.

Well, I'm aware there's a significant software overhead due to the vast range of HID devices - I would literally just be looking for basic keyboard reading - but perhaps I'll have to stick to PS2 then.

You are worried a bit too much about the FPGA implementation of the memory, framebuffer and access methods.

I guess this stems from my lack of experience with FPGAs.  I'm learning quite quickly from this forum that FPGAs are actually blisteringly fast and flexible beyond belief - at least to my inexperienced mind.  :scared:

So if you connect all your components to the pins of a FPGA you can implement your graphics card in any way you like. The block RAM inside FPGAs is naturally dual port anyway and allows being written to at the same time as its being read out of and can typically run at >100MHz. This means you can implement access to the memory in any way you wish. Infact it would be perfectly possible to have 8 of those Z80s talking to the same RAM inside the FPGA simultaneously, doing read/write access to the same memory locations and all at the full 8MHz speed on each Z80. Due to external modern SRAM having 10ns access times the same is possible even withe external memory chips. But since you have only one Z80 this means that 90% of the available memory bandwidth is left unused and so can be utilized by any of the video or graphics acceleration hardware built in the FPGA without slowing down the Z80 at all.

And that's exactly the kind of answer I needed - perhaps I have been overly concerned about the speed of the design, but that's born out of a lack of knowledge.  I did admit that I'm no expert in this field and have been learning as I've gone along.  ;D  So basically, I should just crack on with my preferred design and see how it goes.  These FPGAs are so fast and flexible that most of the magic will be done in the VHDL?


This draw call approach is by far not the only solution to doing this, but its a solution that fits well on modern computers because the CPU can just throw this few kilobyte large "cooking recipe" and go do something more important while the GPU independantly works hard on baking the recipe without hogging any of the CPUs resources.

This was one of the possible options floating around in my head for the IO-based interface.  The Z80 could just send a load of commands and data to the FPGA which would buffer it into a FIFO (if needed - the FPGA will likely not need to buffer much as it's so much quicker, I'm guessing) and the FPGA would perform operations on those commands into the frame buffer whilst it's being read and streamed to the output by the video signal generation part.  That's the other thing I need to remember about FPGAs - they're not one computation block, all their components run in parallel with each other.

Obviously implementing all the draw calls of a modern graphics card is an insanely huge task, but in order to get some impressive graphics going you only need a few simple ones. Things like filling a rectangle with a single color or drawing one bitmap on top of another bitmap, perhaps also with transparency effects and alpha blending. Later on you might want to add support for 2D transformation matrices as that requires little hardware but lets you implement the stuff that for example the SNES can do in Mode 7. All of this can result in graphics that surpass the 16bit console era because you have a unfair advantage of having 10 times more memory bandwidth than they did. But for 8bit games what you will find the most useful is tilemap and sprite support since this requires very little data to be manipulated by the slow CPU, so its very fast even on wimpy old chips.

Absolutely - I think this will be the biggest area for development then.  The actual architecture of the graphics card sounds like it will be really simple, aside from level converters for the 5v to 3v3 and some basic IO address decoding, there will be little else other than the FPGA and HDMI output.  In fact, I suppose there's no reason why I can't integrate the IO address decoding into the FPGA as well?  This is getting easier by the minute!  ;D

Oh and you want to avoid things like USB. Yes you can get a USB host IP module for a FPGA but that will only implement the actual USB port and its transfer protocol (glorified UART), it still needs the higher level protocols that initialize USB devices and drivers to talk to them. This will need a CPU to handle, so you will likely end up with a softcore CPU inside your FPGA that's actually more powerful than the Z80 itself, sitting there to run drivers for USB devices. If you want mouse and keyboard stick to the simple PS/2

 :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 19, 2019, 12:20:09 am
If you want to learn how to design a computer system from the ground up, start designing your own RISC-V RV32I core. It's got only 37 instructions (if I can count lol), and there is a full-blown gcc compiler for it, so you can use real-deal C/C++! Once you have a core, you can start designing peripherals and figuring out how to connect them to the CPU bus, than at some point you will realize you need DMA controller to send data around without CPU involvement - this will force you to implement a multi-master bus so that peripheral can do bus mastering as well as the CPU core, etc. There is almost no limit to how far you can go with this - implementing multi-core system, adding support for external expansion buses (like PCI, or PCI Express), and so on!

For video I really recommend using HDMI because it's so ridiculously easy to implement in its most basic form (RGB-24bit output, video-only), and lower resolutions aren't very taxing on FPGA performance-wise or layout.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 19, 2019, 12:31:47 am
Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...

Best reference documents for DVI-D are the Digital Display Working Group's specifications.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 19, 2019, 03:46:44 am
I suspect I'm going to be reading and analysing what you've written here for months to come.  :o  But from what I can gather, there's the opportunity to steal cycles from the Z80 and open the address and data buses up to peripherals (like the video controller) whilst the Z80 is pausing for breath in the middle of its memory/IO cycles?  That's some major optimisation!!
You do have the gist of it correct. You don't need to pay too much mind to the ramblings of an animated professor in a lecture. Try a drier summary: a system bus, designed for 2T bus cycles, can be time-multiplexed into 4T frames of two fixed time slots, each 2T long (one bus cycle). Each slot is dedicated to an independent master(s), each slot which may perform up to one bus cycle every frame. You can place some glue logic between the Z80 bus (or any other master(s) of your choice) and the system bus to connect/disconnect the master's address, data, and control signals to/from the bus at the beginning/end of its time slot, to smooth over timing differences between the 2T system bus and the Z80's 4T cycles, to hold data received from the system bus until the master is ready for it, and to keep the Z80's machine cycles synchronized with its time slot. The 1980s-era home computer, in 127 words.

If you're a visual learner, print out and clip out the timing diagrams from the Timing section of the Z80 data sheet and lay them on a table, with each start of T1 aligned vertically. Use another sheet of paper to cover from the end of T2 onward and observe that you have enough information to start a cycle. Have scissors handy to cut the memory cycles at the beginning of T3 and introduce one TW of space between. Print/cut out another set, maybe in a different color, and lay them out offset by 2T from the first set. Shuffle them around and examine the interplay.

Quote
Did you design these systems back in the 80's?
No, I was just a boy at that time, but I gave one of the Amiga designers a ride to San Francisco once. :)  I did have a few of those 8-bit machines and a couple of 16-bit machines, and fell in passionate, enduring love with PEEK and POKE. I followed the demo scene for a bit later on, even tried to write a few screens that turned out nothing to write home about.

Quote
That's really what I wanted to know - is there some big reason NOT to go for one approach over another.  I'm thinking for the sake of simplicity, I'll go with the Z80 only having a simple IO connection to the FPGA.  At least in the first instance - I can always change to an alternative method of interfacing later if the need arises, I guess.
You would need to design the multiplexed bus into the system from the beginning. There's always the next machine, right? >:D  If you're using private, single-ported frame buffer RAM, the video system doesn't master the system bus, so time-multiplexing the system bus would be a needless complication, especially if there is already a "normal" DMA controller in the system. Time-multiplexing may still be useful on the private frame buffer side, to arbitrate between the pixel serializer, the graphics coprocessor, and the system bus interface. As powerful as FPGAs are, they're not quite quantum computers and there will still be single resources that must be shared by multiple clients. DRAM chips, even with their multifarious burst modes, still take a moment to close a row and open a new one.

I will call your attention to the timing diagrams for memory vs. I/O accesses, and point out that I/O accesses are 1T longer than memory accesses, by design. That may add up when moving a lot of data from system memory to the frame buffer without DMA. I'll also point out that there are 16-bit load/store instructions which save one or two instruction fetches for every two bytes, and may be more convenient (and faster) for programming, but there are no 16-bit I/O instructions as far as I can tell.

Since you do have the full address bus at the FPGA's disposal, and you prefer not to use a memory-mapped window into frame buffer RAM, I offer the idea of mapping individual registers into a 4kB or so block of address space rather than using a single command/data register pair.

Quote
Well, I'm aware there's a significant software overhead due to the vast range of HID devices - I would literally just be looking for basic keyboard reading - but perhaps I'll have to stick to PS2 then.
Fair enough. You can add-on USB later if one day you happen to wake up really ambitious and feel like porting code from MSX USBorne project :)  For the SL811HS you need only two addresses in I/O space, eight data lines, and the usual bus handshaking signals, which could fit very easily on an expansion board.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 19, 2019, 04:49:44 am
Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...
Isn't all of that stuff optional? My memory on HDMI spec is rather rusty atm.
But whatever - the point is - it works, and fundamentally it's just a couple of counters and few comparators to figure out blanking areas - which incidentally you will need for VGA, the only difference is output SERDES. But on the other hand you get full 24bit color space to work with without a need for any sort of DACs (of which you'll need three for VGA - again if my memory serves me), and you need just 8 pins + few auxiliary ones like HPD and I2C channel, instead of a ton of pins to external parallel DACs. And you get a clear path for upgrade if you ever decide for it - this one is the most important for me (even more so that I'm working on DisplayPort module so that I can go up beyond FullHD).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 19, 2019, 07:19:37 am
I'll go with the Z80 only having a simple IO connection to the FPGA

With a physical Z80 cpu chip? at 5V? having the FPGA's IO at 3.3V?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 19, 2019, 07:43:22 am
Just wanting to be clear... Most people are talking about implementing DVI-D, not HDMI, even though they use the same physical architecture and low-level coding scheme. True HDMI is a step up, and involves adding data islands, audio, BCH codes and so on...
Isn't all of that stuff optional? My memory on HDMI spec is rather rusty atm.
But whatever - the point is - it works, and fundamentally it's just a couple of counters and few comparators to figure out blanking areas - which incidentally you will need for VGA, the only difference is output SERDES. But on the other hand you get full 24bit color space to work with without a need for any sort of DACs (of which you'll need three for VGA - again if my memory serves me), and you need just 8 pins + few auxiliary ones like HPD and I2C channel, instead of a ton of pins to external parallel DACs. And you get a clear path for upgrade if you ever decide for it - this one is the most important for me (even more so that I'm working on DisplayPort module so that I can go up beyond FullHD).

If it doesn't have video guard bands, TERC4 data islands and the data island that defines the display format it isn't HDMI, it's just DVI...

Let me know if you need a hand or advice with DisplayPort. A few years ago I knew the older spec backwards, and got 4k streams going on a few different boards... These might help:

https://github.com/hamsternz/FPGA_DisplayPort

https://github.com/hamsternz/DisplayPort_Verilog
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 19, 2019, 09:13:10 am
If you want to learn how to design a computer system from the ground up, start designing your own RISC-V RV32I core...

One thing at a time, asmi.  :o ;)

For video I really recommend using HDMI because it's so ridiculously easy to implement in its most basic form (RGB-24bit output, video-only), and lower resolutions aren't very taxing on FPGA performance-wise or layout.

Yes, I'm sold on HDMI and will be building with that end goal in mind.  Plus the Spartan 6 - my FPGA of choice, it seems - appears to have hardware support for HDMI in its IO blocks.

The 1980s-era home computer, in 127 words.

You make it sound so simple...  :o  ;D

No, I was just a boy at that time, but I gave one of the Amiga designers a ride to San Francisco once. :)  I did have a few of those 8-bit machines and a couple of 16-bit machines, and fell in passionate, enduring love with PEEK and POKE. I followed the demo scene for a bit later on, even tried to write a few screens that turned out nothing to write home about.

Oh wow - I loved my Amiga(s).  I had an A500+ and then upgraded to the A1200 with a hard drive when I went to university.  They were going to rule the world - then the PC happened.  ::)  PEEK and POKE were a couple of the first commands I created for my system's monitor program.  They were like dark magic back when I was a kid in the 80's.

I will call your attention to the timing diagrams for memory vs. I/O accesses, and point out that I/O accesses are 1T longer than memory accesses, by design. That may add up when moving a lot of data from system memory to the frame buffer without DMA. I'll also point out that there are 16-bit load/store instructions which save one or two instruction fetches for every two bytes, and may be more convenient (and faster) for programming, but there are no 16-bit I/O instructions as far as I can tell.

Well, I'll have to wait to get a prototype up and running to see the performance of the IO interface and decide if I need to look at something different.  I'm not a professional games programmer, so I'm not talking about blockbuster graphics, parallax scrolling and FMV when I talk about what I want from game graphics, but I guess it still waits to be seen what the interface will be capable of.  I'm aware that memory access is faster than IO access because of the extra commands and the additional WAIT state that the Z80 inserts into IO cycles, I'm just not sure how that will play out practically - I'm hoping the difference will not be noticeable.

Since you do have the full address bus at the FPGA's disposal, and you prefer not to use a memory-mapped window into frame buffer RAM, I offer the idea of mapping individual registers into a 4kB or so block of address space rather than using a single command/data register pair.

Let me make sure I understand this - any writes to the appropriate memory space would be picked up by the FPGA and read into internal memory which effectively shadows the system's memory 'window'?  Actually, that would be really useful to transport large amounts of data from the system into the GPU.  Rather than using a sequence of, say, (a minimum of) 64 IO writes to create a sprite in the GPU's memory, the whole thing could be copied into the RAM 'window' using the faster memory commands...

I'll go with the Z80 only having a simple IO connection to the FPGA

With a physical Z80 cpu chip? at 5V? having the FPGA's IO at 3.3V?

Yes, a physical Z80 at 5v and the FPGA at 3.3v.  I did mention earlier that I'd be level-shifting the voltages between the system and GPU, probably using 74LVC buffers or transparent latches or whatever, so don't panic.  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 19, 2019, 09:50:40 am
Yes, a physical Z80 at 5v and the FPGA at 3.3v.  I did mention earlier that I'd be level-shifting the voltages between the system and GPU, probably using 74LVC buffers or transparent latches or whatever, so don't panic.  ;D

no panic, just, I did this stuff two years ago, and it consumed a lot of precious time, hence, reconsidering it, I have some regrets about my past choices. Anyway, I am not here to motivate or demotivate people, so do as you wish. Just, if I were in you, I would about to add extra chips on the PCB.

There are 3.3V Z80 compatible cores.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 19, 2019, 10:25:40 am
(https://jeelabs.org/img/2017/DSC_5808b.jpg)
Like this (https://jeelabs.org/article/1714b)  :D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 19, 2019, 05:22:18 pm
There are some CPUs, such as the 6800 for which there are not (at least not open source) any cycle-accurate softcores. The Z80 however already has more than one very well tested softcore that works just like the real thing, no need to have a real phyiscal Z80 in the mix if you already have the FPGA. The softcore can do everything the original can and much more if you wish.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 19, 2019, 07:49:11 pm
There are some CPUs, such as the 6800 for which there are not (at least not open source) any cycle-accurate softcores. The Z80 however already has more than one very well tested softcore that works just like the real thing, no need to have a real phyiscal Z80 in the mix if you already have the FPGA. The softcore can do everything the original can and much more if you wish.

Well, I'm building this GPU for a hardware Z80 system.  Whilst I appreciate that the FPGA could do everything my hardware system does, that's not the point of this little project.  Perhaps an FPGA is total overkill, though.

I've been looking a little more closely at the Spartan LX45 and it's looking less and less likely that I'll be able to use it, even if I could justify the cost.  I think BGA is a step too far for my soldering skills and equipment at this stage, and the sheer number of pins on those FPGAs will stretch my DipTrace licence past breaking point.  One way around it is to just use one the cheap development boards and plug that straight into my 'GPU card'.  Limited IO, but with the FPGA, SDRAM, clock and programming circuitry done for me...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: ledtester on October 19, 2019, 07:49:50 pm
You might be interested in the video chip being developed for the 8-bit guy's "Dream Machine":

(starts at 9:50)

https://youtu.be/sg-6Cjzzg8s?t=9m50s
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 19, 2019, 08:03:42 pm
Oh wow - I loved my Amiga(s).  I had an A500+ and then upgraded to the A1200 with a hard drive when I went to university.  They were going to rule the world - then the PC happened.  ::)  PEEK and POKE were a couple of the first commands I created for my system's monitor program.  They were like dark magic back when I was a kid in the 80's.
Excellent. That gives me a touchstone to explain some things by analogy. By the way, you didn't have anything important to do this weekend, did you? https://archive.org/details/Amiga_Hardware_Reference_Manual_1985_Commodore   >:D  :-DD

Just for interest, you may be aware there are HDL implementations of the Amiga OCS that might be imported directly into your design, with mimimal modifications. A Z80 driving the OCS chip set could make for a pretty wild experiment, even at 1/2 the memory bandwidth.

Quote
Well, I'll have to wait to get a prototype up and running to see the performance of the IO interface and decide if I need to look at something different. I'm not a professional games programmer, so I'm not talking about blockbuster graphics, parallax scrolling and FMV when I talk about what I want from game graphics, but I guess it still waits to be seen what the interface will be capable of.  I'm aware that memory access is faster than IO access because of the extra commands and the additional WAIT state that the Z80 inserts into IO cycles, I'm just not sure how that will play out practically - I'm hoping the difference will not be noticeable.
Digital design is still not a qualitative discipline ;) , but I'll bite anyway. 75% of theoretical bandwidth won't be much noticeable for relatively low-intensity usage like ping-pong or the like. But also see below.

Quote
Let me make sure I understand this - any writes to the appropriate memory space would be picked up by the FPGA and read into internal memory which effectively shadows the system's memory 'window'?  Actually, that would be really useful to transport large amounts of data from the system into the GPU.  Rather than using a sequence of, say, (a minimum of) 64 IO writes to create a sprite in the GPU's memory, the whole thing could be copied into the RAM 'window' using the faster memory commands...
That would be the memory-mapped window into frame buffer RAM I mistakenly believed you disfavored, but yes, I do think it's a very good idea. I'd also be sure it services reads as well. In fact, once it is servicing both reads and writes, you could make the window very large, like a megabyte give or take, and perhaps scrap the bank switching entirely. Then you have something very much like Amiga chip RAM, in that whatever memory isn't being used by the display can be used by the Z80 for general purposes and/or graphics...

But I was actually proposing that you memory- (or I/O-) map the control registers, using the system bus control/address/data signals to more or less directly read/write registers inside the FPGA and control the video hardware, analogous to the common idiom of using 74377 or similar ICs with suitable decoding as byte-wide input-output ports. C pseudocode:
Code: [Select]
const struct my_video_chip *video = 0xDFF000; /* ;) */
void setpalettecolor_reg(uint8_t index, uint8_t red, uint8_t green, uint8_t blue)
{
  video->palette[index].red = red;
  video->palette[index].green = green;
  video->palette[index].blue = blue;
}
void enabledisplay_reg()
{
  video->displayenable |= 1; /* SET 0, (HL) */
}

Quote
DipTrace
You could always switch horses to KiCAD. #justsayin
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 19, 2019, 08:15:30 pm
Well, I'm building this GPU for a hardware Z80 system.  Whilst I appreciate that the FPGA could do everything my hardware system does, that's not the point of this little project.  Perhaps an FPGA is total overkill, though.

I've been looking a little more closely at the Spartan LX45 and it's looking less and less likely that I'll be able to use it, even if I could justify the cost.  I think BGA is a step too far for my soldering skills and equipment at this stage, and the sheer number of pins on those FPGAs will stretch my DipTrace licence past breaking point.  One way around it is to just use one the cheap development boards and plug that straight into my 'GPU card'.  Limited IO, but with the FPGA, SDRAM, clock and programming circuitry done for me...

The LX45 is a very nice, very large (by hobby standards) FPGA, I would say that for what you are describing it is massively overkill. Just to put things in perspective, an entire 8 bit computer including the CPU (Grant's Multicomp for example) or any of the bronze age arcade games I've recreated fit comfortably within the ancient and tiny (by current standards) EP2C5T144C8 FPGAs which you can get ready to go on a little dev board for around $12. There is lots of middle ground too, if you want Xilinx the LX9 is an inexpensive and very capable part you can get in a reasonably hobbyist friendtly TQFP package. You can also interface a development board directly to your existing Z80 project and use that for prototyping and then once you have a design implemented that you are satisfied with you can look at the consumed resources and select a less expensive FPGA sufficient for your design and build more tidy custom hardware. One of the really awesome things about FPGAs is that it's very easy to make large portions of the code very portable, a lot of my learning was accomplished by porting projects I found from one platform to another.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: rstofer on October 19, 2019, 09:10:21 pm
It might be worthwhile to compare the capabilities of the XL45 and any of the Artix 7 devices, perhhaps even something as small as the 50T
https://www.xilinx.com/support/documentation/selection-guides/7-series-product-selection-guide.pdf (https://www.xilinx.com/support/documentation/selection-guides/7-series-product-selection-guide.pdf)
https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf (https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf)

I think even the Artix 7 50T has more resources than the XL45 and certainly the 100T which is becoming quite common completely dwarfes the XL45.  The 35T is somewhat smaller than the XL45 if that matters.

None of that resources stuff matters much when compared to the fact that the old Spartan 6 devices are not supported by Vivado and the ISE 14.7 version is the last release of ISE and it is no longer supported.  Yes, I still use it for my Spartan 3 projects but, for new stuff, I'm using the Artix 7 chips and Vivado.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: rstofer on October 19, 2019, 09:36:27 pm
Just for a checkpoint, I have the Pacman game running on a Spartan3E 1200 chip
https://www.xilinx.com/support/documentation/data_sheets/ds312.pdf (https://www.xilinx.com/support/documentation/data_sheets/ds312.pdf)
This is much smaller than the XL45
https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf (https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf)

Pacman rolls around like a BB in a bowling alley on that Spartan 3E in terms of logic, see attached PDF

There is a Z80 core, the graphics display and all of the IO pins and it fits easily in the 3E so it will darn sure fit in the XL45
Note, however, that over 2/3 of BlockRAM is used.  This is typical of most cores; we want a lot of RAM.  There a lot going on with the graphics and PROMs.

This is the board I chose to use.  It's pricey for what it was but at the time it was high end.
https://store.digilentinc.com/nexys-2-spartan-3e-fpga-trainer-board-retired-see-nexys-4-ddr/ (https://store.digilentinc.com/nexys-2-spartan-3e-fpga-trainer-board-retired-see-nexys-4-ddr/)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 19, 2019, 10:12:24 pm
Excellent. That gives me a touchstone to explain some things by analogy. By the way, you didn't have anything important to do this weekend, did you? https://archive.org/details/Amiga_Hardware_Reference_Manual_1985_Commodore   >:D  :-DD

Not now...  :popcorn:

Just for interest, you may be aware there are HDL implementations of the Amiga OCS that might be imported directly into your design, with mimimal modifications. A Z80 driving the OCS chip set could make for a pretty wild experiment, even at 1/2 the memory bandwidth.

Ooh, that sounds like the kind of Frankenstein experiment that just might bear some interesting and very warped fruit...  I wouldn't profess to have a tenth the skills required to do something like that, though.  Whilst I cut my teeth programming on the Amiga, it was with Blitz Basic more than anything else and I never got anywhere near the metal, as it were.

That would be the memory-mapped window into frame buffer RAM I mistakenly believed you disfavored, but yes, I do think it's a very good idea...

Well, I disfavoured the idea of a full frame buffer in the Z80's memory space, specifically.  It would have been prohibitively small and/or meant I would have to modify my (basic) MMU and system bus to accommodate multiple masters on the address and data buses.  That was a little further down the rabbit hole than I care to venture at the moment.  However, a simple shadowing of a small amount of memory (e.g. 4 KB as you suggested) into the FPGA would be easy enough to implement and require no modifications to the system buses at all.

I'd also be sure it services reads as well.

That would require some modification of the host system,  but it wouldn't be impossible I guess.  The easiest way for me to do that would be to keep one of the memory sockets empty, which would provide a 512 KB window that the FPGA could intercept the reads and writes to..  Half a meg would be more than big enough for a frame buffer for the resolutions I'm wanting to use, and would allow double-buffering on all the major screen modes if I was able to substitute that window for 512 KB of RAM on the FPGA via SDRAM, for example... hmmm... Tempting...

Using the SDRAM on the FPGA seems to me and my inexperienced mind to be a lot more complicated than using dual-port RAM in the FPGA itself.  Can anyone convince me otherwise?  If I can make use of the SDRAM easily, without any major drawbacks or slowdowns, then I needn't worry about using a big, expensive FPGA with lots of internal RAM...

But I was actually proposing that you memory- (or I/O-) map the control registers, using the system bus control/address/data signals to more or less directly read/write registers inside the FPGA and control the video hardware, analogous to the common idiom of using 74377 or similar ICs with suitable decoding as byte-wide input-output ports.

I thought that was what I was intending to do previously?  My plan was to use IO port calls to address registers in the FPGA?

Quote
DipTrace
You could always switch horses to KiCAD. #justsayin

Oh I've had this discussion with others before.  When I first dipped my toes in the waters of electronics a couple of years ago, to start looking at documenting the tweaks and changes I was making to Grant's simple breadboard computer designs, I tried using KiCAD.  The experience left me comparing it to older versions of Blender.  If you don't know your 3D graphics software, Blender is an open-source 3D software program that can produce professional-quality 3D models, renders, even films and games.  The power it has under the hood is amazing.  But the UI cripples it and puts up an almost vertical learning curve before you even try to do anything.

KiCAD is like that for me - I just cannot get on with it, can't commit the time to learn it.  I found myself spending more time looking up how to do simple things and create patterns for components that weren't there, that I searched around and found DipTrace.  Yes, I'm limited to 500 pins per schematic for DipTrace (I'm on a free 'hobby' licence), but it's so damn easy to use and makes wonderful PCB designs with none of the complexity of KiCAD that I've put up with the pin limit and it's only become a problem now, when I'm looking at FPGAs that potentially use up all that pin count with one part.  I've actually started using EasyEDA now for this GPU design and whilst setting up the schematic is okay, I'm dreading designing the PCB...

The LX45 is a very nice, very large (by hobby standards) FPGA, I would say that for what you are describing it is massively overkill. Just to put things in perspective, an entire 8 bit computer including the CPU (Grant's Multicomp for example) or any of the bronze age arcade games I've recreated fit comfortably within the ancient and tiny (by current standards) EP2C5T144C8 FPGAs which you can get ready to go on a little dev board for around $12.

Well, the reason I'm looking at something as overkill as the LX45 is primarily for the RAM size.  I need something large enough to hold a frame buffer internally, so that it can be dual-ported and spit out a pixel stream whilst it's being written to (allowing for timed writes so as not to cause screen tearing, obviously).  Now, if someone can tell me that the FPGA could easily use an attached SDRAM chip for the frame buffer and be able to form a coherent pixel stream without slowing writes down so much that performance would suffer over the internal 'dual-port' design, then I'd be all for getting a smaller, older, cheaper FPGA to do the job.  For one, it'd be hand-solderable and easier to integrate into a custom card for my system and two, it'd probably fit within my 500-pin design limit so I could keep the design within software I am happy working with, so I can't over-emphasise how important a design 'win' an SDRAM frame buffer would be.

As it is, anything in a BGA form-factor will either require me to spend a fortune getting it assembled at point of manufacture and limit my choices to chips the PCB manufacturer offers, or restrict me to making an FPGA dev board socketable into my custom PCB for my system, which will cause headaches as it probably won't fit within my stacking PCB form-factor without modification (removal of power sockets from the dev board etc).

There is lots of middle ground too, if you want Xilinx the LX9 is an inexpensive and very capable part you can get in a reasonably hobbyist friendtly TQFP package.

Yes, I've been looking at this as the best Spartan 6 I can get in a TQFP package.  Will mean lower-res / less colours, but it'll still be within my design brief.

You can also interface a development board directly to your existing Z80 project and use that for prototyping and then once you have a design implemented that you are satisfied with you can look at the consumed resources and select a less expensive FPGA sufficient for your design and build more tidy custom hardware. One of the really awesome things about FPGAs is that it's very easy to make large portions of the code very portable, a lot of my learning was accomplished by porting projects I found from one platform to another.

Well, that's what I'll be doing initially - I'm waiting on an LX16 dev board to arrive from overseas.  I'll be connecting that up to my system via jumpers and a 74LVC-infested breadboard whilst I test the system and develop the VHDL for it.  Once it's done, like you say, I'm not restricted to only using it on LX16s...

I think even the Artix 7 50T has more resources than the XL45 and certainly the 100T which is becoming quite common completely dwarfes the XL45.  The 35T is somewhat smaller than the XL45 if that matters.

None of that resources stuff matters much when compared to the fact that the old Spartan 6 devices are not supported by Vivado and the ISE 14.7 version is the last release of ISE and it is no longer supported.  Yes, I still use it for my Spartan 3 projects but, for new stuff, I'm using the Artix 7 chips and Vivado.

I haven't looked at the Artix range - it's handy to note that Spartan 6's aren't supported in the latest development software, though - thanks rstofer.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 19, 2019, 10:19:03 pm
(http://www.downthebunker.com/chunk_of/stuff/public/boards/fpga-sp2-150-gb-adv.jpg)
SP2-150, for GameBOY ADV

This one is 5V tollerant, and it's interfaced with a GameBOY ADV.

Is the FPGA 5v tolerant, or have you done the level conversions yourself?  I can't see much in the way of voltage conversion on those boards?

This was another thing that struck me when I start drawing up the schematic with the LX9 that I was unaware of before - it actually requires 3.3v, 1.8v and 1.2v (or similar, I'm running from memory)?

So I can't just run all the address, data and control lines from the Z80 side through some 74LVC components to translate them down to 3.3v?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 19, 2019, 10:42:34 pm
Is the FPGA 5v tolerant, or have you done the level conversions yourself? 

It was made this way because the FPGA is already 5V tolerant. Certain SP2 were, but starting from SP3 they are all 3.3V. The purpose was to simplify stuff, especially the PCB, which is very small.

Voltage level shifters are fine (2), but ... well, that stuff runs at 50Mhz, not so high speed, but enough to cause signal integrity issues(1), and thus requires more cure. There were a couple of bugs with my first batch of PCB. Then I redesigned them.

(1) Ok, today I own a decent MSO, debugging that stuff is easier than years ago.
(2) you can even use a 5V tollerant CPLD as voltage-adapter, it's a good trick, and it might help routing the the PCB. In my case, I used it to adapt 32bit data + 32bit address + 9bit control from a 5V CPU to 3.3V SP3-500. It was a must, because I realy *HATE* every 68020 softcore, hence I wanted an ASIC chip. Now I am using Coldfire v1, and they are already 3.3V. Easy life, less problems, more fun.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 19, 2019, 11:22:05 pm
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 20, 2019, 02:55:40 am
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.
Video cards are also known for requiring (and making good use of) very high memory bandwidth. But it just isn't possible in anything other than BGA packages, as it requires a lot of pins - even the very modest (by video card standards) 64-bit interface requires close to 100 pins to implement, but 128/256 bit interfaces are more popular, and even 512 bit ones are not unheard of. Moreover, most GPUs are modular and each submodule has it's own memory controller, so this requires even more pins to implement as each controller needs to have it's own set of address/command pins. For example RTX 2080 Ti has 11 memory controllers (it's actually the cut-down version of a die which has 12 memory controllers)!
Also since BGA packages provide signal integrity that is far superior to any leaded packages, DDR2 and up memory chips are only available in BGA packages.

Now, I personally love reasonable-pitch BGAs and always choose them over any other packages, but I know some people have some sort of anti-BGA religion, which is why I'm bringing this up.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 20, 2019, 03:01:09 am
Are you forgetting that this is being built to work with an 8 bit Z80? I don't think it needs to be anything super fancy, there's no point in making the video card powerful enough to deal with more data than the host CPU is capable of sending to it. We're not talking about modern high power GPUs.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 20, 2019, 07:40:59 am
Using internal block RAM in FPGAs has a range of advantages, so if a big enough FPGA can be had cheaply enough its a good idea to use it.

Firstly the internal RAM blocks are naturaly dual port and that often comes in usefull, then they are separate blocks that slot together like legos to build RAM of various sizes, so you can have multiple smaller memories that can be accessed in parallel rather than one big memory that everyone needs to get multiplexed to. This last part gives it an insanely high memory bandith too. For the DIYer probably the best of all that it doesn't need any external components and so zero pins to hook up.

But if you want lots of resolution and the ability to bitblit bitmaps around in the 'GPU' (You will need that on a 8bit CPU as that thing will be awfuly slow at pushing pixels around manualy) then you will need more RAM than you can get in reasonably priced FPGAs. The easiest option is SRAM since you just throw an address on its pins and the data shows up, and you can buy it up to about 8MB for a reasonable price. When the Z80 wants to access it you just switch over the CPUs memory bus to the SRAM pins and after its done switch it back to the GPU. Perhaps also add a latch on the bus so you can perform access to SRAM at full speed (~100MHz) and quickly return control back to the GPU while the latch holds the current memory translation on the Z80s bus for its casual 8MHz clock to get around to grabbing the data. You can also use SDRAM, DDR, DDR2, DDR3 and such on FPGAs just fine but its a lot more complex. These memories need some initialization and use pipelining (You execute a read on this cycle but get the data 8 cycles later) so they prefer if you access them in large burst operations while needing to be refreshed by the controller to retain data. Also DDR needs proper length matched traces so you can't just wirewrap wire it together. But as a reward for using any of those complex DRAMs you get gigabytes of RAM for cheep.(Not that useful on a 8bit system tho)

However graphics systems in game consoles have often used a combination of external DRAM and RAM internal to the video chip. Since these chips generate the image live as it is being sent out means that they need to look at a lot of things simultaneously to decide on the color of just one pixel. With the typical memory of the time running in the handful of MHz this made it only fast enough to do 1 or 2 read transaction per pixel. To get around this only the large data is kept in external RAM like actual image data, while other things are kept in lots of tiny internal RAMs inside the video chip, this is things like the color pallete, sprite locations, text charactermaps, sometimes tilemaps... Then as years went by and memory got faster and provided enough bandwidth to perform a handful of operations per one pixel we got more color depth, more layers, transparency, parallax scrolling and all that lovely eye candy. These improvements don't put all that much extra burden on the CPU as the GPU is doing the heavy work of throwing the extra pixels around. The CPU is still just giving it a bunch of coordinates where to draw things, so you can get some amazing graphics from a Z80 if you give it a powerful graphics chip.

But yeah for things that you are doing i wouldn't worry about memory bandwidth since modern memory is so much faster that it will be fast enough for anything.

EDIT: Oh and you would most definitely not want to have your graphics memory directly on the Z80 memory bus, not only does that create multi-master whoes its also slows things down while providing seemingly no real benefit. Its much easier to just connect the FPGA pins to the Z80 bus and have it act as a "gatekeeper" to separate graphics RAM. The FPGA will look to the Z80 like just another SRAM chip so it can read and write to it just the same, but since the FPGA is sitting in between it can hand over the RAM to the graphics hardware whenever the CPU is not using it. A small part of this same "RAM window" memory range can also be used to hold the graphics hardware registers and any other peripherals you might want to have (sound, keyboard etc..)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 20, 2019, 09:58:08 am
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.

Okay, so SDRAM is a definite possibility then, which opens the field in terms of FPGA options right up.  I can look at a TQFP 5v-tolerant FPGA (there must be a few around still, or perhaps even the odd one still being made?), strap an SDRAM to it and let it rip.

Also since BGA packages provide signal integrity that is far superior to any leaded packages, DDR2 and up memory chips are only available in BGA packages.

Now, I personally love reasonable-pitch BGAs and always choose them over any other packages, but I know some people have some sort of anti-BGA religion, which is why I'm bringing this up.

I have nothing against BGA generally, except I've never had to solder one and (at least it looks like) it requires specialist equipment to get the job done with any level of confidence.  I don't have the funds to throw at an IR reflow oven, so the best option I would have is to build my own - the videos on YT don't fill me with confidence, though.  And I'm stepping a looong way away from one of my goals, which was for my computer to be build-able by just about anyone with a little confidence with a soldering iron.

The other issue with BGAs is that I would likely have to switch from 2-layer PCBs to 4-layer if the BGA grid is dense enough, with cost implications there too.

Are you forgetting that this is being built to work with an 8 bit Z80? I don't think it needs to be anything super fancy, there's no point in making the video card powerful enough to deal with more data than the host CPU is capable of sending to it. We're not talking about modern high power GPUs.

Indeed.  I'm designing for an 8 MHz Z80 system, from an era when a single DIP-chip would handle the CRT by outputting a composite, UHF or RGB (if you're lucky) signal.  If it can handle vector graphics or a handful of sprites and a bitmap or tiled background on a 320x240 screen so that I can play Pong or Bubble Bobble, I'll be overjoyed.  Anything else will be a bonus.  ;D

The CPU is still just giving it a bunch of coordinates where to draw things, so you can get some amazing graphics from a Z80 if you give it a powerful graphics chip.

This is what I'm intending to do with the GPU - have the CPU pass instructions/coordinates/sprite and bitmap setup data to the GPU, then have the GPU throw the bytes around the frame buffer.

But yeah for things that you are doing i wouldn't worry about memory bandwidth since modern memory is so much faster that it will be fast enough for anything.

 :-+  Marvellous, that's good to know.

EDIT: Oh and you would most definitely not want to have your graphics memory directly on the Z80 memory bus, not only does that create multi-master whoes its also slows things down while providing seemingly no real benefit. Its much easier to just connect the FPGA pins to the Z80 bus and have it act as a "gatekeeper" to separate graphics RAM. The FPGA will look to the Z80 like just another SRAM chip so it can read and write to it just the same, but since the FPGA is sitting in between it can hand over the RAM to the graphics hardware whenever the CPU is not using it. A small part of this same "RAM window" memory range can also be used to hold the graphics hardware registers and any other peripherals you might want to have (sound, keyboard etc..)

Okay, so how about this for the interface to the GPU?  My current MMU design gives me up to 8x 512 KB chip sockets. Socket 1 is SRAM, Socket 8 is ROM, 2-7 can be anything you want.  If I 'remove' Socket 7 for example, there will be no RAM/ROM in the system trying to reply to the Z80 when it addresses that particular 512KB memory range.  I can then get the FPGA to accept all RD/WRs to that 512 KB window and treat them as direct access to the GPU's frame buffer and registers.  512 KB will only use a tiny fraction of the SDRAM on the other side of the FPGA, but will give the Z80 a huge area to load/assemble bitmaps, sprites, LUTs, symbols into.

Does that sound like a workable plan?

EDIT:

I know this sounds like I'm going back on my original statement that I didn't want a frame buffer in the system memory, but this isn't quite the same thing as an 80's multiplexed frame buffer and all the bus arbitration that would come with it.  This is physically removing a chunk of the system's memory and having the FPGA replace it, if that makes sense?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 20, 2019, 11:29:54 am
Okay, so how about this for the interface to the GPU?  My current MMU design gives me up to 8x 512 KB chip sockets. Socket 1 is SRAM, Socket 8 is ROM, 2-7 can be anything you want.  If I 'remove' Socket 7 for example, there will be no RAM/ROM in the system trying to reply to the Z80 when it addresses that particular 512KB memory range.  I can then get the FPGA to accept all RD/WRs to that 512 KB window and treat them as direct access to the GPU's frame buffer and registers.  512 KB will only use a tiny fraction of the SDRAM on the other side of the FPGA, but will give the Z80 a huge area to load/assemble bitmaps, sprites, LUTs, symbols into.

Does that sound like a workable plan?

EDIT:

I know this sounds like I'm going back on my original statement that I didn't want a frame buffer in the system memory, but this isn't quite the same thing as an 80's multiplexed frame buffer and all the bus arbitration that would come with it.  This is physically removing a chunk of the system's memory and having the FPGA replace it, if that makes sense?

Yes exactly the FPGA would act as a memory in that whole 512KB area.

But the FPGA can also do some address decoding of its own to map some hardware control registers into the first 1KB of its 512KB address space while the rest is a window into video RAM. This gives you the area to control the video hardware (Like choosing at what address the framebuffer is for double or tripple buffering, selecting video modes, holding sprite tables etc...). You can also have a register that offsets the 512KB RAM window around letting you roam across 64MB of video memory for example. (Old cartridge based consoles heavily relied on banking to fit the large ROMs into the limited memory space)

Having access to video memory from the CPU is quite useful since you don't need to implement GPU functionality to load images and tables into video memory, instead you just load it in yourself. Also the CPU can use this memory as part of its program. For example if you are making a game you can use the sprite table to hold the position and type of enemy characters on screen rather than keeping this data separate in CPU RAM and then having to update the sprite table in video memory on every frame.

But the main benefit to have video RAM separated and behind a FPGA is that the video RAM can run at full speed. The FPGA can only draw pixels as fast as it can write them to memory so having 10 times faster memory means it can draw 10 times more pixels per second.

Tho the memory bandwidth is more important if you go the modern drawcall route since that tends to keep everything in RAM, rather than the fully hardware based tilemaps and sprites that generate graphics on the fly without writing to RAM at all. On the other hand the drawcall route is more flexible as it can draw any image on to any other image while simultaneously modifying the image, compared to sprites and tilemaps tend to be limited to fixed sizes and grid patterns. But you can still package up tilemap functionality in the form of a draw call like "Draw tilemap(at 0x5000) using tileset(at 0x7200) into framebuffer(at 0x1000 with the offset 944,785)"

As for vector graphics, home game consoles don't use them all that much in 2D. Its usually 3D when vector graphics get hardware acceleration, but at that point the GPU also often ends up having features like matrix math for fast manipulation of 3D points and texture mapping/shading to make those triangles look pretty and textured rather than being flat solid colored triangles. In the past this 3D functionality was hardwired in the GPU (like tilemaps and sprites are hardwires to work in a certain way) as part of its fixed rendering pipeline that takes the 3D models and turns them into a image (like 2D tilemaps turn into a image). Later on this was made flexible by breaking the process up in to steps so the GPU could be asked trough additional drawcalls to do extra fancy eye candy in between the transitional rendering steps. Even later programmable sharers ware introduced and that made GPUs flexible enough to be used as a giant math coprocessor to mine cryptocurencies, encode video and run physics simulations.

So yeah you could generate 3D graphics similar to a N64 on that Z80 if that FPGA had hardware 3D graphics. But getting all of that to work would be a huge project (Doable by a single person but would take a very long time).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 20, 2019, 03:40:41 pm
Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.

You mean the FPGA would read the 'frame buffer' in the Z80's logical memory space into an internal buffer really quickly, then pass that out as a pixel stream?  Otherwise surely the frame buffer would be sending data at the rate the screen mode requires, which means 73% of the time it would be sending data and locking the Z80 out?  Not sure I understand this fully.

I mean the RAM would be accessed alternately from the CPU and the video controller, in like "access slots". The CPU would not be "locked out", since its own access slot would always be available. Sure the video controller would have more slots available than the CPU, but you just need to clock this fast enough so that the slot allocated to the CPU is less than or equal to the minimum memory access time for the CPU (I don't remember about memory access with the Z80, but I'd guess a typical memory access would be more than 1 cycle, thus this should not be hard to get with even a moderately fast RAM chip.)

As I said, to do this in a simple manner, the CPU and video controller clocks should be synchronized and frequencies should be multiple. For instance, if the video controller needs 3 times as much bandwidth, the RAM access would have 4 slots, 1 for the CPU, and 3 for the video controller. The memory access (MMU?) clock would be 4 times the CPU clock here for instance, and the RAM would be accessed at this elevated frequency (which would still not be that fast for a modern RAM chip). Note that it would be relatively easy to do with SRAM, as long as its access time fits the requirements. With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: james_s on October 20, 2019, 06:05:23 pm
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: langwadt on October 20, 2019, 06:18:59 pm
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.

and afaict the Z80 is TTL input levels so it'll with work with 3.3V input and the usual wimpy TTL high is already limiting the output current, The Xilinx datasheets usually say it is ok to use the ESD diodes for voltage limiting as long as the input current is limited to something like 20mA and the total is less than the sink on that supply
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 20, 2019, 07:46:49 pm
The easiest way for me to do that would be to keep one of the memory sockets empty, which would provide a 512 KB window that the FPGA could intercept the reads and writes to..  Half a meg would be more than big enough for a frame buffer for the resolutions I'm wanting to use, and would allow double-buffering on all the major screen modes if I was able to substitute that window for 512 KB of RAM on the FPGA via SDRAM, for example... hmmm... Tempting...
You got this.  :-+

Quote
Using the SDRAM on the FPGA seems to me and my inexperienced mind to be a lot more complicated than using dual-port RAM in the FPGA itself.  Can anyone convince me otherwise?  If I can make use of the SDRAM easily, without any major drawbacks or slowdowns, then I needn't worry about using a big, expensive FPGA with lots of internal RAM...
I confess that it is a bit more complicated than using a block RAM, and your system will have to be built to accommodate a response that will take several, possibly varying clock cycles to come back, but that doesn't mean you have to do it all yourself when plenty of open IP cores exist to help you out. Some FPGA design suites will even generate and parametrize DDR/DDR2/DDR3 controllers to your order. Often they will present an interface similar to a synchronous SRAM, but occasionally will be full-fledged bus slaves conforming to whatever standard. Wishbone is common in the open hardware world. It's designed to be modular so that you can simply omit complications that you don't want or need. The Simple Bus Architecture (https://en.wikipedia.org/wiki/Simple_Bus_Architecture) is a minimized version of Wishbone in exactly that vein. Also, the Z80 doesn't do everything all at once, and you don't need to either. Consider using a FIFO inside the FPGA to hold pixel data, maybe even a whole scan line ahead. Pipelining is very much your friend.

Quote
Oh I've had this discussion with others before.  When I first dipped my toes in the waters of electronics a couple of years ago
KiCAD 4 was a hot mess two years ago, absolutely. The UI wasn't very good or consistent. There were three separate layout toolsets, each with their own particular commands. The library editors were simply awful, which was a real problem in practice. Eagle was far better put together.

Fast forward a couple of years, with the help of the universe-destroyers at CERN and a full-time project manager KiCAD 5 has blossomed quite a bit. The symbol/footprint librarians have been given extra, dearly needed love. There's an interactive push-and-shove router. The three layout toolsets have been unified and rationalized. The whole thing now feels a bit less like a student project and more like a workflow for people to use. Even our gracious host Dave, who worked at Altium by the way, has been moderately impressed with it, even if some of the pro-level conveniences are lacking. If it's been a couple of years since you've had a look, I encourage you to give the latest a fresh hour or two. Be aware that there is still no whole-board autorouter built into KiCAD, and you would need to use an external application for autorouting such as FreeRouting, TopoR, etc. Personally I don't consider that a problem as I prefer more control over where my traces go and how they get there. But, if that's a deal-breaker for you, fair enough, I won't push it.

As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident. There are universal librarian softwares and services, who maintain a library of symbols and footprints in a master format and translate them on-demand to whatever eCAD software you have. That doesn't imply that any given universal symbol is well-laid-out or available for that part you need right now. So a decent footprint/symbol editor and librarian is a must.

Disclaimer, I've not used any of the pro-level software, but I cut my teeth on Eagle 4.x and was happy with it until my designs approached the free tier's size/layer/field-of-use restrictions. Now that Autodesk has made Eagle part of a cloud service, it's a hard pass.

With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
Two comments: The delays of an activate-precharge cycle might not be noticed within a 125ns Z80 clock cycle, never mind a 375ns memory cycle. There should be plenty of time even with single data rate SDRAM to complete a burst of 8 or 16 bytes into a pixel FIFO and perform any byte read/write from host or blitter that might or might not be waiting, in a fixed rhythm. Second, the Z80 was designed to make DRAM usage easy, with the row address on the bus early. With some clever SDRAM controller programming, the row address can be likewise passed along early to the SDRAM chip to open the row early, allowing usage of slower DRAM, maybe even the good old asynchronous DRAM chips which are still available, but at closer to $1 per megabyte than $1 per gigabit. Still I think it's overkill for the present application.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 21, 2019, 05:43:31 am
As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident. There are universal librarian softwares and services, who maintain a library of symbols and footprints in a master format and translate them on-demand to whatever eCAD software you have. That doesn't imply that any given universal symbol is well-laid-out or available for that part you need right now. So a decent footprint/symbol editor and librarian is a must.

Disclaimer, I've not used any of the pro-level software, but I cut my teeth on Eagle 4.x and was happy with it until my designs approached the free tier's size/layer/field-of-use restrictions. Now that Autodesk has made Eagle part of a cloud service, it's a hard pass.

I have been using Altium Designer for quite a few years and despite it being a professional tool and its large library of parts i still end up drawing most of the symbols and footprints myself.

There are certain style guides i want components to conform to, the way they are named, the way i want supplier links for ordering. But above all is how the symbol is drawn. I like to arrange the pins in certain ways to make for a neat schematic, for MCUs i like the peripherals written down on the pin rather than just being called "PB3", sometimes i draw extra things inside the IC rectangle to make it more clear what the chip is doing(Like logic ICs or MUXes or Digital pots etc). So its almost as fast to just draw my own rather than modify there library component to fit my liking.

The important part is that the editors for symbols and footprints are pretty good. I can draw a 100 pin IC in <10 minutes if the pin names can be copy pasted from a pinout table in the datasheet and this includes arranging the pins around to my liking. The footprint for that same chip in TQFP or BGA form takes another 3 minutes because Altium has a footprint generation wizard that supports almost all sensible package types where you just enter about 3 to 10 dimensions from the drawing and it spits out a footprint that includes a 3D model.

Not saying Altium is the best PCB tool because it still does some things badly and sometimes crashes in weird ways, but it does do some things right like library creation (But lets not speak about maintaing large libraries as they have a long way to go there)

With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
Two comments: The delays of an activate-precharge cycle might not be noticed within a 125ns Z80 clock cycle, never mind a 375ns memory cycle. There should be plenty of time even with single data rate SDRAM to complete a burst of 8 or 16 bytes into a pixel FIFO and perform any byte read/write from host or blitter that might or might not be waiting, in a fixed rhythm. Second, the Z80 was designed to make DRAM usage easy, with the row address on the bus early. With some clever SDRAM controller programming, the row address can be likewise passed along early to the SDRAM chip to open the row early, allowing usage of slower DRAM, maybe even the good old asynchronous DRAM chips which are still available, but at closer to $1 per megabyte than $1 per gigabit. Still I think it's overkill for the present application.

Yeah modern SDRAM is so fast the FPGA would be able to present it like its SRAM to the slow 8MHz Z80 since the SDRAM could be clocked so much faster that the FPGA has time to select the appropriate row, execute a read and clock the data trough the pipeline in the time the Z80s read access happens. Or if it got really unlucky and was right in a refresh cycle when that happened maybe the FPGA can pull on the Z80s WAITn line to halt it a bit.

The extra complexity comes in the GPU where to use the full speed of SDRAM it has to be built to cope with the RAMs pipelined fashion. So performing reads and writes in large bursts and making sure it always has something in the pipeline so that it doesn't end up sitting there waiting for data to come in. Yes all modern digital computing is heavily pipelined because its required to get the high clock speeds, but it comes at a price.

Since you probably don't need more than a few megabytes of memory i think SRAM is a safer choice. It is truly random access and has no confusing pipelining.

I have worked with FPGAs here and there and pipelining still hurts my brain as when writing pipelined HDL code its hard to keep track of what register is holding the data for what cycle of the pipeline. I usually tend to have to resort to drawing a timing diagram of the whole thing so that i can visually see how many cycles something is behind or in front of something and then turn that timing diagram into code.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 21, 2019, 10:24:11 am
But the FPGA can also do some address decoding of its own to map some hardware control registers into the first 1KB of its 512KB address space while the rest is a window into video RAM. This gives you the area to control the video hardware (Like choosing at what address the framebuffer is for double or tripple buffering, selecting video modes, holding sprite tables etc...). You can also have a register that offsets the 512KB RAM window around letting you roam across 64MB of video memory for example. (Old cartridge based consoles heavily relied on banking to fit the large ROMs into the limited memory space)

Yes, I really like this idea.  It side-steps the whole issue of using a narrow bottleneck in the IO ports to speak to the GPU and transfer data a byte at a time, and using multiple IO calls to write or read data (setting the register first, then writing or reading from it in the next IO call) and the compulsory WAIT state for each IO transaction.  It also opens up the Z80's 16-bit memory operations.  Not seeing many negatives - I'm sure my little Z80 can handle only having 3.5 MB of 'system' memory space rather than the full 4 MB.  It's a far cry from the 80's when it would only have 64 KB to squeeze a frame buffer and everything else into...  ::)

Having access to video memory from the CPU is quite useful since you don't need to implement GPU functionality to load images and tables into video memory, instead you just load it in yourself. Also the CPU can use this memory as part of its program. For example if you are making a game you can use the sprite table to hold the position and type of enemy characters on screen rather than keeping this data separate in CPU RAM and then having to update the sprite table in video memory on every frame.

Nice. Could even copy the character set from ROM in the FPGA to a symbol table in RAM on power-up and allow the Z80 to change it for a customisable symbol set. Lots and lots and lots of possibilities...

But the main benefit to have video RAM separated and behind a FPGA is that the video RAM can run at full speed. The FPGA can only draw pixels as fast as it can write them to memory so having 10 times faster memory means it can draw 10 times more pixels per second.

Might come back to this further down as the discussion is turning more towards using SRAM instead of SDRAM... not sure there'd be much of a performance penalty for substituting SRAM as it sounds like SDRAM has its shortcomings.  As I'm working with an 8-bit processor running at a modest clock speed, I can't see it needing to make use of the SDRAM's pipelined architecture.  Sure, the FPGA will appreciate it, but with such low screen resolutions and such, I'm thinking SRAM will be a much easier choice for me to deal with.

Tho the memory bandwidth is more important if you go the modern drawcall route since that tends to keep everything in RAM, rather than the fully hardware based tilemaps and sprites that generate graphics on the fly without writing to RAM at all. On the other hand the drawcall route is more flexible as it can draw any image on to any other image while simultaneously modifying the image, compared to sprites and tilemaps tend to be limited to fixed sizes and grid patterns. But you can still package up tilemap functionality in the form of a draw call like "Draw tilemap(at 0x5000) using tileset(at 0x7200) into framebuffer(at 0x1000 with the offset 944,785)"

I think the bandwidth will be important either way, as the sprites, tilemaps and symbol table will all be in RAM rather than ROM.  The symbol table holding the ASCII char set for example, will be read from RAM so that the user can customise the look of it if they desire.  Same with sprites and tile maps - not much use unless they are customisable.

As I said, to do this in a simple manner, the CPU and video controller clocks should be synchronized and frequencies should be multiple. For instance, if the video controller needs 3 times as much bandwidth, the RAM access would have 4 slots, 1 for the CPU, and 3 for the video controller. The memory access (MMU?) clock would be 4 times the CPU clock here for instance, and the RAM would be accessed at this elevated frequency (which would still not be that fast for a modern RAM chip). Note that it would be relatively easy to do with SRAM, as long as its access time fits the requirements. With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).

I'm all for keeping this as simple as possible, hence going for the 'memory window' approach rather than interleaving access to shared memory - I've never used SDRAM and know next to nothing about using it, so perhaps the fastest SRAM I can reasonably get hold of would be the best choice for the FPGA's frame buffer?  I use 55ns SRAM for the system memory, would need to be a fair bit faster for the FPGA to improve it's pixel draw rate I guess.

I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.

Okay, sounds good.  If the Z80 can talk directly to the FPGA it would make things a fair bit simpler.

and afaict the Z80 is TTL input levels so it'll with work with 3.3V input and the usual wimpy TTL high is already limiting the output current, The Xilinx datasheets usually say it is ok to use the ESD diodes for voltage limiting as long as the input current is limited to something like 20mA and the total is less than the sink on that supply

I'm using a CMOS Z80 and mostly HCT glue logic, apart from a couple of LS parts in the MMU that the FPGA would be exposed to the outputs from.  That's promising info though - so it could be possible to just connect the FPGA directly to the system's address and data lines?  I have a 3.3v power rail in the system that the FPGA would be powered from.

KiCAD 4 was a hot mess two years ago... Fast forward a couple of years, with the help of the universe-destroyers at CERN and a full-time project manager KiCAD 5 has blossomed quite a bit. The symbol/footprint librarians have been given extra, dearly needed love. There's an interactive push-and-shove router. The three layout toolsets have been unified and rationalized. The whole thing now feels a bit less like a student project and more like a workflow for people to use. Even our gracious host Dave, who worked at Altium by the way, has been moderately impressed with it, even if some of the pro-level conveniences are lacking. If it's been a couple of years since you've had a look, I encourage you to give the latest a fresh hour or two.

Okay, I'll give it another go.  Being limited to 500 pins and 2 layers in my PCB schematic and PCB work is a bit of a bind for me, especially now looking at these FPGAs with pin counts in the hundreds. Heh, I'm even looking at some DIY reflow oven tutorials on YT.  I'm sure I read somewhere that someone used to do BGA soldering with their frying pan?  :o

Be aware that there is still no whole-board autorouter built into KiCAD, and you would need to use an external application for autorouting such as FreeRouting, TopoR, etc. Personally I don't consider that a problem as I prefer more control over where my traces go and how they get there. But, if that's a deal-breaker for you, fair enough, I won't push it.

It's not a deal breaker, but it has been a good reason to stay with DipTrace all this time and work within the licence limits.  I need to really just sit down with an hour or two to spare and put together one of my computer's card schematics in KiCAD and route the PCB and see how it goes.

As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident.

Of course, and I have been doing that with DipTrace.  The big difference is that I haven't had to do it that often and, when I have, it has been for relatively obscure parts.  The process of creating a new component and its associated footprint is also pretty easy and straightforward.  The only thing that grips me with DipTrace is that it could be slightly more automated in net naming and attaching. Having to manually click on Every. Single. Pin. on a 16-bit address bus and select the next net address line in the list to link to it is a little tedious.  And that's just a 16-bit address bus...  I noticed the other day while playing with EasyEDA that it automatically increments the net name when you click on the next pin - so linking D0-D7 to a RAM chip is as simple as clicking, entering the first net name 'D0', then 6 more clicks on the rest of the data pins and you're done.  Some thought has gone into that UX.

Since you probably don't need more than a few megabytes of memory i think SRAM is a safer choice. It is truly random access and has no confusing pipelining.

Okay, SRAM it is.  I certainly don't need any extra confusion - I haven't written a single line of VHDL yet, so this is going to be a steep-enough learning curve for me without adding in extra complexity.  :o

I have worked with FPGAs here and there and pipelining still hurts my brain as when writing pipelined HDL code its hard to keep track of what register is holding the data for what cycle of the pipeline. I usually tend to have to resort to drawing a timing diagram of the whole thing so that i can visually see how many cycles something is behind or in front of something and then turn that timing diagram into code.

Righto, thanks Berni.  Steer clear of SDRAM, got it.  ;) :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: migry on October 21, 2019, 04:15:36 pm
Well I have been playing around with hobbiest (read low cost) FPGA boards (both Xilinx and Altera) for the past year, with a particular interest in VGA/video generation and retro computing. So I am going to chip in my 10 cents worth.

Firstly I would recommend visiting YouTube and watching the videos of "Land Boards, LLC" as I think their videos are a perfect fit for answering your questions. I found this channel only recently. They have several videos on implementing Grant Searle's design on a couple of cheap FPGA boards one of which is a Cyclone II board which you might have. They also sell on Tindie and in their videos show a Static RAM expansion board (no idea if this can still be bought). BTW - I have no connection with this company. My interest was the videos they show of the cheap Chinese ZR-Tech Cyclone IV FPGA board. They keep mentioning their GitHub repository, so I think you are likely to find everything you need right there!

Without knowing your level of FPGA, RTL, electronics and video experience/knowledge it is difficult to give accurate advice, but I would suggest start with simple video generation using VGA. There are many text only VGA displays to be found (e.g. OpenCores) which are a great starting point. You will need to get familiar with how video displays work, horizontal and vertical sync and video timings. What is interesting is that todays HDMI "video" is based on the old CRT analogue video standards of yesteryear. Again there is lots of tutorials on the web and YouTube, just one can be found on the FPGA4fun website. Text only video needs little memory so will fit in any FPGA and use internal RAM blocks. VGA allows you to look at the R,G,B and syncs using a scope (if you have one). This can be very educational.

If you want a graphics video solution, then you need frame buffer memory. The problem I faced was that if you want to fit the frame buffer only in internal RAM blocks, this means a very expensive FPGA. Most hobbiest boards use FPGAs with the smallest amount of RAM blocks, to keep the cost low. Adding a fast Static RAM is the easiest way to go, but some FPGA boards don't have enough spare I/O. Some hobbiest FPGA boards have a large SDRAM, but these are not so easy to understand and operate (again check out Hamster's website), and access speed can be an issue. For example one of the retro computer designs (MISTer???) doesn't use the on-board DDR(?)RAM, but adds an external RAM of some kind. So even these clever coders couldn't get the required bandwidth from the on-board memory.

Just FYI, my own solution, is that I designed my own Cyclone II FPGA board/PCB and incorporated a 12ns 128k bytes Static RAM. Even then there were signals crossing multiple clock domains and this caused me a lot of problems  ???  (I am no expert but then I'm no noob).

The "Land Boards" solution added a small daughter PCB with a SRAM for video and CPU memory.

BTW Digilent do a relatively low cost Artix-7 board which has a large fast SRAM, so that's worth a consideration. I have one. FYI, powering this board needs careful attention (otherwise expect lots of USB problems).

Once you have got experience with the "simple" VGA solution, then by all means investigate HDMI. As others have mentioned "Hamster_NZ" has done some excellent work implementing HDMI and his website is a wealth of knowledge. I have re-coded his code in Verilog and ported to a number of different FPGAs. One hurdle is finding a way to hook up the HDMI socket. There are certainly boards which incorporate a HDMI socket but they tend to be more expensive. Some HDMI implementation use LVDS channels from the FPGA directly, but others implement HDMI using an Analogue Devices chip (which might be an easier solution - I do not have one of these boards to try out). I have a board from Numato with HDMI in and out connectors. Also from Numato I bought a HDMI PMOD board, and this has worked well.

The downside of HDMI is that if it doesn't work, it is not so easy to debug. I ported my Verilog to a Cyclone II. I appear to get sync, but the video is wrong. I have run simulations and the channel data packets look correct. I have no idea how to start to debug this! Advice welcomed  :-+

Have fun. and post from time to time with updates or further questions

regards...
--migry


Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 21, 2019, 10:04:54 pm
     Hmmm.   OP:  Can you definitively say what resolution you want for an 8 bit computer?  Earlier you said you wanted XXXkb.  Also, will you have a text mode with addressable modifiable font?  Will the font be dedicated in memory, or use a system memory?  How many colors?  Color palette?  Different video modes on different lines on the screen?  Sprites?

     The problem is with a video controller, you have ram access which is demanding and cannot be part of separate system mory of your 8 bit cpu core, so, is something like a cheap Atari 800 graphics design, the video processor and CPU's memory are intertwined and with a minimum VGA output with X colors, your video output will eat a lot of bandwidth.

     If 64kb is enough, and you want an all internal no ram design, look for a PLD/FPGA with at least 1 megabit internal so it may hold everything with some extra space, otherwise, at the level you seem to be at I strongly recommend buying an existing development eval board for either Altera or Xilinx with a VGA or HDMI out and at least 1 DDR/DDR2 ram chip at minimum.  Make sure the eval board is documented well and not a Chinese only demo code which you cannot dissect to your liking.

Example all in 1, 144pin qfp PLD (no boot prom needed, 1 single supply voltage, I narrowed the selection to 64kbyte and 128kbyte)
https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10 (https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10)

Yes, this is not the cheapest, but it is an all in 1 IC with single supply, no external parts except for video dac (use a 4x74HC574 and resistors if you like) and TTL 3.3v-5v translator IC for data bus, though, you might get away with schottky diode camps and series resistors on the inputs coming from your 5v digital logic.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 21, 2019, 11:09:28 pm
Firstly I would recommend visiting YouTube and watching the videos of "Land Boards, LLC" as I think their videos are a perfect fit for answering your questions. I found this channel only recently. They have several videos on implementing Grant Searle's design on a couple of cheap FPGA boards one of which is a Cyclone II board which you might have. They also sell on Tindie and in their videos show a Static RAM expansion board (no idea if this can still be bought). BTW - I have no connection with this company. My interest was the videos they show of the cheap Chinese ZR-Tech Cyclone IV FPGA board. They keep mentioning their GitHub repository, so I think you are likely to find everything you need right there!

Thanks migry, that's a great recommendation - I'll take a look.  :-+

Without knowing your level of FPGA, RTL, electronics and video experience/knowledge it is difficult to give accurate advice...

FPGA level - absolute beginner, never tried programming one (yet)  ;)
RTL level - what's that?  :o
Video experience - I taught my mum and dad how to use their first VHS back in the 80's?  :-\

However, I make up for all that lack of knowledge/experience with a very inquisitive mind, eagerness to learn and able to listen to good advice.  I built my computer (image earlier in this conversation) using those attributes with no prior education or experience in electronics, so I'm confident I might get halfway to achieving my goal of reasonable 8-bit graphics.  That's not to say I think this will be a walk in the park! ;D

...but I would suggest start with simple video generation using VGA. There are many text only VGA displays to be found (e.g. OpenCores) which are a great starting point. You will need to get familiar with how video displays work, horizontal and vertical sync and video timings.

Yes, I've gone through Grant Searle's VHDL code to see how he did his text display and have read up on the video display inner workings - frame rates, pixel clocks, hsync/vsync, back porch/display/front porch, have seen videos on generating a VGA signal using nothing more than 74-series counter logic etc.  I'm hoping I don't annoy everyone here too much that I can't ask questions further down the line on the software (VHDL) aspect of the design.  ;)

Text only video needs little memory so will fit in any FPGA and use internal RAM blocks.

Yep, see the screen resolutions table further down this reply - I've calculated how much memory each screen mode will require (though I'm not 100% certain they're totally correct, but I'm a beginner!) All the colour modes are using a colour LUT, so I'm calculating one byte per pixel for a maximum of 256 colours on screen at any one time out of a palette of... lots.


     Hmmm.   OP:  Can you definitively say what resolution you want for an 8 bit computer?  Earlier you said you wanted XXXkb.  Also, will you have a text mode with addressable modifiable font?  Will the font be dedicated in memory, or use a system memory?  How many colors?  Color palette?  Different video modes on different lines on the screen?  Sprites?

(https://i.ibb.co/5Lm0yst/Untitled.png) (https://imgbb.com/)

Above is a table of screen modes that I calculated I could fit into a relatively small (<64 KB) memory space, if I was going to go with an FPGA of sufficient size and complexity.  Larger memory would allow more colours, but 640x480 is my intended maximum resolution.  The lower resolution modes would be achieved using pixel stretching, or whatever the correct term is for duplicating the same pixel horizontally and the same line vertically to reduce the effective resolution.  The output that the TV would see would always be 640x480 (obviously I'm using PAL timings).

Mode 0 will be the text mode.  This is going to be the first screen mode I try to implement, as there is plenty of VHDL out there doing this already, Grant Searle's code for example.

The text mode, and any subsequent 'non-text' modes which will still be capable of displaying text as well as graphics, will initially use a symbol table in ROM.  My intention as I develop the VHDL design is to copy that table to RAM (if there is sufficient, naturally) at power-up and read it from RAM when displaying text, giving the user the option of changing the design of those symbols (changing fonts, for example, or allowing the use of special characters as graphics tiles).

Currently, it looks as though I'll be using a fast SRAM to store the frame buffer, character tables, colour LUT, sprites etc.  But initially, I'll be using whatever is available in the FPGA as this will be the easiest (and fastest) solution whilst I develop the FPGA design.  Once I have the text mode working, I can start looking to implement a graphics mode in a resolution that will fit in the available FPGA RAM, but will look to move on to using an external dedicated SRAM chip to open up the screen resolutions etc.

The interface will be a 512 KB 'hole' in the system memory, which the FPGA will monitor for read/writes and action accordingly, and provide a few registers at certain memory locations to control things like clearing the screen, text colour etc.

     If 64kb is enough, and you want an all internal no ram design, look for a PLD/FPGA with at least 1 megabit internal so it may hold everything with some extra space, otherwise, at the level you seem to be at I strongly recommend buying an existing development eval board for either Altera or Xilinx with a VGA or HDMI out and at least 1 DDR/DDR2 ram chip at minimum.  Make sure the eval board is documented well and not a Chinese only demo code which you cannot dissect to your liking.

There are two issues with getting an FPGA with a large enough internal memory - otherwise I'd go straight for the Spartan 6 LX45 (or a more modern equivalent) - that's price and, most importantly, the package the FPGA comes in.  I don't have the equipment or skills to reflow a 484-pin BGA onto one of my PCBs.  I'm researching building a reflow oven and learning to use less restrictive (but less user-friendly) design software, but it could be a long time before I have the equipment and confidence to try soldering a £50 BGA onto a home-designed board.

I have a Spartan 6 LX16 dev board on its way with 32 MB SDRAM and a breakout HDMI connector that I can wire to the dev board, so that base is covered.  I already have a VGA connector, as well, so I can try either option whilst experimenting with the FPGA design.

Example all in 1, 144pin qfp PLD (no boot prom needed, 1 single supply voltage, I narrowed the selection to 64kbyte and 128kbyte)
https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10 (https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10)

Yes, this is not the cheapest, but it is an all in 1 IC with single supply, no external parts except for video dac (use a 4x74HC574 and resistors if you like) and TTL 3.3v-5v translator IC for data bus, though, you might get away with schottky diode camps and series resistors on the inputs coming from your 5v digital logic.

Okay, they're interesting devices...  No, they're not cheap but there's a huge plus straight away - they're hand-solderable LQFP packages and have reasonable amounts of internal RAM... thanks for the suggestion. :)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 21, 2019, 11:58:58 pm
For video mode, use a 480p, @ 27Mhz clock for your video out.  This is the standard for all TVs HDMI making outputting that a breeze.  Use the 640x480 for 4:3 centered inside and expand to 720x480 for 16:9.

Go for the 1.28mbit Max10 as it is only 3$ more than the 512kbit.  With this, you will be able to superimpose color text ontop of the 320x240x256 color paletted graphics as well as full vertical height colored sprites.

Remember, you can still store the palette in registers instead of ram, allowing for 1 bank immediate with all the colors being updated during the v-sync for clean pallet animation transitions.

  Internally, run the IC at 4x27Mhz, 108Mhz, (feed it 27mhz and use the internal PLL to 4x or 6x or 8x the source clock, use the PLL output to drive your dac or hdmi transmitter clock)  well within the Max10's 200Mhz+ abilities.  This gives you multiple reads per pixel with the dumbest logic, including the ability to superimpose the text font of 640x480, or 720x480 for 16:9 on top of the 1/4 res 256 color 320x480 graphics mode + multiple colored sprites.

Use internal 2 port ram, 1 read only for video output, having 4-8 reads per pixel at 108Mhz core. (doubling all your video mode specs, otherwise, you have something like 16 read slots per pixel making that read port up to a 16 channel read port with a multiplexed address input and latched data outputs)  An a second read/write port for your CPU access.

In fact, there is enough in this Max 10 to make an entire Atari 130xe with 6502 emulation & graphics and sound...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: langwadt on October 22, 2019, 12:12:10 am
how many are you going to build? when you can get FPGA,RAM,PSU and flash on a board for ~$20 I'd think you'll have to make quite a few to make it cheaper
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 22, 2019, 12:18:29 am
There are two issues with getting an FPGA with a large enough internal memory - otherwise I'd go straight for the Spartan 6 LX45 (or a more modern equivalent) - that's price and, most importantly, the package the FPGA comes in.  I don't have the equipment or skills to reflow a 484-pin BGA onto one of my PCBs. 
Why do you need such a large package? Spartan-7 devices up to S50 are available in a FTGB196 package, which was specifically designed to be fully broken out on a 4 layer board, which are very cheap nowadays. This package has 100 user IO pins (2 banks containing 50 pins each), which should be plenty for your needs. And you can easily solder it using just a hot air gun.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 22, 2019, 12:46:11 am
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)  Though, you will still need proper ESD protection on the HDMI port. with load termination resistors to adapt the voltages.

OR:
http://www.ti.com/product/TFP410/samplebuy (http://www.ti.com/product/TFP410/samplebuy)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 22, 2019, 03:23:25 am
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)  Though, you will still need proper ESD protection on the HDMI port. with load termination resistors to adapt the voltages.
Spartan-7 and Artix-7 support TMDS natively, easily outputting 720p and (with some effort) even 1080p on almost any pins (48 out of 50 pins in a bank can be used as differential pairs, and each pair has a pair of ISERDES/OSERDES bound to them).
As for ESD, I love TPD12S521 - it combines ESD protection for main data lanes, and voltage level translator 5 V <-> 3.3 V for DDC, CEC and HPD lines.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 22, 2019, 06:15:16 am
Yep i recommend using the DVI serialiser chip that BrianHG suggested. It makes DVI as simple as VGA since you just put the RGB data into the chip rather than into a DAC, all happening at nice low clockspeed parallel speeds, so its easy to even do 1080p 60Hz

As for memory organization i would recommend you keep the color pallete and character ROM inside the FPGA block RAM. These memories are accessed pretty much on every pixel and are small enough to fit easily fit into it, so it will save you a lot of accesses to external memory. Only keep the large stuff that wont fit inside in the external memory such as framebuffers, tilemaps, drawcall tables...

As for graphics modes id say you could combine modes because you have plenty of bandwidth. Like for example having a 640x480 text mode overlayed across a 640x480 graphics mode with 16 or even 256 colors. Gives you a easy fast way to show text on top of graphics. Tho for any sort of games you probably don't want a oldchool PC CGA like graphics mode anyway as the Z80 might be kinda slow at drawing to it. To make good looking games you probably want tilemap modes like the SNES. The video hardware in that game system has 4 layered tilemaps each using its own color pallete with an additional sprite layer on top. The layers are transparently blended together and can be independently moved around in pixel granularity. This is how the SNES does parallax scrolling that gives 2D platform games the impressive visuals.

EDIT:
Here is a good example of parallax scrolling put to good use: https://www.youtube.com/watch?v=VL7jR1NN4p0&feature=youtu.be&t=837 (https://www.youtube.com/watch?v=VL7jR1NN4p0&feature=youtu.be&t=837)
Its on a Sega Genesis/Megadrive so it has similarly powerful graphics hardware to the SNES but works a bit differently.
The impressive things on the SNES generally use the famed Mode 7: https://www.youtube.com/watch?v=H6u7Nk6_L50&feature=youtu.be (https://www.youtube.com/watch?v=H6u7Nk6_L50&feature=youtu.be)
The hardware behind that is actually not that advanced, its just so called Affine transfromation in the video chip combined with some clever game programming. These transformation just need multiply and addition operations so a FPGA can calculate them in a single clock cycle at high speed using hardware multiplier blocks.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 22, 2019, 12:58:33 pm
...Go for the 1.28mbit Max10 as it is only 3$ more than the 512kbit.  With this, you will be able to superimpose color text ontop of the 320x240x256 color paletted graphics as well as full vertical height colored sprites.

Thank you - I appreciate the guidance as one thing I've come to realise during this discussion is that there is so much choice in FPGAs alone, not counting the different ways of interfacing to the system and connecting additional memory, that I'm getting quite confused by it all.

The only question I have about the Max10 is will be able to output direct to HDMI at 720x480?  I know the Spartans I've been looking at have SERDES support built-in.


In fact, there is enough in this Max 10 to make an entire Atari 130xe with 6502 emulation & graphics and sound...

Well, that's the next thing - with all that space left over, it would be remiss of me NOT to put an AY-3-8910 sound generator (or something else) in there.  That would save me an entire (troublesome) PCB in my system... but where does that stop, I guess?  ::)


how many are you going to build? when you can get FPGA,RAM,PSU and flash on a board for ~$20 I'd think you'll have to make quite a few to make it cheaper

A handful initially - one or two, maybe.  The whole thing is a one-off prototype at the moment, but I have intentions down the line to open the design up for people who want to have a go at building it and maybe try to make some of my costs back.


Why do you need such a large package? Spartan-7 devices up to S50 are available in a FTGB196 package, which was specifically designed to be fully broken out on a 4 layer board, which are very cheap nowadays. This package has 100 user IO pins (2 banks containing 50 pins each), which should be plenty for your needs. And you can easily solder it using just a hot air gun.

Purely to have enough internal RAM to hold the frame buffer.  If it has the RAM I need and can drive an HDMI output without too much hassle, it sounds like a good idea.


With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)

It would be nice to have sound too.  :(

Though, you will still need proper ESD protection on the HDMI port. with load termination resistors to adapt the voltages.

OR:
http://www.ti.com/product/TFP410/samplebuy (http://www.ti.com/product/TFP410/samplebuy)
Spartan-7 and Artix-7 support TMDS natively, easily outputting 720p and (with some effort) even 1080p on almost any pins (48 out of 50 pins in a bank can be used as differential pairs, and each pair has a pair of ISERDES/OSERDES bound to them).
As for ESD, I love TPD12S521 - it combines ESD protection for main data lanes, and voltage level translator 5 V <-> 3.3 V for DDC, CEC and HPD lines.

Okay, so the TFP410 and a MAX FPGA are starting to look like the best option (the issues soldering the Spartan packages are a big hurdle, unfortunately) - will I need something like the TPD12S521 for ESD protection if I'm using a TFP410?


As for memory organization i would recommend you keep the color pallete and character ROM inside the FPGA block RAM. These memories are accessed pretty much on every pixel and are small enough to fit easily fit into it, so it will save you a lot of accesses to external memory. Only keep the large stuff that wont fit inside in the external memory such as framebuffers, tilemaps, drawcall tables...

Yes, that makes sense.  I think I'm going to try my best to keep it ALL in the internal block RAM if I can, with a view to expanding into external RAM as I get more experienced and comfortable with the design, VHDL and FPGAs in general.

I think I mentioned several times that I intend to keep the charset in ROM and copy it to RAM when the FPGA is turned on - just to clarify, now I understand a little more about the FPGA architecture, as I understand it the charset will be in RAM anyway, just re-instated to its default data every time the FPGA is turned on, so changing the charset/symbol table will be possible from the get-go.

To make good looking games you probably want tilemap modes like the SNES. The video hardware in that game system has 4 layered tilemaps each using its own color pallete with an additional sprite layer on top. The layers are transparently blended together and can be independently moved around in pixel granularity. This is how the SNES does parallax scrolling that gives 2D platform games the impressive visuals.

Yes, that sounds like an achievable goal - though I'm going to take baby steps getting there, building on the basic text-only mode and improving it as I learn.

Here is a good example of parallax scrolling put to good use: https://www.youtube.com/watch?v=VL7jR1NN4p0&feature=youtu.be&t=837 (https://www.youtube.com/watch?v=VL7jR1NN4p0&feature=youtu.be&t=837)

I thought you were going to link me to something like Shadow of the Beast on the Amiga. ;D  Do you think what I'm designing would be capable of that sort of output (perhaps at a lower resolution?) 

The impressive things on the SNES generally use the famed Mode 7: https://www.youtube.com/watch?v=H6u7Nk6_L50&feature=youtu.be (https://www.youtube.com/watch?v=H6u7Nk6_L50&feature=youtu.be)
The hardware behind that is actually not that advanced, its just so called Affine transfromation in the video chip combined with some clever game programming. These transformation just need multiply and addition operations so a FPGA can calculate them in a single clock cycle at high speed using hardware multiplier blocks.

That's impressive stuff - some might say almost black magic for a Z80 to be able to pull off?!  (I know it's the power of the FPGA doing the gruntwork, but still, the Z80 would get the credit)  ;)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 22, 2019, 03:33:11 pm
Yeah you will be able to fit it all into internal block RAM no problem as long as you keep framebuffers small since those tend to be the largest. Indeed taking baby steps is the way to go. The VGA using 74xxx series logic is an excellent video to demonstrate how you would also develop video output on a FPGA. First you need to build a video timing generator, then once that is working set up pixel coordinate counters, feed the counters directly into the RGB data lines to make sure they generate the counting color pattern you expect, then hook up some memory between the counters and RGB lines to actually generate some images. Once you have that you essentially have a working video card where you can slowly tack on features such as a text mode, or tilemaps. This sort of progressive workflow is excellent especially for starting out since you will have enough to learn as it is and probably won't be interested in design validation trough simulators and timing analysis from the start. But once you had enough headaches debugging a non working design on real hardware you start to see why HDL simulators and writing test benches are a good idea. But still getting a cheap FPGA devboard and just having at it with simple projects is the best way to start.

As for what is possible, all the videos i have linked are perfectly possible on a Z80 with grunty enough video hardware. This also includes "Shadow of the Beast" for the Amiga, all the pretty multi-layer scrolling is the hard work of the video hardware. Each layer is its own image that is being transparently overlayed by the video chip. Additionally the Amiga has bitblit functionality so this is essentially a crude 2D GPU that can move and draw images in memory on its own while the CPU is free for other tasks. So the CPU here has very little work to do as its only running the game logic, updating positions of all enemies, changing coordinates of the layers to make them move in that sweet parallax way and commanding the bitlit to fill in the edges with more of the level as needed.

This video hardware is the reason why graphics on consoles looked so impressive while PC games looked like crap in comparison despite having faster CPUs. The PCs for a long time had very dumb video hardware that could only barf out a framebuffers contents to the monitor and that's it(ignoring text mode). So the CPU had to work hard and draw every single pixel. To get 3D graphics on PCs they had to cheat in all sorts of ways using raycasting, so early 3D games like Wolfenstein3D, Doom, Quake etc. are not actually rendering in 3D, but is actually a 2D world made to look 3D trough graphics trickery. By the time PCs got fast enough to render proper textured polygon 3D graphics, the consoles already had hardware accelerated 3D chips in the Sony PS1 and Nintendo N64 that could look way better and run smoother. Its when PCs got hardware 3D accelerator cards like the Voodoo, that's when they could finnaly rival console graphics.

Even today graphics cards are the ones driving 3D towards looking ever more impressive. Even as CPUs hit the clock speed barrier 15 year ago they kept getting faster at an exponential rate, mainly thanks to the inherently parallel nature of graphics tasks, while 15 years ago graphics cards might have had 16 processing cores in them today they have >1000 cores being fed data from a huge array of DDR memory chips at terabits per second. For a modern PC to render a beautiful highly detailed 3D scene all the CPU has to do is load in all the models and textures, give it the draw call instruction list and say "Hey GPU please render this for me". A few miliseconds later using trillions of math operations and gigabytes of data moved around out pops the finished image while the CPU was doing absolutely nothing.

EDIT:
Oh and if you wan to learn about the various graphics tricks that are used everywhere this is one of the best youtube channels for it:
https://www.youtube.com/watch?v=nBzCS-Y0FcY&feature=youtu.be&t=2310 (https://www.youtube.com/watch?v=nBzCS-Y0FcY&feature=youtu.be&t=2310)
The linked video is one of his more impressive things where he created a polygonal 3D rendering engine in pure C++ code running on the CPU, so it guides you trough all the ideas and math that are used under the hood to generate 3D graphics. Its completely impractical because its slow, but it shows exactly how it works because its coded from the ground up without any graphics library. In his other videos he goes into all sorts of other really common graphics algorithms that are used everywhere like for example the affine transform, vector drawing, pathfinding, colision detection, game physics, edge detection, image processing... etc
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 22, 2019, 08:46:03 pm
Well, that's the next thing - with all that space left over, it would be remiss of me NOT to put an AY-3-8910 sound generator (or something else) in there.  That would save me an entire (troublesome) PCB in my system... but where does that stop, I guess?  ::)
Two notes as preface: 1. There are MOS 6581 (https://github.com/alvieboy/ZPUino-HDL/tree/master/zpu/hdl/zpuino/contrib/NetSID/src) IP cores available, which you can instantiate just like any other VHDL entity, hook up address/data buses, supply with appropriate clocks, and connect to the DAC of your choice. 2. Back in my home computer days, sound programming was my specialty. Over the course I have developed some Opinions on this matter. :)

The Amiga sound hardware was remarkably simple, and it's worth considering as a guide. Each channel isn't much more than a period down-counter, an address up-counter, a sample length down-counter, and a few double-buffered latches. Being relatively insensitive to latency, a double-buffer for sample data allows you to take your sweet time reading the next sample(s) from memory and simplifies its integration into the system. Very likely you have hardware multipliers on the FPGA for volume control, AM effects, and so on. I2S is a standard interface to audio codecs and is pretty easy to use for just digital output: a continuously-running clock, a shift register, and a word (left/right channel) select is about it.

Remember when I half-jokingly suggested putting a Z80 core on the FPGA? Does that perchance remind you of any Sega console's sound hardware? ;) You wouldn't have to worry about DMA at all, just let the software do it...

Quote
Purely to have enough internal RAM to hold the frame buffer.  If it has the RAM I need and can drive an HDMI output without too much hassle, it sounds like a good idea.
You can use those block RAMs as FIFOs very easily (example user guides from Xilinx (https://www.xilinx.com/support/documentation/ip_documentation/fifo_generator/v12_0/pg057-fifo-generator.pdf), Altera (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_fifo.pdf)). You can stream the entire next raster line from the external RAM into them at the fastest pace it allows then deal with it locally according to the pixel rate, a la hierarchical memory (https://en.wikipedia.org/wiki/Memory_hierarchy). In fact, this allows you to decouple the RAM clock from the pixel clock which makes your job a lot easier, enough that you can incorporate existing DRAM/SDRAM/DDR controller cores to expand the framebuffer (or can we now call it "chip") memory without limit.

Quote
I thought you were going to link me to something like Shadow of the Beast on the Amiga. ;D  Do you think what I'm designing would be capable of that sort of output (perhaps at a lower resolution?) 
Eventually. :)  Multi-playfields are largely about memory organization and bandwidth. One key is to decouple the pixel fetch start/stop timing, memory addressing, and row stride from the display window start/stop timing, so that the former can be moved around independently of the latter, and only require that the CPU redraw the columns/rows that would be revealed in the next frame before moving the pixel start address around. Another key observation is that Amiga etc. video memory was often planar-organized (https://en.wikipedia.org/wiki/Planar_(computer_graphics)) to conserve frame buffer memory and use the memory cycles allocated to the pixel data most efficiently. With bit planes being indexed, addressed, and fetched entirely independently, the addition of a dual-playfield mode requires no modifications to the memory fetching system, only a bit of steering logic in the pixel pipeline and a priority encoder to establish Z-index relationships. Have as many playfields as you like.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: rstofer on October 22, 2019, 10:28:02 pm
For pixel graphics, there is no reason that both sides of the BlockRAM need to have the same word width.  The CPU can read/write in bytes while the graphics controller can grab up some huge number of bits at a time.

https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf (https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf)  About page 21

In theory, and I concede to never having tried it, you could organize the video side of the BlockRAM word width to be an entire scan line.

https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf (https://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf)

Of course, with dual port memory, you don't need to do this but it might be a technique to just grab the entire raster line into the pixel shift register during horizontal retrace.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 23, 2019, 12:16:13 am
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)

It would be nice to have sound too.  :(

Understand that DVI transmitter cores are free available.  I'm not sure about HDMI (maybe copyrights, but a public domain transmitter may exist), however when saying sound, this means whatever your sound generator generates, you would need re-sample convert any audio generated to the standard fixed 48Khz audio within the HDMI standard.  Also, HDMI is YUV not RGB like DVI, so, you will also need a color space converter.  (easy peasy as it is nothing more than 3x3 multiply adds...)

Now, the Max10 can run upt to 400Mhz internal and the LVDS serial bus up to 750Mhz.  That means 720p capability, but with only 128kb ram and an 8bit cpu, you are not striving for this...
You have enough logic and hardware multipliers to do the colorspace conversions.
You have the internal dual port ram speed to superimpose translucent text with animated color character fonts.
Enough registers galore to control playfield, sup-pixel video offset of font tiles for smooth scrolling tricks, number of bytes per line with margins.
Also, on your video output mux output layers with translucent colors, don't forget to add pixel / sprite collision detection registers.

In the max10, you should be able to make an enhanced 16 bit version of a 6502 processor running at over 108Mhz, maybe at 8x27Mhz video clock of 216Mhz.  It would make a SNES look like a snail.

You would never have to worry about external Dram access or memory interleave timing.

With all the FPGAs ram as a dual block, one side would be a dedicated video and audio generator while the other would have interrupt free access to an external CUP, or an internal soft-core cpu.

Yes, on my video designs, I have my video read port only side set to 32 bit (note that I have 30bit color video, or 24 bit color with 8bit alpha blend channel) and my other side set to 256bit (I have access to 128bit DDR2 memory modules, yes this speed is needed for a 4 layer 1080p video blender)  In your case, you would want 8bit or 16bit on your read only video output port as you only go up to 256 colors if you want to use 1 byte per pixel.  It also makes access to you font bitmap easily addressable to any byte in ram.   You did say this is for an 8-bit computer.  On the CPU processor side, you can choose 8, 16, 32 bit if you like, but realize you may need to use a byte write enable depending on how your CPU or graphics image processor may move individual bytes around for effects like fill, copy, clear and stencil.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 23, 2019, 01:24:05 am
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)

It would be nice to have sound too.  :(


If the H/W can do DVI-D video, it can also do HDMI with audio. Once you get video working, have a look at this project that implements the HDMI data islands required for proper HDMI video+audio support;

https://github.com/charcole/NeoGeoHDMI

But consider it a stretch goal!
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 23, 2019, 05:36:15 am
A thing I would really appreciate is a PCI Video framebuffer, made on neat and fully documented design.
VDU and 2D primitives to support X11 are all it needs to have.

There are PCI bridges to local legacy bus (kind of ISA, 16bit, with fixed address, and IRQ), hence even a simple VGA controller for 8-bit computer can be interfaced.

Not too bad, at all, and this would be seriously appreciated on non-x86 workstations where the only good thing you can do is:
- adding a Matrox M1/M2
- hacking an ATI XL

Both were reverse engineered, hence the support for what the hardware can really do is limited to the 20%

a miniPCI Video Card made on FPGA would be a super premium 


ah, sweet dreams ... are made of this :D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 23, 2019, 02:31:02 pm
A thing I would really appreciate is a PCI Video framebuffer, made on neat and fully documented design.
PCI is quite a simple bus, with full specification publicly available. So I don't see any major obstacles in implementing this. Essentially PCI is nothing more than a DMA channel into the system memory.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: langwadt on October 23, 2019, 03:38:48 pm
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support)

It would be nice to have sound too.  :(

Understand that DVI transmitter cores are free available.  I'm not sure about HDMI (maybe copyrights, but a public domain transmitter may exist), however when saying sound, this means whatever your sound generator generates, you would need re-sample convert any audio generated to the standard fixed 48Khz audio within the HDMI standard.  Also, HDMI is YUV not RGB like DVI, so, you will also need a color space converter.  (easy peasy as it is nothing more than 3x3 multiply adds...)

afaik HDMI supports both YUV and RGB
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 23, 2019, 04:00:06 pm
Yeah you will be able to fit it all into internal block RAM no problem as long as you keep framebuffers small since those tend to be the largest. Indeed taking baby steps is the way to go... But still getting a cheap FPGA devboard and just having at it with simple projects is the best way to start.

I think I'm going to make a start fairly soon with my Cyclone II.  Yes it's old, but I should be able to get a VGA text display up and running, perhaps even with some basic graphics, in the short term.  When the Spartan 6 (LX16) dev board turns up, I'll move development to that and start looking at the switch to HDMI and building out the resolutions a little further, with more work on the GPU side to flesh out the graphics capabilities.

As far as the Intel MAX 10 series (https://www.intel.co.uk/content/dam/www/programmable/us/en/pdfs/literature/pt/max-10-product-table.pdf) goes, I agree they're the probably the best I can get in a package that I am (and more importantly, others are) able to solder by hand.  I'm thinking for the sake of another fiver (~$6), I'll get the 10M50 which has over 200 KB of RAM and should give me all the room I need to do what I want with the graphics and sound.

This leads me on to choice of a development board for the 10M50, though.  Sheesh - they're not cheap for a hobbyist at my level!  I've had a look at these two:

AliExpress DE10 Lite 10M50 CPLD Development Board with VGA (https://www.aliexpress.com/item/32842850135.html) - £74
AliExpress 10M50 Eval Kit with HDMI (https://www.aliexpress.com/item/32680316361.html) - £130

Now, out of the two I prefer the 10M50 Eval Kit - yes, it's nearly £50 more expensive, but it has less rubbish on it that I won't use taking up IO pins and it has an HDMI port built-in.

Anyone have any opinions on these or suggestions for alternatives?

...So the CPU here has very little work to do as its only running the game logic, updating positions of all enemies, changing coordinates of the layers to make them move in that sweet parallax way and commanding the bitlit to fill in the edges with more of the level as needed.

I guess there's no reason why the FPGA won't be a graphics co-processor and blitter in one device if I can build those functions into it.  What was it, jhpadjustable?  Agnus, Denise and Paula rolled into one? ;)


Two notes as preface: 1. There are MOS 6581 (https://github.com/alvieboy/ZPUino-HDL/tree/master/zpu/hdl/zpuino/contrib/NetSID/src) IP cores available, which you can instantiate just like any other VHDL entity, hook up address/data buses, supply with appropriate clocks, and connect to the DAC of your choice.

Yes, I've found AY-3-891x VHDL code as well, so it looks relatively easy to sort out.  It could just be consideration how I get that audio data down the line to the TV.

2. Back in my home computer days, sound programming was my specialty. Over the course I have developed some Opinions on this matter. :)

Ah, pull up a chair and have a beer.  I may need to pick your brains later on when (if!) I get to the point of sorting out audio in VHDL!! ;)

The Amiga sound hardware was remarkably simple, and it's worth considering as a guide. Each channel isn't much more than a period down-counter, an address up-counter, a sample length down-counter, and a few double-buffered latches. Being relatively insensitive to latency, a double-buffer for sample data allows you to take your sweet time reading the next sample(s) from memory and simplifies its integration into the system. Very likely you have hardware multipliers on the FPGA for volume control, AM effects, and so on. I2S is a standard interface to audio codecs and is pretty easy to use for just digital output: a continuously-running clock, a shift register, and a word (left/right channel) select is about it.

You make it sound so easy!  :o  I can just about play the theme tune to Red Dwarf on a keyboard.  There endeth my knowledge and skills on the subject of audio and music!  ;D  I2S? Audio codecs? Digital output?  So I could output in MP3 or something?

You can use those block RAMs as FIFOs very easily (example user guides from Xilinx (https://www.xilinx.com/support/documentation/ip_documentation/fifo_generator/v12_0/pg057-fifo-generator.pdf), Altera (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug_fifo.pdf)). You can stream the entire next raster line from the external RAM into them at the fastest pace it allows then deal with it locally according to the pixel rate, a la hierarchical memory (https://en.wikipedia.org/wiki/Memory_hierarchy). In fact, this allows you to decouple the RAM clock from the pixel clock which makes your job a lot easier, enough that you can incorporate existing DRAM/SDRAM/DDR controller cores to expand the framebuffer (or can we now call it "chip") memory without limit.

Okay, that's an interesting idea for when/if I get round to using external RAM (it'd make the FPGA choice a lot cheaper!)  Set up a couple of pixel stream FIFOs in internal RAM and use the external SDRAM to burst-read into the one not being read from at that moment, so access to the external RAM would be in short bursts to read the contents, allowing the GPU to access it for data writes etc. in-between?

Eventually. :)  Multi-playfields are largely about memory organization and bandwidth. One key is to decouple the pixel fetch start/stop timing, memory addressing, and row stride from the display window start/stop timing, so that the former can be moved around independently of the latter, and only require that the CPU redraw the columns/rows that would be revealed in the next frame before moving the pixel start address around. Another key observation is that Amiga etc. video memory was often planar-organized (https://en.wikipedia.org/wiki/Planar_(computer_graphics)) to conserve frame buffer memory and use the memory cycles allocated to the pixel data most efficiently. With bit planes being indexed, addressed, and fetched entirely independently, the addition of a dual-playfield mode requires no modifications to the memory fetching system, only a bit of steering logic in the pixel pipeline and a priority encoder to establish Z-index relationships. Have as many playfields as you like.

There's a heck of a lot of jargon in there I just don't understand - "pixel fetch start/stop timing", "row stride", "display window start/stop timing"... so much to learn.  Is there a book on this stuff anywhere, or a website, or more YT videos?  I have yet to sit down with some spare time and go through some of the linked videos in the YT links people have made in previous replies, though...

Understand that DVI transmitter cores are free available.  I'm not sure about HDMI (maybe copyrights, but a public domain transmitter may exist), however when saying sound, this means whatever your sound generator generates, you would need re-sample convert any audio generated to the standard fixed 48Khz audio within the HDMI standard.  Also, HDMI is YUV not RGB like DVI, so, you will also need a color space converter.  (easy peasy as it is nothing more than 3x3 multiply adds...)

Just clear up a little confusion for me - DVI sort of equals HDMI?  Would I be able to use an HDMI connection to a TV with a DVI IP core?

The colour space converter sounds like a simple bit of VHDL to add to the project, but what about the audio conversion?  Are there IP cores for that sort of thing?

Also, on your video output mux output layers with translucent colors, don't forget to add pixel / sprite collision detection registers.

That'll come later, but yes I'll need to think about how to add collision detection etc.

You would never have to worry about external Dram access or memory interleave timing.

I do like the sound of that.  It's just a high price to pay and I can see that I'll want to try to overcome the issues with external RAM so that I can use a smaller/cheaper FPGA.

...but realize you may need to use a byte write enable depending on how your CPU or graphics image processor may move individual bytes around for effects like fill, copy, clear and stencil.

Huh? Sorry, I don't follow.  Could you expand on this a little, please?   :-\


Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 23, 2019, 05:44:18 pm
I mean, if someone develops a simple video card for 8-bit computer, I know to add a PCI interface to it: via PLX chip. Already done, my team has the know/how and a dev-board to manage it.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 23, 2019, 06:02:05 pm
Yes having the video output hardware and blitter in the same FPGA makes a lot of sense since both need access to graphics RAM. Its not all that difficult to MUX the memory bus inside the FPGA around to different parts that need it. That blitter will let you move things around in RAM at easily >10 times higher speed than the Z80 could.

In fact you can even use the same graphics RAM to hold audio data too. Typically memory is not needed for that since the audio from the 8bit era tend to be just chiptune like for example the Commodore SID chip. But later on MOD music became popular since it had the flexibility of wave files but was still small in size. This does need a memory to hold the audio samples in RAM for quick access and would be quite a lot of work for a Z80 to turn into wave audio to send to a DAC. So if you want an audio chip that can play pretty much anything you might want to have a MOD player in a FPGA that can play these files from memory without the CPUs intervention. But for a start its best to stick to some basic analog synthesis to generate beeps when you write to its register since that can be written and tested in HDL in a few hours if you know what you are doing.

For audio output out of the FPGA i would just simply have it generate a fast PWM signal that gets smoothed by a RC filter into analog audio. It only takes about 5 lines of code to write a PWM generator in HDL and everyone has a spare resistor and capacitor laying around. Makes for surprisingly high quality audio when the PWM is running fast enugh. But if you want proper hifi quality audio you can always add a I2S DAC chip to the pins. The I2S bus is almost the same as SPI so its really easy to generate in HDL too. You still could not play back MP3 tho because decoding MP3 requires quite a bit of floating point math and im sure your poor Z80 wouldn't enjoy that. It could be possible if you also added a floating point math coprocessor to the FPGA so that the math could run nice and fast, but this is really pushing the Z80 into things its really not meant to do.

As for HDMI and DVI. Pretty much DVI is part of the HDMI feature set. It uses the same 3 data + 1 clock diff pair with the same signaling levels and uses the same TDMS encoding to AC balance the signal and provide error correction. Both DVI and HDMI support the RGB color format, but HDMI can additionally also handle YUV. Its all still just glorified LVDS video with fancy encoding on top. What HDMI does differently is sneaking extra data into the video blanking periods, this is where audio and other stuff is sent, and there is support for encrypting the whole video stream via HDCP (This is just the movie industry trying to keep people from pirating stuff, but its hilariously ineffective). But all this is why you can create a DVI to HDMi adapter by simply cutting apart two cables and splicing together the right pairs, both are almost the same thing just on a different connector.

Id say for a start stick to raw VGA directly from the pins so that you can learn all about video timings in a way that is easy to look at with a scope. Its easy to then turn that into DVI by pretty much just running the exact same signals into a DVI serializer/encoder block rather than straight out the pins. Just get a cheap FPGA board and play around rather than worrying about it. You can always buy a bigger board later on. Just keep in mind that porting code between two FPGAs of the same vendor is pretty easy, porting it to a FPGA from a different vendor is usually not easy at all (The HDL code is perfectly portable but any hardware specific blocks like block RAM and FIFOs are not) and you have to relearn a completely different IDE.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 23, 2019, 06:18:16 pm
I mean, if someone develops a simple video card for 8-bit computer, I know to add a PCI interface to it: via PLX chip. Already done, my team has the know/how and a dev-board to manage it.
If the only thing you want from the card is to hold a bitmap and output it via some interface, this can be done using Vivado with zero HDL code: MIG to interface with framebuffer memory, PCIE to AXI bridge to read/write data to the framebuffer from the host, "Video Frame Buffer Read" to stream frame from memory, VIC and PHY of your choice to actually output the video to the interface of your choice.
But a video card is far from being only "framebuffer adapter". In addition to all of above it needs to provide some kind of drawing and composition functions. This is even more important if your CPU is very slow. And this is the hard part.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 23, 2019, 07:06:02 pm
I guess there's no reason why the FPGA won't be a graphics co-processor and blitter in one device if I can build those functions into it.  What was it, jhpadjustable?  Agnus, Denise and Paula rolled into one? ;)
Eventually ;) Paula's duties included ADCs for the game port, DACs for audio, the UART (you probably already have one) and floppy disk I/O (you probably don't have one). Those don't directly translate to the FPGA model, but she can be there in spirit.

Quote
Ah, pull up a chair and have a beer.  I may need to pick your brains later on when (if!) I get to the point of sorting out audio in VHDL!! ;)
I'm more of a Verilog guy but cheers, I'll keep an eye out and help out where I can. :)

Quote
You make it sound so easy!  :o  I can just about play the theme tune to Red Dwarf on a keyboard.  There endeth my knowledge and skills on the subject of audio and music!  ;D  I2S? Audio codecs? Digital output?  So I could output in MP3 or something?
Not even Popcorn? For shame! Anyway... codec has an archaic definition of a full-duplex digital-analog converter, an ADC and DAC in one package, usually with a common clock and common framing signals. I should have said ADC/DAC or something like that. MP3 output would be possible, but doesn't seem very interesting due to the lack of direct output devices and the latency, and the rather large number of gates required. TOSlink might be cool, if you have a use for it. The nice thing about all this extra hardware is that you can place footprints on the board for it and not stuff them, or stuff them and just not hook your HDL up to them, leaving them as exercises for the experimenter.

Quote
Okay, that's an interesting idea for when/if I get round to using external RAM (it'd make the FPGA choice a lot cheaper!)  Set up a couple of pixel stream FIFOs in internal RAM and use the external SDRAM to burst-read into the one not being read from at that moment, so access to the external RAM would be in short bursts to read the contents, allowing the GPU to access it for data writes etc. in-between?
Better than that, you can use a single FIFO and fill it whenever you feel like it and there's room. You don't need to double buffer because they're inherently semi-dual ported. But you've got the general idea right.

Quote
There's a heck of a lot of jargon in there I just don't understand - "pixel fetch start/stop timing", "row stride", "display window start/stop timing"... so much to learn.  Is there a book on this stuff anywhere, or a website, or more YT videos?  I have yet to sit down with some spare time and go through some of the linked videos in the YT links people have made in previous replies, though...
The Amiga HRM I linked earlier is an excellent, accessible resource that's fit for people with basic programming and hardware knowledge. While they won't tell you much about the gate-level implementation, Chapter 3, Playfield Hardware, describes the general concepts of raster video display in the context of the specific case of the Amiga OCS. Translating from my language to theirs, pixel fetch is called data fetch, row stride is approximately equivalent to playfield width, and display window is display window.

Quote
Huh? Sorry, I don't follow.  Could you expand on this a little, please?   :-\
I believe he's referring to the rather common possibility that an external RAM chip's external data bus is more than one byte wide, and therefore has a few extra byte-select signals that need to be considered when writing only part of its total width. I think you'll find this ISSI pseudo-static RAM in a hobbyist-friendly TSOP-II package (https://www.digikey.com/products/en?mpart=IS66WV51216EBLL-55TLI&v=706) and its datasheet accessible and interesting. Its byte write enables are UB# and LB#. They're much like an extra set of chip enables for each separate byte, so that only the desired byte parts of the two-byte-wide bus are driven or sampled. The truth table on datasheet page 3 and timing diagrams throughout may help.

So if you want an audio chip that can play pretty much anything you might want to have a MOD player in a FPGA that can play these files from memory without the CPUs intervention.
As someone who is not completely unknown to the exotica branches of the Amiga audio community... Hard-coding the MOD format in HDL? Just... no. It will only get in the way. I would strongly suggest, for simplicity, borrowing the Amiga's audio architecture whole, and letting the main CPU be interrupted at however many ticks per second to adjust the DMA/pitch/volume registers as needed. That way it's much easier to borrow channels for sound effects over/under the music (if desired), and allows for 16-bit samples, various packed tracker module formats or multichannel formats like FastTracker, and formats that are not reducible to the drum-machine tracker paradigm, such as TFMX. We've got to give the poor Z80 something to do, haven't we? ;D

Since this isn't for a proprietary product, IMHO it's worth leveraging any open IP cores that can help rather than do all that from scratch. I think MP3 needs more than a floating point coprocessor, probably closer to a complex math accelerator with its own (small) program store to do IDCTs and such.

Quote
(The HDL code is perfectly portable but any hardware specific blocks like block RAM and FIFOs are not) and you have to relearn a completely different IDE.
But most logic synthesizers will be able to infer memory from HDL, perhaps with a bit of coaxing, and FIFOs are just control logic around those memories, no? At the speeds of interest, fabric implementation of FIFO functionality would do, no?

But a video card is far from being only "framebuffer adapter". In addition to all of above it needs to provide some kind of drawing and composition functions. This is even more important if your CPU is very slow. And this is the hard part.
CGA outputs video. It's in the form of a card. What exactly are you insinuating here?   :box:
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 23, 2019, 07:39:37 pm
As someone who is not completely unknown to the exotica branches of the Amiga audio community... Hard-coding the MOD format in HDL? Just... no. It will only get in the way. I would strongly suggest, for simplicity, borrowing the Amiga's audio architecture whole, and letting the main CPU be interrupted at however many ticks per second to adjust the DMA/pitch/volume registers as needed. That way it's much easier to borrow channels for sound effects over/under the music (if desired), and allows for 16-bit samples, various packed tracker module formats or multichannel formats like FastTracker, and formats that are not reducible to the drum-machine tracker paradigm, such as TFMX. We've got to give the poor Z80 something to do, haven't we? ;D

Since this isn't for a proprietary product, IMHO it's worth leveraging any open IP cores that can help rather than do all that from scratch. I think MP3 needs more than a floating point coprocessor, probably closer to a complex math accelerator with its own (small) program store to do IDCTs and such.

Quote
(The HDL code is perfectly portable but any hardware specific blocks like block RAM and FIFOs are not) and you have to relearn a completely different IDE.
But most logic synthesizers will be able to infer memory from HDL, perhaps with a bit of coaxing, and FIFOs are just control logic around those memories, no? At the speeds of interest, fabric implementation of FIFO functionality would do, no?

Well i didn't mean a fully autonomous MOD file format player that could read a .mod file byte by byte. Just audio hardware that is well suited for playing it back like having the ability to play back various wave samples from memory at various pitch with multiple channels. Makes it easy to have very impressive sound without a ton of HDL code or a lot of CPU effort to play it.

Yeah its possible for the compiler to infer block RAM into a design, but i never really trusted it with doing that since small details in the way you implement it in HDL can make the compiler refuse to do it because it might not look similar enough to the hardware block. But it is not only for block ram, similar goes for other hardware features like PLLs, hardware assisted serdes, DDR pin drivers, retiming flipflops inside IO pin logic etc... So offten the way it is done for block RAM is that an abstraction layer is put in place where a HDL file holds the implementation module for the block RAM and upon porting it you are supposed to replace the implementation with your vendors one and implement any logic around it to get it to provide the same signals (Like flip some lines around, add a latch if it needs it, make the reset work the same way...).

And overall the vendors tools often provide a sizable collection of useful auto generated building blocks. This ranges from simple stuff like a RAM or FIFO to interfaces like DDR2 memory controllers to stuff like heavily chip optimized FFT, DDS, FIR filters etc... And goes on to even broader tools that generate entire systems like Altera Qsys that automatically generates memory and streaming buses to connect components such as softcores, memory, peripherals etc... magically taking care of stuff like bus width and timing adaption, multimaster bus arbitration, addressing etc... (I actually quite miss that tool when working with Lattice chips)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 23, 2019, 09:53:28 pm
If the only thing you want from the card is to hold a bitmap and output it via some interface, this can be done using Vivado with zero HDL code: MIG to interface with framebuffer memory, PCIE t

PCI, not PCIe, and I want to the PCI stuff done via PLX. I don't want anything done in FPGA regarding the PCI. I have a lot of experience with PCI, and not a thing I want to handle in FPGA.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 23, 2019, 09:54:13 pm
Well i didn't mean a fully autonomous MOD file format player that could read a .mod file byte by byte. Just audio hardware that is well suited for playing it back like having the ability to play back various wave samples from memory at various pitch with multiple channels. Makes it easy to have very impressive sound without a ton of HDL code or a lot of CPU effort to play it.
That I can absolutely agree with. As I mentioned to nockieboy upthread, each channel isn't much more than a handful of down-counters and double-buffers attached to a DMA channel. Then multiply, sum, and ship to the DAC.

Quote
Yeah its possible for the compiler to infer block RAM into a design, but i never really trusted it with doing that since small details in the way you implement it in HDL can make the compiler refuse to do it because it might not look similar enough to the hardware block. But it is not only for block ram, similar goes for other hardware features like PLLs, hardware assisted serdes, DDR pin drivers, retiming flipflops inside IO pin logic etc... So offten the way it is done for block RAM is that an abstraction layer is put in place where a HDL file holds the implementation module for the block RAM and upon porting it you are supposed to replace the implementation with your vendors one and implement any logic around it to get it to provide the same signals (Like flip some lines around, add a latch if it needs it, make the reset work the same way...).

And overall the vendors tools often provide a sizable collection of useful auto generated building blocks. This ranges from simple stuff like a RAM or FIFO to interfaces like DDR2 memory controllers to stuff like heavily chip optimized FFT, DDS, FIR filters etc... And goes on to even broader tools that generate entire systems like Altera Qsys that automatically generates memory and streaming buses to connect components such as softcores, memory, peripherals etc... magically taking care of stuff like bus width and timing adaption, multimaster bus arbitration, addressing etc... (I actually quite miss that tool when working with Lattice chips)
All very good points.

I haven't played very deeply in the extra tools of Quartus and never even opened up Platform Designer. That looks really handy. Next time I have an FPGA project in front of me I'm going to have to play with that. This is why I come here  :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 23, 2019, 09:56:24 pm
But a video card is far from being only "framebuffer adapter". In addition to all of above it needs to provide some kind of drawing and composition functions. This is even more important if your CPU is very slow. And this is the hard part.

I said with the minimal 2D functions to support X11.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 24, 2019, 02:48:54 am
For pixel graphics, there is no reason that both sides of the BlockRAM need to have the same word width.  The CPU can read/write in bytes while the graphics controller can grab up some huge number of bits at a time.

The SEGA Geneis (Megadrive) uses a similar idea with the uPD41264 chip made by NEC :D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 24, 2019, 04:32:41 am
...but realize you may need to use a byte write enable depending on how your CPU or graphics image processor may move individual bytes around for effects like fill, copy, clear and stencil.

Huh? Sorry, I don't follow.  Could you expand on this a little, please?   :-\

Ok, understand that the biggest cheat you have going for you when using the MAX10 internal static ram block is that on one side, for your dedicated video graphics and sprites and audio output, you will use an 8 bit port with an address which points to each individual byte.  You can get 1 new byte read 216 million times a second while only needing 1/8 for a true 256 color paleted image, another access cycle for a font byte image lookup, another for a sprite and another 4 or so once every H-sync for audio data.  Sorta like the Amiga Paula and Denise.

(This is separate of you global screen offset controls generally used for oversized image scrolling)  Now, if you also want a Blitter like the Amiga Agnus, this would involve reading a reference graphic and reading the destination display memory, working out the source image transparent stencil bits, within logic registers, paste over the destination read source memory so you are effectively singling out individual bits, then writing that logic register's result back into the destination display memory.  This is the hassle of pasting graphics with transparent bits where you also may want to shift left and right the bits of the source image data to move a pasted graphic 1 pixel at a time onto a bitplane style graphics.  Having this hardware blitter remover you 8 bit CPU from doing this manually when you want to paint brush images onto a bitplane style display.  You have the luxury because of the Max10's speed, logic density, and that dual port ram's nature to place such an image blitter on the 8 bit port side, or on the CPU read/write port side, or even half and half.

In 256 color, byte per pixel mode, just read the source rectangle graphic data and only write the pixel bytes to the destination if the stencil isn't transparent.  Wow, so much simpler but you need a whole lot of video memory to keep every pixel a single byte.  (This would make an insane arcade machine for 20 years ago, but you would want at least 2 full video page memories.)  (I wish the Amiga 1000 had 2mb chipram with an allowable 8 ram and worked with 8bit and 16bit pixels only, no bitplane rubbish, it would have flown except for the slow logic of the 1984 era when it was engineered, all they had to do was get the ram working at 28.6Mhz instead of 14.3Mhz, and 2mb of it the cpu didn't need to go any faster.  The price of memory would have been the killer.)

On the other side of you dual port memory you may also use a wider bus like 16 bits to interface with your CPU, however, what I am saying is if you do so, in Altera's/Intel's dual port memory configuration tool, you have a byte-enable feature which allows when writing to your dual-port ram's 16 bit side, writing to only the high or low 8 bits, or both.  This write-masking is necessary for CPUs doing an 8 bit write accessing a 16 bit wide ram.  Otherwise, your CPU will need to do a 'Read-Modify-Write' to write only 8 bits on that side.

Now everything I'm telling you is possible with relative ease since you do not have to wait for any external ram cycles and having to cache read and write bursts from external dram and it's controller.  You do not need to pipe anything at all.  However, going the other route with large external memory and speed, you can always simulate a bitplane display for your 8 bit cpu and just have 16 bit word pixels and do everything like a software emulator which grants you some enormous flexibility, but you will always be refresh generating a display, a page behind with dedicated buffers to cache memory with tons of wasted dram reads to address single bytes in the middle of larger bursts just to get to those single key important bytes whenever you need random access.  Otherwise, you will be either engineering a smart cache mechanism, or slowing everything down to even faster DDR memory's precharge and post charge cycles.   Normal DRam in some cases wont be any faster, but sequential bursts will be like lightning in comparison.  Other than painting a line of video,  how often does that happen in a 8 bit cpu world which has no smart cache.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 24, 2019, 07:11:53 am
Just so we are clear, there exist cheaper FPGAs with more than 1 megabit pr internal ram out there, but, you would need a VCCIO and VCC core supply & bootprom.  Example:


1.9 megabit core for 16$  (Good for animated 360x240x256colors with 2 fame buffers for animation plus another 80kb for software)
It also has double the logic elements...
https://www.digikey.com/product-detail/en/lattice-semiconductor-corporation/LFE5U-45F-6BG256C/220-2198-ND/9553911 (https://www.digikey.com/product-detail/en/lattice-semiconductor-corporation/LFE5U-45F-6BG256C/220-2198-ND/9553911)
https://www.mouser.com/ProductDetail/Lattice/LFE5U-45F-6BG256C?utm_term=LFE5U-45F&qs=w%2Fv1CP2dgqpblS%252b2xYE99A%3D%3D&utm_campaign=LFE5U-45F-6BG256C&utm_medium=aggregator&utm_source=findchips&utm_content=Lattice (https://www.mouser.com/ProductDetail/Lattice/LFE5U-45F-6BG256C?utm_term=LFE5U-45F&qs=w%2Fv1CP2dgqpblS%252b2xYE99A%3D%3D&utm_campaign=LFE5U-45F-6BG256C&utm_medium=aggregator&utm_source=findchips&utm_content=Lattice)

Eval board is expensive since it comes with a 3.8Megabit FPGA, but, it gives you play room.

Also, the serdes on this Lattice part is fast enough to directly drive DVI/HDMI 1080p.
You'll be using so few IOs that a 4 layer PCB would be fine.
And you can still place a single DDR2/3 ram chip on the PCB if you like...

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 24, 2019, 04:56:46 pm
Also, the serdes on this Lattice part is fast enough to directly drive DVI/HDMI 1080p.
Nope, no SERDES on that part. Anyway, Lattice requires a subscription license to use SERDES-enabled chips so it would be useless to open-hardware hobbyists if it did. The BGA package is also a bit impractical, the comparatively generous 0.80mm pitch notwithstanding.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 24, 2019, 05:07:29 pm
Also, the serdes on this Lattice part is fast enough to directly drive DVI/HDMI 1080p.
Nope, no SERDES on that part. Anyway, Lattice requires a subscription license to use SERDES-enabled chips so it would be useless to open-hardware hobbyists if it did.

Yes. The only ECP5 parts that have SERDES are the LFE5UMxx.
http://www.latticesemi.com/en/Products/FPGAandCPLD/ECP5 (http://www.latticesemi.com/en/Products/FPGAandCPLD/ECP5)

One way of evaluating it AND get a free license for this part is to buy a VERSA kit: http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/ECP55GVersaDevKit (http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/ECP55GVersaDevKit)
(I think the license is restricted to the part used on this board:  LFE5UM5G-45F, but a lot can be done with it!)

There are promotional prices on those boards on a regular basis. I bought the VERSA board for the ECP3 a few years ago for $99. The VERSA ECP5 is about $250 and occasionally can be had for $99 too.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 24, 2019, 08:17:11 pm
Also, the serdes on this Lattice part is fast enough to directly drive DVI/HDMI 1080p.
Nope, no SERDES on that part. Anyway, Lattice requires a subscription license to use SERDES-enabled chips so it would be useless to open-hardware hobbyists if it did.

Yes. The only ECP5 parts that have SERDES are the LFE5UMxx.
http://www.latticesemi.com/en/Products/FPGAandCPLD/ECP5 (http://www.latticesemi.com/en/Products/FPGAandCPLD/ECP5)

One way of evaluating it AND get a free license for this part is to buy a VERSA kit: http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/ECP55GVersaDevKit (http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/ECP55GVersaDevKit)
(I think the license is restricted to the part used on this board:  LFE5UM5G-45F, but a lot can be done with it!)

There are promotional prices on those boards on a regular basis. I bought the VERSA board for the ECP3 a few years ago for $99. The VERSA ECP5 is about $250 and occasionally can be had for $99 too.
I'm sorry, yes, you are correct for getting the 3 gigabit to work, so, 1080p is out.  But for 480p, the standard DQs are fast enough, especially if you user DDR IOs, and the earlier posted verilog DVI/HDMI out serialiser is manually written which does not require the use of Lattice's embedded serdes function.  All it requires is 4 parallel balanced DQ ports which can serial shift at 216Mhz (108Mhz DDR) from a single bank clock.  The chip I listed can handle that with ease.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 24, 2019, 08:50:13 pm
All it requires is 4 parallel balanced DQ ports which can serial shift at 216Mhz (108Mhz DDR) from a single bank clock.  The chip I listed can handle that with ease.

Oh, yeah. I guess maybe even lower-end Lattice FPGAs could do this.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 24, 2019, 09:08:08 pm
All it requires is 4 parallel balanced DQ ports which can serial shift at 216Mhz (108Mhz DDR) from a single bank clock.  The chip I listed can handle that with ease.
AFAIK HDMI requires a pixel clock to be at least 25 MHz, which corresponds to 250 Mbps per data lane (1:10 serialization). So 216 Mbps is not going to be enough.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 24, 2019, 09:21:12 pm
All it requires is 4 parallel balanced DQ ports which can serial shift at 216Mhz (108Mhz DDR) from a single bank clock.  The chip I listed can handle that with ease.
AFAIK HDMI requires a pixel clock to be at least 25 MHz, which corresponds to 250 Mbps per data lane (1:10 serialization). So 216 Mbps is not going to be enough.
Ok, 270Mhz then, still well within the Lattice part I mentioned which can do 500Mhz serial data on the DDR pins.

Generic DDRX1 Outputs With Clock and Data Aligned at Pin (GDDRX1_TX.SCLK.Aligned) Using PCLK Clock Input - Figure 3.9: All Devices — -8 500 — -7 500 — -6 500 Mb/s...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 25, 2019, 05:58:09 am
Yeah Lattice are the most cost effective FPGAs that you can still easily buy and get proper software for. The  nice bank per buck ECP5 family is fast enough to do it.

The HDMI spec calls for a pixel clock of at least 25 Mpixels/s. For 24bit RGB this means each of the 3 diff pairs carries 8 bits per pixel but the encoding increases that to 10bit so the minimum bitrate of a HDMI link is 25M*100 = 250 Mbit

In general a lot of FPGAs will have some form of "baby serdes" on some or most pins. These are sometimes just a DDR buffer but more often are 1:4 serdes. This brings the bitrate down to only being 1/8 per signal in the FPGA fabric as that requires the signals to be generally under about 200MHz. The ECP5 family does include this functionality on the left and right banks and according to the datasheet will run at between 624Mbit and 800Mbit (depends on speed grade). So this means its 2 or 3 times faster than the minimum HDMI spec.

Even if you don't use the right bank to get this functionality you still get DDR buffers on all banks in the ECP5 and those get 500Mbit (regardless of speed grade) but as a downside you only get 1:2 reduction in signal speed so you have to deal with two 250MHz SDR streams in the FPGA fabric and that does take some careful design to run fast enough. Its possible that maybe the code generator in the Lattice Diamond IDE can add a larger ratio sredes in the logic fabric, but i think the code generator always wants the nice pin with hardware 1:4 and 1:7 reduction.

I think the Lattice Crosslink family has some even faster buffers in the IO pins.

So for low resolutions not a problem. Its things like 1080p that would be a problem since the lowest you can get away with is 130MHz pixel clock and this means 1.3Gbit per lane. Im pretty sure no FPGA out there can do that with regular IO functionality so proper dedicated serdes is a must for that and yeah the prices of those FPGAs tend to be ridiculous.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 25, 2019, 06:31:57 am
So for low resolutions not a problem. Its things like 1080p that would be a problem since the lowest you can get away with is 130MHz pixel clock and this means 1.3Gbit per lane. Im pretty sure no FPGA out there can do that with regular IO functionality so proper dedicated serdes is a must for that and yeah the prices of those FPGAs tend to be ridiculous.
Xilinx 7 series at speed grade of 2 and above can officially do 1250 Mbps, and unofficially does 1080p@60 (1480 Mbps or something like that). "High performance" banks can go to 1600 Mbps, but it only supports Vccio of 1.8V and below.
I think Xilinx did a right thing to integrate up to 1:8 SERDES (can be increased up to 1:14 by cascading) into each and every IO pin tile, as users can pick pins which are more convenient from layout point of view, instead of reading through manual to figure out which ones are they supposed to use.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 25, 2019, 06:51:39 am
Oh that's certainly a nice development from Xilinix. I haven't seen such high IO speeds from reasonably priced FPGAs before.

Anyway my point was that most modern FPGAs out there have enough IO performance to easily meet the minimum HDMI speed.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Canis Dirus Leidy on October 25, 2019, 08:54:32 am
By the way. ZX Evolution (http://nedopc.com/zxevo/zxevo_eng.php): FPGA (ACEX 1K) assisted ZX Spectrum compatible computer with enhanced graphic modes and VGA-compatible (due to internal scandoubler) video output. Copying it exactly makes no sense (unless Mr. Wells lends his time machine), but a glimpse into the Verilog code (http://svn.zxevo.ru/listing.php?repname=pentevo) may be useful.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 25, 2019, 10:06:58 am
Right, so... just to clear up any confusion (on my part!  :o) and to reach a general consensus amongst the professionals (you guys!  ;D), can we agree on an FPGA I should choose, bearing in mind my wish-list (listed below for clarity, in order of priority)?

MUST:

NICE TO HAVE:

The MAX 10 (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/max-10/m10_overview.pdf) is looking like a good option, but with all this talk of Lattice and knowing there are even more manufacturers out there, I'd like to get a good indication of which one you think will fit the bill for me?  :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 25, 2019, 11:15:11 am
Yeah Lattice is the 3rd biggest player in the FPGA market. They specialize in being lower cost than the big leading two. Tho the tools on Lattice are not quite as good while still flowing the same free basic version and expensive as hell full version scheme.

Since the cost of the chip is not that high of a priority for you you better stick to Intel or Xilinx as they have better tools. Personally i enjoy Intel/Altera tools the most and i really like the Qsys/SOPC builder tool that wires up your digital computing systems auto-magically.

Previous Altera MAX series tended to be 5V tolerant but the MAX10 not anymore. Then again if your chips are TTL it will all work on 3.3V signals just fine as long as there are resistors to limit current, or you can just use a single 8bit logic buffer to do it proper since the data bus is only 8 bits wide anyway. You still get the around 600 to 700Mbit serial IO performance so HDMI should be just fine, but having 90KB of internal block RAM is going to be quite expensive on it.

For a start just look around at what dev boards you can get from the vendor you like. Be prepared to pay $50 and up for a decent one (But sometimes >$100). The decently sized boards always have some form of SRAM or SDRAM or DDR2 on them. HDMI tends to be rare but if it has VGA its probably good enough for now as you can likely add your own HDMI port later on. (Remember you want to first learn how to use these FPGAs so start simple)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on October 25, 2019, 02:54:18 pm
If this is your first such development, I would personally still start with VGA first. And maybe consider HDMI later on, when all the rest works and I've gotten more proficient. VGA is still easier to implement, and easier to debug. Some people seem to make it look as though HDMI is just a picnic, but they fail to tell you that they probably spent hours getting it right back when they started. And now of course it all looks very easy. Just saying, do as you wish and don't mind me. Now if the HDMI spec is mainly to select an appropriate FPGA that you know will allow you to do HDMI later on, that's a valid point.




Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 25, 2019, 04:38:07 pm
Now if the HDMI spec is mainly to select an appropriate FPGA that you know will allow you to do HDMI later on, that's a valid point.

Absolutely it is.  I don't want to mess around too much swapping FPGAs as I develop - I'd rather get the 'end game' FPGA, and learn from the ground-up on that.  I'm assuming, of course, that there's no real reason I can't develop a VGA output on a dev board with HDMI fitted?  Or offer VGA AND HDMI as outputs on the final version, for example?

Yeah Lattice is the 3rd biggest player in the FPGA market. They specialize in being lower cost than the big leading two. Tho the tools on Lattice are not quite as good while still flowing the same free basic version and expensive as hell full version scheme.

Purely out of interest, is there anything on the expensive-as-hell full version software that a beginner like me would want?

Since the cost of the chip is not that high of a priority for you you better stick to Intel or Xilinx as they have better tools. Personally i enjoy Intel/Altera tools the most and i really like the Qsys/SOPC builder tool that wires up your digital computing systems auto-magically.

The cost is still a priority, just not quite as high as the factors before it in the list.  ;)

The only thing holding me back from the MAX 10 series is the cost of the evaluation board, but I can stretch to it if needs be - I just want to make sure no-one's going to pipe up after I've ordered one with a comment along the lines of, "Haven't you checked out the Costmin LogicBlaster IV?  It does everything you want, packs 200KB of RAM, makes the tea and takes your dog for a walk and the dev board includes HDMI and VGA and only costs £50..."  Because that would make me cry...  ;D

Oh, I also meant to say that I have no preference for VHDL over Verilog at all - it's just that the only exposure I've had to FPGA code is via Grant Searle's Multicomp code, which is in VHDL, hence I've spoken about VHDL in this discussion but don't mean to exclude Verilog.  I'm right at the start with this project, so I am ultimately extremely flexible in which path I take.

Previous Altera MAX series tended to be 5V tolerant but the MAX10 not anymore. Then again if your chips are TTL it will all work on 3.3V signals just fine as long as there are resistors to limit current, or you can just use a single 8bit logic buffer to do it proper since the data bus is only 8 bits wide anyway.

System Interface Considerations:
ALL the chips in my computer that would connect to the FPGA are either 74HCT buffers for data / address / most command signals, or the CMOS Zilog Z80 itself for a couple of control signals (and these will be going TO the Z80 FROM the FPGA and are active-low with 5v pull-ups on them).

The only area for potential variation is in the address bus (from the MMU), which may have HCT, HC or LS logic driving the extended address lines.

You still get the around 600 to 700Mbit serial IO performance so HDMI should be just fine, but having 90KB of internal block RAM is going to be quite expensive on it.

The 10M50 (https://www.intel.co.uk/content/dam/www/programmable/us/en/pdfs/literature/pt/max-10-product-table.pdf) is the most attractive proposition, I think, with 200 KB of RAM in it.

For a start just look around at what dev boards you can get from the vendor you like. Be prepared to pay $50 and up for a decent one (But sometimes >$100). The decently sized boards always have some form of SRAM or SDRAM or DDR2 on them. HDMI tends to be rare but if it has VGA its probably good enough for now as you can likely add your own HDMI port later on. (Remember you want to first learn how to use these FPGAs so start simple)

Well, the 10M50 is available on a dev board with HDMI on it.  That would provide me with a good hardware example to work from and all I'd need to worry about would be getting the FPGA set up properly to display something on it.  I could always wire up a VGA socket as well and go that path first.  It's darn expensive (https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-max-10m50-evaluation.html) though.  :palm:

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Canis Dirus Leidy on October 25, 2019, 05:34:34 pm
Well, the 10M50 is available on a dev board with HDMI on it.  That would provide me with a good hardware example to work from and all I'd need to worry about would be getting the FPGA set up properly to display something on it.  I could always wire up a VGA socket as well and go that path first.  It's darn expensive (https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-max-10m50-evaluation.html) though.  :palm:
Come to the Cyclone 10 side, we have cookies (https://www.aliexpress.com/item/33050240958.html)!  >:D (and QFP package (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pt/cyclone-10-lp-product-table.pdf))

P.S. But HDMI have another problem: high-speed signals PCB routing.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 25, 2019, 05:48:13 pm
The premium features tend to be the same with all the vendors. Things like faster compilation trough the use of more cores and partial recompilation (This is more useful than it sounds because even simple designs can sometimes compile for minutes even on a fast PC) and it usually also unlocks a large set of IP blocks for full use. The free version also usually will not compile for the ridiculously large FPGAs that cost 1000s of dollars per chip.

The set of IP blocks mentioned before tend to  be available in the free version in there crippled form with a "death timer". This is a counter buried deep inside the encrypted IP block that is set to run out in abut 2 hours of running at typical clock speeds. Once it runs out it kills the IP block and renders it unusable. This is nice in that it lets you use it, but not so nice that the FPGA has to be rebooted to get it to work again after the timer runs out. Unluckily the DDR2 controller tends to be once of these crippled premium IPs.

That dev board you linked is actually a pretty good deal. The MAX10 chip on that board is very expensive, i couldn't find the price for the exact part but the slower speed grade version of that chip costs 135 USD if you ware to buy a single one off DigiKey as a mere mortal, so the exact one should be even more expensive. Its actually cheaper to buy the board and desolder the chip rather than buying the chip new(go ahead and try to buy this chip). Tho if you are a commercial costumer and have good relations with Intel they will give you way way cheaper deals directly, especially if you ask for 1000 of them (They likely won't even respond if you ask anything under 100). This is why i brought up that chips with this much memory are expensive, but is also a very powerful chip. For integer math speed it could rival the performance of modern CPUs in PCs (This is why FPGAs got used for cryptocurency mining before GPU compute and ASICs came around). And yet this is actually still a low "low cost FPGA" compared to the other chips they make.

Verilog or VDHL it does not matter. Both are supported by all vendors. I personally prefer Verilog because its simpler and involves less fancy syntax typing. Not that one or the other is better, just my taste went with Verilog after trying one and the other. You can mix VHDL and Verilog in the same project and compile it together anyway.

As for having ready to go hardware, yes that is very useful with FPGAs as they often have a lot of support circuitry (Gazilion power rails, boot memory, config pins) and you always get a JTAG programmer on dev board included. Often this JTAG programmer can be rewired to connect to an external FPGA to turn the dev board into a universal programmer for that vendors chips.

Its your decision what you want to spend the money on but that MAX10 dev board is a pretty good deal for what you get. You can always add a VGA port onto the IO headers to start off there and PWM audio output via a single pin. Believe me you will want to take things slowly because getting into FPGAs is quite a learning curve(Id say significantly more so than with MCUs) but lucky for you that VGA video generation is one of the very nice beginner projects. This is mainly because you can slowly build on it in steps and each incremental step is simple enough to debug purely using a logic analyzer and no simulation.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Canis Dirus Leidy on October 25, 2019, 06:04:48 pm
Unluckily the DDR2 controller tends to be once of these crippled premium IPs.
Ahem. Due to these
Package type - NOT BGA where possible due to hand-soldering requirement
you can forget about this IP restriction.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 25, 2019, 09:46:43 pm
Well, the 10M50 is available on a dev board with HDMI on it.  That would provide me with a good hardware example to work from and all I'd need to worry about would be getting the FPGA set up properly to display something on it.  I could always wire up a VGA socket as well and go that path first.  It's darn expensive (https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-max-10m50-evaluation.html) though.  :palm:
Come to the Cyclone 10 side, we have cookies (https://www.aliexpress.com/item/33050240958.html)!  >:D (and QFP package (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/pt/cyclone-10-lp-product-table.pdf))

P.S. But HDMI have another problem: high-speed signals PCB routing.
The internal ram bits on that Cyclone10 are too few for the OP to completely contain his project.  If the OP wanted a SDRam version, then that dev board would be fine, but when developing the opensource graphics board, he would need a FPGA with enough IOs for the SDRam and other stuff which may not be possible when porting his project to the 144pin TQFP version of the Max10.  With such a setup, the op will need to engineer a multiport SDRam controller which cache banks for display refresh.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 25, 2019, 10:04:54 pm
If this is your first such development, I would personally still start with VGA first. And maybe consider HDMI later on, when all the rest works and I've gotten more proficient. VGA is still easier to implement, and easier to debug. Some people seem to make it look as though HDMI is just a picnic, but they fail to tell you that they probably spent hours getting it right back when they started. And now of course it all looks very easy. Just saying, do as you wish and don't mind me. Now if the HDMI spec is mainly to select an appropriate FPGA that you know will allow you to do HDMI later on, that's a valid point.
Your right, I have too many years of experience in FPGA video and make it seem too easy.
I also picked the MAX10 144pin with all ram inside since the OP's development of a display controller can operate 100% internally without worries about Dram interface errors and logic.

The design is simple enough that the user can make 5 of his own 2$ dev boards PCBs from JLPCB, but yet again, I'm talking from experience I have with Intel chips.

One thing about FPGA analog VGA video out, remember, use a 5V TTL level shifter/line driver when sending the HS and VS!!!  Some monitors work fine feeding 3.3v TTL on these 2 signals, some just wont even turn on, and some monitors sit on the edge turning on, then off, or the syncs go haywire.  The OP wont know what hit him if he makes the mistake of using 3.3v for HS and VS outputs then many third party users start experiencing problems...  |O   A 5v powered 74HCT04 with 3 outputs paralleled for extra drive current for each sync signal should work fine.  At the inputs of the '04, I usually add a single 470ohm pullup to 3.3v since Altera's high drive on their IO pins have an unusual current pull curve as they reach the VCCIO and this just cleans that up a little when driving 3 cmos inputs in parallel on a 5v CMOS part.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 25, 2019, 10:18:53 pm
For the HDMI interface, the Max10 dev board used this IC when interfacing with the HDMI for ESD protection and to deal with the 5V DDC signals:
https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248 (https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248)

It does the ESD protection and level shifts/converts the 5v DDC I2C signals from the HDMI to the FPGA's 3.3v.

It is used in the Max 10 DEV board mentioned above.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: mariush on October 25, 2019, 11:03:34 pm
I don't know why you guys keep stressing about 1080p

I'd suggest setting the maximum to 1280x720 or 1366x768 (HD Ready resolution on cheap TVs)

720p is a standard resolution that should be accepted by everything, and you'd have both 16:9 or 4:3 resolutions, and you can double pixels to do 640x360, or 4 pixels blocks to get 320x180 ... scaling by 1.5x gets 960x720 to stretch 640x480...

Could you maybe use something like SiI8784 https://www.semiconductorstore.com/pages/asp/DownloadDirect.asp?sid=1572019820392 (https://www.semiconductorstore.com/pages/asp/DownloadDirect.asp?sid=1572019820392) - to convert an analogue signal to HDMI ? 
You could use a fast microcontroller and 3 r2r dacs to produce the rgb/ypbpr signal/ and output to the sil chip which then converts to hdmi.

or SiI9136 ... takes up to 36 bit per pixel digital signal and output hdmi : datasheet (https://www.latticesemi.com/view_document?document_id=51622)

edit : this last 9136 chip is used in some cheap 20$ devices (https://www.amazon.com/Color3-Advanced-Processing-Device-010000-002/dp/B00BUUV4X6) and someone made a blog about it : https://hackaday.io/project/122480-eecolor-color3 (https://hackaday.io/project/122480-eecolor-color3)

and this board uses the 9136 and the download on the page has lots of cool documentation and code for 9136 : https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=1067&PartNo=4 (https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&No=1067&PartNo=4)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 26, 2019, 03:12:30 am
For the HDMI interface, the Max10 dev board used this IC when interfacing with the HDMI for ESD protection and to deal with the 5V DDC signals:
https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248 (https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248)

It does the ESD protection and level shifts/converts the 5v DDC I2C signals from the HDMI to the FPGA's 3.3v.
I prefer TPD12S521 (http://www.ti.com/product/TPD12S521) chip - it does the same thing, but it support "flow-through" routing to avoid having any stubs thus improving signal integrity (see attachment).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 26, 2019, 03:37:14 am
For the HDMI interface, the Max10 dev board used this IC when interfacing with the HDMI for ESD protection and to deal with the 5V DDC signals:
https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248 (https://www.digikey.com/product-detail/en/texas-instruments/TPD12S016PWR/296-29690-1-ND/2762248)

It does the ESD protection and level shifts/converts the 5v DDC I2C signals from the HDMI to the FPGA's 3.3v.
I prefer TPD12S521 (http://www.ti.com/product/TPD12S521) chip - it does the same thing, but it support "flow-through" routing to avoid having any stubs thus improving signal integrity (see attachment).
The TPD12S521 (http://www.ti.com/product/TPD12S521) chip does not level voltage translate the I2C DDC lines, at least, it is not mentioned anywhere at all in it's data sheets.
The TPD12S016PWR specifically describes the logic shifter gates with their internal wiring and specs.  The data sheet of the TPD12S521only shows a mosfet inside it, though there is one sentence which says:
"To begin the design process the designer needs to know the 5V_SUPPLY voltage range and the logic level, LV_SUPPLY, voltage range."
But, nothing else...  Typical TexasInstruments crummy modern data sheets...

BTW, the TPD12S016PWR specifically says:
"Auto-direction Sensing I2C Level Shifter with One-shot Circuit to Drive a Long HDMI Cable (750-pF Load)"
With a specific image of this internal logic at the bottom of page 14 in the data sheet.

To be safe, without any mention from TI, I would feel better with my FPGA using the TPD12S016PWR unless I get a letter from TI guaranteeing the same function in the TPD12S521.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 26, 2019, 03:50:38 am
The TPD12S521 (http://www.ti.com/product/TPD12S521) chip does not level voltage translate the I2C DDC lines, at least, it is not mentioned anywhere at all in it's data sheets.
Description, second paragraph, first sentence. "The low-speed control lines offer voltage-level shifting to eliminate the need for an external voltage level-shifter IC."

Quote
The TPD12S016PWR specifically describes the logic shifter gates with their internal wiring and specs.  The data sheet of the TPD12S521only shows a mosfet inside it
Yes, that's a bidirectional level shifter. See NXP app note AN10441 (https://www.nxp.com/docs/en/application-note/AN10441.pdf) for a detailed explanation.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 26, 2019, 04:03:58 am
The TPD12S521 (http://www.ti.com/product/TPD12S521) chip does not level voltage translate the I2C DDC lines, at least, it is not mentioned anywhere at all in it's data sheets.
Description, second paragraph, first sentence. "The low-speed control lines offer voltage-level shifting to eliminate the need for an external voltage level-shifter IC."

Quote
The TPD12S016PWR specifically describes the logic shifter gates with their internal wiring and specs.  The data sheet of the TPD12S521only shows a mosfet inside it
Yes, that's a bidirectional level shifter. See NXP app note AN10441 (https://www.nxp.com/docs/en/application-note/AN10441.pdf) for a detailed explanation.
Your correct, sorry....
It's odd that on page 7 they only show an upward facing FET like symbol.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 26, 2019, 04:19:35 am
To be safe, without any mention from TI, I would feel better with my FPGA using the TPD12S016PWR unless I get a letter from TI guaranteeing the same function in the TPD12S521.
I actually use it on my FPGA board:
https://i.imgur.com/RVsPWrI.jpg
I can assure you that it really works just fine :-+ Moreover, it also provides a current limiter for HDMI port's power line, also the destination voltage (on the FPGA side) for DDC/HPD/CEC lines doesn't have to be 3.3 V - it can be any voltage from 1 to 3.3 V.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 26, 2019, 05:14:09 am
It's odd that on page 7 they only show an upward facing FET like symbol.
But that really is all there is to it. Select an NMOS with (Vgs(on) < Vcclow - margin) and (Vds(max) > Vcchigh). Connect the gate to Vcclow, the source to the low-voltage signal, and the drain to the high-voltage signal. Add pull-ups to the respective Vcc on each signal input. You have just completed a bidirectional digital level shifter, perfectly sufficient for several hundreds of kilohertz. Feel free to work out what happens when neither side is pulling low, when the drain side is pulled low, and when the source side is pulled low. It's genius.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 26, 2019, 05:05:15 pm
I actually use it on my FPGA board:
https://i.imgur.com/RVsPWrI.jpg (https://i.imgur.com/RVsPWrI.jpg)
I can assure you that it really works just fine :-+ Moreover, it also provides a current limiter for HDMI port's power line, also the destination voltage (on the FPGA side) for DDC/HPD/CEC lines doesn't have to be 3.3 V - it can be any voltage from 1 to 3.3 V.

Very impressive, asmi  :-+ - how did you make that board?  I presume it was done in a reflow oven?  Is it 4-layer?  Done in KiCAD or something professional?



As for the TPD12S521, will I need this with a TFP410 (http://www.ti.com/lit/ds/slds145c/slds145c.pdf)?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 26, 2019, 08:42:13 pm
I actually use it on my FPGA board:
https://i.imgur.com/RVsPWrI.jpg (https://i.imgur.com/RVsPWrI.jpg)
I can assure you that it really works just fine :-+ Moreover, it also provides a current limiter for HDMI port's power line, also the destination voltage (on the FPGA side) for DDC/HPD/CEC lines doesn't have to be 3.3 V - it can be any voltage from 1 to 3.3 V.

Very impressive, asmi  :-+ - how did you make that board?  I presume it was done in a reflow oven?  Is it 4-layer?  Done in KiCAD or something professional?



As for the TPD12S521, will I need this with a TFP410 (http://www.ti.com/lit/ds/slds145c/slds145c.pdf)?
??? Your link is for the wrong datasheet.  Get the right one from TIs website and see what it says...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 26, 2019, 08:55:55 pm
 Use this NXP ic if you want HDMI with digital audio support:
https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351 (https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 26, 2019, 09:27:20 pm
??? Your link is for the wrong datasheet.  Get the right one from TIs website and see what it says...

Darn it, pasted the wrong link, sorry.  |O
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 26, 2019, 10:32:24 pm
Use this NXP ic if you want HDMI with digital audio support:
https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351 (https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351)
Note, external HDMI transmitters are almost useless to the hobbyist due to NDA and licensing restrictions, unless you are fortunate enough to find a leak with enough information (https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i2c/tda998x_drv.c) to get started.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 26, 2019, 11:40:51 pm
Use this NXP ic if you want HDMI with digital audio support:
https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351 (https://www.digikey.com/product-detail/en/nxp-usa-inc/TDA19988BHN-C1551/568-12058-ND/2780351)
Note, external HDMI transmitters are almost useless to the hobbyist due to NDA and licensing restrictions, unless you are fortunate enough to find a leak with enough information (https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/i2c/tda998x_drv.c) to get started.
The chip I listed which is freely available without NDA as well as the data sheet doesn't have HDCP, the true roadblock for hobbyists.  HDMI transmitters with and without audio is allowed.  The hobbyist just isn't allowed to use HDMI's logo or claim authenticated HDMI output in their documentation.

I've never had problems with getting such ICs unless HDCP was built in, however, I've been using Analog Devices HDMI transmitters.  NXP may require a license.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 27, 2019, 12:50:39 am
Very impressive, asmi  :-+ - how did you make that board?  I presume it was done in a reflow oven?  Is it 4-layer?  Done in KiCAD or something professional?
This specific one is a six layer board, manufactured by WellPCB for $145 for 10 boards. There is 512 Mbytes of DDR3L memory onboard. I've designed it in Orcad Professional, but it should be possible to do it in KiCAD. I'm actually working on a board for Spartan-7 FPGA and DDR2 memory in KiCAD - the goal is to do it in 4 layers, as this project is meant for beginners - I'm going to publish all sources once I actually make the board and verify that it works. I will also place HDMI out connector on this board because Spartan-7 can also drive HDMI out directly.

As for the TPD12S521, will I need this with a TFP410 (http://www.ti.com/lit/ds/slds145c/slds145c.pdf)?
I didn't need it because Artix FPGA that I have on this board is capable of driving HDMI interface directly.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 27, 2019, 02:52:05 pm
DDR3L

DDR3, the most complex stuff to be designed and handled, especially for beginners.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 27, 2019, 05:51:48 pm
DDR3, the most complex stuff to be designed and handled, especially for beginners.
That's what I thought too in the past. But once I actually tried that, it turned out be much easier than I ever expected it to be.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 27, 2019, 06:16:43 pm
DDR3, the most complex stuff to be designed and handled, especially for beginners.
That's what I thought too in the past. But once I actually tried that, it turned out be much easier than I ever expected it to be.
Ignoring some of the advanced features and if you don't care about shaving every last half clock latency cycle and running them at full tilt speed possible, your controller just about the same as a DDR2 controller.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 27, 2019, 06:33:06 pm
DDR3L? Isn't that the stuff that explodes if you turn it off wrong?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 27, 2019, 06:49:44 pm
Ignoring some of the advanced features and if you don't care about shaving every last half clock latency cycle and running them at full tilt speed possible, your controller just about the same as a DDR2 controller.
I meant from the HW design standpoint. At least for me DDR3 layout seemed much harder than it actually turned out to be. At least for lower speeds (400 MHz is rather low for DDR3), which is what Spartan-7 and Artix-7 support. Going higher requires using Kintex-7, which is significantly more expensive.
For controller I just use MIG (Memory Interface Generator) - it's free and easy to work with.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 27, 2019, 06:51:42 pm
DDR3L? Isn't that the stuff that explodes if you turn it off wrong?
Well it didn't explode on me...yet ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 27, 2019, 11:51:32 pm
Ok, we at at a point where the OP, 'nockieboy' has enough info to make a decision:

A: project must not have BGA, or it can have BGA.
If it cant have BGA, the OP must select to either go with a low cost older FPGA with DRAM, or an all in 1 IC solution like the MAX10 144pin QFP.

When using the older FPGA and DRAM, the op must contend with a more complex PCB, multiple power supply voltages, ram chips and bootprom.


B: Price sensitivity.  If 1 BGA IC is allowed, the Lattice 16$ for 1 chip IC with almost 2 megabit internally pretty much kills everything else in logic density and memory.  Lattice being the third largest FPGA vendor still has relatively decent toolset.  However, you would still need a bootprom and a second regulator for the VCC core voltage.


C: design complexitity.
The TFP410 DVI Transmitter solves the headache of creating and testing his own DVI output serializer core (I realize that home made verilog examples of such a core already exist, but this can be extra work).

Now, with the video modes mentioned, anything he chooses will work, it the development effort and needed and if there will be room for special treats.

If I want a tiny PCB, 1 main IC and HDMI plug, the MAX10 eliminates a lot of complexity and may be possible to be done on a 5$ for 10x 2 layer PCB***. (You really need to know what you are doing and basically the bottom layer is basically a flood-fill GND with a few short trace jumps)

I would make this board a stand alone with a connector for the 8bit MCU board.  I would also offer to make the IC's onboard ram useable as the 8bit cpu's system memory.

If the op went with the larger 2mbit Lattice part, he has the density and speed to offer something like 12 new 256 color sprites, any width up to the width of the screen, with a translucent colors, on every new line of video, from anywhere within the block of system memory, in real time without any overhead.  With the max 10, he could get away with something like 8 sprites at the 180/or/160x240 mode.

For such a project, I would personally recommend using the standard SMPTE 480p base mode of 720x480, 16:9, 27Mhz reference clock as it will work on any PC monitor as well as any TV sold today.  If the op wants 640x480, just center crop the image. (make a black bar an the left and right of the image.)  Also, this standard mode makes integrating the standard HDMI embedded 48Khz stereo audio easy as the exact standard specs are well defined.  Using the 25Mhz VGA 640x480 will add an obstacle upgrade path for embedded audio in the future.

This project is nothing more than a large number of start and stop sequence counters, driving another set of counters look-up memory, use the contents to look up more memory to point to alternate contents of same chunk of memory, and shift the right final content to a palette memory then to feed the DVI transmitter, or feed the internal DVI serializer.

This is why I said in an earlier post, run the internal display memory at 16x the speed of the output pixel feed, you you can read and use each read address to point to another place in ram, 15 times before creating your final resulting pixel.  You may want to make your video memory 16 bits wide so you can access 16 bit words each time allowing a 16bit palette to be stored withing system memory instead of special dedicated registers or memory block.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 28, 2019, 08:03:58 am
Quote
When using the older FPGA and DRAM, the op must contend with a more complex PCB, multiple power supply voltages, ram chips and bootprom.
A x16 pseudo-static RAM is not especially layout-sensitive, any more than an SRAM, and is fine running at 3.3V at several tens of megahertz.

Quote
However, you would still need a bootprom
Sort of true. A 25 series SPI flash and a dozen pins of header should be plenty, and makes it easy to plug in a USBasp or similar SPI master.

Quote
this standard mode makes integrating the standard HDMI embedded 48Khz stereo audio easy
But the TFP410 doesn't. Where does the audio go in?

Quote
16-bit palette
There's a point at which one can and should say "screw palettes, we have enough color bits." 4k colors is about that point. More important, it's about as much data as you'd want to push around with a Z80 and a blitter.

There's also a point at which a designer learns that there are a crapload of tradeoffs to be made, and (hopefully) to avoid the second-system effect.  ;)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 28, 2019, 05:46:55 pm
Ok, we at at a point where the OP, 'nockieboy' has enough info to make a decision:

A: project must not have BGA, or it can have BGA.
If it cant have BGA, the OP must select to either go with a low cost older FPGA with DRAM, or an all in 1 IC solution like the MAX10 144pin QFP.

When using the older FPGA and DRAM, the op must contend with a more complex PCB, multiple power supply voltages, ram chips and bootprom.

I'm still waiting on my Spartan 6 dev board, but realise it has SDRAM, not DRAM, on board.  I'd like to be able to test out some DRAM to see how difficult it would be to use it as video RAM in place of internal RAM blocks in the FPGA... If it is viable (both from a complexity AND performance standpoint) then it would be preferable to getting an expensive FPGA with loads of internal RAM.

The MAX10 is tempting, but it's very expensive.  :-\

I have no issue with adding RAM chips and bootprom, even multiple supply voltages so long as it doesn't get TOO complicated (which is highly subjective, I know).  It looks like I'm going to have to switch to KiCAD to design the board no matter what FPGA package / GPU design I go for, as DipTrace's 500 pin/2-layer limit won't cut it with this project.  :o

BGA is definitely out, though, unless/until I build a suitable reflow oven and get the process stable enough to risk running BGA chips through it.

B: Price sensitivity.  If 1 BGA IC is allowed, the Lattice 16$ for 1 chip IC with almost 2 megabit internally pretty much kills everything else in logic density and memory.  Lattice being the third largest FPGA vendor still has relatively decent toolset.  However, you would still need a bootprom and a second regulator for the VCC core voltage.

Sorry, not sure I read that right - Lattice do a $16 FPGA with 2 megabit internal RAM?!?!? :wtf:  That changes the game entirely - if that's correct, I'll get started on a reflow oven straight away and would offer the PCB with the FPGA already soldered if I ever put it into production! :o

Have you mentioned this part already in the conversation?  If so, forgive me as I must have missed the specs (and thus the implications) for it.

C: design complexitity.

If the op went with the larger 2mbit Lattice part, he has the density and speed to offer something like 12 new 256 color sprites, any width up to the width of the screen, with a translucent colors, on every new line of video, from anywhere within the block of system memory, in real time without any overhead.  With the max 10, he could get away with something like 8 sprites at the 180/or/160x240 mode.

Yeah, okay, so I'm seeing that I should really get a grip of this fear of BGA soldering and tackle it head on.

For such a project, I would personally recommend using the standard SMPTE 480p base mode of 720x480, 16:9, 27Mhz reference clock as it will work on any PC monitor as well as any TV sold today.  If the op wants 640x480, just center crop the image. (make a black bar an the left and right of the image.)  Also, this standard mode makes integrating the standard HDMI embedded 48Khz stereo audio easy as the exact standard specs are well defined.

Sounds reasonable to me.  Naive/stupid question - would getting a 27MHz clock from a 25MHz clock source be non-trivial?  All the dev boards I've seen have a 25MHz clock source.

This project is nothing more than a large number of start and stop sequence counters, driving another set of counters look-up memory, use the contents to look up more memory to point to alternate contents of same chunk of memory, and shift the right final content to a palette memory then to feed the DVI transmitter, or feed the internal DVI serializer.

You make it sound so easy...  :o
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 28, 2019, 07:24:43 pm
Quote
this standard mode makes integrating the standard HDMI embedded 48Khz stereo audio easy
But the TFP410 doesn't. Where does the audio go in?

Yes of course, this is still a consideration - there must be alternatives that will integrate an audio source into the data stream, though?

Quote
16-bit palette
There's a point at which one can and should say "screw palettes, we have enough color bits." 4k colors is about that point. More important, it's about as much data as you'd want to push around with a Z80 and a blitter.

Yes, I was starting to wonder if it wouldn't just be easier to shunt around 2-bytes-per-pixel if I have the memory space for a 4-bit colour space unrestricted by a palette size.  Of course, using internal RAM in the FPGA will still limit my memory size, and doubling the size of the required video RAM will likely either restrict my maximum resolution for 4-bit colour or restrict me to a colour LUT for the screen resolutions that don't have the memory space for 4-bit colour.  Just not sure about the practicalities of using both methods in an FPGA?

There's also a point at which a designer learns that there are a crapload of tradeoffs to be made, and (hopefully) to avoid the second-system effect.  ;)

I'm hoping to avoid the second-system effect - although, as I'll be learning as I go, there's bound to be improvements and embellishments I can make by going back over old code - but I do, however, have a pretty clear end goal in mind.  :phew:
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 28, 2019, 09:41:20 pm
B: Price sensitivity.  If 1 BGA IC is allowed, the Lattice 16$ for 1 chip IC with almost 2 megabit internally pretty much kills everything else in logic density and memory.  Lattice being the third largest FPGA vendor still has relatively decent toolset.  However, you would still need a bootprom and a second regulator for the VCC core voltage.

Sorry, not sure I read that right - Lattice do a $16 FPGA with 2 megabit internal RAM?!?!? :wtf:  That changes the game entirely - if that's correct, I'll get started on a reflow oven straight away and would offer the PCB with the FPGA already soldered if I ever put it into production! :o

Have you mentioned this part already in the conversation?  If so, forgive me as I must have missed the specs (and thus the implications) for it.

$15.86 for 1, $13.94 each for 25, $13.36 each for 100.
A few pages ago: https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg2752240/#msg2752240 (https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg2752240/#msg2752240)

With only 1 small BGA chip (14mm X 14mm), and you using the outer pads only, you can make this work on a 4 layer board and with tinned pads, and because the IC is so small, you can get away with only a good quality hot air gun and a syringe of flux, no oven needed.  Watch a few of Louis Rossman repairs of mac-books where he re-solders small BGA chips.  As long as you have a PCB with solder mask, you can do it easily this way.  (Now, I don't mean populate your PCB with small BGA ICs all over the place.  Dealing with 1 small IC on each PCB will be easy enough with a healthy tube of flux and a hot air gun will be manageable unless you are building 100s of them.  Then use a solder paste stencil and toaster oven...)

As for easy, you really need to plan thing out ahead and choose an HDL and stick with it.  I personally use System Verilog and keep my modules simple stupid enough that they would still work under any compiler, even those which are regular Verilog.

As for the crux of your system, I'll make a reply tonight and you'll laugh as 33% of your problems vanish with around 2 paragraphs of words...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 28, 2019, 10:50:19 pm
Sounds reasonable to me.  Naive/stupid question - would getting a 27MHz clock from a 25MHz clock source be non-trivial?  All the dev boards I've seen have a 25MHz clock source.
Rational clock synthesis isn't that bad. Most PLLs on FPGAs can do it just by configuring and instantiating a module. Here's an example from FPGA4FUN's HDMI (really DVI-D) project, which declares an instance of the Xilinx digital clock manager (https://www.xilinx.com/support/documentation/ip_documentation/dcm_module.pdf) and configures it to multiply the pixel clock by 10, then declares a clock buffer to force the result onto the chip's low-skew clock distribution networks. You could add a .CLKFX_DIVIDE parameter likewise, replacing the default of 1 to achieve a non-integer overall ratio.
Code: [Select]
DCM_SP #(.CLKFX_MULTIPLY(10)) DCM_TMDS_inst(.CLKIN(pixclk), .CLKFX(DCM_TMDS_CLKFX), .RST(1'b0));
BUFG BUFG_TMDSp(.I(DCM_TMDS_CLKFX), .O(clk_TMDS));  // 250 MHz
Some of the lower end FPGAs that don't have onboard clock synthesizers can make use of an offboard clock synthesizer, such as the Si5351, but those can be harder to configure and you must account for delays to and from the synthesizer manually. In either case, clock domain crossing can be a bit tricky if you need other buses to get things done promptly on command. FIFOs, even in their degenerate case of double-buffering with an S-R latch, are an easy and sometimes cheap, but limited and not necessarily prompt way to get clock domain crossing done.

Yes of course, this is still a consideration - there must be alternatives that will integrate an audio source into the data stream, though?
There's a bit of confusion in the thread on DVI-D vs. HDMI. DVI-D is a digital video standard, and basically open and well-documented. HDMI is a superset of DVI-D that includes in-band packets to set color spaces, send audio data, communicate remote control data, defend Hollywood against their customers, etc. and basically proprietary and NDA-infested. HDMI and DVI-D are basically identical at the electrical level. DVI-D transmitters won't generate the packets you need for audio, and HDMI transmitter public data sheets are intentionally incomplete. To my mind, that means you would need to build (or borrow) a transmitter on board the FPGA so that you can generate those packets, glean data about the register configuration for an HDMI transmitter chip from public leaks such as Linux kernel drivers, OR give up on HDMI audio and settle for an analog or S/PDIF output.

Quote
Yes, I was starting to wonder if it wouldn't just be easier to shunt around 2-bytes-per-pixel if I have the memory space for a 4-bit colour space unrestricted by a palette size.  Of course, using internal RAM in the FPGA will still limit my memory size, and doubling the size of the required video RAM will likely either restrict my maximum resolution for 4-bit colour or restrict me to a colour LUT for the screen resolutions that don't have the memory space for 4-bit colour.  Just not sure about the practicalities of using both methods in an FPGA?
It's just an if-then statement that turns into a mux. When you're more conversant with HDL, you'll see. :)

On that note I would strongly caution that you not make any more design or feature decisions until you have hands-on experience with the medium in which you are working. Even then I would wait until I have first picture coming out of the board to decide how to get from spec to implementation. There are subtleties that you might not be considering, such as that block RAMs are still just dozens of individual separate blocks spread over the chip and you (or the synthesis tool) have to use logic and routing to aggregate them, which can get very expensive (in the sense of latency and  logic resources consumed) for a large array.

fpga4fun.com has much fine introductory reading on FPGAs and some gratifying beginner projects. As a reward you can work all the way through the interfacing material to their HDMI example (really just DVI-D, but enough that you would probably understand where to inject logic for audio and so on once you have the spec) and SDRAM. You could even get a head start as most FPGA software has a (limited) simulator suite available, which you can use to display the output waveforms of your design without a board in hand. You'll also want to know how to build virtual test benches for your design to provide stimuli to your design in simulation, say, fake CPU bus cycles or outside clock sources.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 28, 2019, 11:23:45 pm
Sounds reasonable to me.  Naive/stupid question - would getting a 27MHz clock from a 25MHz clock source be non-trivial?  All the dev boards I've seen have a 25MHz clock source.

This problem can be pretty trivial with a bit of planning.

If you have a 25MHz source, you multiply it by 54 to get a VCO frequency of 1350Mhz, which is in the allowable range of 800MHz to 1600Mhz.

- Divide the VCO by 50 to get 27MHz for your pixel clock
- Divide the VCO by 10 to get your clock to drive the serializers the for DVI-D outputs (using DDR mode).

Oh, it seems that the HDMI spec owners are sending out takedown notices: https://glenwing.github.io/docs/HDMI-1.4b.pdf

Might pay to look for "hdmi specification 1.4 filetype:pdf" in everybody's favorite web search engine while you can, just for future reference.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 29, 2019, 02:45:06 am
If the OP is making his own board, or is expecting to, he can always change the crystal oscillator from 25Mhz to 27Mhz.
This may allow for slower core PLLs on different FPGA, however, if he uses the PLL setup wizard, he would just type in the wizard the source clock frequency and the desired output clocks & the wizard would handle this all on it's own.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 29, 2019, 03:58:51 am
If the OP is making his own board, or is expecting to, he can always change the crystal oscillator from 25Mhz to 27Mhz.
This may allow for slower core PLLs on different FPGA, however, if he uses the PLL setup wizard, he would just type in the wizard the source clock frequency and the desired output clocks & the wizard would handle this all on it's own.

I must be old school. I find the wizards are:

- slow to use - a simple change takes a long time

- somewhat inconsistent on their output esp between tool versions.

- very sensitive to inputs (esp when your desired frequencies are a nice decimal number)

- The generated code need tweaking if you use the generated clocks to do something special (e.g. drive SERDES) as you have to add or remove clock buffers, and when you do this you can't use the wizard to regenerate the code

- unless you know what output constraints you have within the clocking resources of the FPGA you just end up trying things and seeing if you get lucky. It isn't a recipe for a well-engineered design

- for me, they often seem to have different ideas of what a "close enough" solution is. Usually a better "close enough" solution exists

But my biggest grip is that they fill your code base with code that is "(c) Vendor X" with an explicit license, that isn't compatible with open projects:

Do a quick search for for

    site:github.com "Copyright(C)" "by Xilinx, Inc"

Here's an example:

Code: [Select]
-- System Generator version 11.1 VHDL source file.
 --
 -- Copyright(C) 2009 by Xilinx, Inc.  All rights reserved.  This
 -- text/file contains proprietary, confidential information of Xilinx,
 -- Inc., is distributed under license from Xilinx, Inc., and may be used,
 -- copied and/or disclosed only pursuant to the terms of a valid license
 -- agreement with Xilinx, Inc.  Xilinx hereby grants you a license to use
 -- this text/file solely for design, simulation, implementation and
 -- creation of design files limited to Xilinx devices or technologies.
 -- Use with non-Xilinx devices or technologies is expressly prohibited
 -- and immediately terminates your license unless covered by a separate
 -- agreement.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 29, 2019, 04:20:17 am
I must be old school. I find the wizards are:
You don't need to actually generate any code with the wizard. I use clocking wizard to calculate all coefficients, and then just place them into my own MCMM/PLL instantiation code. I just couldn't be bothered to calculate this stuff myself ::)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 29, 2019, 04:21:35 am
If the OP is making his own board, or is expecting to, he can always change the crystal oscillator from 25Mhz to 27Mhz.
Or just add a second (third, fourth, you get the point) oscillator on a board. This is an advantage of making your own board - you can place whatever stuff you need (or you think you need :) ).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 29, 2019, 04:22:47 am
I must be old school. I find the wizards are:
You don't need to actually generate any code with the wizard. I use clocking wizard to calculate all coefficients, and then just place them into my own MCMM/PLL instantiation code. I just couldn't be bothered to calculate this stuff myself ::)

^^^^^^

This is most likely the optimal way to use the clocking wizard.  ;)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on October 29, 2019, 04:26:31 am
If the OP is making his own board, or is expecting to, he can always change the crystal oscillator from 25Mhz to 27Mhz.
Or just add a second (third, fourth, you get the point) oscillator on a board. This is an advantage of making your own board - you can place whatever stuff you need (or you think you need :) ).
Just make sure it is connected to a clock-capable pin.   |O

Planning for high speed transceiver reference clocks is the biggest pain. It pays to build a shell of a design just to prove that your desired clocking structure and channel bonding will work.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 29, 2019, 05:02:01 am
Just make sure it is connected to a clock-capable pin.   |O
This is the mistake I made on my very first FPGA board :-DD And never again after that 8)

Planning for high speed transceiver reference clocks is the biggest pain. It pays to build a shell of a design just to prove that your desired clocking structure and channel bonding will work.
If you mean MGTs, it's actually quite easy as there are only so many ways to do it. Things get a bit "interesting" with Kintex because their transceivers are arranged in a column, as opposed to top and bottom sides of a die which have dedicated routes between each other. On Kintex you can only route clock one quad above or below the one which have clock source connected to. This is why it's impossible to implement PCIE x16 on Kintex devices - only 12 MGTs can be reached by a single clock. At least it's possible to implement a PCIE 3.0 x8 interface (with a bit of a soft logic as hardIP only supports PCIE 2.x speed).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 29, 2019, 09:46:57 am
$15.86 for 1, $13.94 each for 25, $13.36 each for 100.

With only 1 small BGA chip (14mm X 14mm), and you using the outer pads only, you can make this work on a 4 layer board and with tinned pads, and because the IC is so small, you can get away with only a good quality hot air gun and a syringe of flux, no oven needed.  Watch a few of Louis Rossman repairs of mac-books where he re-solders small BGA chips.  As long as you have a PCB with solder mask, you can do it easily this way.  (Now, I don't mean populate your PCB with small BGA ICs all over the place.  Dealing with 1 small IC on each PCB will be easy enough with a healthy tube of flux and a hot air gun will be manageable unless you are building 100s of them.  Then use a solder paste stencil and toaster oven...)

Wow, okay, sounds like I need to do some research into this then! Thanks BrianHG!  :-+


As for the crux of your system, I'll make a reply tonight and you'll laugh as 33% of your problems vanish with around 2 paragraphs of words...

Sounds great!  :popcorn:

There's a bit of confusion in the thread on DVI-D vs. HDMI. DVI-D is a digital video standard, and basically open and well-documented. HDMI is a superset of DVI-D that includes in-band packets to set color spaces, send audio data, communicate remote control data, defend Hollywood against their customers, etc. and basically proprietary and NDA-infested.

DVI-D transmitters won't generate the packets you need for audio, and HDMI transmitter public data sheets are intentionally incomplete. To my mind, that means you would need to build (or borrow) a transmitter on board the FPGA so that you can generate those packets, glean data about the register configuration for an HDMI transmitter chip from public leaks such as Linux kernel drivers, OR give up on HDMI audio and settle for an analog or S/PDIF output.

Haha! Fair enough, so it's not HDMI I'm going for but DVI-D. Looks like I'll have to handle sound through an analogue or S/PDIF output then.

On that note I would strongly caution that you not make any more design or feature decisions until you have hands-on experience with the medium in which you are working. Even then I would wait until I have first picture coming out of the board to decide how to get from spec to implementation. There are subtleties that you might not be considering, such as that block RAMs are still just dozens of individual separate blocks spread over the chip and you (or the synthesis tool) have to use logic and routing to aggregate them, which can get very expensive (in the sense of latency and  logic resources consumed) for a large array.

I'm keen to get started on the practical aspects of the project, but at the moment work and home life are preventing me from having any major blocks of time where I can make a start, so in the meantime I like to glean as many pearls of wisdom from you guys here as possible.  I've got an Altera Cyclone II EP2C5T144 board, I guess I can make a start on that?  The code should transfer to a Spartan 6, or MAX 10 or Lattice chip without major issues at this stage as the code will be VERY basic, and future FPGAs that I transfer the code to will have more resources rather than less.

If you have a 25MHz source, you multiply it by 54 to get a VCO frequency of 1350Mhz, which is in the allowable range of 800MHz to 1600Mhz.

- Divide the VCO by 50 to get 27MHz for your pixel clock
- Divide the VCO by 10 to get your clock to drive the serializers the for DVI-D outputs (using DDR mode).

Marvellous, thanks hamster_nz.  This is only an issue whilst I'm developing the design on an eval/dev board as the clock source is fixed (to 25MHz usually).  When I make my own custom board, I'll use a 27MHz clock source.

Oh, it seems that the HDMI spec owners are sending out takedown notices: https://glenwing.github.io/docs/HDMI-1.4b.pdf

Might pay to look for "hdmi specification 1.4 filetype:pdf" in everybody's favorite web search engine while you can, just for future reference.

Well spotted - thanks for that.  I've got a copy of the specs now.  :-+

If the OP is making his own board, or is expecting to, he can always change the crystal oscillator from 25Mhz to 27Mhz.

Of course - and that's my plan, but before I get to the stage of making my own board, I'll be using an eval/dev board which has the 25MHz clock source. Probably. :)

EDIT:

Thinking back to BrianHG's comments about using the outer pins of the Lattice chip, I thought it might be a good idea to work out how many IO pins I'm going to need..

System side:

Video side:

GPU utility:

That's an bare minimum of 80 I/O pins, not including external GPU memory connections and clock sources for the FPGA.  Is that about right?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 29, 2019, 11:59:31 pm
Haha! Fair enough, so it's not HDMI I'm going for but DVI-D. Looks like I'll have to handle sound through an analogue or S/PDIF output then.
For now. If you transmit TMDS directly from the FPGA rather than use an external TMDS transmitter, you leave your audio options open.

Quote
I'm keen to get started on the practical aspects of the project, but at the moment work and home life are preventing me from having any major blocks of time where I can make a start, so in the meantime I like to glean as many pearls of wisdom from you guys here as possible.  I've got an Altera Cyclone II EP2C5T144 board, I guess I can make a start on that?  The code should transfer to a Spartan 6, or MAX 10 or Lattice chip without major issues at this stage as the code will be VERY basic, and future FPGAs that I transfer the code to will have more resources rather than less.
Ah, but what have you done with the Cyclone 2 board so far? You don't need a very great block of time to get a blinky up. :) You don't need more than a few not-very-great blocks of time to get the system bus-side interface up, maybe map a block RAM or two onto the bus. These are things you can work on without any external RAM or video connections.

System side: looks alright. If you monitor the Z80's multiplexed bus directly, maybe you could narrow the address bus requirement by 8 pins or more. You could also replace a few address lines if you let the memory board do your decoding and run a lead from the unused socket's !CS to the GPU, or mirror the same decoding function on the GPU board. It might go against the spirit of maximum integration, but when you're pin-limited a touch of off-chip logic is sometimes the best approach.

It would be much better if we knew exactly how you were getting a 22-bit address space out of a 16-bit CPU address space. I'm going to wildly guess: you're using an 8-bit writable latch, with its own memory addressing, to choose a 16k block of the 4MB RAM/ROM space to map into a preset 16k of the Z80's address space, with peripherals and/or a BIOS-like ROM fixed-mapped into the rest of the Z80 address space. Would you be better off allocating another 16k or so of the flat Z80 address space to implement an independent bank switching scheme inside the GPU?

Video-audio: 3 is enough for audio output. One clock, one data, one word-select. One pin if you decide to do S/PDIF transmission on the chip.

If using an external transmitter, you also need a pixel clock output. You also might gain from cutting back on the bits per channel. Choose some photos, process them with a posterize filter, and see how few bits you can tolerate. RGB565 is the compromise I'd choose: photos still look very good, fits neatly into 16 bits, not too anachronistic. If you find the slight greening of some greys objectionable, maybe you can drop one of the green bits. The unused LSBs can be tied high or low.

GPU utility: JTAG and boot PROM pins are usually dedicated and not movable. In the final version you will only need one or the other. If using a four-layer board you might be able to escape-route them between power planes on the power layer since they're relatively non-critical. A serial configuration memory is usually a 25 series SPI flash so you only need four signals. Depending on the FPGA chosen, you may need pullups or straps to configuration pins to choose the configuration source. You may be able to treat some or all of the config pins as supply pins, which could keep the layout process (and the layout artist) that much saner.

DRAM: Figure 11 or 12 pins for DRAM addressing, 4, 8, or 16 pins for data, 4-6 for control, and 0-4 for clocking, depending on the particular DRAM technology you use.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 30, 2019, 03:48:48 am

Video side:
  • 26x VSYNC/HSYNC/RGB output (much less if the FPGA is going to produce the DVI-D stream itself)
  • 4?x audio output

Only use 12 or 16 IO for RGB, 15 bit color, you will need a ENABLE out if you will be using the TI TFP410 for DVI.  it accepts multiplexed data meaning you can feed 24 bit color on 12 outputs.  To transmit HDMI direct, you need 8 outputs (4 balanced pairs) plus another 5 IOs for the DDC and hot plug detect features.

-----------------------------------------------------------------------------------------------------------------------------------------------

If you have a 'Cyclone II EP2C5T144', today, you should already have your master sync generator and VGA output generating a frame where you can make a dummy test pattern.

It already has enough internal ram for you to replicate an Atari 400 video generator with all it's graphics capabilities, however, due to simplicity, I would at least make a text mode generator.

Begin by making you raster generator with programmable window margins and a 2 color monochrome text mode.

Here is my first bit of help:
#1 make a sync generator which can make a programmable raster, programmable HS, Programmable VS, + Horizontal enable and vertical enable + the 2 combined for you active video region enable.

#2 I've attached a small character generator I one made.  It uses 20kbit.  Has an Atari 800 font with display ram.  The output is an enable signal for superimposing the image on an existing background plus a 3 bit color image (2 bit for font and 1 bit for higher colors, you may shrink the design to a 1 bit font with a little effort saving you ram bits).  I have not included the external palette.v and video mixer.v as they are 30bit color examples for a larger chip.  Also, I have not included my programable sync generator as I need to re-install quartus to verify I'm giving you the right files.
(I only work in Verilog)

Start with your 640x480, or 720x480 raster generator and look at my osd routine and within quartus, you should be able to generate a symbol file for my osd.....v which you can paste into your block diagram.  Until you wish to write to the text display memory, you can GND those signals.  The included memory initialization files have the OSD Font and a test text dump for the display memory.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 30, 2019, 06:49:58 am


I'm keen to get started on the practical aspects of the project, but at the moment work and home life are preventing me from having any major blocks of time where I can make a start, so in the meantime I like to glean as many pearls of wisdom from you guys here as possible.  I've got an Altera Cyclone II EP2C5T144 board, I guess I can make a start on that?  The code should transfer to a Spartan 6, or MAX 10 or Lattice chip without major issues at this stage as the code will be VERY basic, and future FPGAs that I transfer the code to will have more resources rather than less.

Yes that Cyclone II board is definitely a good place to start if you already have it.

Just bodge wire together a R2R DAC to one of those IO headers and connect it to a SubD 15 connector so that you can plug it into VGA. Believe me you will learn a lot by the time you have a picture on your monitor.

Once you have that you can also bodge wire your Z80 memory bus onto more pins so that you can actually draw on the screen in software, then just build up from there. You will have a much easier time designing the rest of your system once you see what the strong and weak points of FPGAs are.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 30, 2019, 11:41:45 am
Ah, but what have you done with the Cyclone 2 board so far? You don't need a very great block of time to get a blinky up. :)

Funny you should mention that - after mentioning that I'd got a Cyclone 2 board yesterday, I thought I should actually get the thing out of its packaging and try using it!  I got a blinky up and running last night.  Was a worthwhile exercise, too - there were a couple of hurdles to overcome with silly things like installing the right USB drivers for the Blaster and working out fiddly stuff like how to use the pin manager software, but once that was done I progressed from a single blinking light to a 3-bit counter display in a minute or two (the board only has 3 LEDs).

Going to start working through the fpga4fun tutorials over the next couple of days.  :-+

You don't need more than a few not-very-great blocks of time to get the system bus-side interface up, maybe map a block RAM or two onto the bus. These are things you can work on without any external RAM or video connections.

You could also replace a few address lines if you let the memory board do your decoding and run a lead from the unused socket's !CS to the GPU, or mirror the same decoding function on the GPU board. It might go against the spirit of maximum integration, but when you're pin-limited a touch of off-chip logic is sometimes the best approach.

Absolutely.  As part of my mantra of starting small and building up from there, this is where I'll start out - and funnily enough, I was looking at the memory board on my Microcom earlier this morning, daydreaming whilst the PC booted up (as you do), and looking at the empty memory chip socket and the thought occurred that I could run all the connections I'd need for the initial FPGA designs straight from the empty memory socket - that'd cut the system-side IO pins down to 30 at most as I'd only need A0-A18 for the 512KB window.

It would be much better if we knew exactly how you were getting a 22-bit address space out of a 16-bit CPU address space. I'm going to wildly guess: you're using an 8-bit writable latch, with its own memory addressing, to choose a 16k block of the 4MB RAM/ROM space to map into a preset 16k of the Z80's address space, with peripherals and/or a BIOS-like ROM fixed-mapped into the rest of the Z80 address space.

Almost. :-+ My MMU addresses up to 4 MB of memory by intercepting A14 & A15 from the Z80, replacing them with EA14 & EA15 and a further 6 address lines, EA16-EA21, using two 74HCT670 4x4 register files, which means I can map ANY 16 KB physical memory bank into any of the four 16KB areas in the Z80's logical 64 KB memory space.  The bootstrap ROM has to be in the topmost chip socket (due to pull-ups on the EA address lines as the system's initial power-on state is with the MMU off), so at power-on the topmost 16KB bank of physical memory is copied across the entire 64 KB logical memory space.  That's more than enough for the Z80 to execute some bootstrap code, which maps some RAM banks into the logical memory space for the bootstrap to make use of, then turns the MMU on.

My memory cards have four 'sockets' on them for chips that are selected by EA19 & EA20 (well, three as the first one is an SMD SRAM chip as RAM is sort of compulsory in most computer systems).  The other three are DIP sockets.  Each socket can take up to 512 KB chips, so one memory card gives a potential of 2 MB memory space.  I can add an identical memory card to the system which adds another 4 'sockets' and pull a jumper, which allows EA21 to select the memory card, giving 4 MB memory space.

Would you be better off allocating another 16k or so of the flat Z80 address space to implement an independent bank switching scheme inside the GPU?

Don't think so - I can allocate anything up to 512KB to the GPU - unless I add DRAM or SRAM as external memory for the GPU, I can't see me needing all that space to access the GPU's internal video RAM...?

If you have a 'Cyclone II EP2C5T144', today, you should already have your master sync generator and VGA output generating a frame where you can make a dummy test pattern.

It already has enough internal ram for you to replicate an Atari 400 video generator with all it's graphics capabilities, however, due to simplicity, I would at least make a text mode generator.

Grant Searle has an entire Z80/6809 computer AND VGA output that fits into one of these devices.  I was toying with the idea of taking his code and stripping everything but the VGA generation parts out and tweaking it to take external input via a serial interface initially, so that I can play with the generation of characters etc. and get a working video output for my Microcom very quickly that I can then build upon, as Grant's code implements VT-100 emulation, it should be quite flexible and may be a good base to learn from?

Begin by making you raster generator with programmable window margins and a 2 color monochrome text mode.

Here is my first bit of help:
#1 make a sync generator which can make a programmable raster, programmable HS, Programmable VS, + Horizontal enable and vertical enable + the 2 combined for you active video region enable.

This is key to having various screen modes and is certainly on my to-do list!  :-+

#2 I've attached a small character generator I one made.

You have? I can't find it? :-//

It uses 20kbit.  Has an Atari 800 font with display ram.  The output is an enable signal for superimposing the image on an existing background plus a 3 bit color image (2 bit for font and 1 bit for higher colors, you may shrink the design to a 1 bit font with a little effort saving you ram bits).  I have not included the external palette.v and video mixer.v as they are 30bit color examples for a larger chip.  Also, I have not included my programable sync generator as I need to re-install quartus to verify I'm giving you the right files.
(I only work in Verilog)

Thank you very much for your help.  I'm very much a visual learner and having an example I can see working helps my understanding massively.  I'm not precious over Verilog/VHDL to be honest, will likely go with whichever seems most logical to me.

Start with your 640x480, or 720x480 raster generator and look at my osd routine and within quartus, you should be able to generate a symbol file for my osd.....v which you can paste into your block diagram.  Until you wish to write to the text display memory, you can GND those signals.  The included memory initialization files have the OSD Font and a test text dump for the display memory.

Thanks again - it's immensely valuable just seeing how these projects fit together at the moment, so anything like this is a great help.  :phew:

Just bodge wire together a R2R DAC to one of those IO headers and connect it to a SubD 15 connector so that you can plug it into VGA. Believe me you will learn a lot by the time you have a picture on your monitor.

Once you have that you can also bodge wire your Z80 memory bus onto more pins so that you can actually draw on the screen in software, then just build up from there. You will have a much easier time designing the rest of your system once you see what the strong and weak points of FPGAs are.

There are weak points to FPGAs?  :o ;)

Well, that's the plan.  I'm hoping I can make some progress over the next few days - will post some pics if I can.  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 30, 2019, 12:13:18 pm

If you have a 'Cyclone II EP2C5T144', today, you should already have your master sync generator and VGA output generating a frame where you can make a dummy test pattern.

It already has enough internal ram for you to replicate an Atari 400 video generator with all it's graphics capabilities, however, due to simplicity, I would at least make a text mode generator.

Grant Searle has an entire Z80/6809 computer AND VGA output that fits into one of these devices.  I was toying with the idea of taking his code and stripping everything but the VGA generation parts out and tweaking it to take external input via a serial interface initially, so that I can play with the generation of characters etc. and get a working video output for my Microcom very quickly that I can then build upon, as Grant's code implements VT-100 emulation, it should be quite flexible and may be a good base to learn from?

Begin by making you raster generator with programmable window margins and a 2 color monochrome text mode.

Here is my first bit of help:
#1 make a sync generator which can make a programmable raster, programmable HS, Programmable VS, + Horizontal enable and vertical enable + the 2 combined for you active video region enable.

This is key to having various screen modes and is certainly on my to-do list!  :-+

#2 I've attached a small character generator I one made.

You have? I can't find it? :-//
Just click on the link at the bottom of my post, it says " * Altera_OSD_20kbit.zip (12.86 kB - downloaded 0 times.) "
Quote
It uses 20kbit.  Has an Atari 800 font with display ram.  The output is an enable signal for superimposing the image on an existing background plus a 3 bit color image (2 bit for font and 1 bit for higher colors, you may shrink the design to a 1 bit font with a little effort saving you ram bits).  I have not included the external palette.v and video mixer.v as they are 30bit color examples for a larger chip.  Also, I have not included my programable sync generator as I need to re-install quartus to verify I'm giving you the right files.
(I only work in Verilog)

Thank you very much for your help.  I'm very much a visual learner and having an example I can see working helps my understanding massively.  I'm not precious over Verilog/VHDL to be honest, will likely go with whichever seems most logical to me.

Start with your 640x480, or 720x480 raster generator and look at my osd routine and within quartus, you should be able to generate a symbol file for my osd.....v which you can paste into your block diagram.  Until you wish to write to the text display memory, you can GND those signals.  The included memory initialization files have the OSD Font and a test text dump for the display memory.

Thanks again - it's immensely valuable just seeing how these projects fit together at the moment, so anything like this is a great help.  :phew:

Stick with 1 video mode of 720x480.  Your different video modes will just be different pixel multiples sitting on top of that master reference mode.

Also, write your own VGA raster generator in Verilog from scratch.  You will need the learning experience and there will be special additions you will need to make for you 8bit bitplane video modes and things like sprites in the future which will be nothing but headaches if you are trying to dissect someone else coding style while sticking/adding your own stuff  on top.

A VGA generator is so simple that you should be posting you code if you have problems here on this forum asking for help and recommendations.

The OSD generator I gave you illustrates a simple level of using Altera's dual port memories & the effort I went through to deal with the fact that to get the top clock or FMAX capabilities, you need to clock latch the address going into the ram as well as clock the data coming out making a 2 clock delay from address to valid data. You will see I have additional delay latches along the video pipe to synchronize the control signals coming in to a new set of output delayed controls which are parallel with the font generated image.

The OSD generator I made is just to give you a boost and play.  You will be remaking your own eventually using 1 huge block of memory where you can selectively choose the base memory for the font as well as where each line of the display output points to in the same huge chunk of memory.

--------------------
Note in my osg_generator.v code, the HS it takes in needs to be only 1 pixel wide, and the generator will start spitting out pixels within 5 clocks.  This means that within your VGA sync generator, you need to have a dedicated horizontal pulse output just for the OSD generator.  This pulse should be in the middle of the active video display window, not at the beginning of the actual horizontal sync. Same goes for the VS signal on my OSD generator input.  It should pulse for 1 line of video prior to the beginning of the line you want your display output.  I recommend you make these 2 horizontal and vertical pulse position programmable so you can move my test box around the screen with just the beginning 2 coordinates.
As you add sprites, other video modes and potential programmable vertical interrupts, these will just be additional dedicated programmable X and Y coordinates which will be nothing more than and equality compare of your VGA sync generator's horizontal and vertical counters to a set of registers which your CPU can address and write new figures into.

I assume you are designing you own video generator and not trying to perfectly replicate an old EGA video card, correct?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 30, 2019, 12:41:57 pm
There are weak points to FPGAs?  :o ;)

Well, that's the plan.  I'm hoping I can make some progress over the next few days - will post some pics if I can.  ;D

That they are expensive, complicated things that are a pain in the ass to program for.

But in exchange for that they can do some digital jobs that no other kind of chip out there can do. That's why FPGAs are only used in cases where the thing cant be done using a MCU or special digital chip. And your application of a custom video card certainly is something that only a FPGA can do.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 30, 2019, 01:11:30 pm
O.T.
does anyoneone happen to have doc regarding the timing of CGA frames?
it's for a crazy project, currently suspended, about interfacing a CGA-tube to a VDU on FPGA
VDU means "text only" display.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 30, 2019, 01:24:23 pm
O.T.
does anyoneone happen to have doc regarding the timing of CGA frames?
it's for a crazy project, currently suspended, about interfacing a CGA-tube to a VDU on FPGA
VDU means "text only" display.
I believe they match 240p RGB.  Same as 240p NTSC.
https://en.wikipedia.org/wiki/Color_Graphics_Adapter (https://en.wikipedia.org/wiki/Color_Graphics_Adapter)
Use a pixel clock of 14.31818Mhz and copy the line size, HS and VS timing of NTSC.
I think it's 910 horizontal and 262 lines vertical.  You would need to search NTSC's display window region and place you picture inside that...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 30, 2019, 02:47:49 pm
#2 I've attached a small character generator I one made.

You have? I can't find it? :-//
Just click on the link at the bottom of my post, it says " * Altera_OSD_20kbit.zip (12.86 kB - downloaded 0 times.) "

I must be going mad then, because I'm sure it wasn't there when I looked last time!  :-DD

Also, write your own VGA raster generator in Verilog from scratch.  You will need the learning experience and there will be special additions you will need to make for you 8bit bitplane video modes and things like sprites in the future which will be nothing but headaches if you are trying to dissect someone else coding style while sticking/adding your own stuff  on top.

Yes, fair point.  I've got some tutorials to work through first to learn the basics of VHDL/Verilog, but also doing this myself (mostly - will likely need some advice on the way!) will also be more of an achievement, too.  You can't beat knowing your code inside out.

The OSD generator I made is just to give you a boost and play.  You will be remaking your own eventually using 1 huge block of memory where you can selectively choose the base memory for the font as well as where each line of the display output points to in the same huge chunk of memory.

Thanks for the boost and all the help so far in this thread. I feel like I've got the spec for the goal I want to achieve, I just need to learn to play football now.  ;D

I assume you are designing you own video generator and not trying to perfectly replicate an old EGA video card, correct?

Definitely just designing my own video generator - I have no intentions of trying to replicate any other type of video card (unless it's a good design that would be foolish to ignore).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 30, 2019, 06:39:18 pm
@BrianHG
thanks!  :D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 31, 2019, 10:11:56 am
Almost. :-+ My MMU addresses up to 4 MB of memory by intercepting A14 & A15 from the Z80, replacing them with EA14 & EA15 and a further 6 address lines, EA16-EA21, using two 74HCT670 4x4 register files, which means I can map ANY 16 KB physical memory bank into any of the four 16KB areas in the Z80's logical 64 KB memory space.  The bootstrap ROM has to be in the topmost chip socket (due to pull-ups on the EA address lines as the system's initial power-on state is with the MMU off), so at power-on the topmost 16KB bank of physical memory is copied across the entire 64 KB logical memory space.  That's more than enough for the Z80 to execute some bootstrap code, which maps some RAM banks into the logical memory space for the bootstrap to make use of, then turns the MMU on.

My memory cards have four 'sockets' on them for chips that are selected by EA19 & EA20 (well, three as the first one is an SMD SRAM chip as RAM is sort of compulsory in most computer systems).  The other three are DIP sockets.  Each socket can take up to 512 KB chips, so one memory card gives a potential of 2 MB memory space.  I can add an identical memory card to the system which adds another 4 'sockets' and pull a jumper, which allows EA21 to select the memory card, giving 4 MB memory space.
Well done.  :clap:

512kB of video memory should be plenty.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 31, 2019, 10:58:01 am
Okay, some code...  Not sure how after all this talk of VHDL, but I've ended up writing this in Verilog!  :-DD

I've made a stab at writing the HSYNC and VSYNC counter, attached below.  Obviously, on its own it won't do anything but I'm working on the topmost component at the moment - I just want to check I'm going down the right lines and using the right timing values.  :)

EDIT:

Also, with all the tutorials and projects floating around on the internet, it's almost impossible to make this 100% my own work - which clearly this isn't - but at least I understand what I'm doing at the moment and am picking up the Verilog syntax.   ^-^
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: legacy on October 31, 2019, 11:11:00 am
Quote
For 60Hz VGA pix_clk should be 25.175MHz, but 25MHz should work?

Usually yes. VGA monitors are tollerant.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 31, 2019, 12:28:14 pm
That verilog code looks pretty good for being your first time into it.

My only complaint is how reset is done. "if(reset)" should also have en else block under it so that it stops everything else from happening during reset. With these <= assignments only the last one  is executed, so high "pix_clk" signal would ignore reset. Also generally reset is also included in the sensitivity list on the always @ () to make it a asynchronous reset (This means reset works even when the clock is not running)

Also i would not give "pix_clk"  that name as it is not a clock. If you leave pix_clk high for more than 1 clock cycle you end up moving more than one pixel. So its behavior is more akin to a enable signal. I would call that something like pix_enable or pix_valid or pix_increment. Also the module name could be a bit more descriptive such as "sync_gen_640x480" or just call it "sync_generator" and turn the relevant "localparam" into "parameter". That way it will be 640x480 by default but upon instantiating the module you can add "#(320,240)" on the end and it will reconfigure it for that resolution without changing any code in this file. This is formally called a parameterized module.

In general old CRTs will take pretty much any timing you throw at it, be it weird porch times or a framerate of 98,777366 Hz it will display it all. Just as long as your timings are within reason for the CRTs cirucitry. So no things like frame rates of 5Hz or 1000Hz or having 10 or 10000 lines. Modern LCDs on the other hand will be quite picky on what you feed them as they have crappy upscalers that only know how to upscale certain resolutions properly and the LCD controller might get upset about unreasonably fast or slow timings a lot easier.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 31, 2019, 01:23:19 pm
Also generally reset is also included in the sensitivity list on the always @ () to make it a asynchronous reset (This means reset works even when the clock is not running)
It is a very bad idea to "generally" do that in FPGA designs because many internal blocks only have synchronous resets. I have seen many designs fail timing only because of these asynchronous resets!  Only do it when you absolutely need, and you know what you're going and are aware of consequences!
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 31, 2019, 01:27:00 pm
That verilog code looks pretty good for being your first time into it.

Thanks Berni, but I can't claim all the credit - I've referred to a couple of different sources to put this code together.

My only complaint is how reset is done. "if(reset)" should also have en else block under it so that it stops everything else from happening during reset. With these <= assignments only the last one  is executed, so high "pix_clk" signal would ignore reset.

As okay, I understand.  I just intended reset to zero the counters, but I guess stopping everything from happening whilst reset is active will completely clear the display too for some feedback to the user.  reset is active low on the board, but I'm inverting it in the top-level module before it gets to the sync generator code here.

Also generally reset is also included in the sensitivity list on the always @ () to make it a asynchronous reset (This means reset works even when the clock is not running)

Thanks  :-+. It's tips like this I wouldn't otherwise pick up until something went wrong.

Also i would not give "pix_clk"  that name as it is not a clock. If you leave pix_clk high for more than 1 clock cycle you end up moving more than one pixel. So its behavior is more akin to a enable signal. I would call that something like pix_enable or pix_valid or pix_increment. Also the module name could be a bit more descriptive such as "sync_gen_640x480" or just call it "sync_generator" and turn the relevant "localparam" into "parameter". That way it will be 640x480 by default but upon instantiating the module you can add "#(320,240)" on the end and it will reconfigure it for that resolution without changing any code in this file. This is formally called a parameterized module.

All good points and the code is amended accordingly (updated file attached).

In general old CRTs will take pretty much any timing you throw at it, be it weird porch times or a framerate of 98,777366 Hz it will display it all. Just as long as your timings are within reason for the CRTs cirucitry. So no things like frame rates of 5Hz or 1000Hz or having 10 or 10000 lines. Modern LCDs on the other hand will be quite picky on what you feed them as they have crappy upscalers that only know how to upscale certain resolutions properly and the LCD controller might get upset about unreasonably fast or slow timings a lot easier.

Yeah, I'll be testing the FPGA out with an LCD screen so it waits to be seen how picky it is with the timings...  :-BROKE

I've also included the top-level module, vga01.v, for feedback.  Am using ASIC World (http://www.asic-world.com/verilog/index.html) as my Verilog reference.

Also generally reset is also included in the sensitivity list on the always @ () to make it a asynchronous reset (This means reset works even when the clock is not running)
It is a very bad idea to "generally" do that in FPGA designs because many internal blocks only have synchronous resets. I have seen many designs fail timing only because of these asynchronous resets!  Only do it when you absolutely need, and you know what you're going and are aware of consequences!

Aww jeez, Rick! Should I / shouldn't I?  :-//
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: asmi on October 31, 2019, 02:07:14 pm
Aww jeez, Rick! Should I / shouldn't I?  :-//
Only use async reset if you need to. At least when working with Xilinx FPGAs. Mosts of their hard blocks (BRAM, DSP, SERDES) only have synchronous resets, so you will have to re-synchronize it internally anyways to reset these blocks. If my memory serves me, the only internal blocks that do have async reset are PLL/MCMM and flip-flops.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 31, 2019, 03:05:32 pm
That verilog code looks pretty good for being your first time into it.

Thanks Berni, but I can't claim all the credit - I've referred to a couple of different sources to put this code together.

My only complaint is how reset is done. "if(reset)" should also have en else block under it so that it stops everything else from happening during reset. With these <= assignments only the last one  is executed, so high "pix_clk" signal would ignore reset.

As okay, I understand.  I just intended reset to zero the counters, but I guess stopping everything from happening whilst reset is active will completely clear the display too for some feedback to the user.  reset is active low on the board, but I'm inverting it in the top-level module before it gets to the sync generator code here.

Copy paste is how most programming is done anyway. As long as you understand what the code does that you copy pasted you are fine. I think learning coding by example is the best way to go

And yes zeroing the counters looks like was your intention, but if the pix_clk is high the reset signal will have no effect if its 1 or 0 because the bottom part of the code will continue counting the registers. Its only when pix_clk is 0 that the reset will actually reset the counters. This is an important thing to know in Verilog since <= and = assignments work in different ways. Tho a reset is not really needed that much here since the counters could initialize in any random state and would eventually get back into its normal counting sequence once it gets to the end. Tho it tend to be good practice to start from a known reset state.

And yes a active high reset is usually used inside FPGAs, unlike the other way around on chips. There are architecture reasons for that and it makes more sense to just write "if(reset)"

Aww jeez, Rick! Should I / shouldn't I?  :-//
Only use async reset if you need to. At least when working with Xilinx FPGAs. Mosts of their hard blocks (BRAM, DSP, SERDES) only have synchronous resets, so you will have to re-synchronize it internally anyways to reset these blocks. If my memory serves me, the only internal blocks that do have async reset are PLL/MCMM and flip-flops.
Aww jeez, Rick! Should I / shouldn't I?  :-//
Only use async reset if you need to. At least when working with Xilinx FPGAs. Mosts of their hard blocks (BRAM, DSP, SERDES) only have synchronous resets, so you will have to re-synchronize it internally anyways to reset these blocks. If my memory serves me, the only internal blocks that do have async reset are PLL/MCMM and flip-flops.

Okay i admit i should have been a bit more clear about the reasons behind it.

This syncshornous vs synchronous reset is unfortunately quite chip architecture specific. Yes it is true that a lot of hardware blocks in FPGAs are synchronous reset only, but those are usually instantiated separately outside of always@ blocks and tend to have there reset signal directly connected to whatever your master system reset signal is. As such they don't tend to care what kind of reset it is, but it is strongly recommended that the edge coming out of reset is synchronized to the clock (Makes sure they all come out of reset on the same clock edge), so you will tend to have a reset synchronizer block all the way at the reset signal source to clean it up.

But where it does make a difference is to the logic inside the always@ statement as its behavior is directly affected by what signals are on the sensitivity list inside the (). Because different FPGAs have different LEs (Logic Elements) means that such logic might get implemented in a different way. I will admit i have only worked with Xilinx a few times so i can't say for sure whats best there. But for most of the lineup from Lattice and from Altera, including the Cyclone II they have LE blocks that include asynchronous clear signals. This means that a asynchronous clear has zero cost to the timing margin while a synchronous clear requires a extra AND gate in front. However this only works for clearing registers to zero with a active high reset signal, trying to set them to 1 with a active low reset signal would add extra logic that can slow things down.

But if you are not trying to push the design to high speeds it probably does not matter. But if you ignore the architecture i agree that synchronous resets are the safer way to go.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on October 31, 2019, 08:31:18 pm
Hmm.. I'm obviously missing something from the Quartus II manual, as I'm getting warnings when I try to compile the code I attached earlier.  I'm getting:

Code: [Select]
Warning (10230): Verilog HDL assignment warning at sync_generator.v(60): truncated value with size 32 to match size of target (10)
Warning (10230): Verilog HDL assignment warning at sync_generator.v(61): truncated value with size 32 to match size of target (10)
Warning (10230): Verilog HDL assignment warning at sync_generator.v(77): truncated value with size 32 to match size of target (10)
Warning (10230): Verilog HDL assignment warning at sync_generator.v(80): truncated value with size 32 to match size of target (10)

... which is referring to these lines:

Code: [Select]
60: assign x = (h_count < DA_STA) ? 0 : (h_count - DA_STA); // x is zero if current pixel is before display area
61: assign y = (v_count >= VA_END) ? (VA_END - 1) : (v_count); // y is VA_END-1 if v_count is outside display area

77: v_count <= v_count + 1;

80:   h_count <= h_count + 1;

It seems the warning is due to the literal values '0' and '1'?  Any way around that?

I'm getting similar warnings for unsigned ints in vga01.v as well.

Can't seem to bottom-out these warnings though:
Code: [Select]
Warning (10034): Output port "VGA_R[0]" at vga01.v(12) has no driver
Warning (10034): Output port "VGA_G[0]" at vga01.v(13) has no driver
Warning (10034): Output port "VGA_B[0]" at vga01.v(15) has no driver

I have assigned the signals to pins in the Pin Manager, but I'm clearly missing something else important as well?

Oh, and:

Code: [Select]
Warning (13024): Output pins are stuck at VCC or GND
Warning (13410): Pin "VGA_R[0]" is stuck at GND
Warning (13410): Pin "VGA_G[0]" is stuck at GND
Warning (13410): Pin "VGA_B[0]" is stuck at GND

I'm Googling for advice as well, but thought I'd pop these here in case anyone following along later comes up against the same problems.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on October 31, 2019, 09:01:58 pm
It seems the warning is due to the literal values '0' and '1'?  Any way around that?
By googling quartus warning 10230 I found Intel recommending (https://www.intel.com/content/www/us/en/programmable/support/support-resources/knowledge-base/solutions/rd06102014_970.html) explicit width specification on constants to suppress the warning.

Quote
I have assigned the signals to pins in the Pin Manager, but I'm clearly missing something else important as well?
I consider warnings in a synthesizer as polite FYIs. Unfortunately, that means I have to think when reviewing the warnings to see if they really are a problem, to see if something I really needed got simplified away.

The 10034 might be a real problem. Can we have a look at vga01.v? And your top file? Are you actually driving them? Do you need to drive them? The 13410 will probably not be a real problem of its own, if it resolves once you have provided drivers for that output port, and if you want them "stuck" at 0.

Another note, it helps readability to use the C conventions for the case of constants (all caps) vs. variables (not all caps).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on October 31, 2019, 09:42:41 pm
Although:
Code: [Select]
// generate sync signals (active low for 640x480)
assign hsync = ~((h_count >= HS_STA) & (h_count < HS_END));
assign vsync = ~((v_count >= VS_STA) & (v_count < VS_END));
Is absolutely correct, however, the >= comparison and < do eat up gates when compiling and your output driving a pin will be a combination unlatched logic equation directly based on your output counter bits.  To get a clean output and better F-Max allowing Quartus's Fitter to better route the logic in your FPGA to meet timing constraints, I would:
A) Make Hsync and Vsync a register.
B) Inside your if (pix_enable), make the logic look like this

if (h_count == HS_STA) begin
    hsync <= 1;
 end else if (h_count == HS_END) hsync <= 0;
if (V_count == VS_STA) begin
    vsync <= 1;
 end else if (h_count == VS_END) vsync <= 0;

Remember, the == only uses 10 xor gates + a 10 input and gate since you are comparing 2 10 bit numbers.  With your GREATER than EQUALS and LESS Than, because of the fixed constants you chose, it may compile to a small number of gates with high efficiency maximum clock frequency (FMAX), but, once you change all your 'localparam's to registers which your CPU can address and change, including default 'reset' parameters, the >= and < will eat up gates and slow down your maximum possible FMAX.  (You may only be thinking about 25Mhz now, however, if you want your core to run at 200Mhz, with you pix_enable running once every 8 clocks, still a 25Mhz pixel, such an optimization will probably be needed further down the road, while just using the resources of your 8bit CPU to do the math when writing these new registers)

*Being registers, this now means that those outputs are delayed by 1 pixel clock, but, those outputs will be cleanly timed to the rising edge of your fpga PLL internal clock as those DFF register outputs don't have any logic gates to go through, those DFF register outputs will feed the IO pin directly.  (Though, I usually add another DFF stage/pipe to all my DAC and sync outputs allowing the compiler/fitter to bury the logic anywhere deep in the FPGA, the use each pins macrocell flipflops registers to drive the pin itself generating the cleanest possible outputs.)

As for your Assign X&Y, again, doing all that realtime arithmetic is not a problem:
Code: [Select]
// keep x and y bound within the display area
assign x = (h_count < DA_STA) ? 0 : (h_count - DA_STA); // x is zero if current pixel is before display area
assign y = (v_count >= VA_END) ? (VA_END - 1) : (v_count); // y is VA_END-1 if v_count is outside display area

However, like the HSync and Vsync, personally, I make these outputs a single bit which turns on or off using the simple equality == trick I used to generate the Hsync and Vsync.  I rely on my video graphics generator to have an address generator which may increment count every single, every second, or every third/fourth line.

Also, not just that one master image X&Y output enable which is REQUIRED by all DVI transmitters and which may also be used as an image crop mute to protect the video output borders for analog VGA dacs (outside the active video area, the RGB level must be always black), I usually would have something like another 16 programmable of these X&Y on and off output flags which may be used to position sprites or superimposed hardware video windows.  Among these 16, which may be positioned after the active video region output, allotting a time slot which may be used to signal memory access cycles to fill your audio DAC buffer, or signal load a new set of X&Y video memory pointers from memory which change the beginning video read address of the next line of video allowing you to dynamically have a new video address and video mode on each new video line.  (EG like the Atari 800 display list which allows for a few lines of text mode, then a few with graphics, then more text at a different resolution, or Amiga pulldown screens with a vertically scroll-able window on top of another, each with a different graphics mode pointing to somewhere different in memory.)

Also, in my designs, I call the 'pix_enable' pclk_ena, or pena, even though I expect the reset to activate even if the pena is held low.

If you still directly want to use you h_count and v_count as reference pixel counters and get rid of the ensuing subtraction math to improve FMAX, I would move your hsync and vsync to the end of your counters so that 0 through 639 and 0 through 479 are passed through without any math while the blank front & back porches with syncs come at then end after pixel 640 and line 480.  I personally still prefer the 1 bit enables which can actually tell you ensuing logic when to start and stop reading, or when there is free time to read memory for other purposes.  With your current code, your logic will continue reading pixel 0x0 whenever there is no video being outputted.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on October 31, 2019, 11:34:17 pm
Yeah those are just warnings about a few things.  By default numbers are 32bit so technically you should use the 1'b1 to make a 1bit number, but functionally there is no difference because it gets truncated to 1 bit anyway.

The other warning about no driver is because you are only setting the bit 1 of your RGB output ports while bit 0 is never set to anything. This is usually a mistake so it warns about it. Since it does not know what to do with it it sets them to be 0 and also warns that it did that.

HDL compilers love to spit out warnings about all sorts of things so you usually won't get a zero warning zero error compilation like you tend to strive for in most other programming, If nothing else it will spit out a warning that it is not fully licensed.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: langwadt on November 01, 2019, 12:08:35 am
HDL compilers love to spit out warnings about all sorts of things so you usually won't get a zero warning zero error compilation like you tend to strive for in most other programming, If nothing else it will spit out a warning that it is not fully licensed.

and it is freaking annoying, because sometimes somewhere  in the ten pages of silly warnings something important is hidden
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 01, 2019, 03:52:45 am
HDL compilers love to spit out warnings about all sorts of things so you usually won't get a zero warning zero error compilation like you tend to strive for in most other programming, If nothing else it will spit out a warning that it is not fully licensed.

and it is freaking annoying, because sometimes somewhere  in the ten pages of silly warnings something important is hidden

Hun, yes, with Altera, I turned off the 32bit integer warning, however, when I compile, I get no other warnings.
I used to when I started, but I once I started coding with individual projects for individual .v modules, and I coded for 0 warnings, (only after around 4 years of using Quartus) no more warnings, even when my main project use all the child .v modules into my authentic design.

It's a hassle, but it locks you down to proper coding and using the tool properly.  And now, only important warnings or errors show up in the build log.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 01, 2019, 11:32:08 am
It seems the warning is due to the literal values '0' and '1'?  Any way around that?
By googling quartus warning 10230 I found Intel recommending (https://www.intel.com/content/www/us/en/programmable/support/support-resources/knowledge-base/solutions/rd06102014_970.html) explicit width specification on constants to suppress the warning.

Yes, found the same page last night - the code doesn't read as clearly now, but by replacing 1's and 0's with 1'b1s and 1'b0s, the warnings have mostly gone away now.  :-+

Quote
I have assigned the signals to pins in the Pin Manager, but I'm clearly missing something else important as well?
I consider warnings in a synthesizer as polite FYIs. Unfortunately, that means I have to think when reviewing the warnings to see if they really are a problem, to see if something I really needed got simplified away.

Chalk it up to my inexperience in this field, and my not-insignificant OCD when it comes to compilation errors.  |O

Can we have a look at vga01.v? And your top file? Are you actually driving them? Do you need to drive them? The 13410 will probably not be a real problem of its own, if it resolves once you have provided drivers for that output port, and if you want them "stuck" at 0.

Of course, I attached both files earlier in the thread.  I'll probably attach updated copies in this message as well.

Another note, it helps readability to use the C conventions for the case of constants (all caps) vs. variables (not all caps).

The dangers of cut 'n' paste code.  ;)

Although:
Code: [Select]
// generate sync signals (active low for 640x480)
assign hsync = ~((h_count >= HS_STA) & (h_count < HS_END));
assign vsync = ~((v_count >= VS_STA) & (v_count < VS_END));
Is absolutely correct, however, the >= comparison and < do eat up gates when compiling and your output driving a pin will be a combination unlatched logic equation directly based on your output counter bits.  To get a clean output and better F-Max allowing Quartus's Fitter to better route the logic in your FPGA to meet timing constraints, I would:
A) Make Hsync and Vsync a register.
B) Inside your if (pix_enable), make the logic look like this

if (h_count == HS_STA) begin
    hsync <= 1;
 end else if (h_count == HS_END) hsync <= 0;
if (V_count == VS_STA) begin
    vsync <= 1;
 end else if (h_count == VS_END) vsync <= 0;

Okay, I think I follow, although I've had to go off and look at Blocking and Non-Blocking, registers vs wires and all kinds of stuff.  Can't remember who said it, but they were right when they said I'd learn a load of stuff just getting a blank screen to display... I haven't even gotten near wiring this up yet!  Attached at the bottom are the two updated files with all the changes made.

Can I assume that the non-blocking <= assignment operators are okay in the context in which I'm using them?

*Being registers, this now means that those outputs are delayed by 1 pixel clock, but, those outputs will be cleanly timed to the rising edge of your fpga PLL internal clock as those DFF register outputs don't have any logic gates to go through, those DFF register outputs will feed the IO pin directly.  (Though, I usually add another DFF stage/pipe to all my DAC and sync outputs allowing the compiler/fitter to bury the logic anywhere deep in the FPGA, the use each pins macrocell flipflops registers to drive the pin itself generating the cleanest possible outputs.)

Sheesh there is so much to learn...  :o

As for your Assign X&Y, again, doing all that realtime arithmetic is not a problem:
Code: [Select]
// keep x and y bound within the display area
assign x = (h_count < DA_STA) ? 0 : (h_count - DA_STA); // x is zero if current pixel is before display area
assign y = (v_count >= VA_END) ? (VA_END - 1) : (v_count); // y is VA_END-1 if v_count is outside display area

However, like the HSync and Vsync, personally, I make these outputs a single bit which turns on or off using the simple equality == trick I used to generate the Hsync and Vsync.  I rely on my video graphics generator to have an address generator which may increment count every single, every second, or every third/fourth line.

You lost me here.  x and y are wires with 10-bit values.  I can't make them a single on/off bit - unless you're talking about creating a couple more registers to use as single-bit flags to indicate whether we're in the display area or not?

Also, not just that one master image X&Y output enable which is REQUIRED by all DVI transmitters and which may also be used as an image crop mute to protect the video output borders for analog VGA dacs (outside the active video area, the RGB level must be always black), I usually would have something like another 16 programmable of these X&Y on and off output flags which may be used to position sprites or superimposed hardware video windows.  Among these 16, which may be positioned after the active video region output, allotting a time slot which may be used to signal memory access cycles to fill your audio DAC buffer, or signal load a new set of X&Y video memory pointers from memory which change the beginning video read address of the next line of video allowing you to dynamically have a new video address and video mode on each new video line.  (EG like the Atari 800 display list which allows for a few lines of text mode, then a few with graphics, then more text at a different resolution, or Amiga pulldown screens with a vertically scroll-able window on top of another, each with a different graphics mode pointing to somewhere different in memory.)

Ah okay, I understand.  Well, I'm trying to crawl before I walk or run, so I'm not going to add in loads of elements that I won't be using for a while, but that's certainly a consideration down the line. 

If you still directly want to use you h_count and v_count as reference pixel counters and get rid of the ensuing subtraction math to improve FMAX, I would move your hsync and vsync to the end of your counters so that 0 through 639 and 0 through 479 are passed through without any math while the blank front & back porches with syncs come at then end after pixel 640 and line 480.  I personally still prefer the 1 bit enables which can actually tell you ensuing logic when to start and stop reading, or when there is free time to read memory for other purposes.  With your current code, your logic will continue reading pixel 0x0 whenever there is no video being outputted.

Sure thing - okay, so I've moved the display area to the start of the v_count and h_count range; hopefully I've got the localparams right for the relevant values, I've added a 'blanking' output to the sync_generator which goes high whenever the generator is in the vertical blanking area.  No doubt I'll need to add one for h_count as well to make sure the RGB values are black when in those areas.

EDIT:

Obviously there's errors in the attached files - I'm working through those right now.  ::)

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 01, 2019, 01:46:01 pm
Updated files without the errors!  :o
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: SiliconWizard on November 01, 2019, 04:38:47 pm
Just a thought regarding the CPU part itself. Of course you could consider using a Z80 core in the FPGA you're going to use. But if you still want a true CPU IC, you could consider the Z180 series. They are software compatible with the Z80, but contain many improvements and embedded peripherals, and there are 3.3V versions as well. The Z8L180 is readily available for instance (at least at Mouser)... so unless you really want to go for a true, vintage Z80, that would be sometthing to consider.

https://en.wikipedia.org/wiki/Zilog_Z180
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 01, 2019, 10:48:21 pm
Just a thought regarding the CPU part itself. Of course you could consider using a Z80 core in the FPGA you're going to use. But if you still want a true CPU IC, you could consider the Z180 series. They are software compatible with the Z80, but contain many improvements and embedded peripherals, and there are 3.3V versions as well. The Z8L180 is readily available for instance (at least at Mouser)... so unless you really want to go for a true, vintage Z80, that would be sometthing to consider.

https://en.wikipedia.org/wiki/Zilog_Z180

I still might create a CPU card with the Z180 on it - I've got a couple of Z180's and Z280's lying around with that purpose in mind, but for the moment I'm using a good old-fashioned DIP Z80 chip (a 10MHz one admittedly, but otherwise quite similar to my first computer).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 03, 2019, 01:29:34 am
Ok, what the blocking statement ( <= ) does and a bit on making optimized fast compact code:  What I'm describing here is a little of whats going on inside the compilers mind, and how it wires gates in the FPGA according to the example verilog code.  Remember, there is no single core math unit in these FPGAs, whatever you write is literally constructed by wiring masses of gates inside the FPGA.  All these wires have lengths, propagation delays, and the gates' inputs have capacitive loads which further slows everything down.  This primer gives you an example of coding technique to be aware of to achieve a minimal gate count FPGA core which is fast as possible and lowest power consuming design.  (If you wanted a central math processing unit, you literally need to code one into existence and wire it up into your design.)

See attached image: (if you got a color printer, print this one)
[attach=1]

There are 3 simplified versions of code in the illustration.

In V1, the 'hsync' is generated just like in the op 'nockieboy' first 'Sync Generator' code.
You have the 10 bit register ''[9:0]''  h_count, which is truly 10 D flipflops.  At the 10 data inputs of the 10 D flipflops, all parallel clocked by the (positive edge of 'clk'), the 10 Q outputs h_count is sent through a bunch of gates added to a fixed 1 bit value of '1', with that new 10 bit result fed back into the 10 D inputs of the 10 D flipflops which is the 10 bit register 'h_count'

Now, to make that 'hsync' wire, those 10 Qs of h_count need to be wired to a set of gates which subtract it from the 10 bit parameter number 'HS_STA' to check to see if the result is not negative, while more wiring of the 10 Qs of 'h_count' are fed into another mass of gates to SIMULTANEOUSLY subtract that same 10 bit 'h_count' with the 10 bit parameter value of 'HS_END' to see if the result is negative, and finally, those 2 comparisons checks are ANDed together, inverted generating a final single WIRE result labeled 'hsync'.

To understand what is happening, the compiler and fitter has to wire up the FPGA with all these gates to make to 2 comparisons as well as the +1 for the counter itself.  Now, find a TTL data book with a 8bit - 8bit adder/subtract with negative flag.  How many gates are there?  The FPGA has to do this 2 fold, with 10 bit numbers each.  (Ignoring the compiler optimizations for now.)  With all that signalling and load, how clean will the 'hsync' wire be if it is driving an IO pin and any other set of logic gates in your design?   Remember, things like load and delay of routing all these wires in the FPGA are taken into account even though you as the programmer aren't specifically told by the compiler other than after a design is compiled, X number of MHz is the best your design will achieve without errors.

In V2, this one is identical to V1 except that I changed 'hsync' from a wire into a register.  The reason why this is an improvement is because, throughout the FPGA, all of the 'D' flipflops clock inputs have a dedicated hard-wiring making sure that all their outputs are parallel.  Example, if your clock is 1 MHz, at the D input of the D flipflop register which is the 'hsync' register, when a clock comes in, the 10 bit 'h_count' with a matched clock will change to it's new value, those 10 bits will go through the mass of gates which are the 2 subtractions and & gate to determine if the number is within 'HS_STA' and 'HS_END', then that result will be presented to the D input of the 'hsync' D flipflop register, however, (and this is the SYNCHRONOUS magic), the Q of the 'hsync' register will not change until the next clock.  This means that after all the number comparing, if the result takes 1ns, 5ns ,6ns,10ns, 15.2ns, 33.8ns, or any number less than 1000ns (I said 1MHz clock as an example), with that mess of gates, that D input may even be filled with glitches/noise/invalid states, as long as the correct result is ready by the next rising clock, that Q output as well as all the Qs of all the registers in your design will all snap to their new values at the same time.

So with this V2, if your 'hsync' register drives an IO pin, or if the 'hsync' is used elsewhere in your design, it will be a fast clean signal which is valid within a fraction of the rising positive edge of the 'clk' signal.  The 1 negative here is that the 'hsync' result is now delayed by 1 clock, but in this case, it is easy enough to subtract 1 from the 'HS_STA' and 'HS_END' parameters.  This change will allow the FPGA compiler to construct a much higher maximum operating frequency for your design, especially if the 'hsync' signal is used elsewhere in your design.

Now in 'V3', the goal here is to make the design operate as fast as possible no mater what the values of  'HS_STA' and 'HS_END'.  The change I made was turn the 'hsync' register into an equivilant clocked SR-Flipflop.  The code I wrote was if the 'h_count' = 'HS_STA', to turn on the 'S' in the flipflop.  Looking above, at the equivilant adder/subtraction and the complex web of gates required for the math, compare all that required to perform an '=' test function which would only be 10 XOR gates whose outputs feed a single 10 input NOR gate.  (This is all it takes to create the function: does 'A=B'?)  That A=B function would run sub nanosecond in the FPGA compared to a 10bit number A minus 10bit number B function, originally requiring 2 of them in parallel.  Just the input load of all those gates in the subtract function slows things down, never mind the delay in each gate as well as the extra length in all the routed wiring.  The second half, if 'h_count' = 'HS_END', turns on the R of the SR-Flipflop clearing it to low.  The load in wiring of this on the FPGA silicon is a fraction of the above code, except for 1 caveat:  If 'h_count' never reaches 'HS_STA', then 'hysnc' will never turn on, or if 'hsync' is on and the 'h_count' never reaches 'HS_END', the 'hsync' will never turn off.

For legibility, the code in the graphic image in plain text:
Code: [Select]
Many things omitted for simplicity sake but at the top:

module sync_gen ( clk, hsync, hcount );

parameter   HS_STA = 10  ;  //Turn on hsync signal position
parameter   HS_END = 210 ;  //Turn off hsync signal position

input  clk;
output hsync;
output hcount;

---------------------------------
V1
---------------------------------

reg [9:0]    h_count;
wire         hsync;


assign       hsync   =  ~((h_count >= HS_STA) & (h_count < HS_END));


always @(posedge clk) begin

    h_count  <=  h_count + 1;  // Without a reset limit, the h_count will run 0 to 1023.

end

---------------------------------
V2
---------------------------------

reg [9:0]    h_count;
reg          hsync;


always @(posedge clk) begin

    h_count  <=  h_count + 1;

    hsync    <=  ~((h_count >= HS_STA) & (h_count < HS_END));

end

---------------------------------
V3
---------------------------------

reg [9:0]    h_count;
reg          hsync;


always @(posedge clk) begin

    h_count  <=  h_count + 1;

if (h_count == HS_STA) begin
  hsync <= 1;
end else if (h_count == HS_END) begin
  hsync <= 0;
end

end

---------------------------------
endmodule
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 03, 2019, 06:35:14 pm
Ok, what the blocking statement ( <= ) does and a bit on making optimized fast compact code:  What I'm describing here is a little of whats going on inside the compilers mind, and how it wires gates in the FPGA according to the example verilog code.  Remember, there is no single core math unit in these FPGAs, whatever you write is literally constructed by wiring masses of gates inside the FPGA.  All these wires have lengths, propagation delays, and the gates' inputs have capacitive loads which further slows everything down.  This primer gives you an example of coding technique to be aware of to achieve a minimal gate count FPGA core which is fast as possible and lowest power consuming design...

Wow - thanks BrianHG!  :clap:

If I'm doing it right, the v3 counter implementation in Quartus II is quoting an Fmax of 217.2 MHz for clk (that's including the vga01.v top level as well).

It still amazes me when I have to keep reminding myself that the code I'm writing has a physical consequence in the FPGA - so what appears to me to be the easiest way of programmatically achieving a task may actually be quite costly in terms of hardware.  :o

I haven't had a lot of time to look at my code or the hardware this weekend as I've been working, but I've got the FPGA connected to one of my LCD monitors via VGA with just GND, HSYNC and VSYNC connected at the moment.  I've messed with the clock timing and tried connecting it at 25 MHz, which showed a corresponding incorrect frequency input on the LCD monitor, but if I get anywhere near the right frequency for VSYNC and HSYNC, the screen is just blank for a few seconds then goes into power saving, so I'm guessing the LCD is (correctly) not detecting any RGB input and is going to sleep as a result.  I was hoping to be able to get the menu up and see what screen resolution was reported, but it seems I need to get RGB signals up and running first.  :-BROKE

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 03, 2019, 06:58:22 pm
Sometimes its better to write HDL code from the perspective of circuit design rather than from a perspective of programming.

For example for the given task try to imagine a approximate digital circuit that does this job and then write HDL code that would act like such a circuit. Instead of instruction cycles or operations think of it in terms of logic gates, MUXes, D flip flops, adders, lookup tables...etc Since this is what ends up in the FPGA.

That's a bit odd that the monitor refuses to display a blank signal. But it should be easy to generate some RGB by just outputting the X Y counter values as pixels to get a colorful pattern. Tho take care that you must output black in the blanking periods! Monitors tend to calibrate the black level during blanking and you can get weird colors if you output other things during that (It can be pretty monitor specific).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 03, 2019, 09:55:12 pm
Ok, what the blocking statement ( <= ) does and a bit on making optimized fast compact code:  What I'm describing here is a little of whats going on inside the compilers mind, and how it wires gates in the FPGA according to the example verilog code.  Remember, there is no single core math unit in these FPGAs, whatever you write is literally constructed by wiring masses of gates inside the FPGA.  All these wires have lengths, propagation delays, and the gates' inputs have capacitive loads which further slows everything down.  This primer gives you an example of coding technique to be aware of to achieve a minimal gate count FPGA core which is fast as possible and lowest power consuming design...

Wow - thanks BrianHG!  :clap:

If I'm doing it right, the v3 counter implementation in Quartus II is quoting an Fmax of 217.2 MHz for clk (that's including the vga01.v top level as well).

It still amazes me when I have to keep reminding myself that the code I'm writing has a physical consequence in the FPGA - so what appears to me to be the easiest way of programmatically achieving a task may actually be quite costly in terms of hardware.  :o

Next, now you know what the '<=' function does, basically a bundle of final flipflops latching the outputs of all the synthesized logic gates which are required to perform the boolean algebra on the right of that blocking function, all of them tied to the same dedicated network clock @(posedge...), it's time to take a look at your sync code and what your top hierarchy screenshot within quartus looks like.  (For the top, I still use quartus's block diagram entry for the IO pins and my Veriliog appears as 1 master or multiple wired block diagram modules...)

When your ready, post here your latest syncgen.v when you are ready...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 03, 2019, 10:44:52 pm
Next, now you know what the '<=' function does, basically a bundle of final flipflops latching the outputs of all the synthesized logic gates which are required to perform the boolean algebra on the right of that blocking function, all of them tied to the same dedicated network clock @(posedge...), it's time to take a look at your sync code and what your top hierarchy screenshot within quartus looks like.  (For the top, I still use quartus's block diagram entry for the IO pins and my Veriliog appears as 1 master or multiple wired block diagram modules...)

When your ready, post here your latest syncgen.v when you are ready...

Not a lot has changed as I've not had much time this weekend to do anything, but here's the latest snapshot of my code attached.

it's time to take a look at your sync code and what your top hierarchy screenshot within quartus looks like.  (For the top, I still use quartus's block diagram entry for the IO pins and my Veriliog appears as 1 master or multiple wired block diagram modules...)

Not sure what you mean by 'top hierarchy screenshot'?  Do you mean the pin assignments? Here's a screenshot of my Quartus II workbench, if that helps:

(https://i.ibb.co/h28bfkc/Untitled.png) (https://ibb.co/NWY5SXr)

I'm not convinced that's what you're after, though.  I haven't used Quartus' block diagram tool to design the hierarchy, I just created the first file (vga01.v) and went from there, creating sync_generator.v and referring to it in vga01.v.  Sorry if I'm being vague or missing an obvious request, it's been a loooong day.  :o
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 03, 2019, 10:56:52 pm
I wonder if the polarity of HSYNC and/or VSYNC are wrong for the chosen resolution.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: hamster_nz on November 04, 2019, 12:45:24 am
I ran up a simulation....

You need to be sure if LINE and SCANLINES are terminal counts, or are the total number. Currently your h_counter goes from  0 to 800 (801 counts), and v_counter goes from 0 to 525 (526 lines).

General suggestion (take them or leave them):

- You need to be clear about what is internal state, and what is exposed. Currently the x & y outputs feel bad.

- Your reset should reset all state (even if it is only for simulation).

- I am not sure why you have "posedge reset" in your always. I don't think it is needed.

Here's what I would do:

- Reset all outputs and state in the reset clause

- keep h_counter and v_counter for internal use only, and change the x & y outputs to be reg not wires.

- Keep LINES and SCANLINES as a total count, so compare to LINES-1 or SCANLINES-1 as the test for resetting the counters/

- Likewise keep HS_STA & HS_END the count where it starts, so remove the "-1" from the localparam and put it in the test.

- Try to avoid ternary operators - usage tends to be very idiomatic, so are a common source of bugs

- Unless you are careful, there is going to be ambiguity about when vsync pulse starts and ends. I suggest you testboth h_count and v_count to decide when to assert vsync. So if you want it to be aligned with the start of the visible pixels use "if (h_count == LINES-1 && v_count == VS_STA-1)" and ""if (h_count == LINES-1 && v_count == VS_END-1)" as the tests when to set the vsync outputs.

This might not give you the fastest design, nor the most compact code, but it will help you avoid lots of subtle bugs.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 04, 2019, 03:32:30 am
I ran up a simulation....

You need to be sure if LINE and SCANLINES are terminal counts, or are the total number. Currently your h_counter goes from  0 to 800 (801 counts), and v_counter goes from 0 to 525 (526 lines).

General suggestion (take them or leave them):

- You need to be clear about what is internal state, and what is exposed. Currently the x & y outputs feel bad.

- Your reset should reset all state (even if it is only for simulation).

- I am not sure why you have "posedge reset" in your always. I don't think it is needed.

Here's what I would do:

- Reset all outputs and state in the reset clause

- keep h_counter and v_counter for internal use only, and change the x & y outputs to be reg not wires.

- Keep LINES and SCANLINES as a total count, so compare to LINES-1 or SCANLINES-1 as the test for resetting the counters/

- Likewise keep HS_STA & HS_END the count where it starts, so remove the "-1" from the localparam and put it in the test.

- Try to avoid ternary operators - usage tends to be very idiomatic, so are a common source of bugs

- Unless you are careful, there is going to be ambiguity about when vsync pulse starts and ends. I suggest you testboth h_count and v_count to decide when to assert vsync. So if you want it to be aligned with the start of the visible pixels use "if (h_count == LINES-1 && v_count == VS_STA-1)" and ""if (h_count == LINES-1 && v_count == VS_END-1)" as the tests when to set the vsync outputs.

This might not give you the fastest design, nor the most compact code, but it will help you avoid lots of subtle bugs.

Note that your code is just as compact and fast as the original because the LINES, VS_STA, VS_END, HS_STA, HS_END,... are all parameters, or constants, so, the compiler works out the -1 ahead of time.  No sweat at all, however, if the OP changes these constants to loadable registers which he can be written to by the Z80 CPU, allowing software defined new video modes, then those '-1' will come into play and add gates to the design.

Quote
- You need to be clear about what is internal state, and what is exposed. Currently the x & y outputs feel bad.

I feel the same, I believe the sync generator should be 100% positive logic.  Meaning positive hsync / vsync / an active video region output which crops the active visual area (a requirement for most DVI/HDMI serializers), and additional optional signals to trigger video timing specific events which may be fixed to the master h_count and v_count.  Any visual display address / graphics /memory pointers, counters and sub-alignment positioning in the 'active video region' should be in the separate graphics image generator.  Remember, a second new X/Y coordinate counter is only another 20 register flipflops max.

I'm waiting a day, however, I'll make a few suggestions to the ops sync generator which will solve those -1's without a -1 in the code, and solve that 1 clock delay with regards to the other variables, including the picture_enable_h and picture_enable_v signal (together = picture_enable).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 04, 2019, 03:48:52 am
@nockieboy, some questions:

1. Do you have an oscilloscope?
2. Which cyclone II board do you have?
3. Which version of quartus are you using?
4. Are you willing to use Quartus graphics design entry?  (Not for coding, your verilog coding will stay the same, but, just to wire your verilog modules to the IO pins and potentially a few xxxx.v modules together.  This will allow you to easily drop in or take out modules or remove them on the fly and see how your graphics pipe is affected as well as the CPU interface.  It will allow you to easily invert IO pins, or, enable a global PLL.  You can always make a now top hierarchy later-on, or, even tell quartus automatically generate you a top 'MyProject.v' verilog code of your illustrated schematic.  Quartus has a feature to automatically generate a design block of each of your 'xxxx.v' source codes which you may place within your schematic layout, including multiple instances. )
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 04, 2019, 06:45:25 am
Not sure what you mean by 'top hierarchy screenshot'?  Do you mean the pin assignments? Here's a screenshot of my Quartus II workbench, if that helps:

(https://i.ibb.co/h28bfkc/Untitled.png)

I'm not convinced that's what you're after, though.  I haven't used Quartus' block diagram tool to design the hierarchy, I just created the first file (vga01.v) and went from there, creating sync_generator.v and referring to it in vga01.v.  Sorry if I'm being vague or missing an obvious request, it's been a loooong day.  :o

I do also like to use the graphical schematic entry for the top module since its a bit easier to see where things are going. The user interface on that schematic editor is absolutely terrible (Its directly from the 90s) but that seams to be the case for all the vendors.

By the way there is also a window called the "RTL view" that turns the synthesized HDL back into a schematic:
(https://surf-vhdl.com/wp/wp-content/uploads/2015/11/fir_filter_4_RTL-view.jpg)

This view is a very useful sanity check to see if the compiler understood your code in the way you intended. You can see individual flip flops, adders, muxes etc... and how it wired them up. Its the best place to see what sort of optimization shortcuts the compiler has taken (It loves to reuse blocks/signals, simplify down logic, remove unnecessary logic etc...). Its pretty messy since its auto generated but clear enough to be useful with a bit of highlighting by clicking nets and blocks.

This is however still NOT how it is actually wired inside the FPGA, but simply input to the mapper that crams the functionality in actual FPGA elements. You can get a view of how the FPGA elements are wired up by opening the "Architecture view" to get a similar looking autogenerated schematic but this time with real hardware blocks in it and how they are configured. This makes it a lot more difficult to read so this view is not as useful. An even more bare metal view is the "Chip planner view". It shows you a map of the FPGAs silicon die with all of the elements inside it, allowing you to peek into each one and see how its wired up and configured. It gives you a nice view of whats is going on physically in there but is near impossible to follow signals around. It is however very useful in checking that certain blocks got configured as you expect, most useful for IO blocks in cases where you want to use special IO pin functionality like DDR, reclocking, built in serdes, delay tuning etc...

(http://we.easyelectronics.ru/uploads/images/00/00/60/2011/04/16/045570.png)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 04, 2019, 10:16:33 am
Brief update - have wired up the MSB RGB outputs to the VGA connector through 270R resistors to try and get some kind of output (haven't bothered with a resistor ladder to merge the LSB outputs as they're not used yet anyway).  Here's the result:

(https://i.ibb.co/gmQG0Yg/20191104-092002.gif) (https://ibb.co/VJbdF6B)

Clearly an issue with sync pulses or timing somewhere...  The image is a GIF, by the way, so you can see the issue if you click on the image.

I wonder if the polarity of HSYNC and/or VSYNC are wrong for the chosen resolution.

Fair point - I've tried the output with reversed polarity on the hsync and vsync outputs and the monitor just goes to sleep, so it would appear the polarity is correct as I do get some sort of output currently...?

You need to be sure if LINE and SCANLINES are terminal counts, or are the total number. Currently your h_counter goes from  0 to 800 (801 counts), and v_counter goes from 0 to 525 (526 lines).

Good point, I hadn't noticed that.  This is where I start to make silly mistakes - basic maths.  ::)

- Reset all outputs and state in the reset clause

- keep h_counter and v_counter for internal use only, and change the x & y outputs to be reg not wires.

- Keep LINES and SCANLINES as a total count, so compare to LINES-1 or SCANLINES-1 as the test for resetting the counters/

- Likewise keep HS_STA & HS_END the count where it starts, so remove the "-1" from the localparam and put it in the test.

- Try to avoid ternary operators - usage tends to be very idiomatic, so are a common source of bugs

- Unless you are careful, there is going to be ambiguity about when vsync pulse starts and ends. I suggest you testboth h_count and v_count to decide when to assert vsync. So if you want it to be aligned with the start of the visible pixels use "if (h_count == LINES-1 && v_count == VS_STA-1)" and ""if (h_count == LINES-1 && v_count == VS_END-1)" as the tests when to set the vsync outputs.

This might not give you the fastest design, nor the most compact code, but it will help you avoid lots of subtle bugs.

Okay, I've tweaked the reset clause to reset hsync and vsync as well - is there anything else I should reset in there?  I'll leave LINE and SCANLINES as they are with the -1 as I'll likely be changing them in the future to regs so they can be changed by the host CPU.

As for the ternary operators comment - so it would be preferable to use full if-else conditionals instead?  Is that purely because of the possibility of user error in writing or interpreting the ternary operators or because of discrepancies in how Quartus II implements them?


Quote
- You need to be clear about what is internal state, and what is exposed. Currently the x & y outputs feel bad.

I feel the same, I believe the sync generator should be 100% positive logic.  Meaning positive hsync / vsync / an active video region output which crops the active visual area (a requirement for most DVI/HDMI serializers), and additional optional signals to trigger video timing specific events which may be fixed to the master h_count and v_count.  Any visual display address / graphics /memory pointers, counters and sub-alignment positioning in the 'active video region' should be in the separate graphics image generator.  Remember, a second new X/Y coordinate counter is only another 20 register flipflops max.

I popped the X & Y counters in the sync generator because it seemed like the easiest way to get some sort of graphical output at the time.  Also, that's how the demo code did it that I borrowed from. ;)
 Naturally, I'm literally starting out with Quartus, Verilog and FPGAs in general, so am happy to accept feedback and suggestions on improvements. 

I'm waiting a day, however, I'll make a few suggestions to the ops sync generator which will solve those -1's without a -1 in the code, and solve that 1 clock delay with regards to the other variables, including the picture_enable_h and picture_enable_v signal (together = picture_enable).

 ;D

@nockieboy, some questions:

1. Do you have an oscilloscope?

Yes.  It's barely up to anything (and my skills using it are also pretty poor - remember, I'm not an electronics engineer and this isn't my day job!) but it's a 20 MHz Hitachi V-212.  I got it cheap off eBay a year or two back when I was breadboard-prototyping my Microcom and didn't know that a logic analyser would have been far more useful.   ^-^

2. Which cyclone II board do you have?

One of those cheap ones off eBay.  Here's an example (https://www.ebay.co.uk/itm/ALTERA-FPGA-Cyslonell-EP2C5T144-Minimum-System-Learning-Development-Board/401255830236).

3. Which version of quartus are you using?

Quartus II 64-Bit Version 13.0.1 Build 232 06/12/2013 SJ Web Edition.  It was indicated to be the last version that supports the Cyclone II.

4. Are you willing to use Quartus graphics design entry?

I'm certainly happy to give it a go.  I'm fumbling around with Quartus II, trying to learn the ropes as I go, so I'm more than happy to find out about features like this that could help.  :-+

(Not for coding, your verilog coding will stay the same, but, just to wire your verilog modules to the IO pins and potentially a few xxxx.v modules together.  This will allow you to easily drop in or take out modules or remove them on the fly and see how your graphics pipe is affected as well as the CPU interface.  It will allow you to easily invert IO pins, or, enable a global PLL.  You can always make a now top hierarchy later-on, or, even tell quartus automatically generate you a top 'MyProject.v' verilog code of your illustrated schematic.  Quartus has a feature to automatically generate a design block of each of your 'xxxx.v' source codes which you may place within your schematic layout, including multiple instances. )

Sounds very useful - I'll see what I can find out about it!  ;D

I have included my full project folder as a .zip archive for info if anyone is playing along at home.  I'm not convinced I fully understand the clock setup and timing / TimeQuest Timing Analysis part of the project very well at all.  :-\
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 04, 2019, 10:46:57 am
@nockieboy, some questions:

1. Do you have an oscilloscope?
Yes.  It's barely up to anything (and my skills using it are also pretty poor - remember, I'm not an electronics engineer and this isn't my day job!) but it's a 20 MHz Hitachi V-212.  I got it cheap off eBay a year or two back when I was breadboard-prototyping my Microcom and didn't know that a logic analyser would have been far more useful.   ^-^

Good enough.  Check the hsync and vsync being sent to your monitor.  Make sure the pulse is clean and at least 4v p-p.  Preferably 5v.  If you are driving your monitor with a 3.3v-3.6v hsync, it may do some funny things...  This is random based on the monitor's internal circuitry.  If the sync it too low, IE you are driving the monitor HS/VS with an FPGA IO directly, there are 2 thing you will need to try.
1> buffer the signal through a 74HC04 inverter, or 74HC245 buffer powered with 5v.  I like to drive the HS and VS into 3 gates each in parallel, and tie each of the 3 outputs in parallel to get a good current output to drive the monitor cable.
2> if the IO on the FPGA is a true 5v output, but the monitor cable load is weakening it too much, a 220Ohm to 330Ohm pullup resistor to 5v may work.

(Option #1 is better, it also separates the FPGA IO from direct connection to a wire on a VGA plug which may blow the FPGA or cause internal malfunction.)


If you use 2 channels, placing 1 probe on hsync and the second on a R/G/B pin, you should see if you get a regular pattern or not.

Also, sync polarity doesn't matter any monitor which can take in VGA with modes as high at 1280x720 or above.  If your mode is not standard with wrong polarity, the monitor should potentially only have bad picture horizontal/vertical size and position unless the sync timing is not associated with the picture.

 
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 04, 2019, 11:20:55 am
Check the hsync and vsync being sent to your monitor.  Make sure the pulse is clean and at least 4v p-p.  Preferably 5v.  If you are driving your monitor with a 3.3v-3.6v hsync, it may do some funny things...  This is random based on the monitor's internal circuitry.

Aha - that'll be the problem then, I'm driving HSYNC and VSYNC directly from the FPGA pins.  Only R, G and B are going via 220R resistors...  :-BROKE

My scope is showing nice clean signals on HSYNC and VSYNC, but only 3.3v p-p, which is to be expected - so the monitor is picking up some information (i.e. the R, G and B signals) but not necessarily the proper sync signals.  It shows a 640x480x60 screen resolution in the menu, but as you can see from the gif I posted previously, the sync is out.

If the sync it too low, IE you are driving the monitor HS/VS with an FPGA IO directly, there are 2 thing you will need to try.
1> buffer the signal through a 74HC04 inverter, or 74HC245 buffer powered with 5v.  I like to drive the HS and VS into 3 gates each in parallel, and tie each of the 3 outputs in parallel to get a good current output to drive the monitor cable.

I don't have too many HC parts lying around, they're all HCT, but I'll see what I can do.

If you use 2 channels, placing 1 probe on hsync and the second on a R/G/B pin, you should see if you get a regular pattern or not.

There's definitely a regular pattern - can see that on the monitor, it's just skewed and rolling because of the sync problems.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 04, 2019, 11:23:57 am
That picture does look like it might have some sort of HSYNC issue.

If electronics are not your main thing then you are actually doing pretty well since FPGAs are pretty far from regular programming. An oscilloscope is one of the most useful tools for debugging electronics, no other tool gives the same deal of universal insight into what circuits are doing. You do usually want to have a digital scope for digital circuits so that you can one shot capture communication buses and decode them. But video luckily involves stable repeating patterns so a analog scope can still be quite useful when triggered from vsync or hsync.

But there is a trick with FPGAs. You can build a logic analyzer in the same chip as the cirucit. Quartus has a tool for doing this called SignalTap:
(https://raw.githubusercontent.com/wiki/mist-devel/mist-board/signaltap_2.jpg)

You give it a list of signals you want to monitor and how much memory to use and it generates a logic analyzer for that, taps off those signals and adds it to the design. Next time you compile your design this logic analyzer will be included in the final output and once loaded into the FPGA listens on JTAG. Then you can just click connect in SignalTap and it will allow you to interact with this logic analyzer to start captures, configure triggers etc... In some cases this is more useful than an actual logic analyzer because you have essentially 100s of logic channels available, allowing you to monitor entire counters like a bus. But because its using internal block ram means that you typically won't be making captures longer than a few thousands of samples.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 04, 2019, 11:55:35 am
Well, that sorted the problem - thanks BrianHG!  :-+

Clearly my monitor wasn't happy with the 3.3v sync signals - I found a 74AHC541 in my box of bits and am now running the vertical and horizontal sync signals through one of these line drivers powered at 5v and am getting a nice stable output:

(https://i.ibb.co/BfFX87L/20191104-114222.jpg) (https://ibb.co/Zf7nywd)

It's surprising how pleased I am at getting four overlapping squares to display on a monitor!!! Haha!  :-DD

If electronics are not your main thing then you are actually doing pretty well since FPGAs are pretty far from regular programming.

Programming isn't my main thing, either, although I'm much better at it than electronics!  Never let it be said that I don't enjoy a challenge... :o

An oscilloscope is one of the most useful tools for debugging electronics, no other tool gives the same deal of universal insight into what circuits are doing. You do usually want to have a digital scope for digital circuits so that you can one shot capture communication buses and decode them. But video luckily involves stable repeating patterns so a analog scope can still be quite useful when triggered from vsync or hsync.

And the frequency isn't so high either - I had been using the scope originally to debug my breadboard computer running at 4 MHz, these signals are in the KHz ranges which is much more comfortable ground for my old 'scope.  ;)

But there is a trick with FPGAs. You can build a logic analyzer in the same chip as the cirucit. Quartus has a tool for doing this called SignalTap:...
You give it a list of signals you want to monitor and how much memory to use and it generates a logic analyzer for that, taps off those signals and adds it to the design.

Wow!  That's really useful - specially if you have a decent amount of RAM in the FPGA.  :-+

Work calls again, but when I get some more time to work on this I'll set up a resistor ladder to merge the two bits for R, G and B on the breadboard so I can try out the full colour palette.  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 04, 2019, 12:27:28 pm
Well, that sorted the problem - thanks BrianHG!  :-+

Clearly my monitor wasn't happy with the 3.3v sync signals - I found a 74AHC541 in my box of bits and am now running the vertical and horizontal sync signals through one of these line drivers powered at 5v and am getting a nice stable output:

(https://i.ibb.co/BfFX87L/20191104-114222.jpg) (https://ibb.co/Zf7nywd)

It's surprising how pleased I am at getting four overlapping squares to display on a monitor!!! Haha!  :-DD
;)   :clap:  HCT is actually better for converting the FPGA 3.3v to 5v since it's inputs have a trigger threshold at TTL levels instead of mid 5v, or 2.5v.  Any will work, like 74HCT, 74ACT, and other variants.  (You would have never figured this one out without me.  This one nearly destroyed a huge demo I had 10 years ago since my home test monitors were studio grade monitors whose sync inputs would work perfectly with a 0.7vp-p sync...  The cheap PC screens didn't like that 3.3v.)

Take a day to learn how to make a top hierarchy block diagram in quartus.  Make your now functioning project's 2 xxx.v files into 2 separate block and wire them on a block diagram.  (Save your current project and start a new one...)

***Remember when defining the inputs and outputs.  Output bus pins in quartus look like ' Red[7..0] ', not 'Red[7:0] ' like you see in verilog.  It's the only BIG gripe I have with Quartus as its a drawback from their pre-verilog days of quartus 1 ...  Once you are ready, I'll show you a few tricks which will make all your outputs parallel without delays, even as you add modules in-between, like my OSD character generator.  I also recommend going on youtube and searching beginners guide to quartus block diagram entry, or similar to get you going.  Include generating a sheet symbol for verilog source files/code in quartus...

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 08:23:50 am
For info, Grant Searle has put his site back up at searle.wales (http://searle.wales).

It seems his ISP took the site down for breach of terms and conditions due to the high traffic it was generating.  :-DD

I'm trying to get my head around hierarchy block diagrams, but it's really slow going.  Initially I was concerned that Quartus II 3.0 sp1 didn't support that feature, but it just seems it's deeply buried somewhere.  Then I got sidetracked looking at character generators (the next big step for my design) and today, hopefully, I'll have some time to work on block diagrams again!!!  :o
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 06, 2019, 08:47:29 am
You should be thinking of making your own character generator, but first, you need to clean up your sync generator.
In Quartus, making a 'New' block diagram file is a key feature, not a hidden one.
Setting that .bdf (block diagram file) as the top hierarchy is also standard.

Double clicking on the the middle of the file to add pins and generic logic, or even library megafunctions is also standard.

What might be a little hidden is opening your .v file in the directory, then 'Generating sheet symbol' (I think it's called this, it's been a decade...)L:
File>Create/Update>Create symbol files for current file.

The verilog must have no errors for this to work.

Once generated, in your .bdf,  double click in a blank area to add a new symbol/component.  Click on user generated symbols and place the symbol on the sheet.

You should see a symbol labeled with you verilog name with all the input and output pins and buses.

In the furture, if you want to edit your verilog code for that symbol in your block diagram, you may just double click on it's symbol in your block diagram.  If you add inputs or outputs and you want them to appear on that symbol in you block diagramm, you will need to re: ' File>Create/Update>Create symbol files for current file. ' to update the symbol with the new IO pins.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 10:52:59 am
You should be thinking of making your own character generator, but first, you need to clean up your sync generator.

Yes, must try not to get ahead of myself.  :-\

In Quartus, making a 'New' block diagram file is a key feature, not a hidden one.
Setting that .bdf (block diagram file) as the top hierarchy is also standard.

Sure is. Just wasn't obvious to a completely new beginner like me yesterday who only knew about creating .v files!  :scared:

I found a really handy tutorial (https://www.flc.losrios.edu/docs/FLC-Documents/Instruction/Engineering/lab/Lab1%20-%20Quartus%20Tutorials.pdf) (actually a university/college worksheet) with step-by-step instructions on how to create the two types of project (one based on code, the other on a diagram), create blocks for them both and use them both in a top-level project as diagram symbols.  I also learned about creating simulations, too!  ;D

This is as far as I've gotten:

(https://i.ibb.co/N1xghPh/Untitled.png) (https://ibb.co/rbkKPSP)

Not great - I've inputs and outputs on sync_generator that don't connect to anything on vga01, which is annoying.  Will go look at that.

Code for vga01.v and sync_generator.v attached for info, as I've changed vga01.v a bit to show colour bars instead of squares now.

EDIT:

Updated diagram:

(https://i.ibb.co/1b41qGr/Untitled.png) (https://ibb.co/ZNb58zB)

EDIT 2:

From looking at this second diagram, it seems I should move the pix_en generation from vga01 to sync_generator, as it would make more sense for it to be there unless I'm likely to need it in vga01 at any point in the future there?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 01:21:25 pm
Updated diagram:

(https://i.ibb.co/mCy8TRz/Untitled.png) (https://ibb.co/XZ58LWJ)

This is giving me errors, though, when I try to compile:

Code: [Select]
Info (12128): Elaborating entity "vga01" for hierarchy "vga01:inst"
Error (12014): Net "vga01:inst|x[8]", which fans out to "vga01:inst|vga_r", cannot be assigned more than one value
Error (12015): Net is fed by "vga01:inst|sync_generator:display|x[8]"
Error (12015): Net is fed by "vga01:inst|x[8]"
Error (12014): Net "vga01:inst|x[7]", which fans out to "vga01:inst|vga_g", cannot be assigned more than one value
Error (12015): Net is fed by "vga01:inst|sync_generator:display|x[7]"
Error (12015): Net is fed by "vga01:inst|x[7]"
Error (12014): Net "vga01:inst|x[6]", which fans out to "vga01:inst|vga_b", cannot be assigned more than one value
Error (12015): Net is fed by "vga01:inst|sync_generator:display|x[6]"
Error (12015): Net is fed by "vga01:inst|x[6]"
Error (12014): Net "vga01:inst|inDisplay", which fans out to "vga01:inst|vga_r", cannot be assigned more than one value
Error (12015): Net is fed by "vga01:inst|sync_generator:display|inDisplayArea"
Error (12015): Net is fed by "vga01:inst|inDisplay"

Trying to look these up, but it's not obvious (to me at least) what the problem is.  I'm obviously doing something very stupid...  :-//
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 06, 2019, 01:28:39 pm
That error to me looks like there are multiple outputs trying to drive 3 bits of net" x" and "inDisplay". You might have an output or input back to front.

Oh and by the way your inDisplay signal is probably the same as a signal that RGB buses tend to call the DE signal (Display Enable). This is a signal that is high whenever pixels inside the viewing area are being sent. Some LCD panels use that instead of Hsync and Vsync
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 06, 2019, 01:41:34 pm
Your RGB 2 wire buses are driving output pins are single wires each.  Remove the doubled pins and rename the output pins R[0], G[0], B[0] to:

R[1..0]
G[1..0]
B[1..0]

Quartus will understand that those outputs are a collective of pins grouped as a bus with 2 wires going to 2 pins.

This come in handy when you have an output group, like in my design, I have a 256bit ram data bus.  Could you imagine my block diagram with 256 individual BIDIR IOs instead of one single one labeled RDQ[255..0]...

Also, you do not need to label the buss wires as they will take their name from the output pins automatically.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 02:39:23 pm
That error to me looks like there are multiple outputs trying to drive 3 bits of net" x" and "inDisplay". You might have an output or input back to front.

Okay, it compiles now - I just had to remove this block from vga01.v:

Code: [Select]
// create a 640x480 synch and pixel
   sync_generator #(640,480) display(
      .clk(clk),
      .reset(rst),
      .x(x),
      .y(y),
      .DE(DISP_EN)
   );

I forgot to remove that block, so the HDL was trying to set up two connections to vga01 - one in code, the other in the diagram?

Quartus will understand that those outputs are a collective of pins grouped as a bus with 2 wires going to 2 pins.

This come in handy when you have an output group, like in my design, I have a 256bit ram data bus.  Could you imagine my block diagram with 256 individual BIDIR IOs instead of one single one labeled RDQ[255..0]...

 :o Yikes.

Also, you do not need to label the buss wires as they will take their name from the output pins automatically.

Ah okay, thanks - the info I found online (from Intel, no less) stated in one of the steps to label the bus, so I did. Happy to hear about shortcuts, though. ;)

Talking of which, I'm really liking this diagram method of connecting blocks up - seems much less stressful than the code method!  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 03:51:25 pm
 |O  Okay, so it compiles but I'm getting no image - the screen is going into standby. 

I've attached the files for vga01 and sync_generator with all the latest modifications I've made - they all compile fine into symbols and as part of the main project, but I'm getting nothing out at the monitor.  Below is a pic of the top-level diagram:

(https://i.ibb.co/fQBpvm6/Untitled.png) (https://ibb.co/vDgPY2y)

I've even tried swapping HSYNC/VSYNC around in case I'd got the pin assignments wrong, but to no avail - I'm not sure what's going wrong to be honest.  Have attached project files in case anyone wants to take a look.

EDIT:

For info, the pin assignments are identical to the project VGA01 - which I've uploaded to the FPGA and tested and that works fine.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 06, 2019, 05:17:41 pm
Instead of looking trough your code for the problem im just going to give you some tips on debugging it.

First choice of action is a oscilloscope. Look at what the pins are doing because this is quick and easy to do, for tracking down the basic mistakes like wonky or missing clocks, swapped signals etc...

Then if it still does not lead you to any findings have a look at that "RTL View" and see what the compiler has compiled. Since you have basicaly put together the same thing in a diferent way means it should also compile into circuitry that is similar. Try to make sense of the connections in there and if there is something weird that can point you into where exactly the issue might be.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 06, 2019, 06:31:59 pm
 :palm: :palm: :palm:

Fixed it.  I'd linked the reset line to both blocks, but had only inverted the signal in one of them - forgetting to invert the signal to the other, so one was held in permanent reset.  |O

All working again now.  ;D

Previous image of diagram is still current, more or less, except for the inversions on the reset inputs for each block (I've removed the code inverting reset in sync_generator).  :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 07, 2019, 09:18:51 am
Here we go, I cleaned up your sync generator.  You had a few bugs.  Now, I don't have Quartus installed anymore and I just typed it in a text editor, so please bear with possible errors.  Once you understand my changes, you may remove anything you don't like:

Code: [Select]
// (default 640x480) 60Hz VGA Driver
// but can take parameters when initialised
// to output sync signals for any screen
// resolution
//
//                          |- LINE
//        __________________________ ____|
//       |                          |    |
//       |                          |    |
//       |               |    |
//       |               |    |
//       |      DISPLAY AREA        |    |
//       |                          |    |
//       |                          |    |
//       |                          |    |
//       --------------------------------- V_RES
//       |                          |
//       |                          |
//       ---------------------------- SCANLINES

module sync_generator(
// inputs
input wire pclk, // base pixel clock
input wire reset, // reset: restarts frame
// outputs
output reg pc_ena    // ***** New pixel clock enable ...  For now, you will use pc_ena[0] which divides the clock by 2.
output reg hde, // Horizontal Display Enable - high when in display area (valid drawing area)
output reg vde, // Vertical Display Enable - high when in display area (valid drawing area)
output reg hsync, // horizontal sync
output reg vsync, // vertical sync
output reg [9:0] h_count, // current pixel x position
output reg [9:0] v_count // current line y position
);

// default resolution if no parameters are passed
parameter H_RES = 640; // horizontal display resolution
parameter V_RES = 480; // vertical display resolution

// no-draw area definitions
// ***** switched to parameters so you can edit these on quartus' block diagram editor.
parameter H_FRONT_PORCH = 16;
parameter HSYNC_WIDTH   = 96;
parameter H_BACK_PORCH  = 48;
parameter V_FRONT_PORCH = 10;
parameter VSYNC_HEIGHT = 2;
parameter V_BACK_PORCH  = 33;

// total screen resolution
localparam LINE = H_RES + H_FRONT_PORCH + HSYNC_WIDTH + H_BACK_PORCH; // complete line (inc. horizontal blanking area)
localparam SCANLINES = V_RES + V_FRONT_PORCH + VSYNC_HEIGHT + V_BACK_PORCH; // total scan lines (inc. vertical blanking area)

// useful trigger points
localparam HS_STA = H_RES + H_FRONT_PORCH - 1; // horizontal sync ON (the minus 1 is because hsync is a REG, and thus one clock behind)
localparam HS_END = H_RES + H_FRONT_PORCH + HSYNC_WIDTH - 1;// horizontal sync OFF (the minus 1 is because hsync is a REG, and thus one clock behind)
localparam VS_STA = V_RES + V_FRONT_PORCH; // vertical sync ON
localparam VS_END = V_RES + V_FRONT_PORCH + VSYNC_HEIGHT; // vertical sync OFF

/*  moved above
reg [9:0] h_count; // line position
reg [9:0] v_count; // SCANLINES position
reg [9:0] z_count; // Frame counter for CPU animation
*/

// keep x and y bound within the display area
/* obsolete
assign x = (h_count < H_RES) ? h_count : (H_RES - 1'b1); // x = H_RES-1 if current pixel is after display area
assign y = (v_count < V_RES) ? v_count : (V_RES - 1'b1); // y = VS_END-1 if v_count is outside display area
*/
   // generate a 25 MHz pixel strobe
   //reg [15:0] cnt;
   //reg pix_en;

   always @(posedge clk)
pc_ena <= pc_ena + 1'b1;  // This is tempoary

// handle signal generation
always @(posedge clk)                    // I removed ', posedge reset)', since it adds a gate to the CLK input of all the registers in your design slowing down that trigger edge
begin
if (reset) // reset to start of frame
begin
h_count <= 1'b0;
v_count <= 1'b0;
hsync   <= 1'b0;
vsync   <= 1'b0;
vde     <= 1'b0;
hde     <= 1'b0;
end
else
begin
if (pc_ena) // once per pixel
begin

if (h_count == H_RES - 1)
begin
HDE <= 1'b0;
end


// check for generation of HSYNC pulse
if (h_count == HS_STA)
begin
hsync <= 1'b1; // turn on HSYNC pulse
end
else if (h_count == HS_END)
hsync <= 1'b0; // turn off HSYNC pulse

// check for generation of VSYNC pulse
if (v_count == VS_STA)
begin
vsync <= 1'b1; // turn on VSYNC pulse
end
else if (v_count == VS_END)
vsync <= 1'b0; // turn off VSYNC pulse

// reset h_count & increment v_count at end of scanline
if (h_count == LINE - 1) // end of line
begin
h_count <= 1'b0;
HDE     <= 1'b1;  // Turn on horizontal video data enable

if (v_count == SCANLINES - 1) // ****** Now that it's time to increment the H count, this is when you would check if the V-count should be cleared.  End of SCANLINES
begin
v_count <= 1'b0;
VDE     <= 1'b1;   // Turn on vertical video data enable
end
else
begin                                             // ****** If the v_count isn't being cleared, you go ahead and add 1 to the v_count
v_count <= v_count + 1'b1;           // increment v_count to next scanline
if (v_count == V_RES - 1)   VDE <= 1'b0 ; // Turn off  vertical video data enable
end
end
else
h_count <= h_count + 1'b1;           // otherwise, just increment horizontal counter
if (h_count == H_RES - 1)  HDE <= 1'b0 ;  // Turn off  vertical video data enable

/*  ****** BUG *****  Moved into the correct place at the the end of a horizontal line, where you increment the v_count increment, or clear it. see 16 lines above
// reset v_count and blanking at bottom of screen
if (v_count == SCANLINES - 1) // end of SCANLINES
begin
v_count <= 1'b0;
end
*/  ***** End of bug ******

end
end
end

/******* Error ****** this must synchronous with you video clock, however, I recognize this was probably just a last minute patch.
always @(posedge clk)
begin
DE <= (h_count < H_RES) && (v_count < V_RES);
end
/********

endmodule


Now, here is a new module to add to your project.  It mutes the RGB data values when the raster is outside the active display area.  (Once again, I just typed it in notepad.)  It belongs just before your output pins.  Again, read comments inside it.

Code: [Select]
//  This module will force mute mute the RGB video output data outside the active video display area
//  This module will also generate the vid_de_out use by many DVI transmiters
//  This module, as an example, also has all the inputs and outputs used along the pixel pipe
//  it illustrates since there is a pixel delay in the video switch, the syncs and video enables are also delayed
//  making the output picture window perfectly parallel with the vidoe coming in, then being fed out.

module vid_out_stencil(
input wire pclk,
input wire reset,
input wire pc_ena,      // Pixel clock enable
input wire hde_in, // Horizontal Display Enable - high when in display area (valid drawing area)
input wire vde_in, // Vertical Display Enable - high when in display area (valid drawing area)
input wire hs_in, // horizontal sync
input wire vs_in, // vertical sync

input wire [RGB_hbit:0] r_in,
input wire [RGB_hbit:0] g_in,
input wire [RGB_hbit:0] b_in,

output reg hde_out,
output reg vde_out,
output reg hs_out,
output reg vs_out,

output reg [RGB_hbit:0] r_out,
output reg [RGB_hbit:0] g_out,
output reg [RGB_hbit:0] b_out,

output reg vid_de_out      // Actual H&V data enable required by some DVI encoders/serializers
);

parameter RGB_hbit    = 1;  // 1 will make the RGB ports go from 1 to 0, eg [1:0].  I know others prefer a '2' here for 2 bits
parameter HS_invert   = 0;  // use a 1 to invert the HS output, the invert feature is only for this video output module
parameter VS_invert   = 0;  // use a 1 to invert the VS output, the invert feature is only for this video output module


always @(posedge clk)
begin
if (reset) // global reset
begin

// not in use for this module

end
else
begin
if (pc_ena) // once per pixel
begin

hde_out <= hde_in;             // since the this video muting switch algorythm delays the output by 1 pixel clock,
vde_out <= vde_in;             // all the video timing reference signals will also get the 1 pixel delay treatment to keep the output aligned perfectly.
hs_out  <= hs_in ^ HS_invert ; // the invert feature is only for this video output module
vs_out  <= vs_in ^ VS_invert ; // the invert feature is only for this video output module

if ( hde_in && vde_in )
begin
de_out <= 1'b1 ; // turn on video enable for DVI transmitters
r_out  <= r_in ; // copy video input to output
g_out  <= g_in ; // copy video input to output
b_out  <= b_in ; // copy video input to output
end
else
begin
de_out <= 1'b0 ; // turn off video enable for DVI transmitters
r_out  <= 0    ; // Mute video output to black
g_out  <= 0    ; // Mute video output to black
b_out  <= 0    ; // Mute video output to black
end

end
end
end
endmodule


Now, the next thing you need to do is use the IOs from this example, and make a new picture pattern generator which does not use the sync generators H&V counters, but, generates it's own internally using this example IO port setup:

Code: [Select]
module vid_pattern_generator(
input wire pclk,
input wire reset,
input wire pc_ena,      // Pixel clock enable
input wire hde_in, // Horizontal Display Enable - high when in display area (valid drawing area)
input wire vde_in, // Vertical Display Enable - high when in display area (valid drawing area)
input wire hs_in, // horizontal sync
input wire vs_in, // vertical sync


output reg hde_out,
output reg vde_out,
output reg hs_out,
output reg vs_out,

output reg [RGB_hbit:0] r_out,
output reg [RGB_hbit:0] g_out,
output reg [RGB_hbit:0] b_out

);

parameter RGB_hbit    = 1;  // 1 will make the RGB ports go from 1 to 0, eg [1:0].  I know others prefer a '2' here for 2 bits


always @(posedge clk)
begin
if (reset) // global reset
begin



end
else
begin
if (pc_ena) // once per pixel
begin

// ***************  insert generator code here
// ***************  also remember to pass through the hde,vde,hs_out,vs_out
// ***************  in the future, numerous delay sizes may be needed if you are performing functions which take multiple clocks before a true pixel becomes valid

end
end
end // always @clk
endmodule


Let me know when you are done, because, after I've gone over your new pattern generator, your next step is to place either your first text, or, graphics.

Additional: Please place you quartus screenshots on this forum, I'm having trouble with your picture server & since this thread may last long on the forum, it is a good idea to have the photos here.  (If you want a compact picture file size, lossless, use the .png file format...)

***** DON'T forget to update the graphic symbol for the changes I made to you sync generator.  The order of the IO pins have been changes and I changed the available parameters on the block diagram symbol.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 07, 2019, 09:30:37 am
Is there a specific reason why not to reuse the X and Y counters from the sync generator?

You will always need to know the pixel X Y coordinates so why not use a counter that already counts this. Only thing i would add is a "pixel_increment" signal that goes high for 1 clock cycle every time it steps onto the next pixel. This can then pace the pixel generation logic at slower speeds than the clock.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 07, 2019, 10:20:49 am
Is there a specific reason why not to reuse the X and Y counters from the sync generator?

You will always need to know the pixel X Y coordinates so why not use a counter that already counts this. Only thing i would add is a "pixel_increment" signal that goes high for 1 clock cycle every time it steps onto the next pixel. This can then pace the pixel generation logic at slower speeds than the clock.
Yes.  The counters are good for generating a pattern, especially in a simple design.  However, there are a few advantages to getting used to having a separate what we could more truly call an address counter generator which doe not rely on multiply and add to point to a position in ram specifically.

First advantage.  You notice I separated the display enables into horizontal and vertical, HDE & VDE.  to create a synchronous new H&V counter, one could approach the counters like this:

--------------------------------------------------------------------------------------------------------------
hde_out <= hde_in;
vde_out <= vde_in;

if (~vde) begin  // regardless of the display geometry and timing, the new
   vcount <=0 ;  // vertical count clears immediately after a video frame has finished
  end else begin
                  if (~hde_in && hde_out ) vcount <= vcount +1;  // regardless of the display geometry and timing, the new
                                                                                        // vertical count increments immediately after a video line is finished
                                                                                       // giving time for address generators to do their math and ram access cycles
                  if (~hde) begin
                     hcount <= 0;
                 end else begin
                      hcount <= hcount +1;
                     end
------------------------------------------------------------------------------------------------------------------------------------
Now, the hcount and vcount coming out of the sync generator may be re-aligned with reference to the active picture area to re-center position the picture, the above code with minimal gates will always clear and increment you picture coordinates right after the display activity has ended, giving you maximum time to perform additional actions/access within the same block of system memory to do thing like, copy over memory to fill a sprite, copy memory for audio dac, read a display list for what the video mode of the next line of video should be and where it's display memory is located.

Now, this is a preference as I know that such logic will only eat another 24 logic registers for a full 1024x1024.  This will also increase your FMAX as these counters may now be located right at the inputs of your dual port ram which quartus now only has to route HDE and VDE from your sync generator to wherever the ram address may be in the FPGA.

Remember, the current sync generator only generates a vde, hde, vsync, hsync.  Wait till we add

sprite1.thru.8_hstart
sprite1.thru.8_vstart
sprite1.thru.8_vstop
vertical_interrupt1.thru.2

Stopping here already makes 26 x 10 bit number compares on the hcount and vcount to generate their windowed coordinates.  Adding that to using the same counters to generate additional raster coordinates will kill the FMAX in such a slow old FPGA.

I'm targeting a high speed FMAX so that with a simple dual port ram, on the graphics read side, if the ram is operating at 6x the pixel clock speed, say 150Mhz, for each pixel drawn, I can access 6 different words of system ram and still have the second port 100% dedicated to the CPU interface.

With these 6 access, we will for example, read a top ascii text window layer, loop that output into a resolver for the font memory base address and read that letters correct pixel, another access cycle for reading a font color map.  Also, access graphics memory for a graphics mode.  And since these modes usually are bitplane, they'll only eat 1-2 of my access time slots.  The remainder access slots 3-6 will be used for a hardware rectangle read / modify / write command which will accelerate clear, carriage return scroll text, paste a bitplane picture from one location in memory into another with an optional transparent color 0.  With the 2 megabit Lattice part, an 8Mhz Z80 would be able to basically run a simplified Doom type game with all the sorted character graphic images pre-rendered in the OP's 160x120 256 color mode.


Hmmm, the only ugly thing would be a triangle filling engine.  Thankfully, there is another thread here on this FPGA forum covering that exact topic, with source code...  With a core 150Mhz clock and 8MHZ cpu, the dedicated CPU memory access port if free over 95% of the time for sophisticated triangle, rectangle and circle geometry engine.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 10:39:03 am
Here we go, I cleaned up your sync generator.  You had a few bugs.

Thanks!  :-+

Well done, by the way, writing all that in a text editor - there were only a couple of typos in variable names, otherwise it all compiled into symbol files just fine.  ;D

Now, the next thing you need to do is use the IOs from this example, and make a new picture pattern generator which does not use the sync generators H&V counters, but, generates it's own internally using this example IO port setup:

Okay, working on it now...  :-/O

Additional: Please place you quartus screenshots on this forum, I'm having trouble with your picture server & since this thread may last long on the forum, it is a good idea to have the photos here.  (If you want a compact picture file size, lossless, use the .png file format...)

Righto - don't know why I got into the habit of externally-hosting the images, but there's no particular reason I still do it.  :-+

***** DON'T forget to update the graphic symbol for the changes I made to you sync generator.  The order of the IO pins have been changes and I changed the available parameters on the block diagram symbol.

No worries. Will get back to you (hopefully) shortly!  :-/O
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 07, 2019, 11:04:44 am
Well yes for counting frame buffer memory addresses you would want to use a separate counter that resets on Vsync. But for other things having X Y tends to be more useful.

For text and tile maps id use grids that are 8 16 32 or something and round the total area to a round power of 2 number so that you can get the index by simply bit shifting and so zero logic is needed to calculate the index from X Y (Yes this wastes memory space but is very useful for smooth scrolling). Also since these are small and often accessed id stick them into separate memory blocks so that you can generate text and color it within a single clock cycle. When deciding about sprites its also common that you need to compare a bunch of registers to the X and Y coordinates. All of it being essentially combinational logic that can work in the background for multiple clock cycles if needed since pixels come out at such a slow rate. You can easily give yourself a 1 pixel head start to it by delaying Hsync output by one pixel so that the pipeline is full and ready when the pixels really start moving out to the monitor.

Anyway many ways to skin a cat, none of them necessarily more correct, just different in there own way.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 11:30:12 am
Okay, vid_pattern_generator.v below for your perusal and criticism.  ;)

I haven't tried it out on the FPGA yet as I'm sure there'll be some feedback and changes I need to make!  I was undecided about how to implement the x, y counters - but the method I've gone with is this one; all the time pc_ena is HIGH (pixels are being drawn), I increment the X and Y counters - so they should range from 0-639 for x and 0-479 for y, or indeed whatever maximum value the current screen resolution goes up to.

As they're counting, the only check that is made is against H_RES to reset x at the end of each scan line.  I don't check y at all as that will stop counting when pc_ena goes LOW and get reset during VSYNC.

As soon as pc_ena goes LOW, the counters no longer increment and when there is a VSYNC pulse, the counters are reset, ready for pc_ena to go HIGH at the start of the next frame and allow them to count again.

Does that sound reasonable, or is there a better way?

Code: [Select]
module vid_pattern_generator(

input wire pclk,
input wire reset,
input wire pc_ena, // Pixel clock enable
input wire hde_in, // Horizontal Display Enable - high when in display area (valid drawing area)
input wire vde_in, // Vertical Display Enable - high when in display area (valid drawing area)
input wire hs_in, // horizontal sync
input wire vs_in, // vertical sync


output reg hde_out,
output reg vde_out,
output reg hs_out,
output reg vs_out,

output reg [RGB_hbit:0] r_out,
output reg [RGB_hbit:0] g_out,
output reg [RGB_hbit:0] b_out

);

parameter RGB_hbit = 1; // 1 will make the RGB ports go from 1 to 0, eg [1:0].  I know others prefer a '2' here for 2 bits

// default resolution if no parameters are passed
parameter H_RES = 640; // horizontal display resolution
parameter V_RES = 480; // vertical display resolution

// set up x & y position counters
reg [9:0] x, y;

always @(posedge pclk)
begin
if (reset) // global reset
begin

// reset position counters
x <= 1'b0;
y <= 1'b0;

end // global reset
else
begin
if (pc_ena) // once per pixel
begin

// Generate some colour bars based on x value
r_out[RGB_hbit] <= x[8];
g_out[RGB_hbit] <= x[7];
b_out[RGB_hbit] <= x[6];

// pass through hde, vde, hs_out, vs_out
hde_out <= hde_in;
vde_out <= vde_in;
hs_out <= hs_in;
vs_out <= vs_in;

// update the position counters
// all the time pc_ena is HIGH, we're in a valid drawing area so
// the counters can be incremented and checked against the
// supplied screen resolution
// reset X & increment Y at end of scanline
if (x == H_RES - 1) // end of line
begin
x <= 1'b0; // reset X
y <= y + 1'b1; // increment Y to next scanline
end
else // not at end of scanline, so just increment X
x <= x + 1'b1;

end // pc_ena
else
begin

// Check for VSYNC and reset X & Y if VSYNC is HIGH
if (vs_in)
begin
x <= 1'b0;
y <= 1'b0;
end
else
begin
// do nothing
end

end // !pc_ena

end // !global reset

end // always @clk

endmodule
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 07, 2019, 11:54:32 am
Here's a trick:

----------------------------
                 if (~hde_in) begin
                     x <= 0;
                 end else begin
                      x <= x +1;
                     end
-----------------------------
OR --- your choice ---
----------------------------
                 if (hde_in) begin
                     x <= x + 1;
                 end else begin
                      x <= 0;
                     end
-----------------------------

What's happening here is while the horizontal enable is OFF, the x counter is kept at 0.
When the horizontal enable is high, the x counter counts.  This is elegant since the X position is 0 right after the display is disabled and goes to 1 right at the second active pixel while continuing to count up.

Vertical has 1 additional trick...

-------------------------------------
if (~vde_in) begin
   y <=0 ;  // vertical count=0 after the picture area is finished and stays at 0 while the vertical DE is not enabled
  end else begin
                  if (~hde_in && hde_out ) y <= y +1;  // the trick.... When the horizontal DE line ends by going low, at that 1 pixel
                                                                              // while the horizontal DE output is still high since it's delayed
                                                                              // by 1 pixel clock, the vertical counter will increment by 1.
                                                                              // Using an input  ANDED with a 1 clock delayed copy of that input is
                                                                             // is commonly used to trigger a single shot / one shot action every time
                                                                            // an input transitions from high to low, or low to high depending on your ~...
  end
------------------------------------

What going on here, is that during the first line of the vertical DE, since it was previously cleared, it starts at 0.
It wont increment until the end of the first horizontal line meaning that at the next line, the Y counter will already be at 1.

Operating like this, if you have a variable HDE and VDE window anywhere in your raster at any size, the X and Y counters will be at their next ready valid numerical state immediately after each line has ended and the frame has ended.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 12:14:06 pm
Operating like this, if you have a variable HDE and VDE window anywhere in your raster at any size, the X and Y counters will be at their next ready valid numerical state immediately after each line has ended and the frame has ended.

Fantastic - works beautifully and is quite small and neat, a great combination.  ;D

[attach=1]  [attach=2]

One thing I've noticed - there's more colour bars... Is the x counter incrementing properly?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 01:05:25 pm
For text and tile maps id use grids that are 8 16 32 or something and round the total area to a round power of 2 number so that you can get the index by simply bit shifting and so zero logic is needed to calculate the index from X Y (Yes this wastes memory space but is very useful for smooth scrolling).

I'd be hoping that I could just ignore the least significant bits of the x and y counters to get the tile index?  Say my tiles are 8x16 (I was originally planning on 8x8, but it seems it should actually be 8x16? I might just stretch the y-axis by 2 though to keep memory use down initially).  So for example:

Code: [Select]
tile_x = x[9:4];
tile_y = y[9:5];

Surely that'd be quicker than messing around with bit shifts, or am I misunderstanding what you've written and we're talking about the same thing?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 07, 2019, 01:15:21 pm
For text and tile maps id use grids that are 8 16 32 or something and round the total area to a round power of 2 number so that you can get the index by simply bit shifting and so zero logic is needed to calculate the index from X Y (Yes this wastes memory space but is very useful for smooth scrolling).

I'd be hoping that I could just ignore the least significant bits of the x and y counters to get the tile index?  Say my tiles are 8x16 (I was originally planning on 8x8, but it seems it should actually be 8x16? I might just stretch the y-axis by 2 though to keep memory use down initially).  So for example:

Code: [Select]
tile_x = x[9:4];
tile_y = y[9:5];

Surely that'd be quicker than messing around with bit shifts, or am I misunderstanding what you've written and we're talking about the same thing?

Yes this is bit shifting, just rewireing bits with some offset. On a FPGA such a operation is free as its done inside the signal routeing hence why its so useful.

The reason for also keeping the map size a power of 2 is so that you can build the address for the map memory by simply going
Code: [Select]
assign tilemap_addr = {tile_y,tile_x};
This means that on a 640x480 screen with 8 pixel tiles you would instead of having 80 tiles wide have 128 tiles wide by using a 7bit wide tile_x register. So you waste memory by having more tile memory than is needed to fill the screen, but it also allows the CPU to draw and update tiles outside the visible area. This is how infinite smooth scroling is implemented on most game consoles.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 07, 2019, 01:24:30 pm
Operating like this, if you have a variable HDE and VDE window anywhere in your raster at any size, the X and Y counters will be at their next ready valid numerical state immediately after each line has ended and the frame has ended.

Fantastic - works beautifully and is quite small and neat, a great combination.  ;D

[attach=1]  [attach=2]

One thing I've noticed - there's more colour bars... Is the x counter incrementing properly?
I think you can now invert the hsync and vsync output to get the standard VGA output sync polarity.  Now with the 5v buffered outputs, your monitor should accept the signal.

As for text, you are correct, take a look at my text generator, though it might need a little touch-up to be wired in place of your video generator.  You'll also note in my code that I am ignoring the LSB on the H&V counters in my code.  This is to make each dot in my 8x8 font 2 pixels wide and 2 pixels tall.

I'll take a look at the code tonight and maybe do a little patch for you.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 02:17:27 pm
This means that on a 640x480 screen with 8 pixel tiles you would instead of having 80 tiles wide have 128 tiles wide by using a 7bit wide tile_x register. So you waste memory by having more tile memory than is needed to fill the screen, but it also allows the CPU to draw and update tiles outside the visible area. This is how infinite smooth scroling is implemented on most game consoles.

Sounds like an effective way to do it, but I might have to leave that for a while until I have an FPGA with enough RAM.   The Spartan 6 arrived today, but I'm still waiting on the cable programmer for it.  ::)

I think you can now invert the hsync and vsync output to get the standard VGA output sync polarity.  Now with the 5v buffered outputs, your monitor should accept the signal.

Okay, that's strange.  I've inverted the HSYNC and VSYNC by changing the parameters on vid_out_stencil, and I'm now getting fuzzy edges to the colour bars and screen blanking several times a second.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 07, 2019, 02:31:54 pm
This means that on a 640x480 screen with 8 pixel tiles you would instead of having 80 tiles wide have 128 tiles wide by using a 7bit wide tile_x register. So you waste memory by having more tile memory than is needed to fill the screen, but it also allows the CPU to draw and update tiles outside the visible area. This is how infinite smooth scroling is implemented on most game consoles.

Sounds like an effective way to do it, but I might have to leave that for a while until I have an FPGA with enough RAM.   The Spartan 6 arrived today, but I'm still waiting on the cable programmer for it.  ::)

I think you can now invert the hsync and vsync output to get the standard VGA output sync polarity.  Now with the 5v buffered outputs, your monitor should accept the signal.

Okay, that's strange.  I've inverted the HSYNC and VSYNC by changing the parameters on vid_out_stencil, and I'm now getting fuzzy edges to the colour bars and screen blanking several times a second.
Your going to have to check the outputs with your scope.  It sounds like the 5v high isn't reaching or maintaining 5v.  With the shorter pulses of a positive sync, the monitor might see the signal better meaning the signal is still operating at the edge of functionality.  For now, just use a positive sync unless you want to scope your 74xxx CMOS buffer's output.

Do you have you CMOS buffer's output tied 4 in parallel like I recommended to give you a higher current drive?  If you monitor have 75ohm load on the sync signals instead of 1kohm, it also may have a high level of around 4v instead of 5v.

Is your vsync also going through the buffer the same way?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 03:30:23 pm
Do you have you CMOS buffer's output tied 4 in parallel like I recommended to give you a higher current drive?  If you monitor have 75ohm load on the sync signals instead of 1kohm, it also may have a high level of around 4v instead of 5v.

Is your vsync also going through the buffer the same way?

VSYNC and HSYNC are both unchanged in that they're both going through the buffer as before.

One thing is that I'm only using one output from the buffer for HSYNC and VSYNC.  Might try the multiple-parallel outputs and see if that improves the image with inverted sync signals then!  :-/O

Oh, I'm also dabbling with trying to build a character generator and am fast realising I know nothing about how to interface a block with a memory block etc...  :o

EDIT:

Yes, that sorted it.  I've only used two outputs per sync signal for the moment, but it's working fine inverted now with no misbehaviour.  :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 07, 2019, 03:57:13 pm
Oh, I'm also dabbling with trying to build a character generator and am fast realising I know nothing about how to interface a block with a memory block etc...  :o
You need Verilog's memory syntax. (ETA: oops, it's not a keyword)
Code: [Select]
reg [7:0] mem [127:0];
always @ (posedge clk) begin
  if (we)
    mem[write_address] <= d;
  q <= mem[read_address]; // q doesn't get d in this clock cycle
end
Note well, that's a non-blocking assignment so adjust your time machine accordingly.

Recommended HDL coding styles handbook (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/qts/qts_qii51007.pdf), p13 and on, for more examples.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 03:58:14 pm
Okay, so for a character generator, this is what I'm thinking:

Replace the vid_pattern_generator with a vid_char_generator.  It will share all the inputs and outputs of the vid_pattern_generator, but with some extras:


This will allow the vid_char_generator to compute the current tile address from the internally-generated x & y coordinates, send that address to the video memory and receive a character index back.

Then it will compute a pixel address for that character in the character memory, based on the character index and current x & y coordinates, which will return a single bit of information based on whether the current pixel is background- or foreground-coloured depending on the character in that tile....  Does that make sense?  Am I over-complicating it, because there seems like a lot going here?  :scared:

I would load a font file (OSD_FONT.MIF) and LUT (osd_mem.mif) into the character memory and, initially at least, the video memory will be garbage - but populating that with useful data is a few steps away yet.

I'm a little confused looking at osd_generator.v, but then I don't have a good map in my head of what should be happening here anyway, so understanding it is proving difficult.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 07, 2019, 04:48:51 pm
Okay, so for a character generator, this is what I'm thinking:

Replace the vid_pattern_generator with a vid_char_generator.  It will share all the inputs and outputs of the vid_pattern_generator, but with some extras:

  • An 8-bit data input from video memory.
  • A xx-bit address to video memory.
  • A CE (chip enable) to video memory.
  • An xx-bit address output to character memory.
  • An 1-bit data input from character memory.
  • A CE to character memory.
CE is only relevant in discrete implementations where the memory needs to be kept from driving a shared bus. Here, memory is always chip-enabled and you just have write-enables and maybe read-enables. Or, if you prefer to have the tool infer memory from generic HDL, you only have write-enables and are always reading every clock. If you want to hold read data, add your own register.

Quote
This will allow the vid_char_generator to compute the current tile address from the internally-generated x & y coordinates, send that address to the video memory and receive a character index back.

Then it will compute a pixel address for that character in the character memory, based on the character index and current x & y coordinates, which will return a single bit of information based on whether the current pixel is background- or foreground-coloured depending on the character in that tile....  Does that make sense?  Am I over-complicating it, because there seems like a lot going here?  :scared:
No, that's about the size of it. Now you see why I was encouraging you to do a plain old framebuffer to start :)

Block RAMs are commonly used in synchronous mode, which means you'll have a one-cycle delay through each memory. You can get ahead of it such as by reading the next character index or bitmap ahead of time and buffering the data, or you can push other business back such as by applying a two-clock delay to the display enable to the rest of the display system. If you want to give yourself something other than garbage or nothing to stare at, use an initial task to preload a character bitmap or two and to fill the text memory with something. Writing this from sleep-deprived memory:
Code: [Select]
initial begin
  integer i;
  for (i=0; i<255; i++) begin
    charmem[0+i*8] = 8'b00010000 ^ {i[0]{8}}; // invert top half according to bit 0 of tile index
    charmem[1+i*8] = 8'b00101000 ^ {i[0]{8}}; // to give you an easy way to see if you're
    charmem[2+i*8] = 8'b01000100 ^ {i[0]{8}}; // actually reading the tile map
    charmem[3+i*8] = 8'b10000010 ^ {i[0]{8}};
    charmem[4+i*8] = 8'b11111110 ^ {i[1]{8}}; // invert bottom half according to bit 1 of tile index
    charmem[5+i*8] = 8'b10000010 ^ {i[1]{8}};
    charmem[6+i*8] = 8'b10000010 ^ {i[1]{8}};
    charmem[7+i*8] = 8'b00000000 ^ {i[1]{8}};
  end
end

initial begin
  integer i;
  integer j;
  for(i=0;i<80;i=i+1)
    for(j=0;j<25;j=j+1)
      tilemap[i] = i + j;
end

I'm using byte-wide memory here because it's easier to illustrate. You can select just one bit of the read port for output by [x[2:0]] bit-selecting the output wire or reg (but not the memory directly, iirc).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 07, 2019, 10:33:17 pm
CE is only relevant in discrete implementations where the memory needs to be kept from driving a shared bus. Here, memory is always chip-enabled and you just have write-enables and maybe read-enables. Or, if you prefer to have the tool infer memory from generic HDL, you only have write-enables and are always reading every clock. If you want to hold read data, add your own register.

Ah yes, of course - makes sense.

Quote
This will allow the vid_char_generator to compute the current tile address from the internally-generated x & y coordinates, send that address to the video memory and receive a character index back.

Then it will compute a pixel address for that character in the character memory, based on the character index and current x & y coordinates, which will return a single bit of information based on whether the current pixel is background- or foreground-coloured depending on the character in that tile....  Does that make sense?  Am I over-complicating it, because there seems like a lot going here?  :scared:
No, that's about the size of it. Now you see why I was encouraging you to do a plain old framebuffer to start :)

You're not kidding!  ???

I think I understand the concept now though - I just need to get my head around making the right connections in Quartus and setting the memory up properly.  There's a template for new memory objects in the New.. menu, but then on a website I found some code that I've re-purposed into this:

Code: [Select]
module ROM_font (

input [11:0] address,
output [7:0] data,
input wire read_en

);

reg [7:0] mem [0:8192];

assign data = (read_en) ? mem[address] : 8'b0;

initial begin
$readmemb("OSD_FONT.MIF", mem);
end

endmodule

This should read a file in the project (borrowed from BrianHG's altera_osd_20kbit files) which contains the character definitions into memory.  Except I don't think it's loading it into memory - I think it's loading into hardware, so it's creating a 'ROM' in the sense that it can't be written to, but I get the feeling it's an extremely wasteful way of doing it.   Please correct me if I'm wrong.  Isn't there a megafunction or wizard of some kind I need to use to create a RAM/ROM area, or does Quartus do this for me by inferring my intentions from the code?

If you want to give yourself something other than garbage or nothing to stare at, use an initial task to preload a character bitmap or two and to fill the text memory with something. Writing this from sleep-deprived memory:

Yes, I could read a file into memory like with the font file above. I thought the RAM would contain random values, so I wouldn't have to load anything initially to see something come out on the screen... is that not the case then?  Is RAM pre-initialised to zeros or something?

I'm using byte-wide memory here because it's easier to illustrate. You can select just one bit of the read port for output by [x[2:0]] bit-selecting the output wire or reg (but not the memory directly, iirc).

Would it be better to return a byte (an entire row from the selected character) and bit-select from that returned value, or configure the 'ROM' to just return the bit in question pointed to by x & y?  What's the difference in terms of the 'ROM' code?  Is it as simple as:

Code: [Select]
module ROM_font {
    output [7:0] data,
    ...

... to return a byte, or:

Code: [Select]
module ROM_font {
    output data,
    ...

... to return a bit?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 03:14:52 am
EDIT:

Yes, that sorted it.  I've only used two outputs per sync signal for the moment, but it's working fine inverted now with no misbehaviour.  :-+
Ok, if 1 x 74AC/HC CMOS buffer was on the line when using negative sync to drive the monitor cables, and 2 in parallel work with any sync polarity, can you see why I said 2 pages ago, 'USE 3 CMOS BUFFERS in parallel....'  Your monitor appears to have a 75 Ohm termination resistor on it's sync lines and they might be using a schmitt trigger meaning that the drive would need to clear almost 4v by the time it reaches the monitors internal circuitry...  If you are now using 2, go to 3, otherwise, on a warm day, when your 74AC/HC CMOS buffer warms up, and it's drive switch resistance drops a bit, of your 5v power supply may be a still valid 4.75v, you will wonder which your monitor's picture has begun to get interference.  Your other choice is to use a single line driver buffer designed to drive a 25ohm load, or a transistor buffer.  With one of those, the sync signal will reach your monitor with the full 0-5v swing.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 03:26:03 am
Operating like this, if you have a variable HDE and VDE window anywhere in your raster at any size, the X and Y counters will be at their next ready valid numerical state immediately after each line has ended and the frame has ended.

Fantastic - works beautifully and is quite small and neat, a great combination.  ;D

[attach=1]  [attach=2]

One thing I've noticed - there's more colour bars... Is the x counter incrementing properly?

Show me your code.  You may have your X&Y counter outside the:
-----------------------------------------------------------------
         if (pc_ena)   // once per pixel
         begin
-----------------------------------------------------------------
Meaning it may be running at 50MHz instead of 25MHz.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 08, 2019, 04:40:43 am
This should read a file in the project (borrowed from BrianHG's altera_osd_20kbit files) which contains the character definitions into memory.  Except I don't think it's loading it into memory - I think it's loading into hardware, so it's creating a 'ROM' in the sense that it can't be written to, but I get the feeling it's an extremely wasteful way of doing it.   Please correct me if I'm wrong. Isn't there a megafunction or wizard of some kind I need to use to create a RAM/ROM area, or does Quartus do this for me by inferring my intentions from the code?
You can use the generic HDL to be portable, or you can instantiate and configure vendor modules to get vendor features, or you can use the megawizard to configure the vendor modules to get vendor features and pointy-clicky satisfaction. Mostly, it's a style matter. Quartus will indeed infer your block RAM intentions if your code walks, talks, and quacks enough like a duck. If not, you can use a ramstyle pragma to specify the style of RAM you want to use. https://www.intel.com/content/www/us/en/programmable/quartushelp/current/index.htm#hdl/vlog/vlog_file_dir_ram.htm (https://www.intel.com/content/www/us/en/programmable/quartushelp/current/index.htm#hdl/vlog/vlog_file_dir_ram.htm)

That ROM_font module has two big problems: one, it is asynchronous. From p15 of the coding style handbook: "Because memory blocks in the newest devices from Altera are synchronous, RAM designs that are targeted towards architectures that contain these dedicated memory blocks must be synchronous to be mapped directly into the device architecture. For these devices, asynchronous memory logic is implemented in regular logic cells." Two, it is sized incorrectly. Specify memory sizes in words rather than bits, much as you would a two-dimensional array, but mind the off-by-one error. 256 tiles of 16 rows each would be sized as reg [n:0] mymemory [0:(256*16)-1] regardless of the register width n.

Quote
Yes, I could read a file into memory like with the font file above. I thought the RAM would contain random values, so I wouldn't have to load anything initially to see something come out on the screen... is that not the case then?  Is RAM pre-initialised to zeros or something?

I think it's customary that block RAMs are initialized to zero upon power-up unless otherwise specified. In simulation, block RAMs are usually initialized to X so that you can see when you're reading from memory that you haven't written to (whether initially or in your logic) and trace how far the consequences reach.

Quote
Would it be better to return a byte (an entire row from the selected character) and bit-select from that returned value, or configure the 'ROM' to just return the bit in question pointed to by x & y?  What's the difference in terms of the 'ROM' code?  Is it as simple as:
I suggest byte-wide character bitmap "ROM" and external bit-selection with a look toward the future. If you decide to make the character bitmap block writable by the host, synthesis won't have to try to infer a dual-port RAM with dissimilar widths on each port, and won't have to guess/decide a possibly incorrect bit-endianness, because you will have specified it explicitly in logic. Whether you bit-select inside or outside the ROM_font module is a matter of style. Personally, I'd treat the x8 ROM as a x8 ROM all round and do the assign bit_out = rom_font_data[~xpos[2:0]]; outside of the ROM_font module, in case you might eventually reuse the bit selector for the frame buffer. That said, I've not done HDL projects too much larger than this, so FPGA old hands feel free to roast me if I'm doing it wrong.  :popcorn:
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 05:02:38 am
Ok, I've cleaned up my OSD generator, commented on everything and it gives a much better starting point.
We will make changes to make things better and touch up the font into standard 8 bit bytes, but as an introduction, this shows how the basics of how one of Intel's memory function is called and configured.

Code: [Select]
module osd_generator ( clk, pc_ena, hs_in, vs_in, osd_ena_out, osd_image, hde_out, vde_out, hs_out, vs_out,
wren_disp, wren_font, wr_addr, wr_data );

// To write contents into the display and font memories, the wr_addr[15:0] selects the address
// the wr_data[7:0] contains a byte which will be written
// the wren_disp is the write enable for the ascii text ram.  Only the wr_addr[8:0] are used as the character display is 32x16.
// the wren_font is the write enable for the font memort.  Only 2 bits are use of the wr_dataa[1:0] and wr_addr[12:0] are used.
// tie these ports to GND for now disabling them

input  clk, pc_ena, hde_in, vde_in, hs_in, vs_in ;

output osd_ena_out ;
reg    osd_ena_out ;
output [2:0] osd_image ;
output hde_out, vde_out, hs_out, vs_out;


input wren_disp, wren_font;
input [15:0] wr_addr;
input [7:0] wr_data;

reg   [9:0] disp_x,dly1_disp_x,dly2_disp_x;
reg   [8:0] disp_y,dly1_disp_y,dly2_disp_y;

reg   dena,dly1_dena,dly2_dena,dly3_dena,dly4_dena;
reg   [7:0] dly1_letter, dly2_letter;

reg   [7:0] hde_pipe, vde_pipe, hs_pipe, vs_pipe;


wire [12:0] font_pos;
wire [8:0]  disp_pos;
wire [2:0]  osd_image;


parameter   PIPE_DELAY =  4;   // This parameter selects the number of pixel clocks which the output VDE and syncs are delayed.  Only use 2 through 9.

// ****************************************************************************************************************************
// LPM_RAM_DP Quartus dual port memory function
// ****************************************************************************************************************************
wire [7:0] sub_wire0;
wire [7:0] letter = sub_wire0[7:0];
// w access = 'b1100000a aaaaaaaa
altsyncram altsyncram_component_osd_mem ( .wren_a ( wren_disp ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[8:0]), .address_b (disp_pos[8:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire0));


defparam
altsyncram_component_osd_mem.intended_device_family = "Cyclone",
altsyncram_component_osd_mem.operation_mode = "DUAL_PORT",
altsyncram_component_osd_mem.width_a = 8,
altsyncram_component_osd_mem.widthad_a = 9,
altsyncram_component_osd_mem.width_b = 8,
altsyncram_component_osd_mem.widthad_b = 9,
altsyncram_component_osd_mem.lpm_type = "altsyncram",
altsyncram_component_osd_mem.width_byteena_a = 1,
altsyncram_component_osd_mem.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_mem.indata_aclr_a = "NONE",
altsyncram_component_osd_mem.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_mem.address_aclr_a = "NONE",
altsyncram_component_osd_mem.address_reg_b = "CLOCK1",
altsyncram_component_osd_mem.address_aclr_b = "NONE",
altsyncram_component_osd_mem.outdata_aclr_b = "NONE",
altsyncram_component_osd_mem.ram_block_type = "AUTO",
altsyncram_component_osd_mem.init_file = "osd_mem.mif";

// ****************************************************************************************************************************

wire [1:0] sub_wire1;
wire [1:0] osd_img = sub_wire1[1:0];
// w access = 'b111aaaaa aaaaaaaa
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[12:0]), .address_b (font_pos[12:0]),
.data_a (wr_data[1:0]), .q_b (sub_wire1));
defparam
altsyncram_component_osd_font.intended_device_family = "Cyclone",
altsyncram_component_osd_font.operation_mode = "DUAL_PORT",
altsyncram_component_osd_font.width_a = 2,
altsyncram_component_osd_font.widthad_a = 13,
altsyncram_component_osd_font.width_b = 2,
altsyncram_component_osd_font.widthad_b = 13,
altsyncram_component_osd_font.lpm_type = "altsyncram",
altsyncram_component_osd_font.width_byteena_a = 1,
altsyncram_component_osd_font.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_font.indata_aclr_a = "NONE",
altsyncram_component_osd_font.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_font.address_aclr_a = "NONE",
altsyncram_component_osd_font.address_reg_b = "CLOCK1",
altsyncram_component_osd_font.address_aclr_b = "NONE",
altsyncram_component_osd_font.outdata_aclr_b = "NONE",
altsyncram_component_osd_font.ram_block_type = "AUTO",
altsyncram_component_osd_font.init_file = "osd_font.mif";

// ****************************************************************************************************************************
// ****************************************************************************************************************************


//  The disp_x is the X coordinate counter.  It runs from 0 to 512 and stops there
//  The disp_y is the Y coordinate sounter.  It runs from 0 to 256 and stops there

assign disp_pos[4:0]  = disp_x[8:4] ;  // The disp_pos[4:0] is the lower address for the 32 characters for the ascii text.
assign disp_pos[8:5]  = disp_y[7:4] ;  // the disp_pos[8:5] is the upper address for the 16 lines of text


//  The result from the ascii memory component 'altsyncram_component_osd_mem'  is called letter[7:0]
//  Since disp_pos[8:0] has entered the read address, it takes 2 pixel clock cycles for the resulting letter[7:0] to come out.

//  Now, font_pos[12:0] is the read address for the memory block containing the font memory

assign font_pos[12:6] = letter[6:0] ;       //  Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0]  = dly2_disp_x[3:1] ;  //  select the font X coordinate with a 2 pixel clock DELAYES disp_x address.  [3:1] is used so that every 2 x pixels are repeats
assign font_pos[5:3]  = dly2_disp_y[3:1] ;  //  select the font y coordinate with a 2 pixel clock DELAYES disp_y address.  [3:1] is used so that every 2 y lines are repeats


//  The resulting font image, 2 bits since I made a 2 bit color atari font is assigned to the OSD[1:0] output
//  Also, since there is an 8th bit in the ascii text memory, I use that as a third OSD[2] output color bit.

assign osd_image[1:0] = osd_img[1:0];
assign osd_image[2]   = dly2_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]
                                         // so, if we want to use letter[7] as an upper color bit, is needs to be delayed 2 pixel clocks so it will be parallel with the osd_img[1:0] read data

always @ ( posedge clk ) begin

if (pc_ena) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

hde_pipe[0]   <= hde_in;
hde_pipe[7:1] <= hde_pipe[6:0];
hde_out       <= hde_pipe[PIPE_DELAY-2];

vde_pipe[0]   <= vde_in;
vde_pipe[7:1] <= vde_pipe[6:0];
vde_out       <= vde_pipe[PIPE_DELAY-2];

hs_pipe[0]    <= hs_in;
hs_pipe[7:1]  <= hs_pipe[6:0];
hs_out        <= hs_pipe[PIPE_DELAY-2];

vs_pipe[0]    <= vs_in;
vs_pipe[7:1]  <= vs_pipe[6:0];
vs_out        <= vs_pipe[PIPE_DELAY-2];

// **********************************************************************************************


// This OSD generator's window is only 512 pixels by 256 lines.
// Since the disp_X&Y counters are the screens X&Y coordinates,
// I'm using an extra most significant bit in the counters to determine if the
// OSD ena flag should be on or off


if (disp_x[9] || disp_y[8]) dena <= 0;  // When disp_x > 511 or disp_y > 255, then turn off the OSD's output enable flag
else dena <= 1;                         // otherwise, turn on the OSD output enable flag.


if ( ~vde ) disp_y[8:0] <= 9'b111111111;  // preset the disp_y counter to max while the vertical display is disabled

  else if (hde_in && ~hde_pipe[0]) begin      // isolate a single event at the begining of the active display area

disp_x[9:0] <= 10'b0000000000; // clear the disp_x counter

if (!disp_y[8] | (disp_y[8:7] == 2'b11)) disp_y <= disp_y + 1;  //  only increment the disp_y counter if it hasn't reached it's end

end else if (!disp_x[9]) disp_x <= disp_x + 1;  // keep on addind to the disp_x counter until it reaches it's end.


// **********************************************************************************************
// *** These delay pipes registers are explaines in the 'assign's above.
// **********************************************************************************************

dly1_disp_x <= disp_x;
dly2_disp_x <= dly1_disp_x;

dly1_disp_y <= disp_y;
dly2_disp_y <= dly1_disp_y;

dly1_letter <= letter;
dly2_letter <= dly1_letter;

dly1_dena   <= dena;
dly2_dena   <= dly1_dena;
dly3_dena   <= dly2_dena;
dly4_dena   <= dly3_dena;

// **********************************************************************************************

osd_ena_out  <= dly2_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
                           // It needs to be delayes by the number of pixel clocks required for the above memories


end // ena
end // always@clk
endmodule


Before we make changes like addressing 80 columns, addressing anywhere withing a chunk of ram, I'll fetch out my palette and overlay generator which goes with this OSD generator.  This will allow you to simultaneously run your pattern generator and superimpose the OSD generator on top, with a semi transparent layer in the font and a palette for the font colors.  For now, you can just wire the 'OSD[2:0]' output's bits 0 to your r&g&b outputs.

Hint, in the block diagram editor, double click on a blank area and type 'WIRE'.  Place these on your schematic and label on the left the buss name and number of my OSD outputs and on the right place a wire and label your RGB pins names.

Also, for the write memory inputs, on the block diagram editor, double click on a blank area, and type GND, and insert 3 GND symbols and tie them to the inputs on the OSD generator module.

The osd text memory block .mif should contain a display font test text of all the characters.  You and edit the file in quartus to make it say anything you like.


Note: After the palette and we will tackle a true memory address generator which will allow text windows of any Z-size by any Y-size, pointing to any memory base address, with a hidden overscan or any text columns width.  Plus, variable X&Y font size and 16 foreground by 16 background colors, paletted and programable for each letter on the display.  We will also be embedding the font memory into the same memory as the ascii text ram which will also be you main 8bit cpu ram as well.  There should be no problem implementing the 256 color graphics as well, but, the resolution window will be puny and we might need to forgo it's palette.

ALSO: I need to you increase your output pixel bits to 4 for each R,G,B, that's 12 bit color, or,  4096 colors.  Do not erase your pattern generator, we will be still bringing it back soon for additional tests.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 07:36:10 am
I suggest byte-wide character bitmap "ROM" and external bit-selection with a look toward the future. If you decide to make the character bitmap block writable by the host, synthesis won't have to try to infer a dual-port RAM with dissimilar widths on each port, and won't have to guess/decide a possibly incorrect bit-endianness, because you will have specified it explicitly in logic. Whether you bit-select inside or outside the ROM_font module is a matter of style. Personally, I'd treat the x8 ROM as a x8 ROM all round and do the assign bit_out = rom_font_data[~xpos[2:0]]; outside of the ROM_font module, in case you might eventually reuse the bit selector for the frame buffer. That said, I've not done HDL projects too much larger than this, so FPGA old hands feel free to roast me if I'm doing it wrong.  :popcorn:

Yes, I'm going there.... I just want the OP to take a few steps inbetween since we will soon be having either an 8bit or 16bit wide dual port ram which will be sized to the maximum the FPGA can handle (parameter configurable so larger FPGA will also work), and the display text, font, text color, and a universal graphics mode will have a base pointer in that same ram which is loaded during the v-sync.  A string of bytes from that base pointer will be loaded into control registers containing all the graphics control metrics, updated every h-sync, making a fully programable display pretty much on par with a Super Nintendo, except with support for 640x480 in the text mode and 4 color bitplane graphics mode.  These functions, really just 2 counters with a variable increment at the end of each line, will include text rows/video modes with a programmable width which may be wider than the entire display screen.

First we needed enough to get his project running the pixel clock at 4x, 100MHz, yet with a 25MHz pixel out while verifying the characters and background graphics were scaling correctly without error & his graphics rendering pipe was functional.

In the future, if the OP wishes to upgrade to external DRAM, that 1 memory block will need to be converted into a cache memory for the external ram and the constructed image will need pre-fetching a line of video in advance.  This one gets nasty.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 10:21:35 am
Ok, if 1 x 74AC/HC CMOS buffer was on the line when using negative sync to drive the monitor cables, and 2 in parallel work with any sync polarity, can you see why I said 2 pages ago, 'USE 3 CMOS BUFFERS in parallel....'  Your monitor appears to have a 75 Ohm termination resistor on it's sync lines and they might be using a schmitt trigger meaning that the drive would need to clear almost 4v by the time it reaches the monitors internal circuitry...  If you are now using 2, go to 3, otherwise, on a warm day, when your 74AC/HC CMOS buffer warms up, and it's drive switch resistance drops a bit, of your 5v power supply may be a still valid 4.75v, you will wonder which your monitor's picture has begun to get interference.  Your other choice is to use a single line driver buffer designed to drive a 25ohm load, or a transistor buffer.  With one of those, the sync signal will reach your monitor with the full 0-5v swing.

I'm not deliberately ignoring advice, just trying to make do with the bits I have to hand - namely, a shortage of very short linking wires for the breadboard the sync driver is plugged into.  ::)  As usual for a beginner, I made do with two parallel outputs, found it worked and stopped there.  Have upped my game now and got three parallel outputs driving each sync signal now.  :-+

I've also got a bit of a loose connection somewhere... the fuzzy column edges and blinking came back earlier, but I suspect a bad connection on the breadboard around the driver output as some finger poking around that area sorted it.  :-/O  Can't wait for the EP4C dev board to arrive - that has a built-in VGA port and will help me focus entirely on the HDL development.

One thing I've noticed - there's more colour bars... Is the x counter incrementing properly?

Show me your code.  You may have your X&Y counter outside the:
-----------------------------------------------------------------
         if (pc_ena)   // once per pixel
         begin
-----------------------------------------------------------------
Meaning it may be running at 50MHz instead of 25MHz.

Ah yes, the x and y counters were outside the if (p_ena) conditional.  :palm:  I'll attach my code at the bottom of this post anyway, for the sake of having an archive of it if nothing else.

Speaking of which, I've got a string of folders (currently on number 4) for this project, at various stages.  I think I might just run with this current project now and get it onto github.  I had an accident earlier where I deleted a file I thought I didn't need anymore, and it turned out to be the current sync_generator file.  Had to go back to a previous post and use the cleaned-up code you'd posted to re-create the file again.  Repo is up on github (https://github.com/nockieboy/microcom_gpu) if you want the latest files.

You can use the generic HDL to be portable, or you can instantiate and configure vendor modules to get vendor features, or you can use the megawizard to configure the vendor modules to get vendor features and pointy-clicky satisfaction. Mostly, it's a style matter. Quartus will indeed infer your block RAM intentions if your code walks, talks, and quacks enough like a duck. If not, you can use a ramstyle pragma to specify the style of RAM you want to use.

No problemo, will stick to generic HDL as much as possible then - I was just getting confused between the output from a wizard and (what appears to me to be) very basic HDL code to create a memory block.

That ROM_font module has two big problems: one, it is asynchronous... Two, it is sized incorrectly. Specify memory sizes in words rather than bits, much as you would a two-dimensional array, but mind the off-by-one error. 256 tiles of 16 rows each would be sized as reg [n:0] mymemory [0:(256*16)-1] regardless of the register width n.

Ah okay, I should go read those documents a little more closely.  I suspect this is a moot point anyway as BrianHG is clearly working to a plan and this will get covered soon.

Ok, I've cleaned up my OSD generator, commented on everything and it gives a much better starting point.
We will make changes to make things better and touch up the font into standard 8 bit bytes, but as an introduction, this shows how the basics of how one of Intel's memory function is called and configured.

Thanks again BrianHG - I'll spend some time looking at this today.

Before we make changes like addressing 80 columns, addressing anywhere withing a chunk of ram, I'll fetch out my palette and overlay generator which goes with this OSD generator.  This will allow you to simultaneously run your pattern generator and superimpose the OSD generator on top, with a semi transparent layer in the font and a palette for the font colors.  For now, you can just wire the 'OSD[2:0]' output's bits 0 to your r&g&b outputs.

Righto, no problem - so I should be able to get this working at 32x16 as it is...  :o

Note: After the palette and we will tackle a true memory address generator which will allow text windows of any Z-size by any Y-size, pointing to any memory base address, with a hidden overscan or any text columns width.  Plus, variable X&Y font size and 16 foreground by 16 background colors, paletted and programable for each letter on the display.  We will also be embedding the font memory into the same memory as the ascii text ram which will also be you main 8bit cpu ram as well.  There should be no problem implementing the 256 color graphics as well, but, the resolution window will be puny and we might need to forgo it's palette.

Aye carumba! This is going to be MUCH better than the plain old 'white on black' text mode, maybe with a splash of colour, that I was thinking of...  ;D

Okay, off to go work on this...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 11:30:28 am
Now, for a little more pushing... Once you have integrated my latest OSD generator, and get it working, the next step will be to modify the current 2 bit , 2 color per pixel font memory into a 1 bit color, 8 bit wide font memory.  I've attached a new Atari font which is formatted the way @jhpadjustable likes it.  This means you need to adjust the read addresses, size and location, and at the 8 bit data output, you need to make an immediate 8:1 mux selector to draw the 1bit font image.

The new font files I've attached are in 2 formats.
#1, "osd_font_8x8bit_128char.mif", A quartus .mif format file which is internally used for it's built in LPM_RAM_DP / AltSyncram function.
#2, "osd_font_8x8bit_128char.rom" A raw binary file which is a raw 8 bit binary dump of the Atari font.  Quartus can also use these, however 10 years ago, the feature had bugs so I convert the binaries to .mif and use them that way in quartus.

You will need to change the AltSyncram function I used in the OSD generator.  Read up on Quartus' library function for that memories controls as you need to change everything to 8bit data.  This step will allow us to combine both the character and font memory into 1 large continuous ram block for your Z80 to access.

Show me first the current character generator working, then you will need to modify the font memory into 8bit wide, monochrome color mode and verify that is also working properly.


---------------------------------------------------------------------------------
Your next step after this will be both the merge and making the new ram block have 5 dedicated read ports running at the 25MHz pixel clock speed.  This means the ram is running at 125MHz and you have a 5 muxed address in and 5 muxed/latched datas out while the second port on the ram will now be a read/write port which will be dedicated to the Z80 and way int the future, perhaps some graphics acceleration functions.

------------------------------------------------------------------------------
VERY Important, each step of the way, keep track of your FMAX!!!  If a piece of code you generate along the way kills it below 150Mhz approximately, find out why and ask questions.  Because, if you go too far with a 75MHz FMAX, it may be tedious to fix the problem lateron.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 11:43:34 am

Aye carumba! This is going to be MUCH better than the plain old 'white on black' text mode, maybe with a splash of colour, that I was thinking of...  ;D

Okay, off to go work on this...
Nope, even with your current FPGA, you will achieve a font display with 16 foreground + 16 background colors, with 4096 paletted colors and 16 translucency levels! as your text mode will be superimposed on top of the bitplane and 256 color graphics mode which also have a programmable 4096 colors for each of those 256 paletted colors.  Yes, the letters color palette will set different levels of translucency for both the font's foreground and background.  This means 16 stage fading just by manipulating the font's palette.  All this with 80 column x 60 rows, all within 16kb.  Wait a sec, does your FPGA have 16kb?  If not, oh well, we'll use 40x30 mode for now and you will only need 4kb for now.

Each line of text will have X&Y pixel sizes of 1,2,3,4 pixel/line.  Basically a separate X&Y scale size.  Same goes for the bitplane and 256 color graphics modes.  And yes, the graphics have their own separate palette too....
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 12:09:39 pm
@BrianHG

Just had to make a couple of corrections to osd_generator to get it to successfully compile a symbol file:

1) hde_in, vde_in added to ports in first line
2) Declared hde_out, vde_out, hs_out and vs_out as regs as they're given blocking assignments in an always in lines 130 onwards
3) Line 160 - if ( ~vde ) - changed to if ( ~vde_in )

For now, you can just wire the 'OSD[2:0]' output's bits 0 to your r&g&b outputs.

Hint, in the block diagram editor, double click on a blank area and type 'WIRE'.  Place these on your schematic and label on the left the buss name and number of my OSD outputs and on the right place a wire and label your RGB pins names.

I think I've done what you've asked correctly...  ???

The osd text memory block .mif should contain a display font test text of all the characters.  You and edit the file in quartus to make it say anything you like.

But osd_mem.mif should show something, right?  Should show all the ASCII chars on the screen, in theory?  I'm just getting a blank screen - no text at all, all black (at least it's not going into standby)?

ALSO: I need to you increase your output pixel bits to 4 for each R,G,B, that's 12 bit color, or,  4096 colors.  Do not erase your pattern generator, we will be still bringing it back soon for additional tests.

Ah, will this cause the issue above then?  I haven't done it yet - my output is restricted to 1-bit colour at the moment as I'm still using the 220R resistors in series with the R, G and B outputs.

Here's my current setup:

[attach=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 12:21:42 pm
You forgot to add a buss wires on the:
osd_image[2..0] output, and on that bus, write the label 'osd_image[2..0]'

Quartus will not associate the labels on your wires unless you tie them to a block diagram's IO somewhere with the same name.  It does this since you may name these buss wires anything, not necessarily the internal verilog names.

As for the rgb_in[1], again, you need to add a bus wires on the stencil's input and add rgb_in[1..0]'
Add 3 more wire symbols after the rgb_in[1] wires and add the rgb_in[0] wires shorting the 2 together to generate a full brightness image. Or, you can wire the 'osd_ena_out' into all 3 'rgb_in[0]' instead.  This will generate a differenct luminance located where the the OSD character generator is active.

ALSO, use a different osd_image[ # ] for each red,green,blue.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 12:50:25 pm
Oh wow - yes, got an output now:

[attach=1]

Not sure it's aligned correctly - perhaps needs to move right and down by 3 pixels in each axis?

I'm guessing the vertical bars are because the 32 vertical lines have been exceeded?

You forgot to add a buss wires on the:
osd_image[2..0] output, and on that bus, write the label 'osd_image[2..0]'

D'oh.  I thought Quartus would link the wires on the Wire to the correct block IO, but I guess several blocks could all have the same IO names...

Add 3 more wire symbols after the rgb_in[1] wires and add the rgb_in[0] wires shorting the 2 together to generate a full brightness image. Or, you can wire the 'osd_ena_out' into all 3 'rgb_in[0]' instead.  This will generate a differenct luminance located where the the OSD character generator is active.

The outputs for r[0]. g[0] and b[0] aren't connected to anything yet.  I'll need to set up a resistor ladder to make use of those outputs.  I'll put that on my to-do list.  ;)

ALSO, use a different osd_image[ # ] for each red,green,blue.

Spot on.  The first time I ran it on the FPGA I just got a white screen.   :-+
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 01:01:31 pm
To fix the OSD outline, change the first 3 'WIRE' symbols you have to 2 INPUT AND gates.  Wire 1 input of each and gate to exactly what you have now, and, wire the second of each and gate's input to the OSD generators 'osd_ena_out' signal.

This will mute the OSD output's image when outside the active 32x16 box.

Also, I think your monitor might need the H and V size adjusted.  I thought my OSD generator had perfect alignment.
(This wouldn't be a problem with DVI...)

This originally was a superimposed window as a debug screen for my video enhancer project.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 01:34:18 pm
To fix the OSD outline, change the first 3 'WIRE' symbols you have to 2 INPUT AND gates.  Wire 1 input of each and gate to exactly what you have now, and, wire the second of each and gate's input to the OSD generators 'osd_ena_out' signal.

This will mute the OSD output's image when outside the active 32x16 box.

Looking better already.  ;D

[attach=1]

Also, I think your monitor might need the H and V size adjusted.  I thought my OSD generator had perfect alignment.
(This wouldn't be a problem with DVI...)

 :palm:

Of course.. Auto-adjust fixed it straight away. There I was going straight for the complex problems, thinking it was a timing problem...  :-DD

Right, I've put together a 2-bit resistor ladder, so I now have 2-bit RGB output. 
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 08, 2019, 01:52:46 pm
LOL.... Live and learn...

Anyways, the next step.  Now you will need to understand my OSD code.  You will recognize that except for the way I made the 2bit font, it pretty much operates as you figured all along except for those piped delays to match after the first ram's read's contents are ready, the next one being the font image lookup, has it's x&y pointers delayed so they come in parallel with the contents of the text memory.

Now, I want you to change my font ram weird 2 bit color into an 8 bit wide font, now only B&W.  Use the new 8bit wide font I posted 7 messages up.  Try to get it to work properly.  It should fit in easily, though, you will need to mux select 8:1 the output at the right position.  Your output image should now be 2 bit color.  1 bit B&W for the font, and 1 bit for the MSB, bit 7, in the character memory.  Let's see if you can keep everything perfectly pixel aligned.

Also take note of your current FMAX...

Also, copy the osd.c module into a new name and work with that one in the same project so you can swap modules to verify alignment.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 05:17:24 pm
Going to take this slowly, as that's how my mind works..  :o ;D

Now, I want you to change my font ram weird 2 bit color into an 8 bit wide font, now only B&W.  Use the new 8bit wide font I posted 7 messages up.

So here's my first attempt:

Code: [Select]
module vid_osd_generator ( clk, pc_ena, hde_in, vde_in, hs_in, vs_in, osd_ena_out, osd_image, hde_out, vde_out, hs_out, vs_out,
wren_disp, wren_font, wr_addr, wr_data );

// To write contents into the display and font memories, the wr_addr[15:0] selects the address
// the wr_data[7:0] contains a byte which will be written
// the wren_disp is the write enable for the ascii text ram.  Only the wr_addr[8:0] are used as the character display is 32x16.
// the wren_font is the write enable for the font memory.  Only 2 bits are used of the wr_data[1:0] and wr_addr[12:0] are used.
// tie these ports to GND for now disabling them

input  clk, pc_ena, hde_in, vde_in, hs_in, vs_in;

output osd_ena_out;
reg    osd_ena_out;
output [2:0] osd_image;
output hde_out, vde_out, hs_out, vs_out;
reg hde_out, vde_out, hs_out, vs_out;

input wren_disp, wren_font;
input [15:0] wr_addr;
input [7:0] wr_data;

reg   [9:0] disp_x,dly1_disp_x,dly2_disp_x;
reg   [8:0] disp_y,dly1_disp_y,dly2_disp_y;

reg   dena,dly1_dena,dly2_dena,dly3_dena,dly4_dena;
reg   [7:0] dly1_letter, dly2_letter;

reg   [7:0] hde_pipe, vde_pipe, hs_pipe, vs_pipe;

wire [9:0] font_pos;
wire [8:0]  disp_pos;
wire [2:0]  osd_image;

parameter   PIPE_DELAY =  4;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.

// ****************************************************************************************************************************
// SCREEN TEXT MEMORY
// ****************************************************************************************************************************
wire [7:0] sub_wire0;
wire [7:0] letter = sub_wire0[7:0];
// w access = 'b1100000a aaaaaaaa
altsyncram altsyncram_component_osd_mem ( .wren_a ( wren_disp ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[8:0]), .address_b (disp_pos[8:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire0));

defparam
altsyncram_component_osd_mem.intended_device_family = "Cyclone II",
altsyncram_component_osd_mem.operation_mode = "DUAL_PORT",
altsyncram_component_osd_mem.width_a = 8,
altsyncram_component_osd_mem.widthad_a = 9,
altsyncram_component_osd_mem.width_b = 8,
altsyncram_component_osd_mem.widthad_b = 9,
altsyncram_component_osd_mem.lpm_type = "altsyncram",
altsyncram_component_osd_mem.width_byteena_a = 1,
altsyncram_component_osd_mem.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_mem.indata_aclr_a = "NONE",
altsyncram_component_osd_mem.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_mem.address_aclr_a = "NONE",
altsyncram_component_osd_mem.address_reg_b = "CLOCK1",
altsyncram_component_osd_mem.address_aclr_b = "NONE",
altsyncram_component_osd_mem.outdata_aclr_b = "NONE",
altsyncram_component_osd_mem.ram_block_type = "AUTO",
altsyncram_component_osd_mem.init_file = "osd_mem.mif";

// ****************************************************************************************************************************
// FONT MEMORY
// ****************************************************************************************************************************
wire [7:0] sub_wire1;
wire [7:0] char_line = sub_wire1;
// w access = 'b111aaaaa aaaaaaaa
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[12:0]), .address_b (font_pos[9:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire1));
defparam
altsyncram_component_osd_font.intended_device_family = "Cyclone II",
altsyncram_component_osd_font.operation_mode = "DUAL_PORT",
altsyncram_component_osd_font.width_a = 8,
altsyncram_component_osd_font.widthad_a = 13,
altsyncram_component_osd_font.width_b = 8,
altsyncram_component_osd_font.widthad_b = 10,
altsyncram_component_osd_font.lpm_type = "altsyncram",
altsyncram_component_osd_font.width_byteena_a = 1,
altsyncram_component_osd_font.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_font.indata_aclr_a = "NONE",
altsyncram_component_osd_font.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_font.address_aclr_a = "NONE",
altsyncram_component_osd_font.address_reg_b = "CLOCK1",
altsyncram_component_osd_font.address_aclr_b = "NONE",
altsyncram_component_osd_font.outdata_aclr_b = "NONE",
altsyncram_component_osd_font.ram_block_type = "AUTO",
altsyncram_component_osd_font.init_file = "osd_font_8x8bit_128char.mif";

// ****************************************************************************************************************************
// ****************************************************************************************************************************


//  The disp_x is the X coordinate counter.  It runs from 0 to 512 and stops there
//  The disp_y is the Y coordinate counter.  It runs from 0 to 256 and stops there

// Get the character at the current x, y position
assign disp_pos[4:0]  = disp_x[8:4] ;  // The disp_pos[4:0] is the lower address for the 32 characters for the ascii text.
assign disp_pos[8:5]  = disp_y[7:4] ;  // the disp_pos[8:5] is the upper address for the 16 lines of text

//  The result from the ascii memory component 'altsyncram_component_osd_mem'  is called letter[7:0]
//  Since disp_pos[8:0] has entered the read address, it takes 2 pixel clock cycles for the resulting letter[7:0] to come out.

//  Now, font_pos[12:0] is the read address for the memory block containing the character specified in letter[]

assign font_pos[9:3] = letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0] = dly2_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

// get the pixel from the x position within the character


//  The resulting 1-bit font image at x is assigned to the OSD[0] output
//  Also, since there is an 8th bit in the ascii text memory, I use that as a second OSD[1] output color bit
assign osd_image[0] = char_line[dly2_disp_x[0]];
assign osd_image[1] = dly2_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]
                                         // so, if we want to use letter[7] as an upper color bit, is needs to be delayed 2 pixel clocks so it will be parallel with the osd_img[1:0] read data

always @ ( posedge clk ) begin

if (pc_ena) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

hde_pipe[0]   <= hde_in;
hde_pipe[7:1] <= hde_pipe[6:0];
hde_out       <= hde_pipe[PIPE_DELAY-2];

vde_pipe[0]   <= vde_in;
vde_pipe[7:1] <= vde_pipe[6:0];
vde_out       <= vde_pipe[PIPE_DELAY-2];

hs_pipe[0]    <= hs_in;
hs_pipe[7:1]  <= hs_pipe[6:0];
hs_out        <= hs_pipe[PIPE_DELAY-2];

vs_pipe[0]    <= vs_in;
vs_pipe[7:1]  <= vs_pipe[6:0];
vs_out        <= vs_pipe[PIPE_DELAY-2];

// **********************************************************************************************
// This OSD generator's window is only 512 pixels by 256 lines.
// Since the disp_X&Y counters are the screens X&Y coordinates, I'm using an extra most
// significant bit in the counters to determine if the OSD ena flag should be on or off.

if (disp_x[9] || disp_y[8])
dena <= 0; // When disp_x > 511 or disp_y > 255, then turn off the OSD's output enable flag
else
dena <= 1; // otherwise, turn on the OSD output enable flag

if (~vde_in)
disp_y[8:0] <= 9'b111111111; // preset the disp_y counter to max while the vertical display is disabled

else if (hde_in && ~hde_pipe[0])
begin // isolate a single event at the begining of the active display area

disp_x[9:0] <= 10'b0000000000; // clear the disp_x counter
if (!disp_y[8] | (disp_y[8:7] == 2'b11))
disp_y <= disp_y + 1; // only increment the disp_y counter if it hasn't reached it's end

end
else if (!disp_x[9])
disp_x <= disp_x + 1;  // keep on addind to the disp_x counter until it reaches it's end.

// **********************************************************************************************
// *** These delay pipes registers are explained in the 'assign's above
// **********************************************************************************************
dly1_disp_x <= disp_x;
dly2_disp_x <= dly1_disp_x;

dly1_disp_y <= disp_y;
dly2_disp_y <= dly1_disp_y;

dly1_letter <= letter;
dly2_letter <= dly1_letter;

dly1_dena   <= dena;
dly2_dena   <= dly1_dena;
dly3_dena   <= dly2_dena;
dly4_dena   <= dly3_dena;

// **********************************************************************************************
osd_ena_out  <= dly2_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
// It needs to be delayed by the number of pixel clocks required for the above memories

end // ena

end // always@clk

endmodule


But it wont compile - I'm stuck on this error:

Code: [Select]
Error (12152): Can't elaborate user hierarchy "vid_osd_generator:inst2|altsyncram:altsyncram_component_osd_font"

EDIT:

Oh hang on.  I've found a few more errors that may clear this up... watch this space.

EDIT 2:

I've changed altsyncram_component_osd_font as follows:

Code: [Select]
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[9:0]), .address_b (font_pos[9:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire1));


Still getting errors:

Code: [Select]
Error (272006): In altsyncram megafunction, when OPERATION_MODE parameter is set to DUAL_PORT, total number of bits of port A and port B must be the same
Error (287078): Assertion error: The current megafunction is configured for use with the clear box feature and cannot be used when the clear box feature is disabled
Error (12152): Can't elaborate user hierarchy "vid_osd_generator:inst2|altsyncram:altsyncram_component_osd_font"
Error: Quartus II 64-Bit Analysis & Synthesis was unsuccessful. 3 errors, 15 warnings
Error: Peak virtual memory: 4601 megabytes
Error: Processing ended: Fri Nov 08 17:20:59 2019
Error: Elapsed time: 00:00:02
Error: Total CPU time (on all processors): 00:00:02
Error (293001): Quartus II Full Compilation was unsuccessful. 5 errors, 15 warnings


EDIT 3:

Fixed the first error:
Code: [Select]
wire [7:0] char_line = sub_wire1[7:0];
// w access = 'b111aaaaa aaaaaaaa
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[9:0]), .address_b (font_pos[9:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire1);


All fixed now.  Time to see if it works... ;)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 05:41:19 pm
Yeah, that's not working as intended....

EDIT: Just had a hell of a time trying to attach that image - put it at the bottom in the end instead of inline, as the 'edit message' form was just not working!

Here's the current code:

Code: [Select]
module vid_osd_generator ( clk, pc_ena, hde_in, vde_in, hs_in, vs_in, osd_ena_out, osd_image, hde_out, vde_out, hs_out, vs_out,
wren_disp, wren_font, wr_addr, wr_data );

// To write contents into the display and font memories, the wr_addr[15:0] selects the address
// the wr_data[7:0] contains a byte which will be written
// the wren_disp is the write enable for the ascii text ram.  Only the wr_addr[8:0] are used as the character display is 32x16.
// the wren_font is the write enable for the font memory.  Only 2 bits are used of the wr_data[1:0] and wr_addr[12:0] are used.
// tie these ports to GND for now disabling them

input  clk, pc_ena, hde_in, vde_in, hs_in, vs_in;

output osd_ena_out;
reg    osd_ena_out;
output [2:0] osd_image;
output hde_out, vde_out, hs_out, vs_out;
reg hde_out, vde_out, hs_out, vs_out;

input wren_disp, wren_font;
input [15:0] wr_addr;
input [7:0] wr_data;

reg   [9:0] disp_x,dly1_disp_x,dly2_disp_x;
reg   [8:0] disp_y,dly1_disp_y,dly2_disp_y;

reg   dena,dly1_dena,dly2_dena,dly3_dena,dly4_dena;
reg   [7:0] dly1_letter, dly2_letter;

reg   [7:0] hde_pipe, vde_pipe, hs_pipe, vs_pipe;

wire [9:0] font_pos;
wire [8:0]  disp_pos;
wire [2:0]  osd_image;

parameter   PIPE_DELAY =  4;   // This parameter selects the number of pixel clocks to delay the VDE and sync outputs.  Only use 2 through 9.

// ****************************************************************************************************************************
// SCREEN TEXT MEMORY
// ****************************************************************************************************************************
wire [7:0] sub_wire0;
wire [7:0] letter = sub_wire0[7:0];
// w access = 'b1100000a aaaaaaaa
altsyncram altsyncram_component_osd_mem ( .wren_a ( wren_disp ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[8:0]), .address_b (disp_pos[8:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire0));

defparam
altsyncram_component_osd_mem.intended_device_family = "Cyclone II",
altsyncram_component_osd_mem.operation_mode = "DUAL_PORT",
altsyncram_component_osd_mem.width_a = 8,
altsyncram_component_osd_mem.widthad_a = 9,
altsyncram_component_osd_mem.width_b = 8,
altsyncram_component_osd_mem.widthad_b = 9,
altsyncram_component_osd_mem.lpm_type = "altsyncram",
altsyncram_component_osd_mem.width_byteena_a = 1,
altsyncram_component_osd_mem.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_mem.indata_aclr_a = "NONE",
altsyncram_component_osd_mem.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_mem.address_aclr_a = "NONE",
altsyncram_component_osd_mem.address_reg_b = "CLOCK1",
altsyncram_component_osd_mem.address_aclr_b = "NONE",
altsyncram_component_osd_mem.outdata_aclr_b = "NONE",
altsyncram_component_osd_mem.ram_block_type = "AUTO",
altsyncram_component_osd_mem.init_file = "osd_mem.mif";

// ****************************************************************************************************************************
// FONT MEMORY
// ****************************************************************************************************************************
wire [7:0] sub_wire1;
wire [7:0] char_line = sub_wire1[7:0];
// w access = 'b111aaaaa aaaaaaaa
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena),
.address_a (wr_addr[9:0]), .address_b (font_pos[9:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire1));
defparam
altsyncram_component_osd_font.intended_device_family = "Cyclone II",
altsyncram_component_osd_font.operation_mode = "DUAL_PORT",
altsyncram_component_osd_font.width_a = 8,
altsyncram_component_osd_font.widthad_a = 10,
altsyncram_component_osd_font.width_b = 8,
altsyncram_component_osd_font.widthad_b = 10,
altsyncram_component_osd_font.lpm_type = "altsyncram",
altsyncram_component_osd_font.width_byteena_a = 1,
altsyncram_component_osd_font.outdata_reg_b = "CLOCK1",
altsyncram_component_osd_font.indata_aclr_a = "NONE",
altsyncram_component_osd_font.wrcontrol_aclr_a = "NONE",
altsyncram_component_osd_font.address_aclr_a = "NONE",
altsyncram_component_osd_font.address_reg_b = "CLOCK1",
altsyncram_component_osd_font.address_aclr_b = "NONE",
altsyncram_component_osd_font.outdata_aclr_b = "NONE",
altsyncram_component_osd_font.ram_block_type = "AUTO",
altsyncram_component_osd_font.init_file = "osd_font_8x8bit_128char.mif";

// ****************************************************************************************************************************
// ****************************************************************************************************************************


//  The disp_x is the X coordinate counter.  It runs from 0 to 512 and stops there
//  The disp_y is the Y coordinate counter.  It runs from 0 to 256 and stops there

// Get the character at the current x, y position
assign disp_pos[4:0]  = disp_x[8:4] ;  // The disp_pos[4:0] is the lower address for the 32 characters for the ascii text.
assign disp_pos[8:5]  = disp_y[7:4] ;  // the disp_pos[8:5] is the upper address for the 16 lines of text

//  The result from the ascii memory component 'altsyncram_component_osd_mem'  is called letter[7:0]
//  Since disp_pos[8:0] has entered the read address, it takes 2 pixel clock cycles for the resulting letter[7:0] to come out.

//  Now, font_pos[12:0] is the read address for the memory block containing the character specified in letter[]

assign font_pos[9:3] = letter[6:0] ;       // Select the upper font address with the 7 bit letter, note the atari font has only 128 characters.
assign font_pos[2:0] = dly2_disp_y[3:1] ;  // select the font y coordinate with a 2 pixel clock DELAYED disp_y address.  [3:1] is used so that every 2 y lines are repeats

// get the pixel from the x position within the character


//  The resulting 1-bit font image at x is assigned to the OSD[0] output
//  Also, since there is an 8th bit in the ascii text memory, I use that as a second OSD[1] output color bit
assign osd_image[0] = char_line[dly2_disp_x[0]];
assign osd_image[2] = dly2_letter[7];  // Remember, it takes 2 pixel clocks for osd_img[1:0] data to be valid from read address letter[6:0]
                                       // so, if we want to use letter[7] as an upper color bit, is needs to be delayed 2 pixel clocks so it will be parallel with the osd_img[1:0] read data

always @ ( posedge clk ) begin

if (pc_ena) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

hde_pipe[0]   <= hde_in;
hde_pipe[7:1] <= hde_pipe[6:0];
hde_out       <= hde_pipe[PIPE_DELAY-2];

vde_pipe[0]   <= vde_in;
vde_pipe[7:1] <= vde_pipe[6:0];
vde_out       <= vde_pipe[PIPE_DELAY-2];

hs_pipe[0]    <= hs_in;
hs_pipe[7:1]  <= hs_pipe[6:0];
hs_out        <= hs_pipe[PIPE_DELAY-2];

vs_pipe[0]    <= vs_in;
vs_pipe[7:1]  <= vs_pipe[6:0];
vs_out        <= vs_pipe[PIPE_DELAY-2];

// **********************************************************************************************
// This OSD generator's window is only 512 pixels by 256 lines.
// Since the disp_X&Y counters are the screens X&Y coordinates, I'm using an extra most
// significant bit in the counters to determine if the OSD ena flag should be on or off.

if (disp_x[9] || disp_y[8])
dena <= 0; // When disp_x > 511 or disp_y > 255, then turn off the OSD's output enable flag
else
dena <= 1; // otherwise, turn on the OSD output enable flag

if (~vde_in)
disp_y[8:0] <= 9'b111111111; // preset the disp_y counter to max while the vertical display is disabled

else if (hde_in && ~hde_pipe[0])
begin // isolate a single event at the begining of the active display area

disp_x[9:0] <= 10'b0000000000; // clear the disp_x counter
if (!disp_y[8] | (disp_y[8:7] == 2'b11))
disp_y <= disp_y + 1; // only increment the disp_y counter if it hasn't reached it's end

end
else if (!disp_x[9])
disp_x <= disp_x + 1;  // keep on addind to the disp_x counter until it reaches it's end.

// **********************************************************************************************
// *** These delay pipes registers are explained in the 'assign's above
// **********************************************************************************************
dly1_disp_x <= disp_x;
dly2_disp_x <= dly1_disp_x;

dly1_disp_y <= disp_y;
dly2_disp_y <= dly1_disp_y;

dly1_letter <= letter;
dly2_letter <= dly1_letter;

dly1_dena   <= dena;
dly2_dena   <= dly1_dena;
dly3_dena   <= dly2_dena;
dly4_dena   <= dly3_dena;

// **********************************************************************************************
osd_ena_out  <= dly2_dena; // This is used to drive a graphics A/B switch which tells when the OSD graphics should be shown
// It needs to be delayed by the number of pixel clocks required for the above memories

end // ena

end // always@clk

endmodule

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: Berni on November 08, 2019, 06:52:41 pm
Nice work there
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 07:19:58 pm
Okay, I realised I was only passing a single bit to char_line to get the x position within the character, so now it's passing the 8-bit x position value.  But the characters were reversed in the x-axis on the screen, so I made the following change:

Code: [Select]
assign osd_image[0] = char_line[(8 - dly2_disp_x[2:0])];
... and now I'm getting this:

[attach=1]

It's printing every character twice... with some errors at the right hand edge (the last column of the last character on the screen is duplicated once).

EDIT:

Fixed the right-hand edge duplication of the last character column by adding in another delay for x:

Code: [Select]
dly3_disp_x <= dly2_disp_x;
Just got to sort the duplication of each character now...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 08, 2019, 07:45:10 pm
All sorted.  Just needed me to use bits 3:1 in x rather than 2:0 to prevent the duplication of characters - remembered you'd written something about that in the notes somewhere.  Removed the additional delay as it's no longer needed.  :-+

[attach=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 12:33:13 am
Good show  :-+ .  You see all the comments I've added to Verilog code are a must read.

Now:
1) please post the final OSD code so I can proof the working version.
2) Will you be working on the code this weekend? What time?
3) What is you current FMAX?  Which FPGA part# are you using?

Next steps:
1.  Set up the PLL to give you a 125Mhz clock.  This means we will change the PC_ENA coming out of your 'sync_generator' into a 4 bit number with a parameter in the sync generator which selects the maximum count of 4, then resets to 0.  All the attached graphics modules which had 1 wire input PC_ENA will now need PC_ENA[3:0], and all the if (PC_ENA) will change to if (PC_ENA[3:0] == 0).

What we have done here is make you system clock 5x speed, yet divide your pixel clock down to 1/5.  We chose 5 as a magic number and included feeding the PC_ENA full 4 bit phase throughout your design for 2 reasons:
     a) since when you eventually incorporate DVI inside the FPGA, the serial shift out clock is 10x the pixel clock and having the FPGA system feed a DDR IO with 125MHz, you get your 250megabit per second serializer clock with any FPGA, no fancy dedicated hardware serializers.
     b) We will be dedicating memory access time slots during those 5 points, so it is useful to have those 4 bits to allot synchronous events.  Also, the 16$ 2megabit ram lattice part is fast enough to run the entire core at 10x 250MHz, so having the 4 bit PC_ENA, you may just then change that clock divide parameter from 4 to 9 giving you 10 access slots per pixel output.

We should plan what time we are doing this if it is on the weekend...
Next step #2:  Copy and rename the OSD generator to graphics generator.  You will remove both dual port 'altsynccrams' and replace it with a new verilog module 'multiport_gpu_ram.v'.  In your new 'multiport_gpu_ram.v' you will place a single 'altsynccrams' which is 4Kilobytes, and you will make IO registers for:
5 read addressed [20 bit] (Your graphics system will handle a maximum 1 megabyte frame buffer), 5 auxiliary read commands[8bit], 5 data output[8bit], 5 passed through read addresses and 5 passed through auxiliary read commands, all synchronous to the PC_ENA[3:0] position 0, an active pixel cycle.  (WE WILL BEGIN by only specifying the IOs and their functions in the .v module and make sure you understand what everything if for as from this point on, you will be coding everything yourself...)

And IO wires for the second port on the synccram READ/WRITE port for 1 R/W address[20bit] 1 write enable[1 wire], data in[8bit], data out[8bit].  This port is for the Z80, and in future, advanced GPU accelerated processing function (blitter).

Inside the 'multiport_gpu_ram.v', you will have another module 'gpu_dual_port_ram_INTEL.v', where inside that one I'll show you how to configure Quartus' LPM_dualport ram/altsynccram.  This one verilog module, 'gpu_dual_port_ram_INTEL.v'  will change between XILINX & LATTICE as their dual-port memory function blocks have their own setup and parameters written out differently, but function identically.  Other than PLL settings, everything else we have in your GPU project will cross compile in all FPGA vendor 3 ecosystems as our verilog code is 'basement' level.  If the other compilers cant cope with this project, it is their fault...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 09, 2019, 03:57:59 am
Okay, I realised I was only passing a single bit to char_line to get the x position within the character, so now it's passing the 8-bit x position value.  But the characters were reversed in the x-axis on the screen, so I made the following change:

Code: [Select]
assign osd_image[0] = char_line[(8 - dly2_disp_x[2:0])];
Do you see the off-by-one error in this code? You'd do well to learn the bitwise operators (https://www.nandland.com/verilog/examples/example-bitwise-operators.html) and start thinking in binary. Especially the unary ~ operator, same as in C.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 04:47:49 am
Okay, I realised I was only passing a single bit to char_line to get the x position within the character, so now it's passing the 8-bit x position value.  But the characters were reversed in the x-axis on the screen, so I made the following change:

Code: [Select]
assign osd_image[0] = char_line[(8 - dly2_disp_x[2:0])];
Do you see the off-by-one error in this code? You'd do well to learn the bitwise operators (https://www.nandland.com/verilog/examples/example-bitwise-operators.html) and start thinking in binary. Especially the unary ~ operator, same as in C.
He had 2 errors, 1 was that my routine was setup for every 2 pixels to be the same, so, disp_x[2:0] should have been disp_x[3:1].  Next, the '8-' should have been '7-'.  As for the 'dly2_disp_x[]', it should actually be 'dly4_disp_x[]' as that bit selection is immediately at the final output, yet reading the font memory takes 2 clocks since it's address was presented.  I still need to proof 'nockieboy's' final code to make sure he knows whats going on and the final results aren't a fluke, or, he figured out about the 2 clock read cycle delay.

The next step after switching to 125MHz will be re-doing the ram and making a simultaneous 5 random port read at every new pixel with 1 huge chunk of 8 bit memory.  Then, a replacement of the X&Y counters for a software programmable address generator with vertical and  horizontal increment sizes and individual X&Y size scales of 1, 2, 3, 4 pixels/lines.  Then a 8 bit color map for the text character, 16 foreground and 16 background colors for each character, going through a 16 bit 16 color palette, 4 bit ARGB.  That's 12 bits 4096 RGB colors + 16 transparency levels since the text mode will sit on top of the graphics mode. (Will look like a TV studio text genlock on top of a video source) The text mode will eat 3 of the 5 access cycles.  The next one will be the 256 color graphic pixel / pixels data (sits below the varible transparency text layer) and the final memory one will be up to 16 sprites per line of video.  (These sit on top of the text layer, or below the text layer, but above the graphics layer.)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 05:52:46 am
Okay, I realised I was only passing a single bit to char_line to get the x position within the character, so now it's passing the 8-bit x position value.  But the characters were reversed in the x-axis on the screen, so I made the following change:

Code: [Select]
assign osd_image[0] = char_line[(8 - dly2_disp_x[2:0])];
Do you see the off-by-one error in this code? You'd do well to learn the bitwise operators (https://www.nandland.com/verilog/examples/example-bitwise-operators.html) and start thinking in binary. Especially the unary ~ operator, same as in C.
Yes, I agree, the unary ~ operator instead of the '8-' or '7-' would give the compiler less headaches.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: jhpadjustable on November 09, 2019, 06:06:05 am
Yes, I agree, the unary ~ operator instead of the '8-' or '7-' would give the compiler less headaches.
I dunno, ~ just seemed more natural to me. "Be the bits you wish to see in the world."

Anyway this is turning out to be an interesting lab/demonstration in pair programming. I'm going to try to sit back as a good audience member and watch quietly. :)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 08:07:08 am
Just personal notes:

The op is currently using a "Cyclone II EP2C5T144".
He has 4608 logic elements / IE 4608 register bits, or, 576 x 8 bit registers.
He has 119808 bits of ram.  Safely, he can make an 8 kilobyte ram plus another 2 kilobyte ram.

Current plan:
Configure main system memory 8 kilobytes.  Contains text, fonts, graphic data & some spare room for CPU in lo-res mode.

        - 1200bytes for 40x30 text mode, or, 4800 bytes for 80x60, or 2400 bytes for 80x30
        - Double the above bytes for text with 16 color foreground and 16 color background.
        - 1024 bytes for 128 character 8x8 font.  2048 bytes for 256 character font.  Double for 8x16 VGA quality font.

        - Additional 16 words by 16 bits for text mode palette.  (This one is small enough to fit in registers if needed.)
        - Additional 256 words by 16 bits for graphics palette.

I hope his next FPGA board is going to have at least 20 kilobytes of ram.  He will be able to make a full VGA grade 16 color text mode, or, 160x120 in 16 color mode.  160x120 in 256 color mode with 32 kilobytes.

The 16$ lattice FPGA can safely be configured to have 192 kilobytes (216kb with a few tricks) of ram (note: 320x240x256color = 75 kilobytes, 640x480x16color = 150kb, 320x240x65k truecolor = 150kb (LOL, an 8 bit computer with a full truecolor desktop background + text based window superimposed)) , plus some spare for palette and a fancy blitter with a command cache.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 10:32:39 am
Good show  :-+ .  You see all the comments I've added to Verilog code are a must read.

Absolutely.  The more comments, the better.  ;)

Now:
1) please post the final OSD code so I can proof the working version.
2) Will you be working on the code this weekend? What time?
3) What is you current FMAX?  Which FPGA part# are you using?

1) See attachment in this post
2) Likely, yes, but I can't commit to specific times with any certainty, but most likely in the evenings (so anything from 12pm-7pm your time?) - have to placate the missus somehow.  ;D
3) FMAX is currently 182.65 MHz, with a Restricted Fmax of 163.03 MHz.  Apparently limited due to high minimum pulse width violation (tch)?  It was around 200 MHz before the changes yesterday.  :-\

Next steps:
1.  Set up the PLL to give you a 125Mhz clock.  This means we will change the PC_ENA coming out of your 'sync_generator' into a 4 bit number with a parameter in the sync generator which selects the maximum count of 4, then resets to 0.  All the attached graphics modules which had 1 wire input PC_ENA will now need PC_ENA[3:0], and all the if (PC_ENA) will change to if (PC_ENA[3:0] == 0).

Is this right?

Code: [Select]
altsyncram altsyncram_component_osd_font ( .wren_a ( wren_font ), .clock0 (clk), .clock1 (clk), .clocken1 (pc_ena[3:0] == 0),
.address_a (wr_addr[9:0]), .address_b (font_pos[9:0]),
.data_a (wr_data[7:0]), .q_b (sub_wire1));

Hmm.. my monitor is saying 'out of range', quoting 19.5 KHz / 24 Hz...  either I've made a typo or there's a clock division error on PC_ENA.  If PC_ENA[3:0] == 0 is dividing  by 8, surely?  Or are my maths out?  That'll give a PC_ENA frequency of 15 MHz?

Using pc_ena[3:2] == 0 gives an output the monitor can understand, though, even though it's a little... wrong.  I think there's a memory timing issue here?

[attach=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 10:44:46 am
Yes, I agree, the unary ~ operator instead of the '8-' or '7-' would give the compiler less headaches.
I dunno, ~ just seemed more natural to me. "Be the bits you wish to see in the world."

Anyway this is turning out to be an interesting lab/demonstration in pair programming. I'm going to try to sit back as a good audience member and watch quietly. :)

Yep, I changed this earlier - I knew the 8 - ... was a complete hack, but I didn't have time to look up a better way to do it.  Using ~ didn't occur at the time!  ::)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 10:47:42 am
Ok, in sync_gen.v, the 'pc_ena' should read:
---------------------------------------------------
if (pc_ena == PIX_CLK_DIVIED) pc_ena <= 0;
else pc_ena <= pc_ena +1;
---------------------------------------------------

Make 'PIX_CLK_DIVIED' a parameter and make the default value 4.


In the ' if (pc_ena[3:2] == 0)   // once per pixel '
it should read:
if (pc_ena[2:0] == 0)     // for now, use [2:0] on your small FPGA...
^^^^ Change this one everywhere!!!

This means a single pixel advance will happen 1 clock after the 'PIX_CLK_DIVIED' value makes pc_ena[3:0] = to 0.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 10:54:04 am
Your PLL clock should have a ratio of X10/2 for 125MHz.


Your current FMAX slowdown is due to this line:
assign osd_image[0] = char_line[(~dly4_disp_x[3:1])];

Though correct, since osd_image[0] is not a register, this is just a mass of gates.  For now, you are clearing 125MHz and we will be fixing this later on.

If 'osd_image[0]' was a register, this would give an additional clock cycle for the bit selector to do select bits 7 through 0, then that output would be shifted off to the and gates and video stencil.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 10:56:26 am
Looking good again.  :)

Fmax has dropped to 162.31 MHz though.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 11:04:53 am
Your PLL clock should have a ratio of X10/2 for 125MHz.

My board's clock is 50 MHz... that would make the PLL clock 250 MHz.  :-//
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 11:06:50 am
FMAX going up and down can mean logic has been simplified out due to design change, or errors.

Now, look as my post about the 2 sub modules you need to make:

'multiport_gpu_ram.v'
and
'gpu_dual_port_ram_INTEL.v'

Begin with the ''multiport_gpu_ram.v''. 
begin the module with all the inputs and outputs labeled as you like.
Make a second clock and clock enable input for the Z80 cpu port side.  It's nice just to have it there...

Add at least these 1 parameters:
Maximum address bit (name it what you like, this one will configure the maximum size of memory)

As a strategy, keep the address ports all 20bit, even if you configure the memory to 10 bits.  You will bury the address wiring limit inside the ram module only, but you still wan to pass all 20 bits through the module whether you use them or not.


Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 11:07:54 am
Your PLL clock should have a ratio of X10/2 for 125MHz.

My board's clock is 50 MHz... that would make the PLL clock 250 MHz.  :-//
My bad...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 11:22:15 am
Next step #2:  Copy and rename the OSD generator to graphics generator.  You will remove both dual port 'altsynccrams' and replace it with a new verilog module 'multiport_gpu_ram.v'.  In your new 'multiport_gpu_ram.v' you will place a single 'altsynccrams' which is 4Kilobytes, and you will make IO registers for:

Okay, with you so far (I think) - see attached file.

5 read addressed [20 bit] (Your graphics system will handle a maximum 1 megabyte frame buffer), 5 auxiliary read commands[8bit], 5 data output[8bit], 5 passed through read addresses and 5 passed through auxiliary read commands, all synchronous to the PC_ENA[3:0] position 0, an active pixel cycle.

Need to confirm I'm reading this right - you want 5x 20-bit read address buses, 5x 8-bit read buses, 5x 8-bit data buses, and 5x pass-thru address buses and 5x pass-thru command buses???!?!

How does this fit into the dual-port paradigm?  I thought it was bad enough trying to get a dual-port memory chip, let alone a five-port one!

Any chance of some single-syllable clarification on this next step?  :o  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 11:40:26 am
You will only have a  dual port ram which will be inside the ''gpu_dual_port_ram_INTEL.v'' module.

The 'multiport_gpu_ram.v' doesn't have the 'altsyncram_component_gpu_ram' in it, it will have the ''gpu_dual_port_ram_INTEL.v'' called in it where you will unify the that dual memory's IO ports.

What the 'multiport_gpu_ram.'v contains is a 5:1 mux for all those address inputs, each selected during the pena[2:0] 5 phases 0 through 4, then feeds that 1 result address into the read address of 1 read port of the dual port ram hidden inside the ''gpu_dual_port_ram_INTEL.v'' module.  Then take the data coming out and latch that data into the correct 1 of five data output registers which are all parallel represented as 8 x 8bit outputs.

Since our ram will now run at 125MHz, having it's ENA hard wired to 1/ON, we can feed the 5 read address and latch the 5 sequential data outputs into 5 sequential registers making that ram appear to have 5 parallel read ports running at 25MHz.

With you 25MHz pixel port, you can now fetch the character memory, then fetch the font pixel, then feed that output from 1 same block of memory, instead you will now need to deal with different base memory address.  This is the next step after the 5 port ram works.

As for the second dedicated read/write port for the 8 bit CPU, that one will be a pass-through.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 11:48:12 am
Remember, except for the 8bit CPU ports, make all the outputs I specified into separate registers.
The core issue will be the MUX selector and how you parallel delay the piped function to maintain FMAX.
For now, keep it basic and remember to use a register at each point.

This means that the memory which once took 2 clocks from in to out, will now also have additional piped stages for the address and data output to consider, and also all the final results will need to land back on pena[2:0] phase 0.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 12:21:53 pm
Remember, except for the 8bit CPU ports, make all the outputs I specified into separate registers.
The core issue will be the MUX selector and how you parallel delay the piped function to maintain FMAX.
For now, keep it basic and remember to use a register at each point.

This means that the memory which once took 2 clocks from in to out, will now also have additional piped stages for the address and data output to consider, and also all the final results will need to land back on pena[2:0] phase 0.

Okay, this is where I am so far with multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena, // Pixel clock enable
input clk_host, // Host (Z80) clock input
input hc_ena, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [7:0] aux_read_0,
input [7:0] aux_read_1,
input [7:0] aux_read_2,
input [7:0] aux_read_3,
input [7:0] aux_read_4,

// address buses (pass-thru outputs)
output reg [19:0] addrPT_0,
output reg [19:0] addrPT_1,
output reg [19:0] addrPT_2,
output reg [19:0] addrPT_3,
output reg [19:0] addrPT_4,

// auxilliary read command buses (pass-thru output)
output reg [7:0] auxRdPT_0,
output reg [7:0] auxRdPT_1,
output reg [7:0] auxRdPT_2,
output reg [7:0] auxRdPT_3,
output reg [7:0] auxRdPT_4,

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BIT = 20;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
// TBC
);

always @(posedge clk) begin

if (pc_ena[2:0] == 0) begin



end // pc_ena

end // always @clk

endmodule

addr_host and data_host are host Z80 buses.

Am I going along the right lines?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 12:35:46 pm
Yes, good work, except, for the first time, you will be working differently with the 'if (pc_ena[2:0] == 0) begin'

Next, make a reg 'address_mux[]' & 'aux_read_mux[]', (use you preferred name) and make those registers run at 125Mhz, serial sequencing through the 5 pc_ena[2:0] stages.

Now, there is one little annoying problem here, a 20+8 bits in total, mux 5:1 may run too slow for 125Mhz, if so, later on as we add new inputs to the ports, you may need to make that muxing algorithm take 3 clocks instead of 1 clock.  Basically 3 parallel 2:1 muxers, those outputs feed another 2 parallel 2:1 muxers, than that last one will feed a final 2:1 muxer.



Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 12:44:04 pm
Here's the skeleton of the 5:1 mux:

Code: [Select]
always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b011 : ;
3'b100 : ;
3'b101 : ;
3'b110 : ;
3'b111 : ;

end // always @clk

Just trying to work out what to put into the case statements based on your comment about making the data available on the next p_ena[2:0] phase 0...  :o

Yes, good work, except, for the first time, you will be working differently with the 'if (pc_ena[2:0] == 0) begin'

Next, make a reg 'address_mux[]' & 'aux_read_mux[]', (use you preferred name) and make those registers run at 125Mhz, serial sequencing through the 5 pc_ena[2:0] stages.

Now, there is one little annoying problem here, a 20+8 bits in total, mux 5:1 may run too slow for 125Mhz, if so, later on as we add new inputs to the ports, you may need to make that muxing algorithm take 3 clocks instead of 1 clock.  Basically 3 parallel 2:1 muxers, those outputs feed another 2 parallel 2:1 muxers, than that last one will feed a final 2:1 muxer.

Ookay, I think that's kind of what I've started to do (running the case statement at 125 MHz).
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 12:50:20 pm
Remember, the case for pc_ena[2:0] goes 0,1,2,3,4,0...  Remember, I'm adding in the sync generator and resetting back to 0 after 4, the case of b111 for pc_ena[2:0] will never be reached.

This is not like the 8 bit font where at the left most coordinate of the screen xpos=0 the first pixel in the font's byte is the 7th bit, then 6th bit, then ect...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 12:55:38 pm
Remember, the case for pc_ena[2:0] goes 0,1,2,3,4,0...  Remember, I'm adding in the sync generator and resetting back to 0 after 4, the case of b111 for pc_ena[2:0] will never be reached.

This is not like the 8 bit font where at the left most coordinate of the screen xpos=0 the first pixel in the font's byte is the 7th bit, then 6th bit, then ect...

Ah yes,  I'd forgotten about the reset after the 5th count, hence the confusion over the bit count in the case statement.

Code: [Select]
reg address_mux[19:0], aux_read_mux[7:0];

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
// TBC
);

always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
address_mux <= address_0;
aux_read_mux <= aux_read_0;
addrPT_0 <= address_0;
end
3'b001 : begin
address_mux <= address_1;
aux_read_mux <= aux_read_1;
addrPT_1 <= address_1;
end
3'b011 : begin
address_mux <= address_2;
aux_read_mux <= aux_read_2;
addrPT_2 <= address_2;
end
3'b100 : begin
address_mux <= address_3;
aux_read_mux <= aux_read_3;
addrPT_3 <= address_3;
end
3'b101 : begin
address_mux <= address_4;
aux_read_mux <= aux_read_4;
addrPT_4 <= address_4;
end
endcase

end // always @clk
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:11:26 pm
Ok, next, we will start our  ''gpu_dual_port_ram_INTEL.v'' module.

In that module, you should have an input wire read port with address in , aux in, and pc_ena_in[2:0], data out, address out, aux out, pc_ena out[]  and host port wires and bidir.

Now, the altsyncram is obsolete, so, what you should do is within quartus, block diagram editor, double click on a blank area and insert a megafunction from the LPM_Ram_dp section.  Make sure launch wizard is selected on.

Next configure the memory for 1 read/write port and 1 read only port.  Clock/register the addresses going in and register the data coming out.  Before completing the function, sen me a snapshot of you screen example block diagram.

If it looks good, when finishing the wizard, select 'generate verilog source code/source files'.  Quartus will generate an example verilog.v file where you will copy and paste the LPM_Ram_dp/altsyncram into your ''gpu_dual_port_ram_INTEL.v'' source file where you will wire through the memories ports.
In that ''gpu_dual_port_ram_INTEL.v'', you will also pipe through the read address and aux input to the address output as well as pipe through the pc_ena_in[2:0] to an output pc_ena_out[2:0].

Don't forget about also having a MAX_RAM_ADDRESS parameter in this sub module as well.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 01:21:41 pm
Ok, next, we will start our  ''gpu_dual_port_ram_INTEL.v'' module.

In that module, you should have an input wire read port with address in , aux in, and pc_ena_in[2:0], data out, address out, aux out, pc_ena out[]  and host port wires and bidir.

Now, the altsyncram is obsolete, so, what you should do is within quartus, block diagram editor, double click on a blank area and insert a megafunction from the LPM_Ram_dp section.  Make sure launch wizard is selected on.

I don't have an LPM_Ram_dp section?

[attachimg=1]

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 01:23:39 pm
The post today was interesting, however.  Have received the Cyclone IV EasyFPGA board - has an EP4CE6 on board, with SDRAM and VGA connector (as well as PS2, which will be handy later).

Is it worth me updating to the latest Quartus software to support this chip?  I was using Quartus II 13.0sp1 as it was the last version to support the Cyclone II I was using...

Will mean a delay in proceedings whilst I get it all set up.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:27:49 pm
Ok, next, we will start our  ''gpu_dual_port_ram_INTEL.v'' module.

In that module, you should have an input wire read port with address in , aux in, and pc_ena_in[2:0], data out, address out, aux out, pc_ena out[]  and host port wires and bidir.

Now, the altsyncram is obsolete, so, what you should do is within quartus, block diagram editor, double click on a blank area and insert a megafunction from the LPM_Ram_dp section.  Make sure launch wizard is selected on.

I don't have an LPM_Ram_dp section?

[attachimg=1]

Strange, it should be in the memory compiler section.  Scroll down and expand/shrink the list.  Don't type LPM in the query box, it may just be called ram_dp, dual_port_ram...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 01:30:25 pm
Strange, it should be in the memory compiler section.  Scroll down and expand/shrink the list.  Don't type LPM in the query box, it may just be called ram_dp, dual_port_ram...

Is it because it's an older Quartus II version?  It may not have the module you're looking for?

[attachimg=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:35:59 pm
The post today was interesting, however.  Have received the Cyclone IV EasyFPGA board - has an EP4CE6 on board, with SDRAM and VGA connector (as well as PS2, which will be handy later).

Is it worth me updating to the latest Quartus software to support this chip?  I was using Quartus II 13.0sp1 as it was the last version to support the Cyclone II I was using...

Will mean a delay in proceedings whilst I get it all set up.
For now, while I'm available, let's work with what you have in hand.  You will need to transfer your project on your own time.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:36:35 pm
Strange, it should be in the memory compiler section.  Scroll down and expand/shrink the list.  Don't type LPM in the query box, it may just be called ram_dp, dual_port_ram...

Is it because it's an older Quartus II version?  It may not have the module you're looking for?

[attachimg=1]
Try the ram-2port.  It's not the quartus version, I think its that you are using a CycloneII.  It shouldn't matter as the newer FPGA will support all the memory features of the earlier ones.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 01:46:47 pm
Okay, this is where I am currently...

[attachimg=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:51:31 pm
You got it.  Now, remember the specs.

2 different clocks.
1 port read only.
other port read & write.
8 bits
13 addressees, or, 8192 words.

and clock the input controls as well as the output data.  As you change your options, the illustration will update showing you what you are creating.  send me the final image + the verilog.v example text...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 01:55:02 pm
You got it.  Now, remember the specs.

2 different clocks.
1 port read only.
other port read & write.
8 bits
13 addressees, or, 8192 words.

and clock the input controls as well as the output data.  As you change your options, the illustration will update showing you what you are creating.  send me the final image + the verilog.v example text...

Okay, using dual-clock (separate clocks for A and B ports as opposed to input and output clocks).  Will disable write on port 1 in the code when it produces it (there's no option for that in the wizard), 8-bit data is configured and I've gone for 14 addresses (16384 words), as I'm now using the EP4CE6 (still with Quartus II 13.0sp1) and have ~30KB of RAM to play with.

EDIT:

Do I need rd_en signals for either port?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 01:58:04 pm
You got it.  Now, remember the specs.

2 different clocks.
1 port read only.
other port read & write.
8 bits
13 addressees, or, 8192 words.

and clock the input controls as well as the output data.  As you change your options, the illustration will update showing you what you are creating.  send me the final image + the verilog.v example text...

Okay, using dual-clock (separate clocks for A and B ports as opposed to input and output clocks).  Will disable write on port 1 in the code when it produces it (there's no option for that in the wizard), 8-bit data is configured and I've gone for 14 addresses (16384 words), as I'm now using the EP4CE6 (still with Quartus II 13.0sp1) and have ~30KB of RAM to play with.
Unless your EP4CE6 board is ready to run now, we can still get a lot done on the CycloneII board.
Show me the final illustration from the megawizard.
Memory size wont make a difference anyways as your MAX_MEM_ADDR will overwrite what the wizard has filled into the example.v file we will just be using as a guide.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 02:03:08 pm
Unless your EP4CE6 board is ready to run now, we can still get a lot done on the CycloneII board.

The EP4CE6 is set up and outputting the video RAM contents as good as anything.  The only downside is that it has 1-bit colour output on the VGA connector.

Show me the final illustration from the megawizard.
Memory size wont make a difference anyways as your MAX_MEM_ADDR will overwrite what the wizard has filled into the example.v file we will just be using as a guide.

[attachimg=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:06:00 pm
Perfect, paste the example into your  ''gpu_dual_port_ram_INTEL.v'' module and wire the ports to the modules IO declarations.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:10:00 pm
The EP4CE6 is set up and outputting the video RAM contents as good as anything.  The only downside is that it has 1-bit colour output on the VGA connector.

Now when you are saying 'outputting the video RAM contents', you mean the OSD generator we have written here, correct?

As for the color, you should eventually be able to do something about that.  Remember to choose output pins which are in the same IO bank as the current 1 bit RGB output pins.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 02:20:07 pm
The EP4CE6 is set up and outputting the video RAM contents as good as anything.  The only downside is that it has 1-bit colour output on the VGA connector.

Now when you are saying 'outputting the video RAM contents', you mean the OSD generator we have written here, correct?

Yes.  Without all these new modules we're working on at the moment.

As for the color, you should eventually be able to do something about that.  Remember to choose output pins which are in the same IO bank as the current 1 bit RGB output pins.

Yes, just means I'll have to use my breadboard resistor ladder and VGA breakout connector again.  ::)
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 02:25:37 pm
Perfect, paste the example into your  ''gpu_dual_port_ram_INTEL.v'' module and wire the ports to the modules IO declarations.

Okay, I'm a little confused right now.  I've got a megafunction-generated file, gpu_ram.v, which isn't normally included in the project files as Quartus wants me to leave the source code alone and just use the diagram component.

I've included the gpu_ram.v file so I can get to the code, but do I need to change anything there?  Can't I just wire up the gpu_ram component in the diagram to the multiport_gpu_ram component?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:30:19 pm
The EP4CE6 is set up and outputting the video RAM contents as good as anything.  The only downside is that it has 1-bit colour output on the VGA connector.

Now when you are saying 'outputting the video RAM contents', you mean the OSD generator we have written here, correct?

Yes.  Without all these new modules we're working on at the moment.

As for the color, you should eventually be able to do something about that.  Remember to choose output pins which are in the same IO bank as the current 1 bit RGB output pins.

Yes, just means I'll have to use my breadboard resistor ladder and VGA breakout connector again.  ::)
Not, get the board's schematic as they have a series resistor feeding the analog RGB video, and add a 2x value in series of each resistor to a new IO pin each.  This will give you 2 bit color.  Add a 4x series resistor to the analog VGA on a third IO pin to get 3 bit color...

example:
currently:
Current Red IO --------220ohm-----VGAred

Cheap DAC without the R2R ladder.
Current Red IO --------220ohm--------|--- VGAred
New Red IO2 ----------440ohm--------|
New Red IO3 ----------880ohm--------|
New Red IO4 ---------1760ohm-------|

It should be simple enough to hand wire this addition...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:38:44 pm
I've included the gpu_ram.v file so I can get to the code, but do I need to change anything there?  Can't I just wire up the gpu_ram component in the diagram to the multiport_gpu_ram component?

Nope, just copy and paste it into your  ''gpu_dual_port_ram_INTEL.v'' module.  We will make some changes and pass some parameters.  Don't bother with the Quartus copyright text.  Take a look at how I did it with my altsyncram in my current OSD generator.  It was copied and pasted from the same megawizard as well.

Though you could have read intel's manual on the dual port memory types and all their controls, the wizard just helps give you the basic feature setup visually.  You can always add or remove files from your project at any time.

The ''gpu_dual_port_ram_INTEL.v'' has a few more things than just the ram and you don't want a wizard file which can edit the structure if you place the file or click on the block diagram file.


OH, BTW, obviously delete the memory symbol from your block diagram file.  Your obviously wont be using that one.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 02:42:35 pm
I've included the gpu_ram.v file so I can get to the code, but do I need to change anything there?  Can't I just wire up the gpu_ram component in the diagram to the multiport_gpu_ram component?

Nope, just copy and paste it into your  ''gpu_dual_port_ram_INTEL.v'' module.  We will make some changes and pass some parameters.  Don't bother with the Quartus copyright text.  Take a look at how I did it with my altsyncram in my current OSD generator.  It was copied and pasted from the same megawizard as well.

Though you could have read intel's manual on the dual port memory types and all their controls, the wizard just helps give you the basic feature setup visually.  You can always add or remove files from your project at any time.

The ''gpu_dual_port_ram_INTEL.v'' has a few more things than just the ram and you don't want a wizard file which can edit the structure if you place the file or click on the block diagram file.

So, this is my gpu_dual_port_ram_INTEL.v file now:

Code: [Select]
module gpu_dual_port_ram_INTEL (
input clk,
input [2:0] pc_ena_in,
input wr_en_a,
input clk_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_a,
input [7:0] data_b
);

// ****************************************************************************************************************************
// Multiport GPU RAM
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clock_a),
.wren_a (wren_a),
.address_b (address_b),
.clock1 (clock_b),
.data_b (data_b),
.wren_b (wren_b),
.address_a (address_a),
.data_a (data_a),
.q_a (sub_wire0),
.q_b (sub_wire1),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));
defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 16384,
altsyncram_component.numwords_b = 16384,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = 14,
altsyncram_component.widthad_b = 14,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

endmodule

I've just cut 'n' pasted like you said.  Seems to be a lot of clock enables for some reason?  Obviously I'm going to have to prune/sort out the IO assignments.

[attach=1]
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:55:19 pm
Fill and wire the IO ports.
Add the max address parameter, pass the parameters to the altsyncram's params.
When wiring the ram's address ports, limit the max ram adr inside the wiring there.

And, add a @(posedge clock), for the graphics, delay and pass through the read address, aux data and  pc_ena[3:0] with the same number of clock cycles as the memory's read pipe delay.

Try to think why I want you to do this.  What can these be used for.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 02:59:39 pm

Okay, this is where I am so far with multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena, // Pixel clock enable
input clk_host, // Host (Z80) clock input
input hc_ena, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [7:0] aux_read_0,
input [7:0] aux_read_1,
input [7:0] aux_read_2,
input [7:0] aux_read_3,
input [7:0] aux_read_4,

// address buses (pass-thru outputs)
output reg [19:0] addrPT_0,
output reg [19:0] addrPT_1,
output reg [19:0] addrPT_2,
output reg [19:0] addrPT_3,
output reg [19:0] addrPT_4,

// auxilliary read command buses (pass-thru output)
output reg [7:0] auxRdPT_0,
output reg [7:0] auxRdPT_1,
output reg [7:0] auxRdPT_2,
output reg [7:0] auxRdPT_3,
output reg [7:0] auxRdPT_4,

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BIT = 20;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
// TBC
);

always @(posedge clk) begin

if (pc_ena[2:0] == 0) begin



end // pc_ena

end // always @clk

endmodule

addr_host and data_host are host Z80 buses.

Am I going along the right lines?
Change the 'hc_ena' to 'hc_write_ena'.  This is for writing data.  The host can read or write just like normal ram.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 03:03:21 pm

I've just cut 'n' pasted like you said.  Seems to be a lot of clock enables for some reason?  Obviously I'm going to have to prune/sort out the IO assignments.

[attach=1]

Those have been hardwired to stay always enabled.  Leave them like that.

As for your photo of the 'multiport_gpu_ram', LOL  :-DD  You didn't have to generate a symbol of it....
That's just absurd...
You will obviously just call an instance of it in the OSG graphics generator...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 03:19:37 pm
Time's up for me today - for a while at least.  Here's the latest where I've gotten to with gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL (
// inputs
input clk,
input [2:0] pc_ena_in,
input wr_en_a,
input clk_host,
input wr_en_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_a,
input [7:0] data_in_b,
// outputs
output [7:0] data_out_a,
output [7:0] data_out_b
);

// define delay pipe registers
reg [MAX_ADDR_BIT:0] rd_addr_pipe;
reg [7:0] aux_dat_pipe;
reg [3:0] pc_ena_pipe;

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BIT = 14;

// ****************************************************************************************************************************
// Multiport GPU RAM
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (wr_en_a),
.address_b (addr_b[MAX_ADDR_BIT:0]),
.clock1 (clk_host),
.data_b (data_b),
.wren_b (wr_en_host),
.address_a (addr_a[MAX_ADDR_BIT:0]),
.data_a (data_a),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 16384,
altsyncram_component.numwords_b = 16384,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BIT,
altsyncram_component.widthad_b = MAX_ADDR_BIT,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

rd_addr_pipe  <= addr_a;

hde_pipe[7:1] <= hde_pipe[6:0];
hde_out       <= hde_pipe[PIPE_DELAY-2];

end

endmodule

Thanks for your help and patience so far!!  ;D
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 09, 2019, 03:33:54 pm
Time's up for me today - for a while at least.  Here's the latest where I've gotten to with gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL (
// inputs
input clk,
input [2:0] pc_ena_in,
input wr_en_a,          *************** Unused
input clk_host,
input wr_en_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_a,   **************** Unused
input [7:0] data_in_b,
// outputs
output [7:0] data_out_a,
output [7:0] data_out_b
);

// define delay pipe registers  ********** dont forget that these will be output ports as well
reg [19:0] rd_addr_pipe;  ************************* even though the ram may be smaller, we should still pipe through all the 20 address bits.
reg [7:0] aux_dat_pipe;
reg [3:0] pc_ena_pipe;

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BITS = 14;   ********************** this is 32k, however, for less confusion, I've changed it to BITS

// ****************************************************************************************************************************
// Multiport GPU RAM
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (wr_en_a),
.address_b (addr_b[MAX_ADDR_BITS-1:0]),           *****************   bits-1
.clock1 (clk_host),
.data_b (data_b),
.wren_b (wr_en_host),
.address_a (addr_a[MAX_ADDR_BITS-1:0]),               *********************
.data_a (data_a),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 16384,    ************** fix, needs calculation based on MAX_ADDR_BITS
altsyncram_component.numwords_b = 16384,  ***********    fix, needs calculation based on MAX_ADDR_BITS
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS,                           *************** bits-0
altsyncram_component.widthad_b = MAX_ADDR_BITS,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************

rd_addr_pipe  <= addr_a;   *********** remember, there are 2 pipe steps before you should output the rd_addr out...

hde_pipe[7:1] <= hde_pipe[6:0];                  ******** unused
hde_out       <= hde_pipe[PIPE_DELAY-2];  ******* unused

end

endmodule


Thanks for your help and patience so far!!  ;D

Comments in code with the asterisk ****...
I'll check in later tonight...
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 04:47:53 pm
And, add a @(posedge clock), for the graphics, delay and pass through the read address, aux data and  pc_ena[3:0] with the same number of clock cycles as the memory's read pipe delay.

Okay, latest code here:

Code: [Select]
module gpu_dual_port_ram_INTEL (
// inputs
input clk,
input [2:0] pc_ena_in,
input clk_host,
input wr_en_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
// outputs
output [19:0] addr_out_a,
output [7:0] data_out_a,
output [7:0] data_out_b,
output [2:0] pc_ena_out
);

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BITS = 14;

// define delay pipe registers  ********** dont forget that these will be output ports as well
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
reg [7:0] dat_out_a_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[MAX_ADDR_BITS - 1:0]),
.clock1 (clk_host),
.data_b (data_in_b),
.wren_b (wr_en_host),
.address_a (addr_a[MAX_ADDR_BITS - 1:0]),
.data_a (8'b00000000),
.q_a (data_out_a_pipe),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 2 ** MAX_ADDR_BITS,
altsyncram_component.numwords_b = 2 ** MAX_ADDR_BITS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe  <= addr_a;
addr_out_a <= rd_addr_pipe;

//dat_out_a_pipe <= data_out_a;
data_out_a <= dat_out_a_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule

Not sure I've got the delay pipes set up correctly.  data_out_a is coming straight out of the memory module, so only needs 1 (or actually probably 0?) delays.

Try to think why I want you to do this.  What can these be used for.

Well, the address pass-through and pc_ena signal will arrive at the next module at the same time as the data, so should all be valid at the same time and can thus be used to process the data?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 09, 2019, 07:37:53 pm
Code: [Select]
// define delay pipe registers  ********** dont forget that these will be output ports as well
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
reg [7:0] aux_dat_pipe;
reg [3:0] pc_ena_pipe;

Why do they need to be output ports?  I've got them outputting their contents into outputs already...?

Code: [Select]
// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe  <= addr_a;
addr_out_a <= rd_addr_pipe;

//dat_out_a_pipe <= data_out_a;
data_out_a <= dat_out_a_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

addr_out_a, data_out_a and pc_ena_out are all outputs?
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 10, 2019, 01:54:09 am
Ok, when defining a module IO, IE any signal/wires going in and out, all must be labeled as an input, or output, or bidir. (There are a few others, but not all compilers support it.)

Now, I know you have:
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;

However, if you want that reg to be exposed to you function's declared IO port, yes, it will also need to be declared as an output as well.  The one small exception is if when you declare 'INPUT', with nothing else written, it is also automatically considered a 'WIRE' even though you did not specifically write 'input wire'...

To make things easier on you, you should declare you function's IO pins like the way you have it in your first verilog program 'sync_generator.v'.  Writing out each passed wire/bus as an 'input wire', 'output wire', 'output reg', would really clean thing up for coding.  I know my 15 year old OSD code stuffed everything like a 'C' code declaration.

From now on, declare your verilog functions and update your current one like this:
Code: [Select]
module sync_generator(
// inputs
input wire clk, // base clock
input wire pix_enable, // pixel clock
input wire reset, // reset: restarts frame
// outputs
output reg hsync, // horizontal sync
output reg vsync, // vertical sync
output wire [9:0] x, // current pixel x position
output wire [9:0] y, // current pixel y position
output reg inDisplayArea // high when in display area (valid drawing area)
);


Now, for the memory data out.  That clocking and registering of that bus is something happening inside quartus' 'altsyncram' function.  From your point of view, that data signal is an 'OUTPUT' of the 'altsyncram' function which you are receiving, so, in your function, it is like you are receiving an input.   You do not want to pass the data out through another clocked register in you code.  It is already delayed 2 clocks behind the address input and putting it through a reg will now make that 3 clocks before the data bus exist your verilog function module.  In your function's declaration, just make a wire:

---------------------------------------------
output wire [7:0] data_out_a,
---------------------------------------------



However, if Quartus does not like this, you might need to declare an internal subwire. This means just above you 'altsyncram' function add this:
--------------------------------------------------
wire [7:0] sub_data_out_a;
wire [7:0] data_out_a = sub_data_out_a[7:0];
----------------------------------------------

And in the 'altsyncram' change:
         .q_a (data_out_a),
to:
         .q_a (sub_data_out_a),


In the body of your @(posedge clk), you do not need to touch the read data, it's just a wire right through from ram block to your outer function.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 10, 2019, 04:20:22 am
OT: Thread read rankings:
In FPGA, we are #35 out of 285 since the inception of EEVblog forum.
In FPGA, in threads reads for a thread which existed for only the last 3 weeks, we are #1!

Ok, I realize that FPGA is a low read count on this forum...

Cant wait for the OP to get past this synchronous parallel access memory, the programmable memory pointing address generators is the last fancy thing, everything else is nothing more than 2 lookup palette table rams a few conditional A/B stencil switches, and the project will be finished.

Adding internal DVI serializer would be version 1.5 as it needs a routed PCB.
Adding a sophisticated pixel/line/box copy/drawing blitter would be considered next level version #2, though it is just added logic.  I ensured everything else which exists, how it is written in a way which will support the plugin.

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 10, 2019, 12:49:36 pm
Ok, when defining a module IO, IE any signal/wires going in and out, all must be labeled as an input, or output, or bidir. (There are a few others, but not all compilers support it.)

Now, I know you have:
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;

However, if you want that reg to be exposed to you function's declared IO port, yes, it will also need to be declared as an output as well. 

That's my point though, it's just an internal register? The output from the altsyncram function goes into the pipe, which then goes into the output (addr_out_a, or pc_ena_out).  Do I still need to declare the pipe as an out as well then?

Now, for the memory data out.  That clocking and registering of that bus is something happening inside quartus' 'altsyncram' function.  From your point of view, that data signal is an 'OUTPUT' of the 'altsyncram' function which you are receiving, so, in your function, it is like you are receiving an input.   You do not want to pass the data out through another clocked register in you code.  It is already delayed 2 clocks behind the address input and putting it through a reg will now make that 3 clocks before the data bus exist your verilog function module.

Seems to compile with no errors as is - see latest gpu_dual_port_ram_INTEL.v code below.

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [2:0] pc_ena_in,
input clk_host,
input wr_en_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,

// registered outputs
output reg [19:0] addr_out_a,
output reg [2:0] pc_ena_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BITS = 14;

// define delay pipe registers
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[MAX_ADDR_BITS - 1:0]),
.clock1 (clk_host),
.data_b (data_in_b),
.wren_b (wr_en_host),
.address_a (addr_a[MAX_ADDR_BITS - 1:0]),
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 2 ** MAX_ADDR_BITS,
altsyncram_component.numwords_b = 2 ** MAX_ADDR_BITS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe  <= addr_a;
addr_out_a <= rd_addr_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule


In the body of your @(posedge clk), you do not need to touch the read data, it's just a wire right through from ram block to your outer function.

Okay, the code above should be updated with the changes you've pointed out.  The only area I'm not sure about are the delay pipes - they're internal to the function and don't directly output, hence my reluctance to declare them as outputs.  Let me know if this is wrong and I've misunderstood something!   :popcorn:
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 10, 2019, 01:19:27 pm
Ok, when defining a module IO, IE any signal/wires going in and out, all must be labeled as an input, or output, or bidir. (There are a few others, but not all compilers support it.)

Now, I know you have:
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;

However, if you want that reg to be exposed to you function's declared IO port, yes, it will also need to be declared as an output as well. 


That's my point though, it's just an internal register? The output from the altsyncram function goes into the pipe, which then goes into the output (addr_out_a, or pc_ena_out).  Do I still need to declare the pipe as an out as well then?

Nope, we just want the final addr_out[] reg exposed.
Note that no matter the configured ram size, we still want to pass through all 20 address bits to the output, even if the ram is configured to only 16 or 17.
Quote

Now, for the memory data out.  That clocking and registering of that bus is something happening inside quartus' 'altsyncram' function.  From your point of view, that data signal is an 'OUTPUT' of the 'altsyncram' function which you are receiving, so, in your function, it is like you are receiving an input.   You do not want to pass the data out through another clocked register in you code.  It is already delayed 2 clocks behind the address input and putting it through a reg will now make that 3 clocks before the data bus exist your verilog function module.

Seems to compile with no errors as is - see latest gpu_dual_port_ram_INTEL.v code below.

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [2:0] pc_ena_in,
input clk_host,
input wr_en_host,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,

// registered outputs
output reg [19:0] addr_out_a,
output reg [2:0] pc_ena_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BITS = 14;

// define delay pipe registers
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[MAX_ADDR_BITS - 1:0]),
.clock1 (clk_host),
.data_b (data_in_b),
.wren_b (wr_en_host),
.address_a (addr_a[MAX_ADDR_BITS - 1:0]),
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 2 ** MAX_ADDR_BITS,
altsyncram_component.numwords_b = 2 ** MAX_ADDR_BITS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe  <= addr_a;
addr_out_a <= rd_addr_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule

Ok, first little error:
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
Get rid of the '-1' here, Intel is asking for the 'Width' of the port, not what the maximum address wire number is.

Next,
---------------
   rd_addr_pipe  <= addr_a;
   addr_out_a <= rd_addr_pipe;
   
   pc_ena_pipe <= pc_ena_in;
   pc_ena_out <= pc_ena_pipe;
------------------
Correct.  You forgot, I asked for an additional auxiliary command to be piped through just like the addr_a in&out pipe.  Call it  'cmd_in' and 'cmd_out' and make it 16 bits ie. [15:0]...

The command is useful for example to direct the destination read to a specific register, or describe a specific pixel type, like bitplane, or 256 color, or, set how many pixels that read pixel should clock for, like a horizontal size/scale.  It can also be used to further expand one or more of our read ports into another 2 through 16 parallel read channels operating below the pixel 25MHz pixel clock speed.  (for example, maybe multichannel audio playback engine...)

Do not worry about all the extra wires, registers, and ports.  The compiler automatically simplifies out any un-wired logic.

Quote


In the body of your @(posedge clk), you do not need to touch the read data, it's just a wire right through from ram block to your outer function.

Okay, the code above should be updated with the changes you've pointed out.  The only area I'm not sure about are the delay pipes - they're internal to the function and don't directly output, hence my reluctance to declare them as outputs.  Let me know if this is wrong and I've misunderstood something!   :popcorn:

Wit the last addition of the cmd in&out pipe, it's time to wire this module into the multiport gpu ram.
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 10, 2019, 01:25:47 pm
Also, you can optionally change:
--------------------------
input clk_host,
input wr_en_host,
-------------------------
to:
input clk_b
input wr_en_b

If you like....

Note that it is the job of the ''multiport_gpu_ram'' to wire the 2 port memory and wire  port b into the names clk_host, wr_en_host, data_in_host, data_out_host...


remember, you labeled port 2 addr_b, data in B, data out B....

And dont forget to change the:
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
to:
reg [19:0] rd_addr_pipe;
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 10, 2019, 02:20:42 pm
Nope, we just want the final addr_out[] reg exposed.
Not that no matter the configured ram size, we still want to pass through all 20 address bits to the output, even if the ram is configured to only 16 or 17.

Marvellous, was thinking along the right lines then. 

Correct.  You forgot, I asked for an additional auxiliary command to be piped through just like the addr_a in&out pipe.  Call it  'cmd_in' and 'cmd_out' and make it 16 bits ie. [15:0]...

Yes, my memory isn't my strong point.  ::)  Sorted now.

Wit the last addition of the cmd in&out pipe, it's time to wire this module into the multiport gpu ram.

I've made a start on the multiport_gpu_ram already - lots of things going off here though, so can't focus on this at the moment - will be back to it later tonight.

Current multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [7:0] aux_read_0,
input [7:0] aux_read_1,
input [7:0] aux_read_2,
input [7:0] aux_read_3,
input [7:0] aux_read_4,

// address pass-thru bus (output)
output reg [19:0] addr_passthru,

// auxilliary read command buses (pass-thru output)
output reg [7:0] auxRdPT_0,
output reg [7:0] auxRdPT_1,
output reg [7:0] auxRdPT_2,
output reg [7:0] auxRdPT_3,
output reg [7:0] auxRdPT_4,

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bits - effectively the RAM size
parameter MAX_ADDR_BITS = 20;

reg [MAX_ADDR_BITS - 1:0] address_mux;
reg [7:0] aux_read_mux;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(address_mux),
.addr_b(),
.data_in_b(),
.addr_out_a(addr_passthru),
.pc_ena_out(),
.data_out_a(aux_read_mux),
.data_out_b()
);

always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
address_mux <= address_0;
aux_read_mux <= aux_read_0;
addr_passthru <= address_0;
end
3'b001 : begin
address_mux <= address_1;
aux_read_mux <= aux_read_1;
addr_passthru <= address_1;
end
3'b011 : begin
address_mux <= address_2;
aux_read_mux <= aux_read_2;
addr_passthru <= address_2;
end
3'b100 : begin
address_mux <= address_3;
aux_read_mux <= aux_read_3;
addr_passthru <= address_3;
end
3'b101 : begin
address_mux <= address_4;
aux_read_mux <= aux_read_4;
addr_passthru <= address_4;
end
endcase

end // always @clk

endmodule

Have merged all the pass-through addresses into one bus - otherwise can't see the point of having them in the mux at all?  addr_passthru is the passed-through address, muxed from one of the five input addresses depending on pc_ena value.  Hopefully that's right.   ???

Also, you can optionally change:
--------------------------
input clk_host,
input wr_en_host,
-------------------------
to:
input clk_b
input wr_en_b

If you like....

Note that it is the job of the ''multiport_gpu_ram'' to wire the 2 port memory and wire  port b into the names clk_host, wr_en_host, data_in_host, data_out_host...


remember, you labeled port 2 addr_b, data in B, data out B....

And dont forget to change the:
reg [MAX_ADDR_BITS - 1:0] rd_addr_pipe;
to:
reg [19:0] rd_addr_pipe;

Think I've done all that - things have gotten busy here though and I've had to cut this short, so attaching files for perusal, but there may be incomplete bits errors.

gpu_dual_port_ram_INTEL:

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [3:0] pc_ena_in,
input clk_b,
input wr_en_b,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
input [15:0] cmd_in,

// registered outputs
output reg [19:0] addr_out_a,
output reg [2:0] pc_ena_out,
output reg [15:0] cmd_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit - effectively the RAM size
parameter MAX_ADDR_BITS = 14;

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[MAX_ADDR_BITS:0]),
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[MAX_ADDR_BITS:0]),
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = 2 ** MAX_ADDR_BITS,
altsyncram_component.numwords_b = 2 ** MAX_ADDR_BITS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe <= addr_a;
addr_out_a <= rd_addr_pipe;

cmd_pipe <= cmd_in;
cmd_out <= cmd_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule
Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 10, 2019, 03:00:03 pm

I've made a start on the multiport_gpu_ram already - lots of things going off here though, so can't focus on this at the moment - will be back to it later tonight.

Current multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [7:0] aux_read_0,
input [7:0] aux_read_1,
input [7:0] aux_read_2,
input [7:0] aux_read_3,
input [7:0] aux_read_4,

// address pass-thru bus (output)
output reg [19:0] addr_passthru,

// auxilliary read command buses (pass-thru output)
output reg [7:0] auxRdPT_0,
output reg [7:0] auxRdPT_1,
output reg [7:0] auxRdPT_2,
output reg [7:0] auxRdPT_3,
output reg [7:0] auxRdPT_4,

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bits - effectively the RAM size
parameter MAX_ADDR_BITS = 20;

reg [MAX_ADDR_BITS - 1:0] address_mux;
reg [7:0] aux_read_mux;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(address_mux),
.addr_b(),
.data_in_b(),
.addr_out_a(addr_passthru),
.pc_ena_out(),
.data_out_a(aux_read_mux),
.data_out_b()
);

always @(posedge clk) begin

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
address_mux <= address_0;
aux_read_mux <= aux_read_0;
addr_passthru <= address_0;
end
3'b001 : begin
address_mux <= address_1;
aux_read_mux <= aux_read_1;
addr_passthru <= address_1;
end
3'b011 : begin
address_mux <= address_2;
aux_read_mux <= aux_read_2;
addr_passthru <= address_2;
end
3'b100 : begin
address_mux <= address_3;
aux_read_mux <= aux_read_3;
addr_passthru <= address_3;
end
3'b101 : begin
address_mux <= address_4;
aux_read_mux <= aux_read_4;
addr_passthru <= address_4;
end
endcase

end // always @clk

endmodule

Have merged all the pass-through addresses into one bus - otherwise can't see the point of having them in the mux at all?  addr_passthru is the passed-through address, muxed from one of the five input addresses depending on pc_ena value.  Hopefully that's right.   ???

Ok, from what I can see so far:
After initiating the "gpu_dual_port_ram_INTEL gpu_RAM(.....);", you need the:
---------------------------
          defparam
                           gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS ;
---------------------------
     This will pass the module multiport_gpu_ram's MAX_ADDR_BITS parameter into the gpu_dual_port_ram_INTEL's MAX_ADDR_BITS parameter.  It may be useful to pass the 'altsyncram_component.numwords_a&b' since it may be possible to allocate 24kb in the FPGA since it has that much memory, yet not 32kb.

   .addr_out_a(addr_passthru), should change to (addr_passthru_mux) and don't forget to declare it as a wire.
   .data_out_a(aux_read_mux) is also a wire.
   .pc_ena_out(pc_ena_out), is also a wire and an output

   // address pass-thru bus (output)
   output reg [19:0] addr_out,   There are 5 of these to match the read address ins 0 through 5 in.

   // auxilliary read command buses (input)
   input [7:0] aux_read_0,
   input [7:0] aux_read_1,
   input [7:0] aux_read_2,
   input [7:0] aux_read_3,
   input [7:0] aux_read_4,
 change all these to cmd_in[15:0].  (global search and replace)

   // auxilliary read command buses (pass-thru output)
   output reg [7:0] auxRdPT_0,
   output reg [7:0] auxRdPT_1,
   output reg [7:0] auxRdPT_2,
   output reg [7:0] auxRdPT_3,
   output reg [7:0] auxRdPT_4,
change these to cmd_out[15:0]

reg [MAX_ADDR_BITS - 1:0] address_mux; 
change to reg [19:0] address_mux; 

reg [7:0] aux_read_mux;
change to reg [15:0] cmd_read_mux  (global search and replace)

Your missing a few of the new ports for 'gpu_dual_port_ram_INTEL gpu_RAM(...);'

Almost done, next you will resort the read ram contents, the piped through address & cmds into their output registers and sync those to your new delayed 'pc_ena_out[3:0]' coming out of the Intel ram module.

Note that we forgot to wire through the 'pc_ena_out[3:0]' coming out of the Intel ram module thought to the multiport_gpu_ram ( ...) ports, so that the rest of our graphics pipe heading to the output pins will incorporate the delay shift generated by the memory.  (Though we can work around this through sophisticated re-syncing all the ram outputs back to the next pc_ena_in==0 cycle, this ena signal in the FPGA is beginning to drive so much logic limiting our FMAX, this is an opportune point to D-clock pipe the signals for the second half of our graphics pipe.)

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: nockieboy on November 11, 2019, 10:12:38 am
After initiating the "gpu_dual_port_ram_INTEL gpu_RAM(.....);", you need the:
---------------------------
          defparam
                           gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS ;
---------------------------
     This will pass the module multiport_gpu_ram's MAX_ADDR_BITS parameter into the gpu_dual_port_ram_INTEL's MAX_ADDR_BITS parameter.  It may be useful to pass the 'altsyncram_component.numwords_a&b' since it may be possible to allocate 24kb in the FPGA since it has that much memory, yet not 32kb.

Okay, stupid question - altsyncram specifies altsyncram_component.numwords_a (and b) - I had 2 ** MAXSIZE in there, but if they're the number of words, I'll need to divide that by word size (8), otherwise the RAM will (try to be) 8 times larger than what I think I'm specifying?

So, for example, this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = 2 ** MAX_ADDR_BITS;


..needs to be this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = (2 ** MAX_ADDR_BITS) / 8;


??

   // address pass-thru bus (output)
   output reg [19:0] addr_out,   There are 5 of these to match the read address ins 0 through 5 in.

   // auxilliary read command buses (input)
   input [7:0] aux_read_0,
   input [7:0] aux_read_1,
   input [7:0] aux_read_2,
   input [7:0] aux_read_3,
   input [7:0] aux_read_4,
 change all these to cmd_in[15:0].  (global search and replace)

   // auxilliary read command buses (pass-thru output)
   output reg [7:0] auxRdPT_0,
   output reg [7:0] auxRdPT_1,
   output reg [7:0] auxRdPT_2,
   output reg [7:0] auxRdPT_3,
   output reg [7:0] auxRdPT_4,
change these to cmd_out[15:0]

reg [MAX_ADDR_BITS - 1:0] address_mux; 
change to reg [19:0] address_mux; 

reg [7:0] aux_read_mux;
change to reg [15:0] cmd_read_mux  (global search and replace)

These should all be present and correct now... I think.  Got a little confused earlier with all the changes, so I'll be double-checking it all, but I think it would benefit from a close look.

Your missing a few of the new ports for 'gpu_dual_port_ram_INTEL gpu_RAM(...);'

They should all be present and correct now.  :D

Almost done, next you will resort the read ram contents, the piped through address & cmds into their output registers and sync those to your new delayed 'pc_ena_out[3:0]' coming out of the Intel ram module.

Have made a bit of a start on this - the 5:1 mux code is modified according to my present understanding.  The read address is passed through to the ram module, the pass-through address is passed out to the appropriate address bus according to the current mux step, as is the data read from memory.

I'm a little unsure about the command bus, though.  It's piped into the memory via cmd_read_mux, but that seems like an unnecessary step as I only have one cmd_in bus (and one cmd_out bus) - should these be increased to 5 as well?  It's possible I've misunderstood your instruction to 'change all these to cmd_in[15:0]'...  ???

Note that we forgot to wire through the 'pc_ena_out[3:0]' coming out of the Intel ram module thought to the multiport_gpu_ram ( ...) ports, so that the rest of our graphics pipe heading to the output pins will incorporate the delay shift generated by the memory.  (Though we can work around this through sophisticated re-syncing all the ram outputs back to the next pc_ena_in==0 cycle, this ena signal in the FPGA is beginning to drive so much logic limiting our FMAX, this is an opportune point to D-clock pipe the signals for the second half of our graphics pipe.)

Okay, I think I understand - but pc_ena passes through the gpu_dual_port_ram_INTEL module via a register pipe, which will fulfil the need to D-clock the signal, right?

gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [3:0] pc_ena_in,
input clk_b,
input wr_en_b,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
input [15:0] cmd_in,

// registered outputs
output reg [19:0] addr_out_a,
output reg [3:0] pc_ena_out,
output reg [15:0] cmd_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit
parameter MAX_ADDR_BITS = 14;

// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = 2 ** MAX_ADDR_BITS;

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[MAX_ADDR_BITS:0]),
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[MAX_ADDR_BITS:0]),
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = WORDS,
altsyncram_component.numwords_b = WORDS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = MAX_ADDR_BITS - 1,
altsyncram_component.widthad_b = MAX_ADDR_BITS - 1,
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe <= addr_a;
addr_out_a <= rd_addr_pipe;

cmd_pipe <= cmd_in;
cmd_out <= cmd_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule


multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena_in, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [15:0] cmd_in,

// outputs
output wire [3:0] pc_ena_out,

// address pass-thru bus (output)
output reg [19:0] addr_passthru_0,
output reg [19:0] addr_passthru_1,
output reg [19:0] addr_passthru_2,
output reg [19:0] addr_passthru_3,
output reg [19:0] addr_passthru_4,
output reg [19:0] addr_host_passthru,

// auxilliary read command bus (pass-thru output)
output reg [15:0] cmd_out,

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bits - effectively the RAM size
parameter MAX_ADDR_BITS = 20;

reg [19:0] address_mux;
reg [15:0] cmd_read_mux;
wire [19:0] addr_passthru_mux;
wire [7:0] data_mux;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(address_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_read_mux),
.addr_out_a(addr_passthru_mux),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_out),
.data_out_a(data_mux),
.data_out_b()
);

// pass MAX_ADDR_BITS into the gpu_RAM instance
defparam gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS;

// set none-default word size for the RAM (24 KB)
defparam gpu_RAM.WORDS = 24576;  // ************** should this be divided by 8?

always @(posedge clk) begin

// route non-muxed pass-throughs
cmd_read_mux <= cmd_in;

// perform 5:1 mux for all inputs to the dual-port RAM
case (pc_ena[2:0])
3'b000 : begin
address_mux <= address_0;
addr_passthru_0 <= addr_passthru_mux;
dataOUT_0 <= data_mux;
end
3'b001 : begin
address_mux <= address_1;
addr_passthru_1 <= addr_passthru_mux;
dataOUT_1 <= data_mux;
end
3'b011 : begin
address_mux <= address_2;
addr_passthru_2 <= addr_passthru_mux;
dataOUT_2 <= data_mux;
end
3'b100 : begin
address_mux <= address_3;
addr_passthru_3 <= addr_passthru_mux;
dataOUT_3 <= data_mux;
end
3'b101 : begin
address_mux <= address_4;
addr_passthru_4 <= addr_passthru_mux;
dataOUT_4 <= data_mux;
end
endcase

end // always @clk

endmodule

Title: Re: FPGA VGA Controller for 8-bit computer
Post by: BrianHG on November 11, 2019, 10:51:26 am
After initiating the "gpu_dual_port_ram_INTEL gpu_RAM(.....);", you need the:
---------------------------
          defparam
                           gpu_RAM.MAX_ADDR_BITS = MAX_ADDR_BITS ;
---------------------------
     This will pass the module multiport_gpu_ram's MAX_ADDR_BITS parameter into the gpu_dual_port_ram_INTEL's MAX_ADDR_BITS parameter.  It may be useful to pass the 'altsyncram_component.numwords_a&b' since it may be possible to allocate 24kb in the FPGA since it has that much memory, yet not 32kb.

Okay, stupid question - altsyncram specifies altsyncram_component.numwords_a (and b) - I had 2 ** MAXSIZE in there, but if they're the number of words, I'll need to divide that by word size (8), otherwise the RAM will (try to be) 8 times larger than what I think I'm specifying?

So, for example, this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = 2 ** MAX_ADDR_BITS;


..needs to be this:

Code: [Select]
// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on MAX_ADDR_BITS if not otherwise specified
parameter WORDS = (2 ** MAX_ADDR_BITS) / 8;


??
Nope, the first one not the divide by 8.
Quote

   // address pass-thru bus (output)
   output reg [19:0] addr_out,   There are 5 of these to match the read address ins 0 through 5 in.

   // auxilliary read command buses (input)
   input [7:0] aux_read_0,
   input [7:0] aux_read_1,
   input [7:0] aux_read_2,
   input [7:0] aux_read_3,
   input [7:0] aux_read_4,
 change all these to cmd_in[15:0].  (global search and replace)

   // auxilliary read command buses (pass-thru output)
   output reg [7:0] auxRdPT_0,
   output reg [7:0] auxRdPT_1,
   output reg [7:0] auxRdPT_2,
   output reg [7:0] auxRdPT_3,
   output reg [7:0] auxRdPT_4,
change these to cmd_out[15:0]

reg [MAX_ADDR_BITS - 1:0] address_mux; 
change to reg [19:0] address_mux; 

reg [7:0] aux_read_mux;
change to reg [15:0] cmd_read_mux  (global search and replace)

These should all be present and correct now... I think.  Got a little confused earlier with all the changes, so I'll be double-checking it all, but I think it would benefit from a close look.

Your missing a few of the new ports for 'gpu_dual_port_ram_INTEL gpu_RAM(...);'

They should all be present and correct now.  :D

Almost done, next you will resort the read ram contents, the piped through address & cmds into their output registers and sync those to your new delayed 'pc_ena_out[3:0]' coming out of the Intel ram module.

Have made a bit of a start on this - the 5:1 mux code is modified according to my present understanding.  The read address is passed through to the ram module, the pass-through address is passed out to the appropriate address bus according to the current mux step, as is the data read from memory.

I'm a little unsure about the command bus, though.  It's piped into the memory via cmd_read_mux, but that seems like an unnecessary step as I only have one cmd_in bus (and one cmd_out bus) - should these be increased to 5 as well?  It's possible I've misunderstood your instruction to 'change all these to cmd_in[15:0]'...  ???
The 1 command bus is inside the INTEL dual port ram module.  Just like the read addresses, it should be piped through in a single file fashion.
On the multiport GPU ram, there should be 5 groups going in, grouped with the 5 read addresses going in, and 5 grouped 16 bit cmd coming out, just like the 5 read datas, 5 read addresses, 5 cmd_outs, all in parallel...
 
Quote

Note that we forgot to wire through the 'pc_ena_out[3:0]' coming out of the Intel ram module thought to the multiport_gpu_ram ( ...) ports, so that the rest of our graphics pipe heading to the output pins will incorporate the delay shift generated by the memory.  (Though we can work around this through sophisticated re-syncing all the ram outputs back to the next pc_ena_in==0 cycle, this ena signal in the FPGA is beginning to drive so much logic limiting our FMAX, this is an opportune point to D-clock pipe the signals for the second half of our graphics pipe.)

Okay, I think I understand - but pc_ena passes through the gpu_dual_port_ram_INTEL module via a register pipe, which will fulfil the need to D-clock the signal, right?
Pipe it just like a read address and the auxiliary 16 bit cmd, delayed by two 125MHz clocks.  The difference is when it comes back through the GPU multiport ram module, there it is not muxed, it's just wired through without delay.
Quote


gpu_dual_port_ram_INTEL.v:

Code: [Select]
module gpu_dual_port_ram_INTEL (

// inputs
input clk,
input [3:0] pc_ena_in,
input clk_b,
input wr_en_b,
input [19:0] addr_a,
input [19:0] addr_b,
input [7:0] data_in_b,
input [15:0] cmd_in,

// registered outputs
output reg [19:0] addr_out_a,
output reg [3:0] pc_ena_out,
output reg [15:0] cmd_out,

// direct outputs
output wire [7:0] data_out_a,
output wire [7:0] data_out_b

);

// define the maximum address bit
parameter ADDR_SIZE = 14;   **********************************************************

// define the memory size (number of words) - this allows RAM sizes other than multiples of 2
// but defaults to power-of-two sizing based on ADDR_SIZE if not otherwise specified
parameter NUM_WORDS = 2 ** ADDR_SIZE;   **********************************************************

// define delay pipe registers
reg [19:0] rd_addr_pipe_a;
reg [15:0] cmd_pipe;
reg [3:0] pc_ena_pipe;

// ****************************************************************************************************************************
// Dual-port GPU RAM
//
// Port A - read only by GPU
// Port B - read/writeable by host system
// Data buses - 8 bits / 1 byte wide
// Address buses - MAX_ADDR_BITS wide (14 bits default)
// Memory word size - 2^MAX_ADDR_BITS (16384 bytes default)
// ****************************************************************************************************************************
altsyncram altsyncram_component (
.clock0 (clk),
.wren_a (1'b1),
.address_b (addr_b[ADDR_SIZE-1:0]),   ***************************************************************
.clock1 (clk_b),
.data_b (data_in_b),
.wren_b (wr_en_b),
.address_a (addr_a[ADDR_SIZE-1:0]),   ****************************************************************************
.data_a (8'b00000000),
.q_a (data_out_a),
.q_b (data_out_b),
.aclr0 (1'b0),
.aclr1 (1'b0),
.addressstall_a (1'b0),
.addressstall_b (1'b0),
.byteena_a (1'b1),
.byteena_b (1'b1),
.clocken0 (1'b1),
.clocken1 (1'b1),
.clocken2 (1'b1),
.clocken3 (1'b1),
.eccstatus (),
.rden_a (1'b1),
.rden_b (1'b1));

defparam
altsyncram_component.address_reg_b = "CLOCK1",
altsyncram_component.clock_enable_input_a = "BYPASS",
altsyncram_component.clock_enable_input_b = "BYPASS",
altsyncram_component.clock_enable_output_a = "BYPASS",
altsyncram_component.clock_enable_output_b = "BYPASS",
altsyncram_component.indata_reg_b = "CLOCK1",
altsyncram_component.init_file = "../osd_mem.mif",
altsyncram_component.intended_device_family = "Cyclone IV E",
altsyncram_component.lpm_type = "altsyncram",
altsyncram_component.numwords_a = NUM_WORDS,
altsyncram_component.numwords_b = NUM_WORDS,
altsyncram_component.operation_mode = "BIDIR_DUAL_PORT",
altsyncram_component.outdata_aclr_a = "NONE",
altsyncram_component.outdata_aclr_b = "NONE",
altsyncram_component.outdata_reg_a = "CLOCK0",
altsyncram_component.outdata_reg_b = "CLOCK1",
altsyncram_component.power_up_uninitialized = "FALSE",
altsyncram_component.read_during_write_mode_port_a = "OLD_DATA",they're
altsyncram_component.read_during_write_mode_port_b = "OLD_DATA",
altsyncram_component.widthad_a = ADDR_SIZE,  ********************************************************************
altsyncram_component.widthad_b = ADDR_SIZE,  *********************************************************************
altsyncram_component.width_a = 8,
altsyncram_component.width_b = 8,
altsyncram_component.width_byteena_a = 1,
altsyncram_component.width_byteena_b = 1,
altsyncram_component.wrcontrol_wraddress_reg_b = "CLOCK1";

// ****************************************************************************************************************************

always @(posedge clk) begin

// **************************************************************************************************************************
// *** Create a serial pipe where the PIPE_DELAY parameter selects the pixel count delay for the xxx_in to the xxx_out ports
// **************************************************************************************************************************
rd_addr_pipe <= addr_a;
addr_out_a <= rd_addr_pipe;

cmd_pipe <= cmd_in;
cmd_out <= cmd_pipe;

pc_ena_pipe <= pc_ena_in;
pc_ena_out <= pc_ena_pipe;
// **************************************************************************************************************************

end

endmodule


multiport_gpu_ram.v:

Code: [Select]
module multiport_gpu_ram (

input clk, // Primary clk input (125 MHz)
input [3:0] pc_ena_in, // Pixel clock enable
input clk_b, // Host (Z80) clock input
input write_ena_b, // Host (Z80) clock enable

// address buses (input)
input [19:0] address_0,
input [19:0] address_1,
input [19:0] address_2,
input [19:0] address_3,
input [19:0] address_4,
input [19:0] addr_host,

// auxilliary read command buses (input)
input [15:0] cmd_in,

// outputs
output wire [3:0] pc_ena_out,

// address pass-thru bus (output)
output reg [19:0] addr_passthru_0,
output reg [19:0] addr_passthru_1,
output reg [19:0] addr_passthru_2,
output reg [19:0] addr_passthru_3,
output reg [19:0] addr_passthru_4,
output reg [19:0] addr_host_passthru,

// auxilliary read command bus (pass-thru output)
output reg [15:0] cmd_out,  *************************************  NEED 5x cmd_out0/1/2/3/4 and we also need 5x cmd_in#

// data buses (output)
output reg [7:0] dataOUT_0,
output reg [7:0] dataOUT_1,
output reg [7:0] dataOUT_2,
output reg [7:0] dataOUT_3,
output reg [7:0] dataOUT_4,
output [7:0] data_host

);

// dual-port GPU RAM handler

// define the maximum address bits - effectively the RAM size
parameter ADDR_SIZE = 14;                 *******************************************
parameter NUM_WORDS = 2 ** ADDR_SIZE ;                 *******************************************

reg [19:0] address_mux;
reg [15:0] cmd_read_mux;
wire [19:0] addr_passthru_mux;
wire [7:0] data_mux;

// create a GPU RAM instance
gpu_dual_port_ram_INTEL gpu_RAM(
.clk(clk),
.pc_ena_in(pc_ena_in),
.clk_b(clk_b),
.wr_en_b(wr_en_b),
.addr_a(address_mux),
.addr_b(),
.data_in_b(),
.cmd_in(cmd_read_mux),
.addr_out_a(addr_passthru_mux),
.pc_ena_out(pc_ena_out),
.cmd_out(cmd_out),
.data_out_a(data_mux),
.da