FPGA VGA Controller for 8-bit computer

#50 Reply
Posted by james_s on 19 Oct, 2019 20:15
Quote from: nockieboy on 19 Oct, 2019 19:49
Well, I'm building this GPU for a hardware Z80 system. Whilst I appreciate that the FPGA could do everything my hardware system does, that's not the point of this little project. Perhaps an FPGA is total overkill, though.

I've been looking a little more closely at the Spartan LX45 and it's looking less and less likely that I'll be able to use it, even if I could justify the cost. I think BGA is a step too far for my soldering skills and equipment at this stage, and the sheer number of pins on those FPGAs will stretch my DipTrace licence past breaking point. One way around it is to just use one the cheap development boards and plug that straight into my 'GPU card'. Limited IO, but with the FPGA, SDRAM, clock and programming circuitry done for me...

The LX45 is a very nice, very large (by hobby standards) FPGA, I would say that for what you are describing it is massively overkill. Just to put things in perspective, an entire 8 bit computer including the CPU (Grant's Multicomp for example) or any of the bronze age arcade games I've recreated fit comfortably within the ancient and tiny (by current standards) EP2C5T144C8 FPGAs which you can get ready to go on a little dev board for around $12. There is lots of middle ground too, if you want Xilinx the LX9 is an inexpensive and very capable part you can get in a reasonably hobbyist friendtly TQFP package. You can also interface a development board directly to your existing Z80 project and use that for prototyping and then once you have a design implemented that you are satisfied with you can look at the consumed resources and select a less expensive FPGA sufficient for your design and build more tidy custom hardware. One of the really awesome things about FPGAs is that it's very easy to make large portions of the code very portable, a lot of my learning was accomplished by porting projects I found from one platform to another.

#51 Reply
Posted by rstofer on 19 Oct, 2019 21:10
It might be worthwhile to compare the capabilities of the XL45 and any of the Artix 7 devices, perhhaps even something as small as the 50T
https://www.xilinx.com/support/documentation/selection-guides/7-series-product-selection-guide.pdf
https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

I think even the Artix 7 50T has more resources than the XL45 and certainly the 100T which is becoming quite common completely dwarfes the XL45. The 35T is somewhat smaller than the XL45 if that matters.

None of that resources stuff matters much when compared to the fact that the old Spartan 6 devices are not supported by Vivado and the ISE 14.7 version is the last release of ISE and it is no longer supported. Yes, I still use it for my Spartan 3 projects but, for new stuff, I'm using the Artix 7 chips and Vivado.

#52 Reply
Posted by rstofer on 19 Oct, 2019 21:36
Just for a checkpoint, I have the Pacman game running on a Spartan3E 1200 chip
https://www.xilinx.com/support/documentation/data_sheets/ds312.pdf
This is much smaller than the XL45
https://www.xilinx.com/support/documentation/data_sheets/ds160.pdf

Pacman rolls around like a BB in a bowling alley on that Spartan 3E in terms of logic, see attached PDF

There is a Z80 core, the graphics display and all of the IO pins and it fits easily in the 3E so it will darn sure fit in the XL45
Note, however, that over 2/3 of BlockRAM is used. This is typical of most cores; we want a lot of RAM. There a lot going on with the graphics and PROMs.

This is the board I chose to use. It's pricey for what it was but at the time it was high end.
https://store.digilentinc.com/nexys-2-spartan-3e-fpga-trainer-board-retired-see-nexys-4-ddr/

PACMAN.pdf

#53 Reply
Posted by nockieboy on 19 Oct, 2019 22:12
Quote from: jhpadjustable on 19 Oct, 2019 20:03
Excellent. That gives me a touchstone to explain some things by analogy. By the way, you didn't have anything important to do this weekend, did you? https://archive.org/details/Amiga_Hardware_Reference_Manual_1985_Commodore

Not now...

Quote from: jhpadjustable on 19 Oct, 2019 20:03
Just for interest, you may be aware there are HDL implementations of the Amiga OCS that might be imported directly into your design, with mimimal modifications. A Z80 driving the OCS chip set could make for a pretty wild experiment, even at 1/2 the memory bandwidth.

Ooh, that sounds like the kind of Frankenstein experiment that just might bear some interesting and very warped fruit... I wouldn't profess to have a tenth the skills required to do something like that, though. Whilst I cut my teeth programming on the Amiga, it was with Blitz Basic more than anything else and I never got anywhere near the metal, as it were.

Quote from: jhpadjustable on 19 Oct, 2019 20:03
That would be the memory-mapped window into frame buffer RAM I mistakenly believed you disfavored, but yes, I do think it's a very good idea...

Well, I disfavoured the idea of a full frame buffer in the Z80's memory space, specifically. It would have been prohibitively small and/or meant I would have to modify my (basic) MMU and system bus to accommodate multiple masters on the address and data buses. That was a little further down the rabbit hole than I care to venture at the moment. However, a simple shadowing of a small amount of memory (e.g. 4 KB as you suggested) into the FPGA would be easy enough to implement and require no modifications to the system buses at all.

Quote from: jhpadjustable on 19 Oct, 2019 20:03
I'd also be sure it services reads as well.

That would require some modification of the host system, but it wouldn't be impossible I guess. The easiest way for me to do that would be to keep one of the memory sockets empty, which would provide a 512 KB window that the FPGA could intercept the reads and writes to.. Half a meg would be more than big enough for a frame buffer for the resolutions I'm wanting to use, and would allow double-buffering on all the major screen modes if I was able to substitute that window for 512 KB of RAM on the FPGA via SDRAM, for example... hmmm... Tempting...

Using the SDRAM on the FPGA seems to me and my inexperienced mind to be a lot more complicated than using dual-port RAM in the FPGA itself. Can anyone convince me otherwise? If I can make use of the SDRAM easily, without any major drawbacks or slowdowns, then I needn't worry about using a big, expensive FPGA with lots of internal RAM...

Quote from: jhpadjustable on 19 Oct, 2019 20:03
But I was actually proposing that you memory- (or I/O-) map the control registers, using the system bus control/address/data signals to more or less directly read/write registers inside the FPGA and control the video hardware, analogous to the common idiom of using 74377 or similar ICs with suitable decoding as byte-wide input-output ports.

I thought that was what I was intending to do previously? My plan was to use IO port calls to address registers in the FPGA?

Quote from: jhpadjustable on 19 Oct, 2019 20:03
Quote
DipTrace
You could always switch horses to KiCAD. #justsayin

Oh I've had this discussion with others before. When I first dipped my toes in the waters of electronics a couple of years ago, to start looking at documenting the tweaks and changes I was making to Grant's simple breadboard computer designs, I tried using KiCAD. The experience left me comparing it to older versions of Blender. If you don't know your 3D graphics software, Blender is an open-source 3D software program that can produce professional-quality 3D models, renders, even films and games. The power it has under the hood is amazing. But the UI cripples it and puts up an almost vertical learning curve before you even try to do anything.

KiCAD is like that for me - I just cannot get on with it, can't commit the time to learn it. I found myself spending more time looking up how to do simple things and create patterns for components that weren't there, that I searched around and found DipTrace. Yes, I'm limited to 500 pins per schematic for DipTrace (I'm on a free 'hobby' licence), but it's so damn easy to use and makes wonderful PCB designs with none of the complexity of KiCAD that I've put up with the pin limit and it's only become a problem now, when I'm looking at FPGAs that potentially use up all that pin count with one part. I've actually started using EasyEDA now for this GPU design and whilst setting up the schematic is okay, I'm dreading designing the PCB...

Quote from: james_s on 19 Oct, 2019 20:15
The LX45 is a very nice, very large (by hobby standards) FPGA, I would say that for what you are describing it is massively overkill. Just to put things in perspective, an entire 8 bit computer including the CPU (Grant's Multicomp for example) or any of the bronze age arcade games I've recreated fit comfortably within the ancient and tiny (by current standards) EP2C5T144C8 FPGAs which you can get ready to go on a little dev board for around $12.

Well, the reason I'm looking at something as overkill as the LX45 is primarily for the RAM size. I need something large enough to hold a frame buffer internally, so that it can be dual-ported and spit out a pixel stream whilst it's being written to (allowing for timed writes so as not to cause screen tearing, obviously). Now, if someone can tell me that the FPGA could easily use an attached SDRAM chip for the frame buffer and be able to form a coherent pixel stream without slowing writes down so much that performance would suffer over the internal 'dual-port' design, then I'd be all for getting a smaller, older, cheaper FPGA to do the job. For one, it'd be hand-solderable and easier to integrate into a custom card for my system and two, it'd probably fit within my 500-pin design limit so I could keep the design within software I am happy working with, so I can't over-emphasise how important a design 'win' an SDRAM frame buffer would be.

As it is, anything in a BGA form-factor will either require me to spend a fortune getting it assembled at point of manufacture and limit my choices to chips the PCB manufacturer offers, or restrict me to making an FPGA dev board socketable into my custom PCB for my system, which will cause headaches as it probably won't fit within my stacking PCB form-factor without modification (removal of power sockets from the dev board etc).

Quote from: james_s on 19 Oct, 2019 20:15
There is lots of middle ground too, if you want Xilinx the LX9 is an inexpensive and very capable part you can get in a reasonably hobbyist friendtly TQFP package.

Yes, I've been looking at this as the best Spartan 6 I can get in a TQFP package. Will mean lower-res / less colours, but it'll still be within my design brief.

Quote from: james_s on 19 Oct, 2019 20:15
You can also interface a development board directly to your existing Z80 project and use that for prototyping and then once you have a design implemented that you are satisfied with you can look at the consumed resources and select a less expensive FPGA sufficient for your design and build more tidy custom hardware. One of the really awesome things about FPGAs is that it's very easy to make large portions of the code very portable, a lot of my learning was accomplished by porting projects I found from one platform to another.

Well, that's what I'll be doing initially - I'm waiting on an LX16 dev board to arrive from overseas. I'll be connecting that up to my system via jumpers and a 74LVC-infested breadboard whilst I test the system and develop the VHDL for it. Once it's done, like you say, I'm not restricted to only using it on LX16s...

Quote from: rstofer on 19 Oct, 2019 21:10
I think even the Artix 7 50T has more resources than the XL45 and certainly the 100T which is becoming quite common completely dwarfes the XL45. The 35T is somewhat smaller than the XL45 if that matters.

None of that resources stuff matters much when compared to the fact that the old Spartan 6 devices are not supported by Vivado and the ISE 14.7 version is the last release of ISE and it is no longer supported. Yes, I still use it for my Spartan 3 projects but, for new stuff, I'm using the Artix 7 chips and Vivado.

I haven't looked at the Artix range - it's handy to note that Spartan 6's aren't supported in the latest development software, though - thanks rstofer.

#54 Reply
Posted by nockieboy on 19 Oct, 2019 22:19
Quote from: legacy on 19 Oct, 2019 21:50

SP2-150, for GameBOY ADV

This one is 5V tollerant, and it's interfaced with a GameBOY ADV.

Is the FPGA 5v tolerant, or have you done the level conversions yourself? I can't see much in the way of voltage conversion on those boards?

This was another thing that struck me when I start drawing up the schematic with the LX9 that I was unaware of before - it actually requires 3.3v, 1.8v and 1.2v (or similar, I'm running from memory)?

So I can't just run all the address, data and control lines from the Z80 side through some 74LVC components to translate them down to 3.3v?

#55 Reply
Posted by legacy on 19 Oct, 2019 22:42
Quote from: nockieboy on 19 Oct, 2019 22:19
Is the FPGA 5v tolerant, or have you done the level conversions yourself?

It was made this way because the FPGA is already 5V tolerant. Certain SP2 were, but starting from SP3 they are all 3.3V. The purpose was to simplify stuff, especially the PCB, which is very small.

Voltage level shifters are fine (2), but ... well, that stuff runs at 50Mhz, not so high speed, but enough to cause signal integrity issues(1), and thus requires more cure. There were a couple of bugs with my first batch of PCB. Then I redesigned them.

(1) Ok, today I own a decent MSO, debugging that stuff is easier than years ago.
(2) you can even use a 5V tollerant CPLD as voltage-adapter, it's a good trick, and it might help routing the the PCB. In my case, I used it to adapt 32bit data + 32bit address + 9bit control from a 5V CPU to 3.3V SP3-500. It was a must, because I realy *HATE* every 68020 softcore, hence I wanted an ASIC chip. Now I am using Coldfire v1, and they are already 3.3V. Easy life, less problems, more fun.

#56 Reply
Posted by james_s on 19 Oct, 2019 23:22
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.

#57 Reply
Posted by asmi on 20 Oct, 2019 02:55
Quote from: james_s on 19 Oct, 2019 23:22
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.
Video cards are also known for requiring (and making good use of) very high memory bandwidth. But it just isn't possible in anything other than BGA packages, as it requires a lot of pins - even the very modest (by video card standards) 64-bit interface requires close to 100 pins to implement, but 128/256 bit interfaces are more popular, and even 512 bit ones are not unheard of. Moreover, most GPUs are modular and each submodule has it's own memory controller, so this requires even more pins to implement as each controller needs to have it's own set of address/command pins. For example RTX 2080 Ti has 11 memory controllers (it's actually the cut-down version of a die which has 12 memory controllers)!
Also since BGA packages provide signal integrity that is far superior to any leaded packages, DDR2 and up memory chips are only available in BGA packages.

Now, I personally love reasonable-pitch BGAs and always choose them over any other packages, but I know some people have some sort of anti-BGA religion, which is why I'm bringing this up.

#58 Reply
Posted by james_s on 20 Oct, 2019 03:01
Are you forgetting that this is being built to work with an 8 bit Z80? I don't think it needs to be anything super fancy, there's no point in making the video card powerful enough to deal with more data than the host CPU is capable of sending to it. We're not talking about modern high power GPUs.

#59 Reply
Posted by Berni on 20 Oct, 2019 07:40
Using internal block RAM in FPGAs has a range of advantages, so if a big enough FPGA can be had cheaply enough its a good idea to use it.

Firstly the internal RAM blocks are naturaly dual port and that often comes in usefull, then they are separate blocks that slot together like legos to build RAM of various sizes, so you can have multiple smaller memories that can be accessed in parallel rather than one big memory that everyone needs to get multiplexed to. This last part gives it an insanely high memory bandith too. For the DIYer probably the best of all that it doesn't need any external components and so zero pins to hook up.

But if you want lots of resolution and the ability to bitblit bitmaps around in the 'GPU' (You will need that on a 8bit CPU as that thing will be awfuly slow at pushing pixels around manualy) then you will need more RAM than you can get in reasonably priced FPGAs. The easiest option is SRAM since you just throw an address on its pins and the data shows up, and you can buy it up to about 8MB for a reasonable price. When the Z80 wants to access it you just switch over the CPUs memory bus to the SRAM pins and after its done switch it back to the GPU. Perhaps also add a latch on the bus so you can perform access to SRAM at full speed (~100MHz) and quickly return control back to the GPU while the latch holds the current memory translation on the Z80s bus for its casual 8MHz clock to get around to grabbing the data. You can also use SDRAM, DDR, DDR2, DDR3 and such on FPGAs just fine but its a lot more complex. These memories need some initialization and use pipelining (You execute a read on this cycle but get the data 8 cycles later) so they prefer if you access them in large burst operations while needing to be refreshed by the controller to retain data. Also DDR needs proper length matched traces so you can't just wirewrap wire it together. But as a reward for using any of those complex DRAMs you get gigabytes of RAM for cheep.(Not that useful on a 8bit system tho)

However graphics systems in game consoles have often used a combination of external DRAM and RAM internal to the video chip. Since these chips generate the image live as it is being sent out means that they need to look at a lot of things simultaneously to decide on the color of just one pixel. With the typical memory of the time running in the handful of MHz this made it only fast enough to do 1 or 2 read transaction per pixel. To get around this only the large data is kept in external RAM like actual image data, while other things are kept in lots of tiny internal RAMs inside the video chip, this is things like the color pallete, sprite locations, text charactermaps, sometimes tilemaps... Then as years went by and memory got faster and provided enough bandwidth to perform a handful of operations per one pixel we got more color depth, more layers, transparency, parallax scrolling and all that lovely eye candy. These improvements don't put all that much extra burden on the CPU as the GPU is doing the heavy work of throwing the extra pixels around. The CPU is still just giving it a bunch of coordinates where to draw things, so you can get some amazing graphics from a Z80 if you give it a powerful graphics chip.

But yeah for things that you are doing i wouldn't worry about memory bandwidth since modern memory is so much faster that it will be fast enough for anything.

EDIT: Oh and you would most definitely not want to have your graphics memory directly on the Z80 memory bus, not only does that create multi-master whoes its also slows things down while providing seemingly no real benefit. Its much easier to just connect the FPGA pins to the Z80 bus and have it act as a "gatekeeper" to separate graphics RAM. The FPGA will look to the Z80 like just another SRAM chip so it can read and write to it just the same, but since the FPGA is sitting in between it can hand over the RAM to the graphics hardware whenever the CPU is not using it. A small part of this same "RAM window" memory range can also be used to hold the graphics hardware registers and any other peripherals you might want to have (sound, keyboard etc..)

#60 Reply
Posted by nockieboy on 20 Oct, 2019 09:58
Quote from: james_s on 19 Oct, 2019 23:22
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.

Okay, so SDRAM is a definite possibility then, which opens the field in terms of FPGA options right up. I can look at a TQFP 5v-tolerant FPGA (there must be a few around still, or perhaps even the odd one still being made?), strap an SDRAM to it and let it rip.

Quote from: asmi on 20 Oct, 2019 02:55
Also since BGA packages provide signal integrity that is far superior to any leaded packages, DDR2 and up memory chips are only available in BGA packages.

Now, I personally love reasonable-pitch BGAs and always choose them over any other packages, but I know some people have some sort of anti-BGA religion, which is why I'm bringing this up.

I have nothing against BGA generally, except I've never had to solder one and (at least it looks like) it requires specialist equipment to get the job done with any level of confidence. I don't have the funds to throw at an IR reflow oven, so the best option I would have is to build my own - the videos on YT don't fill me with confidence, though. And I'm stepping a looong way away from one of my goals, which was for my computer to be build-able by just about anyone with a little confidence with a soldering iron.

The other issue with BGAs is that I would likely have to switch from 2-layer PCBs to 4-layer if the BGA grid is dense enough, with cost implications there too.

Quote from: james_s on 20 Oct, 2019 03:01
Are you forgetting that this is being built to work with an 8 bit Z80? I don't think it needs to be anything super fancy, there's no point in making the video card powerful enough to deal with more data than the host CPU is capable of sending to it. We're not talking about modern high power GPUs.

Indeed. I'm designing for an 8 MHz Z80 system, from an era when a single DIP-chip would handle the CRT by outputting a composite, UHF or RGB (if you're lucky) signal. If it can handle vector graphics or a handful of sprites and a bitmap or tiled background on a 320x240 screen so that I can play Pong or Bubble Bobble, I'll be overjoyed. Anything else will be a bonus.

Quote from: Berni on 20 Oct, 2019 07:40
The CPU is still just giving it a bunch of coordinates where to draw things, so you can get some amazing graphics from a Z80 if you give it a powerful graphics chip.

This is what I'm intending to do with the GPU - have the CPU pass instructions/coordinates/sprite and bitmap setup data to the GPU, then have the GPU throw the bytes around the frame buffer.

Quote from: Berni on 20 Oct, 2019 07:40
But yeah for things that you are doing i wouldn't worry about memory bandwidth since modern memory is so much faster that it will be fast enough for anything.

Marvellous, that's good to know.

Quote from: Berni on 20 Oct, 2019 07:40
EDIT: Oh and you would most definitely not want to have your graphics memory directly on the Z80 memory bus, not only does that create multi-master whoes its also slows things down while providing seemingly no real benefit. Its much easier to just connect the FPGA pins to the Z80 bus and have it act as a "gatekeeper" to separate graphics RAM. The FPGA will look to the Z80 like just another SRAM chip so it can read and write to it just the same, but since the FPGA is sitting in between it can hand over the RAM to the graphics hardware whenever the CPU is not using it. A small part of this same "RAM window" memory range can also be used to hold the graphics hardware registers and any other peripherals you might want to have (sound, keyboard etc..)

Okay, so how about this for the interface to the GPU? My current MMU design gives me up to 8x 512 KB chip sockets. Socket 1 is SRAM, Socket 8 is ROM, 2-7 can be anything you want. If I 'remove' Socket 7 for example, there will be no RAM/ROM in the system trying to reply to the Z80 when it addresses that particular 512KB memory range. I can then get the FPGA to accept all RD/WRs to that 512 KB window and treat them as direct access to the GPU's frame buffer and registers. 512 KB will only use a tiny fraction of the SDRAM on the other side of the FPGA, but will give the Z80 a huge area to load/assemble bitmaps, sprites, LUTs, symbols into.

Does that sound like a workable plan?

EDIT:

I know this sounds like I'm going back on my original statement that I didn't want a frame buffer in the system memory, but this isn't quite the same thing as an 80's multiplexed frame buffer and all the bus arbitration that would come with it. This is physically removing a chunk of the system's memory and having the FPGA replace it, if that makes sense?

#61 Reply
Posted by Berni on 20 Oct, 2019 11:29
Quote from: nockieboy on 20 Oct, 2019 09:58
Okay, so how about this for the interface to the GPU? My current MMU design gives me up to 8x 512 KB chip sockets. Socket 1 is SRAM, Socket 8 is ROM, 2-7 can be anything you want. If I 'remove' Socket 7 for example, there will be no RAM/ROM in the system trying to reply to the Z80 when it addresses that particular 512KB memory range. I can then get the FPGA to accept all RD/WRs to that 512 KB window and treat them as direct access to the GPU's frame buffer and registers. 512 KB will only use a tiny fraction of the SDRAM on the other side of the FPGA, but will give the Z80 a huge area to load/assemble bitmaps, sprites, LUTs, symbols into.

Does that sound like a workable plan?

EDIT:

I know this sounds like I'm going back on my original statement that I didn't want a frame buffer in the system memory, but this isn't quite the same thing as an 80's multiplexed frame buffer and all the bus arbitration that would come with it. This is physically removing a chunk of the system's memory and having the FPGA replace it, if that makes sense?

Yes exactly the FPGA would act as a memory in that whole 512KB area.

But the FPGA can also do some address decoding of its own to map some hardware control registers into the first 1KB of its 512KB address space while the rest is a window into video RAM. This gives you the area to control the video hardware (Like choosing at what address the framebuffer is for double or tripple buffering, selecting video modes, holding sprite tables etc...). You can also have a register that offsets the 512KB RAM window around letting you roam across 64MB of video memory for example. (Old cartridge based consoles heavily relied on banking to fit the large ROMs into the limited memory space)

Having access to video memory from the CPU is quite useful since you don't need to implement GPU functionality to load images and tables into video memory, instead you just load it in yourself. Also the CPU can use this memory as part of its program. For example if you are making a game you can use the sprite table to hold the position and type of enemy characters on screen rather than keeping this data separate in CPU RAM and then having to update the sprite table in video memory on every frame.

But the main benefit to have video RAM separated and behind a FPGA is that the video RAM can run at full speed. The FPGA can only draw pixels as fast as it can write them to memory so having 10 times faster memory means it can draw 10 times more pixels per second.

Tho the memory bandwidth is more important if you go the modern drawcall route since that tends to keep everything in RAM, rather than the fully hardware based tilemaps and sprites that generate graphics on the fly without writing to RAM at all. On the other hand the drawcall route is more flexible as it can draw any image on to any other image while simultaneously modifying the image, compared to sprites and tilemaps tend to be limited to fixed sizes and grid patterns. But you can still package up tilemap functionality in the form of a draw call like "Draw tilemap(at 0x5000) using tileset(at 0x7200) into framebuffer(at 0x1000 with the offset 944,785)"

As for vector graphics, home game consoles don't use them all that much in 2D. Its usually 3D when vector graphics get hardware acceleration, but at that point the GPU also often ends up having features like matrix math for fast manipulation of 3D points and texture mapping/shading to make those triangles look pretty and textured rather than being flat solid colored triangles. In the past this 3D functionality was hardwired in the GPU (like tilemaps and sprites are hardwires to work in a certain way) as part of its fixed rendering pipeline that takes the 3D models and turns them into a image (like 2D tilemaps turn into a image). Later on this was made flexible by breaking the process up in to steps so the GPU could be asked trough additional drawcalls to do extra fancy eye candy in between the transitional rendering steps. Even later programmable sharers ware introduced and that made GPUs flexible enough to be used as a giant math coprocessor to mine cryptocurencies, encode video and run physics simulations.

So yeah you could generate 3D graphics similar to a N64 on that Z80 if that FPGA had hardware 3D graphics. But getting all of that to work would be a huge project (Doable by a single person but would take a very long time).

#62 Reply
Posted by SiliconWizard on 20 Oct, 2019 15:40
Quote from: nockieboy on 18 Oct, 2019 19:02
Quote from: SiliconWizard on 18 Oct, 2019 14:49
Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.

You mean the FPGA would read the 'frame buffer' in the Z80's logical memory space into an internal buffer really quickly, then pass that out as a pixel stream? Otherwise surely the frame buffer would be sending data at the rate the screen mode requires, which means 73% of the time it would be sending data and locking the Z80 out? Not sure I understand this fully.

I mean the RAM would be accessed alternately from the CPU and the video controller, in like "access slots". The CPU would not be "locked out", since its own access slot would always be available. Sure the video controller would have more slots available than the CPU, but you just need to clock this fast enough so that the slot allocated to the CPU is less than or equal to the minimum memory access time for the CPU (I don't remember about memory access with the Z80, but I'd guess a typical memory access would be more than 1 cycle, thus this should not be hard to get with even a moderately fast RAM chip.)

As I said, to do this in a simple manner, the CPU and video controller clocks should be synchronized and frequencies should be multiple. For instance, if the video controller needs 3 times as much bandwidth, the RAM access would have 4 slots, 1 for the CPU, and 3 for the video controller. The memory access (MMU?) clock would be 4 times the CPU clock here for instance, and the RAM would be accessed at this elevated frequency (which would still not be that fast for a modern RAM chip). Note that it would be relatively easy to do with SRAM, as long as its access time fits the requirements. With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).

#63 Reply
Posted by james_s on 20 Oct, 2019 18:05
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.

#64 Reply
Posted by langwadt on 20 Oct, 2019 18:18
Quote from: james_s on 20 Oct, 2019 18:05
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.

and afaict the Z80 is TTL input levels so it'll with work with 3.3V input and the usual wimpy TTL high is already limiting the output current, The Xilinx datasheets usually say it is ok to use the ESD diodes for voltage limiting as long as the input current is limited to something like 20mA and the total is less than the sink on that supply

#65 Reply
Posted by jhpadjustable on 20 Oct, 2019 19:46
Quote from: nockieboy on 19 Oct, 2019 22:12
The easiest way for me to do that would be to keep one of the memory sockets empty, which would provide a 512 KB window that the FPGA could intercept the reads and writes to.. Half a meg would be more than big enough for a frame buffer for the resolutions I'm wanting to use, and would allow double-buffering on all the major screen modes if I was able to substitute that window for 512 KB of RAM on the FPGA via SDRAM, for example... hmmm... Tempting...
You got this.

Quote
Using the SDRAM on the FPGA seems to me and my inexperienced mind to be a lot more complicated than using dual-port RAM in the FPGA itself. Can anyone convince me otherwise? If I can make use of the SDRAM easily, without any major drawbacks or slowdowns, then I needn't worry about using a big, expensive FPGA with lots of internal RAM...
I confess that it is a bit more complicated than using a block RAM, and your system will have to be built to accommodate a response that will take several, possibly varying clock cycles to come back, but that doesn't mean you have to do it all yourself when plenty of open IP cores exist to help you out. Some FPGA design suites will even generate and parametrize DDR/DDR2/DDR3 controllers to your order. Often they will present an interface similar to a synchronous SRAM, but occasionally will be full-fledged bus slaves conforming to whatever standard. Wishbone is common in the open hardware world. It's designed to be modular so that you can simply omit complications that you don't want or need. The Simple Bus Architecture is a minimized version of Wishbone in exactly that vein. Also, the Z80 doesn't do everything all at once, and you don't need to either. Consider using a FIFO inside the FPGA to hold pixel data, maybe even a whole scan line ahead. Pipelining is very much your friend.

Quote
Oh I've had this discussion with others before. When I first dipped my toes in the waters of electronics a couple of years ago
KiCAD 4 was a hot mess two years ago, absolutely. The UI wasn't very good or consistent. There were three separate layout toolsets, each with their own particular commands. The library editors were simply awful, which was a real problem in practice. Eagle was far better put together.

Fast forward a couple of years, with the help of the universe-destroyers at CERN and a full-time project manager KiCAD 5 has blossomed quite a bit. The symbol/footprint librarians have been given extra, dearly needed love. There's an interactive push-and-shove router. The three layout toolsets have been unified and rationalized. The whole thing now feels a bit less like a student project and more like a workflow for people to use. Even our gracious host Dave, who worked at Altium by the way, has been moderately impressed with it, even if some of the pro-level conveniences are lacking. If it's been a couple of years since you've had a look, I encourage you to give the latest a fresh hour or two. Be aware that there is still no whole-board autorouter built into KiCAD, and you would need to use an external application for autorouting such as FreeRouting, TopoR, etc. Personally I don't consider that a problem as I prefer more control over where my traces go and how they get there. But, if that's a deal-breaker for you, fair enough, I won't push it.

As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident. There are universal librarian softwares and services, who maintain a library of symbols and footprints in a master format and translate them on-demand to whatever eCAD software you have. That doesn't imply that any given universal symbol is well-laid-out or available for that part you need right now. So a decent footprint/symbol editor and librarian is a must.

Disclaimer, I've not used any of the pro-level software, but I cut my teeth on Eagle 4.x and was happy with it until my designs approached the free tier's size/layer/field-of-use restrictions. Now that Autodesk has made Eagle part of a cloud service, it's a hard pass.

Quote from: SiliconWizard on 20 Oct, 2019 15:40
With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
Two comments: The delays of an activate-precharge cycle might not be noticed within a 125ns Z80 clock cycle, never mind a 375ns memory cycle. There should be plenty of time even with single data rate SDRAM to complete a burst of 8 or 16 bytes into a pixel FIFO and perform any byte read/write from host or blitter that might or might not be waiting, in a fixed rhythm. Second, the Z80 was designed to make DRAM usage easy, with the row address on the bus early. With some clever SDRAM controller programming, the row address can be likewise passed along early to the SDRAM chip to open the row early, allowing usage of slower DRAM, maybe even the good old asynchronous DRAM chips which are still available, but at closer to $1 per megabyte than $1 per gigabit. Still I think it's overkill for the present application.

#66 Reply
Posted by Berni on 21 Oct, 2019 05:43
Quote from: jhpadjustable on 20 Oct, 2019 19:46
As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident. There are universal librarian softwares and services, who maintain a library of symbols and footprints in a master format and translate them on-demand to whatever eCAD software you have. That doesn't imply that any given universal symbol is well-laid-out or available for that part you need right now. So a decent footprint/symbol editor and librarian is a must.

Disclaimer, I've not used any of the pro-level software, but I cut my teeth on Eagle 4.x and was happy with it until my designs approached the free tier's size/layer/field-of-use restrictions. Now that Autodesk has made Eagle part of a cloud service, it's a hard pass.

I have been using Altium Designer for quite a few years and despite it being a professional tool and its large library of parts i still end up drawing most of the symbols and footprints myself.

There are certain style guides i want components to conform to, the way they are named, the way i want supplier links for ordering. But above all is how the symbol is drawn. I like to arrange the pins in certain ways to make for a neat schematic, for MCUs i like the peripherals written down on the pin rather than just being called "PB3", sometimes i draw extra things inside the IC rectangle to make it more clear what the chip is doing(Like logic ICs or MUXes or Digital pots etc). So its almost as fast to just draw my own rather than modify there library component to fit my liking.

The important part is that the editors for symbols and footprints are pretty good. I can draw a 100 pin IC in <10 minutes if the pin names can be copy pasted from a pinout table in the datasheet and this includes arranging the pins around to my liking. The footprint for that same chip in TQFP or BGA form takes another 3 minutes because Altium has a footprint generation wizard that supports almost all sensible package types where you just enter about 3 to 10 dimensions from the drawing and it spits out a footprint that includes a 3D model.

Not saying Altium is the best PCB tool because it still does some things badly and sometimes crashes in weird ways, but it does do some things right like library creation (But lets not speak about maintaing large libraries as they have a long way to go there)

Quote from: jhpadjustable on 20 Oct, 2019 19:46
Quote from: SiliconWizard on 20 Oct, 2019 15:40
With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
Two comments: The delays of an activate-precharge cycle might not be noticed within a 125ns Z80 clock cycle, never mind a 375ns memory cycle. There should be plenty of time even with single data rate SDRAM to complete a burst of 8 or 16 bytes into a pixel FIFO and perform any byte read/write from host or blitter that might or might not be waiting, in a fixed rhythm. Second, the Z80 was designed to make DRAM usage easy, with the row address on the bus early. With some clever SDRAM controller programming, the row address can be likewise passed along early to the SDRAM chip to open the row early, allowing usage of slower DRAM, maybe even the good old asynchronous DRAM chips which are still available, but at closer to $1 per megabyte than $1 per gigabit. Still I think it's overkill for the present application.

Yeah modern SDRAM is so fast the FPGA would be able to present it like its SRAM to the slow 8MHz Z80 since the SDRAM could be clocked so much faster that the FPGA has time to select the appropriate row, execute a read and clock the data trough the pipeline in the time the Z80s read access happens. Or if it got really unlucky and was right in a refresh cycle when that happened maybe the FPGA can pull on the Z80s WAITn line to halt it a bit.

The extra complexity comes in the GPU where to use the full speed of SDRAM it has to be built to cope with the RAMs pipelined fashion. So performing reads and writes in large bursts and making sure it always has something in the pipeline so that it doesn't end up sitting there waiting for data to come in. Yes all modern digital computing is heavily pipelined because its required to get the high clock speeds, but it comes at a price.

Since you probably don't need more than a few megabytes of memory i think SRAM is a safer choice. It is truly random access and has no confusing pipelining.

I have worked with FPGAs here and there and pipelining still hurts my brain as when writing pipelined HDL code its hard to keep track of what register is holding the data for what cycle of the pipeline. I usually tend to have to resort to drawing a timing diagram of the whole thing so that i can visually see how many cycles something is behind or in front of something and then turn that timing diagram into code.

#67 Reply
Posted by nockieboy on 21 Oct, 2019 10:24
Quote from: Berni on 20 Oct, 2019 11:29
But the FPGA can also do some address decoding of its own to map some hardware control registers into the first 1KB of its 512KB address space while the rest is a window into video RAM. This gives you the area to control the video hardware (Like choosing at what address the framebuffer is for double or tripple buffering, selecting video modes, holding sprite tables etc...). You can also have a register that offsets the 512KB RAM window around letting you roam across 64MB of video memory for example. (Old cartridge based consoles heavily relied on banking to fit the large ROMs into the limited memory space)

Yes, I really like this idea. It side-steps the whole issue of using a narrow bottleneck in the IO ports to speak to the GPU and transfer data a byte at a time, and using multiple IO calls to write or read data (setting the register first, then writing or reading from it in the next IO call) and the compulsory WAIT state for each IO transaction. It also opens up the Z80's 16-bit memory operations. Not seeing many negatives - I'm sure my little Z80 can handle only having 3.5 MB of 'system' memory space rather than the full 4 MB. It's a far cry from the 80's when it would only have 64 KB to squeeze a frame buffer and everything else into...

Quote from: Berni on 20 Oct, 2019 11:29
Having access to video memory from the CPU is quite useful since you don't need to implement GPU functionality to load images and tables into video memory, instead you just load it in yourself. Also the CPU can use this memory as part of its program. For example if you are making a game you can use the sprite table to hold the position and type of enemy characters on screen rather than keeping this data separate in CPU RAM and then having to update the sprite table in video memory on every frame.

Nice. Could even copy the character set from ROM in the FPGA to a symbol table in RAM on power-up and allow the Z80 to change it for a customisable symbol set. Lots and lots and lots of possibilities...

Quote from: Berni on 20 Oct, 2019 11:29
But the main benefit to have video RAM separated and behind a FPGA is that the video RAM can run at full speed. The FPGA can only draw pixels as fast as it can write them to memory so having 10 times faster memory means it can draw 10 times more pixels per second.

Might come back to this further down as the discussion is turning more towards using SRAM instead of SDRAM... not sure there'd be much of a performance penalty for substituting SRAM as it sounds like SDRAM has its shortcomings. As I'm working with an 8-bit processor running at a modest clock speed, I can't see it needing to make use of the SDRAM's pipelined architecture. Sure, the FPGA will appreciate it, but with such low screen resolutions and such, I'm thinking SRAM will be a much easier choice for me to deal with.

Quote from: Berni on 20 Oct, 2019 11:29
Tho the memory bandwidth is more important if you go the modern drawcall route since that tends to keep everything in RAM, rather than the fully hardware based tilemaps and sprites that generate graphics on the fly without writing to RAM at all. On the other hand the drawcall route is more flexible as it can draw any image on to any other image while simultaneously modifying the image, compared to sprites and tilemaps tend to be limited to fixed sizes and grid patterns. But you can still package up tilemap functionality in the form of a draw call like "Draw tilemap(at 0x5000) using tileset(at 0x7200) into framebuffer(at 0x1000 with the offset 944,785)"

I think the bandwidth will be important either way, as the sprites, tilemaps and symbol table will all be in RAM rather than ROM. The symbol table holding the ASCII char set for example, will be read from RAM so that the user can customise the look of it if they desire. Same with sprites and tile maps - not much use unless they are customisable.

Quote from: SiliconWizard on 20 Oct, 2019 15:40
As I said, to do this in a simple manner, the CPU and video controller clocks should be synchronized and frequencies should be multiple. For instance, if the video controller needs 3 times as much bandwidth, the RAM access would have 4 slots, 1 for the CPU, and 3 for the video controller. The memory access (MMU?) clock would be 4 times the CPU clock here for instance, and the RAM would be accessed at this elevated frequency (which would still not be that fast for a modern RAM chip). Note that it would be relatively easy to do with SRAM, as long as its access time fits the requirements. With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).

I'm all for keeping this as simple as possible, hence going for the 'memory window' approach rather than interleaving access to shared memory - I've never used SDRAM and know next to nothing about using it, so perhaps the fastest SRAM I can reasonably get hold of would be the best choice for the FPGA's frame buffer? I use 55ns SRAM for the system memory, would need to be a fair bit faster for the FPGA to improve it's pixel draw rate I guess.

Quote from: james_s on 20 Oct, 2019 18:05
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.

Okay, sounds good. If the Z80 can talk directly to the FPGA it would make things a fair bit simpler.

Quote from: langwadt on 20 Oct, 2019 18:18
and afaict the Z80 is TTL input levels so it'll with work with 3.3V input and the usual wimpy TTL high is already limiting the output current, The Xilinx datasheets usually say it is ok to use the ESD diodes for voltage limiting as long as the input current is limited to something like 20mA and the total is less than the sink on that supply

I'm using a CMOS Z80 and mostly HCT glue logic, apart from a couple of LS parts in the MMU that the FPGA would be exposed to the outputs from. That's promising info though - so it could be possible to just connect the FPGA directly to the system's address and data lines? I have a 3.3v power rail in the system that the FPGA would be powered from.

Quote from: jhpadjustable on 20 Oct, 2019 19:46
KiCAD 4 was a hot mess two years ago... Fast forward a couple of years, with the help of the universe-destroyers at CERN and a full-time project manager KiCAD 5 has blossomed quite a bit. The symbol/footprint librarians have been given extra, dearly needed love. There's an interactive push-and-shove router. The three layout toolsets have been unified and rationalized. The whole thing now feels a bit less like a student project and more like a workflow for people to use. Even our gracious host Dave, who worked at Altium by the way, has been moderately impressed with it, even if some of the pro-level conveniences are lacking. If it's been a couple of years since you've had a look, I encourage you to give the latest a fresh hour or two.

Okay, I'll give it another go. Being limited to 500 pins and 2 layers in my PCB schematic and PCB work is a bit of a bind for me, especially now looking at these FPGAs with pin counts in the hundreds. Heh, I'm even looking at some DIY reflow oven tutorials on YT. I'm sure I read somewhere that someone used to do BGA soldering with their frying pan?

Quote from: jhpadjustable on 20 Oct, 2019 19:46
Be aware that there is still no whole-board autorouter built into KiCAD, and you would need to use an external application for autorouting such as FreeRouting, TopoR, etc. Personally I don't consider that a problem as I prefer more control over where my traces go and how they get there. But, if that's a deal-breaker for you, fair enough, I won't push it.

It's not a deal breaker, but it has been a good reason to stay with DipTrace all this time and work within the licence limits. I need to really just sit down with an hour or two to spare and put together one of my computer's card schematics in KiCAD and route the PCB and see how it goes.

Quote from: jhpadjustable on 20 Oct, 2019 19:46
As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident.

Of course, and I have been doing that with DipTrace. The big difference is that I haven't had to do it that often and, when I have, it has been for relatively obscure parts. The process of creating a new component and its associated footprint is also pretty easy and straightforward. The only thing that grips me with DipTrace is that it could be slightly more automated in net naming and attaching. Having to manually click on Every. Single. Pin. on a 16-bit address bus and select the next net address line in the list to link to it is a little tedious. And that's just a 16-bit address bus... I noticed the other day while playing with EasyEDA that it automatically increments the net name when you click on the next pin - so linking D0-D7 to a RAM chip is as simple as clicking, entering the first net name 'D0', then 6 more clicks on the rest of the data pins and you're done. Some thought has gone into that UX.

Quote from: Berni on 21 Oct, 2019 05:43
Since you probably don't need more than a few megabytes of memory i think SRAM is a safer choice. It is truly random access and has no confusing pipelining.

Okay, SRAM it is. I certainly don't need any extra confusion - I haven't written a single line of VHDL yet, so this is going to be a steep-enough learning curve for me without adding in extra complexity.

Quote from: Berni on 21 Oct, 2019 05:43
I have worked with FPGAs here and there and pipelining still hurts my brain as when writing pipelined HDL code its hard to keep track of what register is holding the data for what cycle of the pipeline. I usually tend to have to resort to drawing a timing diagram of the whole thing so that i can visually see how many cycles something is behind or in front of something and then turn that timing diagram into code.

Righto, thanks Berni. Steer clear of SDRAM, got it.

#68 Reply
Posted by migry on 21 Oct, 2019 16:15
Well I have been playing around with hobbiest (read low cost) FPGA boards (both Xilinx and Altera) for the past year, with a particular interest in VGA/video generation and retro computing. So I am going to chip in my 10 cents worth.

Firstly I would recommend visiting YouTube and watching the videos of "Land Boards, LLC" as I think their videos are a perfect fit for answering your questions. I found this channel only recently. They have several videos on implementing Grant Searle's design on a couple of cheap FPGA boards one of which is a Cyclone II board which you might have. They also sell on Tindie and in their videos show a Static RAM expansion board (no idea if this can still be bought). BTW - I have no connection with this company. My interest was the videos they show of the cheap Chinese ZR-Tech Cyclone IV FPGA board. They keep mentioning their GitHub repository, so I think you are likely to find everything you need right there!

Without knowing your level of FPGA, RTL, electronics and video experience/knowledge it is difficult to give accurate advice, but I would suggest start with simple video generation using VGA. There are many text only VGA displays to be found (e.g. OpenCores) which are a great starting point. You will need to get familiar with how video displays work, horizontal and vertical sync and video timings. What is interesting is that todays HDMI "video" is based on the old CRT analogue video standards of yesteryear. Again there is lots of tutorials on the web and YouTube, just one can be found on the FPGA4fun website. Text only video needs little memory so will fit in any FPGA and use internal RAM blocks. VGA allows you to look at the R,G,B and syncs using a scope (if you have one). This can be very educational.

If you want a graphics video solution, then you need frame buffer memory. The problem I faced was that if you want to fit the frame buffer only in internal RAM blocks, this means a very expensive FPGA. Most hobbiest boards use FPGAs with the smallest amount of RAM blocks, to keep the cost low. Adding a fast Static RAM is the easiest way to go, but some FPGA boards don't have enough spare I/O. Some hobbiest FPGA boards have a large SDRAM, but these are not so easy to understand and operate (again check out Hamster's website), and access speed can be an issue. For example one of the retro computer designs (MISTer???) doesn't use the on-board DDR(?)RAM, but adds an external RAM of some kind. So even these clever coders couldn't get the required bandwidth from the on-board memory.

Just FYI, my own solution, is that I designed my own Cyclone II FPGA board/PCB and incorporated a 12ns 128k bytes Static RAM. Even then there were signals crossing multiple clock domains and this caused me a lot of problems (I am no expert but then I'm no noob).

The "Land Boards" solution added a small daughter PCB with a SRAM for video and CPU memory.

BTW Digilent do a relatively low cost Artix-7 board which has a large fast SRAM, so that's worth a consideration. I have one. FYI, powering this board needs careful attention (otherwise expect lots of USB problems).

Once you have got experience with the "simple" VGA solution, then by all means investigate HDMI. As others have mentioned "Hamster_NZ" has done some excellent work implementing HDMI and his website is a wealth of knowledge. I have re-coded his code in Verilog and ported to a number of different FPGAs. One hurdle is finding a way to hook up the HDMI socket. There are certainly boards which incorporate a HDMI socket but they tend to be more expensive. Some HDMI implementation use LVDS channels from the FPGA directly, but others implement HDMI using an Analogue Devices chip (which might be an easier solution - I do not have one of these boards to try out). I have a board from Numato with HDMI in and out connectors. Also from Numato I bought a HDMI PMOD board, and this has worked well.

The downside of HDMI is that if it doesn't work, it is not so easy to debug. I ported my Verilog to a Cyclone II. I appear to get sync, but the video is wrong. I have run simulations and the channel data packets look correct. I have no idea how to start to debug this! Advice welcomed

Have fun. and post from time to time with updates or further questions

regards...
--migry

#69 Reply
Posted by BrianHG on 21 Oct, 2019 22:04
Hmmm. OP: Can you definitively say what resolution you want for an 8 bit computer? Earlier you said you wanted XXXkb. Also, will you have a text mode with addressable modifiable font? Will the font be dedicated in memory, or use a system memory? How many colors? Color palette? Different video modes on different lines on the screen? Sprites?

The problem is with a video controller, you have ram access which is demanding and cannot be part of separate system mory of your 8 bit cpu core, so, is something like a cheap Atari 800 graphics design, the video processor and CPU's memory are intertwined and with a minimum VGA output with X colors, your video output will eat a lot of bandwidth.

If 64kb is enough, and you want an all internal no ram design, look for a PLD/FPGA with at least 1 megabit internal so it may hold everything with some extra space, otherwise, at the level you seem to be at I strongly recommend buying an existing development eval board for either Altera or Xilinx with a VGA or HDMI out and at least 1 DDR/DDR2 ram chip at minimum. Make sure the eval board is documented well and not a Chinese only demo code which you cannot dissect to your liking.

Example all in 1, 144pin qfp PLD (no boot prom needed, 1 single supply voltage, I narrowed the selection to 64kbyte and 128kbyte)
https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10

Yes, this is not the cheapest, but it is an all in 1 IC with single supply, no external parts except for video dac (use a 4x74HC574 and resistors if you like) and TTL 3.3v-5v translator IC for data bus, though, you might get away with schottky diode camps and series resistors on the inputs coming from your 5v digital logic.

#70 Reply
Posted by nockieboy on 21 Oct, 2019 23:09
Quote from: migry on 21 Oct, 2019 16:15
Firstly I would recommend visiting YouTube and watching the videos of "Land Boards, LLC" as I think their videos are a perfect fit for answering your questions. I found this channel only recently. They have several videos on implementing Grant Searle's design on a couple of cheap FPGA boards one of which is a Cyclone II board which you might have. They also sell on Tindie and in their videos show a Static RAM expansion board (no idea if this can still be bought). BTW - I have no connection with this company. My interest was the videos they show of the cheap Chinese ZR-Tech Cyclone IV FPGA board. They keep mentioning their GitHub repository, so I think you are likely to find everything you need right there!

Thanks migry, that's a great recommendation - I'll take a look.

Quote from: migry on 21 Oct, 2019 16:15
Without knowing your level of FPGA, RTL, electronics and video experience/knowledge it is difficult to give accurate advice...

FPGA level - absolute beginner, never tried programming one (yet)
RTL level - what's that?
Video experience - I taught my mum and dad how to use their first VHS back in the 80's? $:-\$

However, I make up for all that lack of knowledge/experience with a very inquisitive mind, eagerness to learn and able to listen to good advice. I built my computer (image earlier in this conversation) using those attributes with no prior education or experience in electronics, so I'm confident I might get halfway to achieving my goal of reasonable 8-bit graphics. That's not to say I think this will be a walk in the park!

Quote from: migry on 21 Oct, 2019 16:15
...but I would suggest start with simple video generation using VGA. There are many text only VGA displays to be found (e.g. OpenCores) which are a great starting point. You will need to get familiar with how video displays work, horizontal and vertical sync and video timings.

Yes, I've gone through Grant Searle's VHDL code to see how he did his text display and have read up on the video display inner workings - frame rates, pixel clocks, hsync/vsync, back porch/display/front porch, have seen videos on generating a VGA signal using nothing more than 74-series counter logic etc. I'm hoping I don't annoy everyone here too much that I can't ask questions further down the line on the software (VHDL) aspect of the design.

Quote from: migry on 21 Oct, 2019 16:15
Text only video needs little memory so will fit in any FPGA and use internal RAM blocks.

Yep, see the screen resolutions table further down this reply - I've calculated how much memory each screen mode will require (though I'm not 100% certain they're totally correct, but I'm a beginner!) All the colour modes are using a colour LUT, so I'm calculating one byte per pixel for a maximum of 256 colours on screen at any one time out of a palette of... lots.

Quote from: BrianHG on 21 Oct, 2019 22:04
Hmmm. OP: Can you definitively say what resolution you want for an 8 bit computer? Earlier you said you wanted XXXkb. Also, will you have a text mode with addressable modifiable font? Will the font be dedicated in memory, or use a system memory? How many colors? Color palette? Different video modes on different lines on the screen? Sprites?

Above is a table of screen modes that I calculated I could fit into a relatively small (<64 KB) memory space, if I was going to go with an FPGA of sufficient size and complexity. Larger memory would allow more colours, but 640x480 is my intended maximum resolution. The lower resolution modes would be achieved using pixel stretching, or whatever the correct term is for duplicating the same pixel horizontally and the same line vertically to reduce the effective resolution. The output that the TV would see would always be 640x480 (obviously I'm using PAL timings).

Mode 0 will be the text mode. This is going to be the first screen mode I try to implement, as there is plenty of VHDL out there doing this already, Grant Searle's code for example.

The text mode, and any subsequent 'non-text' modes which will still be capable of displaying text as well as graphics, will initially use a symbol table in ROM. My intention as I develop the VHDL design is to copy that table to RAM (if there is sufficient, naturally) at power-up and read it from RAM when displaying text, giving the user the option of changing the design of those symbols (changing fonts, for example, or allowing the use of special characters as graphics tiles).

Currently, it looks as though I'll be using a fast SRAM to store the frame buffer, character tables, colour LUT, sprites etc. But initially, I'll be using whatever is available in the FPGA as this will be the easiest (and fastest) solution whilst I develop the FPGA design. Once I have the text mode working, I can start looking to implement a graphics mode in a resolution that will fit in the available FPGA RAM, but will look to move on to using an external dedicated SRAM chip to open up the screen resolutions etc.

The interface will be a 512 KB 'hole' in the system memory, which the FPGA will monitor for read/writes and action accordingly, and provide a few registers at certain memory locations to control things like clearing the screen, text colour etc.

Quote from: BrianHG on 21 Oct, 2019 22:04
If 64kb is enough, and you want an all internal no ram design, look for a PLD/FPGA with at least 1 megabit internal so it may hold everything with some extra space, otherwise, at the level you seem to be at I strongly recommend buying an existing development eval board for either Altera or Xilinx with a VGA or HDMI out and at least 1 DDR/DDR2 ram chip at minimum. Make sure the eval board is documented well and not a Chinese only demo code which you cannot dissect to your liking.

There are two issues with getting an FPGA with a large enough internal memory - otherwise I'd go straight for the Spartan 6 LX45 (or a more modern equivalent) - that's price and, most importantly, the package the FPGA comes in. I don't have the equipment or skills to reflow a 484-pin BGA onto one of my PCBs. I'm researching building a reflow oven and learning to use less restrictive (but less user-friendly) design software, but it could be a long time before I have the equipment and confidence to try soldering a £50 BGA onto a home-designed board.

I have a Spartan 6 LX16 dev board on its way with 32 MB SDRAM and a breakout HDMI connector that I can wire to the dev board, so that base is covered. I already have a VGA connector, as well, so I can try either option whilst experimenting with the FPGA design.

Quote from: BrianHG on 21 Oct, 2019 22:04
Example all in 1, 144pin qfp PLD (no boot prom needed, 1 single supply voltage, I narrowed the selection to 64kbyte and 128kbyte)
https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10

Yes, this is not the cheapest, but it is an all in 1 IC with single supply, no external parts except for video dac (use a 4x74HC574 and resistors if you like) and TTL 3.3v-5v translator IC for data bus, though, you might get away with schottky diode camps and series resistors on the inputs coming from your 5v digital logic.

Okay, they're interesting devices... No, they're not cheap but there's a huge plus straight away - they're hand-solderable LQFP packages and have reasonable amounts of internal RAM... thanks for the suggestion.

#71 Reply
Posted by BrianHG on 21 Oct, 2019 23:58
For video mode, use a 480p, @ 27Mhz clock for your video out. This is the standard for all TVs HDMI making outputting that a breeze. Use the 640x480 for 4:3 centered inside and expand to 720x480 for 16:9.

Go for the 1.28mbit Max10 as it is only 3$ more than the 512kbit. With this, you will be able to superimpose color text ontop of the 320x240x256 color paletted graphics as well as full vertical height colored sprites.

Remember, you can still store the palette in registers instead of ram, allowing for 1 bank immediate with all the colors being updated during the v-sync for clean pallet animation transitions.

Internally, run the IC at 4x27Mhz, 108Mhz, (feed it 27mhz and use the internal PLL to 4x or 6x or 8x the source clock, use the PLL output to drive your dac or hdmi transmitter clock) well within the Max10's 200Mhz+ abilities. This gives you multiple reads per pixel with the dumbest logic, including the ability to superimpose the text font of 640x480, or 720x480 for 16:9 on top of the 1/4 res 256 color 320x480 graphics mode + multiple colored sprites.

Use internal 2 port ram, 1 read only for video output, having 4-8 reads per pixel at 108Mhz core. (doubling all your video mode specs, otherwise, you have something like 16 read slots per pixel making that read port up to a 16 channel read port with a multiplexed address input and latched data outputs) An a second read/write port for your CPU access.

In fact, there is enough in this Max 10 to make an entire Atari 130xe with 6502 emulation & graphics and sound...

#72 Reply
Posted by langwadt on 22 Oct, 2019 00:12
how many are you going to build? when you can get FPGA,RAM,PSU and flash on a board for ~$20 I'd think you'll have to make quite a few to make it cheaper

#73 Reply
Posted by asmi on 22 Oct, 2019 00:18
Quote from: nockieboy on 21 Oct, 2019 23:09
There are two issues with getting an FPGA with a large enough internal memory - otherwise I'd go straight for the Spartan 6 LX45 (or a more modern equivalent) - that's price and, most importantly, the package the FPGA comes in. I don't have the equipment or skills to reflow a 484-pin BGA onto one of my PCBs.
Why do you need such a large package? Spartan-7 devices up to S50 are available in a FTGB196 package, which was specifically designed to be fully broken out on a 4 layer board, which are very cheap nowadays. This package has 100 user IO pins (2 banks containing 50 pins each), which should be plenty for your needs. And you can easily solder it using just a hot air gun.

#74 Reply
Posted by BrianHG on 22 Oct, 2019 00:46
With "effort", that Max10's PLL and LVDS transmitters are fast enough to directly drive 480p DVI. (HDMI compatible, but, no sound or EDID support) Though, you will still need proper ESD protection on the HDMI port. with load termination resistors to adapt the voltages.

OR:
http://www.ti.com/product/TFP410/samplebuy