Well, I'm building this GPU for a hardware Z80 system. Whilst I appreciate that the FPGA could do everything my hardware system does, that's not the point of this little project. Perhaps an FPGA is total overkill, though.
I've been looking a little more closely at the Spartan LX45 and it's looking less and less likely that I'll be able to use it, even if I could justify the cost. I think BGA is a step too far for my soldering skills and equipment at this stage, and the sheer number of pins on those FPGAs will stretch my DipTrace licence past breaking point. One way around it is to just use one the cheap development boards and plug that straight into my 'GPU card'. Limited IO, but with the FPGA, SDRAM, clock and programming circuitry done for me...
Excellent. That gives me a touchstone to explain some things by analogy. By the way, you didn't have anything important to do this weekend, did you? https://archive.org/details/Amiga_Hardware_Reference_Manual_1985_Commodore
Just for interest, you may be aware there are HDL implementations of the Amiga OCS that might be imported directly into your design, with mimimal modifications. A Z80 driving the OCS chip set could make for a pretty wild experiment, even at 1/2 the memory bandwidth.
That would be the memory-mapped window into frame buffer RAM I mistakenly believed you disfavored, but yes, I do think it's a very good idea...
I'd also be sure it services reads as well.
But I was actually proposing that you memory- (or I/O-) map the control registers, using the system bus control/address/data signals to more or less directly read/write registers inside the FPGA and control the video hardware, analogous to the common idiom of using 74377 or similar ICs with suitable decoding as byte-wide input-output ports.
QuoteDipTraceYou could always switch horses to KiCAD. #justsayin
The LX45 is a very nice, very large (by hobby standards) FPGA, I would say that for what you are describing it is massively overkill. Just to put things in perspective, an entire 8 bit computer including the CPU (Grant's Multicomp for example) or any of the bronze age arcade games I've recreated fit comfortably within the ancient and tiny (by current standards) EP2C5T144C8 FPGAs which you can get ready to go on a little dev board for around $12.
There is lots of middle ground too, if you want Xilinx the LX9 is an inexpensive and very capable part you can get in a reasonably hobbyist friendtly TQFP package.
You can also interface a development board directly to your existing Z80 project and use that for prototyping and then once you have a design implemented that you are satisfied with you can look at the consumed resources and select a less expensive FPGA sufficient for your design and build more tidy custom hardware. One of the really awesome things about FPGAs is that it's very easy to make large portions of the code very portable, a lot of my learning was accomplished by porting projects I found from one platform to another.
I think even the Artix 7 50T has more resources than the XL45 and certainly the 100T which is becoming quite common completely dwarfes the XL45. The 35T is somewhat smaller than the XL45 if that matters.
None of that resources stuff matters much when compared to the fact that the old Spartan 6 devices are not supported by Vivado and the ISE 14.7 version is the last release of ISE and it is no longer supported. Yes, I still use it for my Spartan 3 projects but, for new stuff, I'm using the Artix 7 chips and Vivado.
SP2-150, for GameBOY ADV
This one is 5V tollerant, and it's interfaced with a GameBOY ADV.
Is the FPGA 5v tolerant, or have you done the level conversions yourself?
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.
Using a gigantic FPGA in order to get enough internal block RAM for a framebuffer is not the right way to go about it. Video cards have been using SDRAM and prior to that SRAM and even regular DRAM to implement framebuffers for many, many years. I'm not going to say that it's "easy" but one of the things SDRAM is specifically designed to be good at is pumping in/out blocks of data synchronously which is precisely what video is doing. Look at some video cards from the late 90s to mid 2000's, these are often made with discrete ICs and relatively easy to tell what's going on. Interfacing to external memory is one of the things FPGAs are optimized to do, they have lots of IO pins and some of them even have onboard dedicated SDRAM interfaces.
Also since BGA packages provide signal integrity that is far superior to any leaded packages, DDR2 and up memory chips are only available in BGA packages.
Now, I personally love reasonable-pitch BGAs and always choose them over any other packages, but I know some people have some sort of anti-BGA religion, which is why I'm bringing this up.
Are you forgetting that this is being built to work with an 8 bit Z80? I don't think it needs to be anything super fancy, there's no point in making the video card powerful enough to deal with more data than the host CPU is capable of sending to it. We're not talking about modern high power GPUs.
The CPU is still just giving it a bunch of coordinates where to draw things, so you can get some amazing graphics from a Z80 if you give it a powerful graphics chip.
But yeah for things that you are doing i wouldn't worry about memory bandwidth since modern memory is so much faster that it will be fast enough for anything.
EDIT: Oh and you would most definitely not want to have your graphics memory directly on the Z80 memory bus, not only does that create multi-master whoes its also slows things down while providing seemingly no real benefit. Its much easier to just connect the FPGA pins to the Z80 bus and have it act as a "gatekeeper" to separate graphics RAM. The FPGA will look to the Z80 like just another SRAM chip so it can read and write to it just the same, but since the FPGA is sitting in between it can hand over the RAM to the graphics hardware whenever the CPU is not using it. A small part of this same "RAM window" memory range can also be used to hold the graphics hardware registers and any other peripherals you might want to have (sound, keyboard etc..)
Okay, so how about this for the interface to the GPU? My current MMU design gives me up to 8x 512 KB chip sockets. Socket 1 is SRAM, Socket 8 is ROM, 2-7 can be anything you want. If I 'remove' Socket 7 for example, there will be no RAM/ROM in the system trying to reply to the Z80 when it addresses that particular 512KB memory range. I can then get the FPGA to accept all RD/WRs to that 512 KB window and treat them as direct access to the GPU's frame buffer and registers. 512 KB will only use a tiny fraction of the SDRAM on the other side of the FPGA, but will give the Z80 a huge area to load/assemble bitmaps, sprites, LUTs, symbols into.
Does that sound like a workable plan?
EDIT:
I know this sounds like I'm going back on my original statement that I didn't want a frame buffer in the system memory, but this isn't quite the same thing as an 80's multiplexed frame buffer and all the bus arbitration that would come with it. This is physically removing a chunk of the system's memory and having the FPGA replace it, if that makes sense?
Well, nope. The interleaving suggestion implied that the RAM would actually be accessed faster, so the CPU wouldn't see a difference. To pull that off, you would of course need a fast enough RAM (which should be no problem with a modern SDRAM chip or even SRAM), and clocking the MMU faster. Also, to keep things simple, you'd have to have the CPU and video clock frequencies multiple.
You mean the FPGA would read the 'frame buffer' in the Z80's logical memory space into an internal buffer really quickly, then pass that out as a pixel stream? Otherwise surely the frame buffer would be sending data at the rate the screen mode requires, which means 73% of the time it would be sending data and locking the Z80 out? Not sure I understand this fully.
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.
The easiest way for me to do that would be to keep one of the memory sockets empty, which would provide a 512 KB window that the FPGA could intercept the reads and writes to.. Half a meg would be more than big enough for a frame buffer for the resolutions I'm wanting to use, and would allow double-buffering on all the major screen modes if I was able to substitute that window for 512 KB of RAM on the FPGA via SDRAM, for example... hmmm... Tempting...
Using the SDRAM on the FPGA seems to me and my inexperienced mind to be a lot more complicated than using dual-port RAM in the FPGA itself. Can anyone convince me otherwise? If I can make use of the SDRAM easily, without any major drawbacks or slowdowns, then I needn't worry about using a big, expensive FPGA with lots of internal RAM...
Oh I've had this discussion with others before. When I first dipped my toes in the waters of electronics a couple of years ago
With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident. There are universal librarian softwares and services, who maintain a library of symbols and footprints in a master format and translate them on-demand to whatever eCAD software you have. That doesn't imply that any given universal symbol is well-laid-out or available for that part you need right now. So a decent footprint/symbol editor and librarian is a must.
Disclaimer, I've not used any of the pro-level software, but I cut my teeth on Eagle 4.x and was happy with it until my designs approached the free tier's size/layer/field-of-use restrictions. Now that Autodesk has made Eagle part of a cloud service, it's a hard pass.
With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).Two comments: The delays of an activate-precharge cycle might not be noticed within a 125ns Z80 clock cycle, never mind a 375ns memory cycle. There should be plenty of time even with single data rate SDRAM to complete a burst of 8 or 16 bytes into a pixel FIFO and perform any byte read/write from host or blitter that might or might not be waiting, in a fixed rhythm. Second, the Z80 was designed to make DRAM usage easy, with the row address on the bus early. With some clever SDRAM controller programming, the row address can be likewise passed along early to the SDRAM chip to open the row early, allowing usage of slower DRAM, maybe even the good old asynchronous DRAM chips which are still available, but at closer to $1 per megabyte than $1 per gigabit. Still I think it's overkill for the present application.
But the FPGA can also do some address decoding of its own to map some hardware control registers into the first 1KB of its 512KB address space while the rest is a window into video RAM. This gives you the area to control the video hardware (Like choosing at what address the framebuffer is for double or tripple buffering, selecting video modes, holding sprite tables etc...). You can also have a register that offsets the 512KB RAM window around letting you roam across 64MB of video memory for example. (Old cartridge based consoles heavily relied on banking to fit the large ROMs into the limited memory space)
Having access to video memory from the CPU is quite useful since you don't need to implement GPU functionality to load images and tables into video memory, instead you just load it in yourself. Also the CPU can use this memory as part of its program. For example if you are making a game you can use the sprite table to hold the position and type of enemy characters on screen rather than keeping this data separate in CPU RAM and then having to update the sprite table in video memory on every frame.
But the main benefit to have video RAM separated and behind a FPGA is that the video RAM can run at full speed. The FPGA can only draw pixels as fast as it can write them to memory so having 10 times faster memory means it can draw 10 times more pixels per second.
Tho the memory bandwidth is more important if you go the modern drawcall route since that tends to keep everything in RAM, rather than the fully hardware based tilemaps and sprites that generate graphics on the fly without writing to RAM at all. On the other hand the drawcall route is more flexible as it can draw any image on to any other image while simultaneously modifying the image, compared to sprites and tilemaps tend to be limited to fixed sizes and grid patterns. But you can still package up tilemap functionality in the form of a draw call like "Draw tilemap(at 0x5000) using tileset(at 0x7200) into framebuffer(at 0x1000 with the offset 944,785)"
As I said, to do this in a simple manner, the CPU and video controller clocks should be synchronized and frequencies should be multiple. For instance, if the video controller needs 3 times as much bandwidth, the RAM access would have 4 slots, 1 for the CPU, and 3 for the video controller. The memory access (MMU?) clock would be 4 times the CPU clock here for instance, and the RAM would be accessed at this elevated frequency (which would still not be that fast for a modern RAM chip). Note that it would be relatively easy to do with SRAM, as long as its access time fits the requirements. With typical SDRAM, not so much: implementing such accesses in slots would cause issues because they would typically not be consecutive, introducing possibly unacceptable latencies (yes SDRAM doesn't "like" fully random accesses).
I don't think you will find a 5V tolerant TQFP FPGA, but I don't think that is really all that big of a problem. Look at the schematic for the Terasic DE2 board I have, (manual is downloadable) the FPGA it uses is not 5V tolerant but the GPIO can be interfaced to 5V logic without problems. Each IO pin has a dual diode between IOVcc and Gnd followed by a resistor so the diodes prevent the pin from getting pulled too high and the resistor limits the current that can be dissipated by the diodes. It's not as good as proper level shifting but in practice I have used it many times without issues. At the speeds a vintage Z80 runs you can get away with quite a lot.
and afaict the Z80 is TTL input levels so it'll with work with 3.3V input and the usual wimpy TTL high is already limiting the output current, The Xilinx datasheets usually say it is ok to use the ESD diodes for voltage limiting as long as the input current is limited to something like 20mA and the total is less than the sink on that supply
KiCAD 4 was a hot mess two years ago... Fast forward a couple of years, with the help of the universe-destroyers at CERN and a full-time project manager KiCAD 5 has blossomed quite a bit. The symbol/footprint librarians have been given extra, dearly needed love. There's an interactive push-and-shove router. The three layout toolsets have been unified and rationalized. The whole thing now feels a bit less like a student project and more like a workflow for people to use. Even our gracious host Dave, who worked at Altium by the way, has been moderately impressed with it, even if some of the pro-level conveniences are lacking. If it's been a couple of years since you've had a look, I encourage you to give the latest a fresh hour or two.
Be aware that there is still no whole-board autorouter built into KiCAD, and you would need to use an external application for autorouting such as FreeRouting, TopoR, etc. Personally I don't consider that a problem as I prefer more control over where my traces go and how they get there. But, if that's a deal-breaker for you, fair enough, I won't push it.
As a general rule a designer should expect to create their own symbols for any devices out of the ordinary, and treat their presence in the standard or contributed libraries as a happy accident.
Since you probably don't need more than a few megabytes of memory i think SRAM is a safer choice. It is truly random access and has no confusing pipelining.
I have worked with FPGAs here and there and pipelining still hurts my brain as when writing pipelined HDL code its hard to keep track of what register is holding the data for what cycle of the pipeline. I usually tend to have to resort to drawing a timing diagram of the whole thing so that i can visually see how many cycles something is behind or in front of something and then turn that timing diagram into code.
Firstly I would recommend visiting YouTube and watching the videos of "Land Boards, LLC" as I think their videos are a perfect fit for answering your questions. I found this channel only recently. They have several videos on implementing Grant Searle's design on a couple of cheap FPGA boards one of which is a Cyclone II board which you might have. They also sell on Tindie and in their videos show a Static RAM expansion board (no idea if this can still be bought). BTW - I have no connection with this company. My interest was the videos they show of the cheap Chinese ZR-Tech Cyclone IV FPGA board. They keep mentioning their GitHub repository, so I think you are likely to find everything you need right there!
Without knowing your level of FPGA, RTL, electronics and video experience/knowledge it is difficult to give accurate advice...
...but I would suggest start with simple video generation using VGA. There are many text only VGA displays to be found (e.g. OpenCores) which are a great starting point. You will need to get familiar with how video displays work, horizontal and vertical sync and video timings.
Text only video needs little memory so will fit in any FPGA and use internal RAM blocks.
Hmmm. OP: Can you definitively say what resolution you want for an 8 bit computer? Earlier you said you wanted XXXkb. Also, will you have a text mode with addressable modifiable font? Will the font be dedicated in memory, or use a system memory? How many colors? Color palette? Different video modes on different lines on the screen? Sprites?
If 64kb is enough, and you want an all internal no ram design, look for a PLD/FPGA with at least 1 megabit internal so it may hold everything with some extra space, otherwise, at the level you seem to be at I strongly recommend buying an existing development eval board for either Altera or Xilinx with a VGA or HDMI out and at least 1 DDR/DDR2 ram chip at minimum. Make sure the eval board is documented well and not a Chinese only demo code which you cannot dissect to your liking.
Example all in 1, 144pin qfp PLD (no boot prom needed, 1 single supply voltage, I narrowed the selection to 64kbyte and 128kbyte)
https://www.digikey.com/products/en/integrated-circuits-ics/embedded-fpgas-field-programmable-gate-array/696?FV=ii1290240%7C1329%2Cii1677312%7C1329%2Cii691200%7C1329%2C-8%7C696%2C16%7C13077&quantity=0&ColumnSort=1000011&page=1&stock=1&nstock=1&k=max10&pageSize=25&pkeyword=max10
Yes, this is not the cheapest, but it is an all in 1 IC with single supply, no external parts except for video dac (use a 4x74HC574 and resistors if you like) and TTL 3.3v-5v translator IC for data bus, though, you might get away with schottky diode camps and series resistors on the inputs coming from your 5v digital logic.
There are two issues with getting an FPGA with a large enough internal memory - otherwise I'd go straight for the Spartan 6 LX45 (or a more modern equivalent) - that's price and, most importantly, the package the FPGA comes in. I don't have the equipment or skills to reflow a 484-pin BGA onto one of my PCBs.