Write window = +1 to -3, 5 functional positions for DDR3_CLK_WDQ.
Write window = +1 to -3, 5 functional positions for DDR3_CLK_WDQ.
This is only 200 ps off centre. The skew between different clocks from the same PLL might be around 100 ps.
waiting for Quartus to compile and test was the other > 90% of the work.
Even ModelSim also shows it is off by 1 step when I measure the PLL clock output waveforms. It's like Altera has forgotten that their phase tap calculation begins at 0, not 1, or it is an integer rounding error in their code. These are not the only bugs I've encountered. Coding the memory controller was the easy part and done in the first week running great in ModelSim. In that part, nothing has changed since then. Uncovering all these little undocumented, or erroneous documented functions, waiting for Quartus to compile and test was the other > 90% of the work.
waiting for Quartus to compile and test was the other > 90% of the work.
Just curious, how much time does it take to complete a compilation, and how long to run the tests/simulations?
Even ModelSim also shows it is off by 1 step when I measure the PLL clock output waveforms. It's like Altera has forgotten that their phase tap calculation begins at 0, not 1, or it is an integer rounding error in their code. These are not the only bugs I've encountered. Coding the memory controller was the easy part and done in the first week running great in ModelSim. In that part, nothing has changed since then. Uncovering all these little undocumented, or erroneous documented functions, waiting for Quartus to compile and test was the other > 90% of the work.
This is bizarre. May be Vivado is not that bad after all
Next week, expect to wire-in my DDR3 controller in a limited fashion since current GPU core will need updating as it was only designed to address 1 megabyte, but the DECA board has 512 megabytes. There is a lot of crap to update all over the place. Might as well bring everything up to 30-32 bit addressing supporting 1-4 gigabytes, though, I do not know how a Z80 can address all that...
Ok Nockieboy, it is time to wire up your DECA board to a Z80. Please think of a sturdy way to get those connections to the TTL 5v translators.
Get the current project working.
Also, make sure the basics and !WAIT is wired with an NPN transistor so we may pause the Z80 if a read isn't ready in time.
Your current code should work as is + you will have access to >128 kilobytes onchip graphics ram and 15 MAGGIE layers.
Next week, expect to wire-in my DDR3 controller in a limited fashion since current GPU core will need updating as it was only designed to address 1 megabyte, but the DECA board has 512 megabytes. There is a lot of crap to update all over the place. Might as well bring everything up to 30-32 bit addressing supporting 1-4 gigabytes, though, I do not know how a Z80 can address all that...
Next week, expect to wire-in my DDR3 controller in a limited fashion since current GPU core will need updating as it was only designed to address 1 megabyte, but the DECA board has 512 megabytes. There is a lot of crap to update all over the place. Might as well bring everything up to 30-32 bit addressing supporting 1-4 gigabytes, though, I do not know how a Z80 can address all that...The simplest method I can think of is this: implement six memory-mapped 8 bit registers (Z80 is 8bit right?) -
4 registers for the address (32 bit address)
1 register for stride
1 data register
The idea is that once you set up address and stride, you just keep writing data into a single data register (or reading from it), and FPGA will auto-increment the address as per stride value after every successful read from or write to the data port. And now you can address any 32 bit location of the video memory, and pump the data at max CPU speed because it doesn't need to adjust address all the time. Using stride also allows for some advanced scenarios like writing every other byte (common with text modes, which encode each symbol position with two bytes of memory - one for the symbol itself, and another one for background/foreground color).
Ditto. It would be easy enough to set up an MMU or register-based access in the FPGA as you've suggested to allow the Z80 to access any part of an arbitrarily-sized RAM space - the question is how useful it would be. Even if I start thinking about using the GPU RAM as a ram drive for the Z80's operating system (CP/M), it will only need a few megabytes of space (the 64MB CF card I'm currently using is way too big for it to fill any time soon and my BIOS provides 16 drives within that space, of which I'm using four that aren't even close to being filled yet).
However, where that kind of memory space would be more useful is if the host system is 16-bit. This GPU card could be used with any homebrew computer system - if I ever get enough free time again I'll be building a 16-bit Motorola 68010 system which will use this GPU card.
Also, make sure the basics and !WAIT is wired with an NPN transistor so we may pause the Z80 if a read isn't ready in time.
It looks like GPU RAM will now move on to the DECA board DDR3 for GPU operations and DMA to the Microcom can be done for additional DECA FPGA projects.
What I am asking is it now possible to use the GPU and terminal emulator for other 8 bits that only have 64k RAM and 256 or less I/O locations?
It looks like GPU RAM will now move on to the DECA board DDR3 for GPU operations and DMA to the Microcom can be done for additional DECA FPGA projects.nockieboy will correct me if I'm wrong, but my understanding is that segway onto DECA board is just a way to validate DDR3 controller with other bits and pieces in hardware, with eventual goal of designing a custom PCB to house everything.
Nockieboy et al.
I have been watching this topic for several months and its great to see something like this being developed for homebrews or existing vintage systems, quite an awesome project.
It looks like GPU RAM will now move on to the DECA board DDR3 for GPU operations and DMA to the Microcom can be done for additional DECA FPGA projects.
What I am asking is it now possible to use the GPU and terminal emulator for other 8 bits that only have 64k RAM and 256 or less I/O locations?
nockieboy will correct me if I'm wrong, but my understanding is that segway onto DECA board is just a way to validate DDR3 controller with other bits and pieces in hardware, with eventual goal of designing a custom PCB to house everything.
On a different note, could a .ttf font be used on the VT100 terminal emulator easily?
Thanks again for this awesome project
Again, as per my understanding right now the VRAM is mapped into CPU memory address space (because 6502 doesn't have dedicated I/O bus), but technically you can use a suggestion I offered just few posts above, when you map up to 6 8-bit locations into I/O space (4 for 32bit address, 1 register for stride, 1 data register), which will allow your CPU (which I assume is some variation of Z80) to access any 32bit address, and write bursts at the maximum IO speed. You will probably need to map a DMA controller's control registers into IO space as well. You will have to lean heavily on DMA and GPU hardware functions because there is no way you can achieve any sort of acceptable performance with the CPU itself.
I've been thinking for a while to create a system with a bunch of 8bitters in a sort of SMP system just for the hell of it, but unfortunately real life gets in the way so much so I don't have enough time to even complete my current projects
I just finally got the multichannel/multiport 'BrianHG_DDR3_COMMANDER.sv' working. It is the front end for my DDR3 controller providing 16 read and 16 write ports, each with a user set individual data widths and a DMA through function for all the read channels. Right now, I just need to clean up the 'priority' selection encoder and I will post the entire DDR3 controller on a separate thread. This should be enough for Nockieboy to add CPU data & program IO access ports to the controller, + video access, + audio access, + SD-Card read & write access, + Geometry processor access, + anything else he wants. All accessed like a huge internal multiport FPGA static ram with the 1 caviot that you occasionally need to wait for a busy flag to clear, or read data ready flag on each individual port.
Just thinking out loud, in a 64k system with only 56k of RAM for the operating system there is not a lot of address space left for VRAM.
Would an approach like asmi suggests, sort of like the TMS9918 where you send a start address through an I/O port and subsequent reads or writes increment the internal address counter, be possible without a complete redesign? This is just a curious question not a wish list.
I was looking through your Github and found the "Z80_Bridge.v". Is it still the same or close to the current version that you are using?
I am considering going back to the start of this thread and try to understand how the bridge works.