Author Topic: FPGA VGA Controller for 8-bit computer (Read 424827 times)

nockieboy · « **Reply #2700 on:** September 14, 2021, 10:29:53 am »

Am I right in thinking that the geometry unit won't draw objects outside the bounds of the visible screen? This poses a slight issue for clearing the screen. It appears that the best method for clearing the screen (a CLS subroutine, basically) is to draw a rectangle on it in the background colour, as this handles all the nuances of the screen mode and its specific bpp setting, instead of just writing zero bytes (for 1 bpp) to GPU RAM for a calculated screen size.

The issue I've got is that the rectangle will not fill the screen - there's a vertical line on the far right edge where it won't draw. This looks like something I've missed when I tested the geometry unit whilst we developed it - I'm off to go and dive into the HDL and add 1 to wherever the bounds check is for drawing pixels, but thought I'd raise it here as it's a bug in the existing GPU code.

Also, I need to clear a horizontal line of pixels past the bottom edge of the visible area (used for vertical scrolling) - I'll need to tweak the HDL to allow for drawing past ALL edges by a pixel or two to cater for scrolling, I guess? Let me know if I'm walking into a bear trap or something.

BrianHG · « **Reply #2701 on:** September 14, 2021, 10:47:06 am »

Quote from: nockieboy on September 14, 2021, 10:29:53 am

Am I right in thinking that the geometry unit won't draw objects outside the bounds of the visible screen? This poses a slight issue for clearing the screen. It appears that the best method for clearing the screen (a CLS subroutine, basically) is to draw a rectangle on it in the background colour, as this handles all the nuances of the screen mode and its specific bpp setting, instead of just writing zero bytes (for 1 bpp) to GPU RAM for a calculated screen size.

The issue I've got is that the rectangle will not fill the screen - there's a vertical line on the far right edge where it won't draw. This looks like something I've missed when I tested the geometry unit whilst we developed it - I'm off to go and dive into the HDL and add 1 to wherever the bounds check is for drawing pixels, but thought I'd raise it here as it's a bug in the existing GPU code.

Also, I need to clear a horizontal line of pixels past the bottom edge of the visible area (used for vertical scrolling) - I'll need to tweak the HDL to allow for drawing past ALL edges by a pixel or two to cater for scrolling, I guess? Let me know if I'm walking into a bear trap or something.

You may blank the a bottom line after a vertical scroll, or,
just expand the max X&Y limits beyond the bottom of the display area.
However, so be warned, you will allow pixels to be written into ram off the screen, so make sure you have spare memory there.

IE, have 8 spare lines of blank data in the Y direction so that you may use a single blit including the blank ram so you do not have to do the actual rectangle clr at the bottom of the screen.

You may also have a full line of text text, 8 y pixels, below the display area pre-printed if you want a TV type vertical smooth scroll where the text begins off the bottom of the screen. You may also have 1 spare line above the top of the screen if you want an up-down web-page like smooth scroll for a text reader/word processor or spreadsheet.

You may also do this horizontally as well and it need not be only 1 line and 1 row of text, you may buffer 2 or 4 characters off the borders if you like or even an entire page.

nockieboy · « **Reply #2702 on:** September 14, 2021, 12:35:12 pm »

Of course, there's loads of ways I could have implemented the scrolling. Why I went with the extra line/s off the screen I don't know, probably the first idea that sprang to mind at the time - other than that I can clear those lines once, then not worry about having to clear every bottom line when I blit the screen upwards one line. You're right though, if memory is tight then it's not the best solution, but with megabytes to play with in the DDR3, that shouldn't be a concern.

Have sorted the CLS not clearing the rightmost column without resorting to changing HDL in the end. I'd gotten mixed up with which registers are 0-based and which aren't, and had set the max_x value to 639 instead of 640 in the setup.

It's the MAGGIE HW registers that are 0-based, not the GPU registers. Sorted now.

nockieboy · « **Reply #2703 on:** September 25, 2021, 04:51:30 pm »

Okay, nearly there with the CP/M graphics driver after a little time to work on it again today. I can now change modes (between 640x480x1 and 640x480x2 anyway) and 4-colour mode is working. I could do a 640x480x4 mode but the MAX10 doesn't have enough block RAM for a power-of-2 to allow for it and would involve some HDL hacks that aren't necessary if I'm going to move up to DDR3 shortly.

An issue has arisen which I'd like to address before I move on to connecting the DDR3 up - and it links directly to the previous question I had last week (or recently, anyway) about the XOR-ing of colours when blitting text characters.

I'd like to be able to blit a 1-bit text character to any number of bitplanes in the target, and change the values of 0s and 1s in the source to arbitrary values in the target. i.e. I'd like to be able to blit the source 1bpp image so that the text could be white on black, or black on white, or green on a different shade of green, depending on which pen/paper values the user has set (and not solely by editing the palette). The current setup doesn't appear to allow this, at least as far as my understanding of it goes? It just seems to be able to alternate between colours immediately adjacent to each other in the palette thanks to the XOR-ing? (i.e. palette entries 0 and 1, 2 and 3, 4 and 5 etc)

Is there a setting that I haven't worked out yet that will allow me to (for example) blit blue text with black background (palette entries 2 and 0)? At the moment all I can do is black on black, or its inverse, or blue on green, or its inverse. I can obviously edit the palette so that 0 is black, 1 is white, or 2 is light green, 3 is dark green etc, but that's not very flexible if bitmaps are also in play).

BrianHG · « **Reply #2704 on:** September 25, 2021, 10:44:31 pm »

For the paste, maybe we can change XOR to ADD in the HDL.

Though, if you want to have different background colors, you would need to do 2 blits, one with a 0 transparency, and the next with a 1 transparency, while pasting both with a different ADD value.

This will work fine for 1 bit source data.
For more bits, you would need to think things through to ensure you can get the color functionality you want if you make the 'ADD' change.

nockieboy · « **Reply #2705 on:** September 26, 2021, 01:04:29 pm »

Quote from: BrianHG on September 25, 2021, 10:44:31 pm

For the paste, maybe we can change XOR to ADD in the HDL.

Though, if you want to have different background colors, you would need to do 2 blits, one with a 0 transparency, and the next with a 1 transparency, while pasting both with a different ADD value.

This will work fine for 1 bit source data.
For more bits, you would need to think things through to ensure you can get the color functionality you want if you make the 'ADD' change.

I'm just thinking purely for text - so 1-bit blits. I've just been reformatting and tidying the code in geometry_xy_plotter whilst thinking about this (I think the word is procrastinating

).

The XOR-ing is done in the geo_pixel_writer, right? I'm looking at these lines specifically:

Code: [Select]

colour_sel_miss  = 16'(wc_paste_pixel ? (PX_COPY_COLOUR ^ {8'd0,wc_colour}) : {8'd0,(wc_colour & LUT_mask[wc_bpp])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color 
colour_sel_hit   = 16'(paste_pixel    ? (PX_COPY_COLOUR ^ {8'd0,colour   }) : {8'd0,(colour    & LUT_mask[bpp   ])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color

So colour_sel_miss and colour_sel_hit refer to cache hits or misses on the read pixel, applying the XOR operation on PX_COPY_COLOUR and colour, or just returning colour ANDed with the LUT_mask?

BrianHG · « **Reply #2706 on:** September 27, 2021, 12:59:52 am »

Yes, you found it.

Now, there is the issue that when adding, it only goes in the positive direction and you have a bit limit based on color depth.

I think you should try:

Code: [Select]

colour_sel_miss  = 16'(wc_paste_pixel ? (PX_COPY_COLOUR + {8'd0,wc_colour}) : {8'd0,(wc_colour & LUT_mask[wc_bpp])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color
colour_sel_hit   = 16'(paste_pixel    ? (PX_COPY_COLOUR + {8'd0,colour   }) : {8'd0,(colour    & LUT_mask[bpp   ])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color

Keep an eye on the FMAX.
Also, I guess it may also be possible to prevent the color overflow paste problem like this (IE: meaning when the final paste color exceeded the available bit depth on the display, trim the extra bits out to prevent garble graphics, this should also allow you to subtract a color value by when adding, a result number greater than the available colors on the screen will be clipped to the available color bits.):

Code: [Select]

colour_sel_miss  = 16'(wc_paste_pixel ? (PX_COPY_COLOUR + {8'd0,wc_colour}) : {8'd0,(wc_colour)})  & LUT_mask[wc_bpp]; // select between copy buffer color for paste XORed with immediate color, or immediate color
colour_sel_hit   = 16'(paste_pixel    ? (PX_COPY_COLOUR + {8'd0,colour   }) : {8'd0,(colour)})     & LUT_mask[bpp   ]; // select between copy buffer color for paste XORed with immediate color, or immediate color

You will need to test and verify the 2.
Also check FMAX for the 125MHz clock.

nockieboy · « **Reply #2707 on:** September 27, 2021, 09:07:21 am »

Quote from: BrianHG on September 27, 2021, 12:59:52 am

Yes, you found it.

Now, there is the issue that when adding, it only goes in the positive direction and you have a bit limit based on color depth.

True, but I'm tending to mask input values according to the current mode (or bitplane setting) in software.

Quote from: BrianHG on September 27, 2021, 12:59:52 am

I think you should try:

Code: [Select]
colour_sel_miss = 16'(wc_paste_pixel ? (PX_COPY_COLOUR + {8'd0,wc_colour}) : {8'd0,(wc_colour & LUT_mask[wc_bpp])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color colour_sel_hit = 16'(paste_pixel ? (PX_COPY_COLOUR + {8'd0,colour }) : {8'd0,(colour & LUT_mask[bpp ])}) ; // select between copy buffer color for paste XORed with immediate color, or immediate color
Keep an eye on the FMAX.

Okie dokie. This is the normal project Timings result (with NO changes) for comparison:

And this is the Timings results with the changes mentioned above:

No real benefit to the text, to be honest. I'm still getting blocks of colour for blitted character where the 'background' of the source character isn't being remapped to the preferred background colour, etc.

Quote from: BrianHG on September 27, 2021, 12:59:52 am

Also, I guess it may also be possible to prevent the color overflow paste problem like this (IE: meaning when the final paste color exceeded the available bit depth on the display, trim the extra bits out to prevent garble graphics, this should also allow you to subtract a color value by when adding, a result number greater than the available colors on the screen will be clipped to the available color bits.):

Code: [Select]
colour_sel_miss = 16'(wc_paste_pixel ? (PX_COPY_COLOUR + {8'd0,wc_colour}) : {8'd0,(wc_colour)}) & LUT_mask[wc_bpp]; // select between copy buffer color for paste XORed with immediate color, or immediate color colour_sel_hit = 16'(paste_pixel ? (PX_COPY_COLOUR + {8'd0,colour }) : {8'd0,(colour)}) & LUT_mask[bpp ]; // select between copy buffer color for paste XORed with immediate color, or immediate color

Timing results for above change:

So there's an incremental negative change in Fmax for the GPU core with these changes. Neither stops the GPU running with the current testing I'm doing, but I'm only requesting 640x480 from the GPU; most likely it will introduce artefacts when pushed or fail to produce HDMI-compatible higher-resolutions.

With background 0, foreground 1, I'm able to get a green font on black background.
Setting background to 1 doesn't change anything - still green on black.
Setting foreground to 2 gives blue on green characters (background is still black, but blitted character backgrounds are green).
Setting background to 2 gives blue on green with a green background.
Setting foreground to 3 gives green on blue characters (background remains green from previous step, but blitted char backgrounds are blue).
Setting background to 3 gives green on blue with a blue background.
Changing foreground colour gives the same changes as previous steps when foreground is changed, but the screen background (not the blitted background) remains the same as the background setting.

Perhaps it would make sense to create an additional blitter mode specifically for text (if possible)? One that takes a 1-bit source and pastes pixels to the target based on the current foreground and background colours, replacing 1s and 0s accordingly. That would allow any colour text to be printed against any colour background (within bpp limits of the current mode, obviously). This wouldn't need to be massively performant due to it being used solely for text, so two blits wouldn't be a deal breaker? Is that possible?

Or is it just a play on the transparency blit mode, treating 0's in the source as transparent and replacing 1's with the foreground colour? That would be preferable, to be honest. Perhaps I'm testing the wrong blit mode or something?

BrianHG · « **Reply #2708 on:** September 27, 2021, 09:40:49 am »

Remember, you have the 'transparency color' setting where when you place a '1', this will invert the font.

Blit 1 time without the invert, selecting an appropriate 'paste add color' will paint your foreground color.
Blit a second time with the transparency inverted, selecting a new 'paste add color' color will render the background outline color of your choice.

After you get the initial DDR3 functioning, you will want to redo the geometry to run at 200MHz instead of 125 since you will want the ram core to operate at 400MHz. This will most likely need some changes to the 'ellipse' module and a few changes to the 'pixel_writer' module. You will have to deal with selecting how the deep color modes are processed when reading from the DDR3 and probably leave the accelerated tile modes to any spare FPGA core memory. This will probably mean something like having only 128kb for hardware tiles/sprite on the larger FPGA with all of the DDR3 memory available for the blitter accelerated style sprites. But, if I could make a faster 'multi-port' module, it could theoretically be possible to make the blitter run at 800 million 32bit pixels a second (32bit ram, 400m read, 400m write) if not at least half that with the blitter pixel collision disabled, but only with 8bit, 16bit, or 32 bit pixels.

The next change after that would be to generate a sequence processor which will read a chunk of DDR3 and feed the 'geometry' units input FIFO instead of having your Z80 always do from a port. The goal is to have enough command functions in this sequencer to select, loop, copy, stop and go, add/subtract/mult/divide 16 and 32 bit 4x4 matrices so that you can have pre-compiled rendered sequences to render graphics where the Z80 only has to send a 'begin program at DDR3 address' and let the GPU hardware do all the work at 200MHz.

IE, such a program in the DDR3 may be loop and read a section of ram at x address, with a width and height of 80x25 bytes, with another for foreground and background colors, and render a screen of text driving the geo unit and blitter. And do so once every frame. Now the Z80 can just access that 80x25 bytes as ascii text and the sequence processor will render that contents as text on a graphics screen software emulating a text mode while the Z80 wont know the difference.

nockieboy · « **Reply #2709 on:** September 27, 2021, 11:05:37 am »

Quote from: BrianHG on September 27, 2021, 09:40:49 am

Remember, you have the 'transparency color' setting where when you place a '1', this will invert the font.

Blit 1 time without the invert, selecting an appropriate 'paste add color' will paint your foreground color.
Blit a second time with the transparency inverted, selecting a new 'paste add color' color will render the background outline color of your choice.

Ah yes, I've been testing with Paste Mask off. When I turn it on, I'm able to get more variety with the fore/background combinations. I think I'll leave it at that for the moment given the potential future need to redo stuff.

Quote from: BrianHG on September 27, 2021, 09:40:49 am

After you get the initial DDR3 functioning, you will want to redo the geometry to run at 200MHz instead of 125 since you will want the ram core to operate at 400MHz.

Yeah, best I stop ducking this next task and crack on with it. Expect questions in the near future.

Quote from: BrianHG on September 27, 2021, 09:40:49 am

This will most likely need some changes to the 'ellipse' module and a few changes to the 'pixel_writer' module. You will have to deal with selecting how the deep color modes are processed when reading from the DDR3 and probably leave the accelerated tile modes to any spare FPGA core memory. This will probably mean something like having only 128kb for hardware tiles/sprite on the larger FPGA with all of the DDR3 memory available for the blitter accelerated style sprites. But, if I could make a faster 'multi-port' module, it could theoretically be possible to make the blitter run at 800 million 32bit pixels a second (32bit ram, 400m read, 400m write) if not at least half that with the blitter pixel collision disabled, but only with 8bit, 16bit, or 32 bit pixels.

Well, the existing font tiles don't take up a vast amount of room, but with more screen resolution comes the need to step up to a font bigger than 8x16 pixels in the future, I guess. Also, if you're trying to impress me with big numbers, it's working.

16-bit pixels should be fine for anything I'd ever want to do.

Quote from: BrianHG on September 27, 2021, 09:40:49 am

The next change after that would be to generate a sequence processor which will read a chunk of DDR3 and feed the 'geometry' units input FIFO instead of having your Z80 always do from a port. The goal is to have enough command functions in this sequencer to select, loop, copy, stop and go, add/subtract/mult/divide 16 and 32 bit 4x4 matrices so that you can have pre-compiled rendered sequences to render graphics where the Z80 only has to send a 'begin program at DDR3 address' and let the GPU hardware do all the work at 200MHz.

This sounds a bit like the 'copper' in the old Amiga systems, but more advanced? So the Z80 could just set up a load of graphics commands and set the sequence processor running whilst it does something else. That sounds awesome...

Quote from: BrianHG on September 27, 2021, 09:40:49 am

IE, such a program in the DDR3 may be loop and read a section of ram at x address, with a width and height of 80x25 bytes, with another for foreground and background colors, and render a screen of text driving the geo unit and blitter. And do so once every frame. Now the Z80 can just access that 80x25 bytes as ascii text and the sequence processor will render that contents as text on a graphics screen software emulating a text mode while the Z80 wont know the difference.

asmi · « **Reply #2710 on:** September 27, 2021, 06:57:51 pm »

Have you considered stuffing a CPU core inside FPGA too and doing away with a silicon? I think the softcore will work much faster than the hard silicon.

nockieboy · « **Reply #2711 on:** September 27, 2021, 08:51:38 pm »

Quote from: asmi on September 27, 2021, 06:57:51 pm

Have you considered stuffing a CPU core inside FPGA too and doing away with a silicon? I think the softcore will work much faster than the hard silicon.

Oh yes, I'm fully aware that this is possible (with a T80 core?) but I started out building my own computer and I'll continue with that for the time being, even though the technology I'm using for the GPU is far superior to the rest of the 'stack'. I do have plans to replace the hardware MMU with one based in the FPGA at some point - and where does the migration onto FPGA end? - but I've found I actually quite enjoy designing and building PCBs and pushing my soldering skills, so all the time I've got hardware (CPUs) sitting around I'll design and build for those first before I go "full FPGA" and do it all there.

asmi · « **Reply #2712 on:** September 28, 2021, 02:14:05 am »

Quote from: nockieboy on September 27, 2021, 08:51:38 pm

Oh yes, I'm fully aware that this is possible (with a T80 core?) but I started out building my own computer and I'll continue with that for the time being, even though the technology I'm using for the GPU is far superior to the rest of the 'stack'. I do have plans to replace the hardware MMU with one based in the FPGA at some point - and where does the migration onto FPGA end? - but I've found I actually quite enjoy designing and building PCBs and pushing my soldering skills, so all the time I've got hardware (CPUs) sitting around I'll design and build for those first before I go "full FPGA" and do it all there.

It just seems to me that the hard silicon is a serious drag for performance, also internal connections are much easier and (if done right) won't have any glitches or timing issues. As for soldering - there will be plenty of it in any case

nockieboy · « **Reply #2713 on:** September 28, 2021, 07:33:32 am »

Quote from: asmi on September 28, 2021, 02:14:05 am

It just seems to me that the hard silicon is a serious drag for performance, also internal connections are much easier and (if done right) won't have any glitches or timing issues. As for soldering - there will be plenty of it in any case

Fortunately performance (other than in the GPU) isn't a real concern for me with this particular system. I started out with the objective of building something more powerful than the first computer I had (whilst learning about electronics at the same time), which I've done in spades (or will have done once I've sorted out the audio). Even the GPU in its current iteration (with no DDR3) is massively more powerful.

The audio is a function I will be migrating to the FPGA, though. AY-3-8912 PSGs (the YM2149 is a Yamaha copy) - as used in my original computer - seem to be a dwindling commodity, and whilst I've got a hardware audio card I built around the AY-3-8910 (not much different, more IO), I built it a few years ago before I progressed to SMD parts etc and it's not my best work. I don't understand analogue electronics at all. I also don't particularly want yet another card on the stack, particularly one for a single chip like the 8910, so I'll be looking to incorporate an HDL YM2149 instead and hope to output that via the PCM5101A, though I'm not sure how I'm going to do that yet (or even if it's practical to).

The only other thing on my FPGA to-do list is replace the CompactFlash card on the stack with a slightly newer SD (preferably micro-SD) slot on the GPU card - maybe even a USB flash drive. I've had a lot of difficulty with the last two iterations of the GPU card (for the EP4CE10) matching a micro-SD card socket with a PCB footprint for some reason. If anyone can recommend a particular micro-SD card socket that is available on EasyEDA and has a decent supplier (Mouser preferably), then please let me know!

asmi · « **Reply #2714 on:** September 28, 2021, 11:15:12 am »

Quote from: nockieboy on September 28, 2021, 07:33:32 am

Fortunately performance (other than in the GPU) isn't a real concern for me with this particular system. I started out with the objective of building something more powerful than the first computer I had (whilst learning about electronics at the same time), which I've done in spades (or will have done once I've sorted out the audio). Even the GPU in its current iteration (with no DDR3) is massively more powerful.

Well it's up to you of course, but as far as I'm concerned, there is no such thing as too much performance. More performance means better graphics, more advanced gameplay (for games), etc.

Quote from: nockieboy on September 28, 2021, 07:33:32 am

The audio is a function I will be migrating to the FPGA, though. AY-3-8912 PSGs (the YM2149 is a Yamaha copy) - as used in my original computer - seem to be a dwindling commodity, and whilst I've got a hardware audio card I built around the AY-3-8910 (not much different, more IO), I built it a few years ago before I progressed to SMD parts etc and it's not my best work. I don't understand analogue electronics at all. I also don't particularly want yet another card on the stack, particularly one for a single chip like the 8910, so I'll be looking to incorporate an HDL YM2149 instead and hope to output that via the PCM5101A, though I'm not sure how I'm going to do that yet (or even if it's practical to).

I think I already recommended in the past to just connect any I2S DAC to FPGA and implement a sound card inside. This way you have an easy growth path as any DAC of the past decade or so supports 48k@24bps and better audio stream, so if at some point you will want to implement stuff like mp3/ogg/wav playback, you can easily do so. But initially you can just bit-stuff extra bits if you want to stick to "historical" sound.

Quote from: nockieboy on September 28, 2021, 07:33:32 am

The only other thing on my FPGA to-do list is replace the CompactFlash card on the stack with a slightly newer SD (preferably micro-SD) slot on the GPU card - maybe even a USB flash drive. I've had a lot of difficulty with the last two iterations of the GPU card (for the EP4CE10) matching a micro-SD card socket with a PCB footprint for some reason. If anyone can recommend a particular micro-SD card socket that is available on EasyEDA and has a decent supplier (Mouser preferably), then please let me know!

You will have to learn how to create footprints yourself. That is a crucial skill for anyone aspiring to do any half-decent PCBs. One bit of advice from me - pick parts from manufacturers which provide 3D STEP models of their products, this way you can "virtually" verify a footprint by adding a 3D model on top of it and seeing if it fits.
As for specific model, the one I've been using for a while is GCT's MEM2075-00-140-01-A, it's available via both Mouser and Digikey, and manufacturer provides 3D models and footprints for many eCAD systems (but I don't think EasyEDA is one of them). The reason I like this specific part is because it's a push-push socket, meaning you push the card in for it to lock inside, and push it in again to release - this guarantees that will be properly secured and won't fall out or lose contact, also it's quite affordable at about US$1.6 for quantity 10 - I bought a 100 of them back in the day as I use them quite often, so I know I will use them up eventually.

nockieboy · « **Reply #2715 on:** September 29, 2021, 09:21:03 am »

Quote from: asmi on September 28, 2021, 11:15:12 am

I think I already recommended in the past to just connect any I2S DAC to FPGA and implement a sound card inside. This way you have an easy growth path as any DAC of the past decade or so supports 48k@24bps and better audio stream, so if at some point you will want to implement stuff like mp3/ogg/wav playback, you can easily do so. But initially you can just bit-stuff extra bits if you want to stick to "historical" sound.

Yes, you did. I'm going to need to give some significant thought to interfacing the YM2149 HDL to an output compatible with an I2S DAC though. I'm planning on using a PCM5102 or something similar. Chip supplies are severely drying up now though, but it looks like I can get these DACs as part of breakout boards and 'reclaim' the chips for use in my own project for cheaper than actually buying (and waiting half a year) for new stock. Wish the same were true of the FPGAs, but I can't get too annoyed about that as I don't really know exactly which FPGA I'm going to be using yet. I nearly pulled the trigger on the last EP4CE22F17C7N in Mouser stock the other day (it's gone now), given BrianHG's recent comments that the CV actually doesn't seem to be any faster despite having more LEs and block RAM, but since block RAM is no longer a real concern if I can get this DDR3 controller working with my GPU, I have a lot more freedom (it seems) in choosing an FPGA.

Quote from: asmi on September 28, 2021, 11:15:12 am

Quote from: nockieboy on September 28, 2021, 07:33:32 am
The only other thing on my FPGA to-do list is replace the CompactFlash card on the stack with a slightly newer SD (preferably micro-SD) slot on the GPU card - maybe even a USB flash drive. I've had a lot of difficulty with the last two iterations of the GPU card (for the EP4CE10) matching a micro-SD card socket with a PCB footprint for some reason. If anyone can recommend a particular micro-SD card socket that is available on EasyEDA and has a decent supplier (Mouser preferably), then please let me know!
You will have to learn how to create footprints yourself. That is a crucial skill for anyone aspiring to do any half-decent PCBs. One bit of advice from me - pick parts from manufacturers which provide 3D STEP models of their products, this way you can "virtually" verify a footprint by adding a 3D model on top of it and seeing if it fits.

Oh I can create footprints with no issue from datasheets etc., just wondered if anyone had a recommendation to save me the trouble...

Quote from: asmi on September 28, 2021, 11:15:12 am

As for specific model, the one I've been using for a while is GCT's MEM2075-00-140-01-A, it's available via both Mouser and Digikey, and manufacturer provides 3D models and footprints for many eCAD systems (but I don't think EasyEDA is one of them). The reason I like this specific part is because it's a push-push socket, meaning you push the card in for it to lock inside, and push it in again to release - this guarantees that will be properly secured and won't fall out or lose contact, also it's quite affordable at about US$1.6 for quantity 10 - I bought a 100 of them back in the day as I use them quite often, so I know I will use them up eventually.

...like that one.

I can get them from Mouser and there's footprints available on EasyEDA for them too which I can verify from the datasheet, thanks very much asmi.

nockieboy · « **Reply #2716 on:** September 29, 2021, 11:15:58 am »

*** DDR3 Controller Setup Questions ***

@BrianHG - I've made a start on connecting the BrianHG_DDR3_Controller to the rest of the GPU this morning in some spare time. I'm using the BrianHG_DDR3_DECA_test1 project as a guide as I figure the way the RS232 debugger is connected to the controller should emulate (in some way) how the Z80_Bridge will need to be connected in the GPU to the DDR3 controller. Is that a good starting point?

I note there's some discrepancy between the _test1 project's top-level file parameters and the GPU's top-level file parameters, so I've copied the params from the _test1 project to the GPU top-level as that's more up-to-date with your v1.0 DDR3_Controller than the GPU_top file was.

Also, do I need to include and tweak the two additional modules, DDR3_CMD_ENCODE_BYTE and DDR3_CMD_DECODE_BYTE in _test1_top into GPU_top? Looks like they're decoding/encoding data to/from the RS232 port and the DDR3_controller, but with only a 5-bit wide address? Is that because it's reading/writing 32 bytes at a time and it allows the host to specify which byte it's accessing from that cache, or something else...?

Finally, it looks like the data_mux_geo would mirror the rs232_debugger's connection to the DDR3 RAM, where it currently connects to internal block RAM. data_mux_geo isn't instantiated in GPU_TOP.sv where the DDR3_Controller resides, though - it's a level down in GPU.sv, so I'm going to need to connect all the DDR3_Controller's IO through to GPU.sv and from there into the data_mux_geo - is that right?

Have attached GPU.sv (have spent some time tidying it up) and GPU_TOP.sv for info if required. GPU_TOP has the DDR3_Controller stuff commented out currently as I've just built the project to make sure my tidying of GPU.sv hasn't broken anything.

BrianHG · « **Reply #2717 on:** September 29, 2021, 11:51:37 am »

You are using the wrong project as an example.
You should be using 'BrianHG_DDR3_DECA_RS232_DEBUG_TEST' from V1.0.

You need to use that full project and it's internal RS232 debugger will replace the one inside your GPU one.

Increase it's read and write port total by 1.
Raise the read and write port #2's priority to max.
Set the new port 2's to 8 bit each.

Wire the write port to the Z80 bridge write -> gpu ram output to my DDR3 write channel #2.

Next, test to see if the Z80 is writing to the DDR3 by looking at RS232 debugger while running Z80 code.

Next for the Z80 read data from gpu ram, disconnect it's read data from the data_mux_geo and wire it to my DDR3 read port #2.

Verify you can now read and write up to 1 megabyte. Also, I have not checked if the core GPU ram has checks in it to prevent writing above your set limit.

Remember, any control signals you disconnect, or ones you are not using should be disabled to 0.

Follow these steps after merging your GPU project into your renamed:
'BrianHG_DDR3_DECA_RS232_DEBUG_TEST' project.
Use it's top hierarchy.

This current setup will not allow the geometry unit to access the DDR3 as this will require a change or new 'data_mux_geo' module to deal with the partitioning of FPGA core ram VS DDR3. But it will allow you to generate the next step which will require the addition of adding the Z80 wait-states. Since the DDR3 is doing nothing else, I doubt you will get corrupt reads, but, we will try to find a way to test this.

nockieboy · « **Reply #2718 on:** September 29, 2021, 02:57:17 pm »

Quote from: BrianHG on September 29, 2021, 11:51:37 am

You need to use that full project and it's internal RS232 debugger will replace the one inside your GPU one.

Increase it's read and write port total by 1.
Raise the read and write port #2's priority to max.
Set the new port 2's to 8 bit each.

Okay, so I'm using the BrianHG_DDR3_DECA_RS232_DEBUG_TEST project (renamed to GPU_DECA_DDR3) and I'm using the donor project's _top file. I've changed these lines:

Code: [Select]

parameter int        PORT_R_TOTAL            = 2,                // Set the total number of DDR3 controller read ports, 1 to 16 max.
parameter int        PORT_W_TOTAL            = 2,                // Set the total number of DDR3 controller write ports, 1 to 16 max.
parameter int        PORT_VECTOR_SIZE        = 8,                // Sets the width of each port's VECTOR input and output.

Hopefully that's correct? What you've said implies that I can set different PORT_VECTOR_SIZEs for each R/W port. Am hoping that's just a 'lost in translation' thing.

How do I change the priority of the read and write ports? This is confusing me:

Code: [Select]

parameter bit [2:0]  PORT_R_PRIORITY      [0:15] = '{  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1},
parameter bit [2:0]  PORT_W_PRIORITY      [0:15] = '{  2,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1},

Quote from: BrianHG on September 29, 2021, 11:51:37 am

Wire the write port to the Z80 bridge write -> gpu ram output to my DDR3 write channel #2.

This is where I'm going to need to pass through some buses and controls from GPU.sv up to GPU_DECA_DDR3_top.sv, I'm guessing, so I can wire the Z80_bridge into the DDR3_Controller?

Files attached for info.

asmi · « **Reply #2719 on:** September 29, 2021, 04:05:21 pm »

Quote from: nockieboy on September 29, 2021, 09:21:03 am

I'm planning on using a PCM5102 or something similar.

That chip supports I2S, and left- and right-justified formats, just like pretty much any other audio DAC I've seen. You can pick pretty much any of them you will find in stock (Cirrus Logic makes quite a bit of different P/Ns) - they all are input-compatible for the most part, and fundamentally work the same way.

SiliconWizard · « **Reply #2720 on:** September 29, 2021, 05:30:49 pm »

Quote from: asmi on September 29, 2021, 04:05:21 pm

Quote from: nockieboy on September 29, 2021, 09:21:03 am
I'm planning on using a PCM5102 or something similar.
That chip supports I2S, and left- and right-justified formats, just like pretty much any other audio DAC I've seen. You can pick pretty much any of them you will find in stock (Cirrus Logic makes quite a bit of different P/Ns) - they all are input-compatible for the most part, and fundamentally work the same way.

Yep. There won't be any difference as far as I2S is concerned - just be aware that any 24-bit or higher DAC these days will take 32-bit data sample through I2S (64-bit data per stereo frame), with the lower unused bits ignored. Some (now less common) 16-bit DACs may only take 16-bit data sample/32-bit per frame, so regarding I2S, that would be your incompatibility here. But if you stick to 24-bit or higher DACs, then it will be basically plug and play as far as I2S is concerned.

Differences may be with how the chips are configured (hardware config via pins, software config via I2C or SPI...), the master clock, but those can be handled separately. As far as I remember, the PCM5102 (probably 5102A as the 5102 is now obsolete if I'm not mistaken) can generate the master clock internally from the I2S bit clock, so it's very easy to interface.

asmi · « **Reply #2721 on:** September 29, 2021, 07:54:47 pm »

Quote from: SiliconWizard on September 29, 2021, 05:30:49 pm

Yep. There won't be any difference as far as I2S is concerned - just be aware that any 24-bit or higher DAC these days will take 32-bit data sample through I2S (64-bit data per stereo frame), with the lower unused bits ignored. Some (now less common) 16-bit DACs may only take 16-bit data sample/32-bit per frame, so regarding I2S, that would be your incompatibility here. But if you stick to 24-bit or higher DACs, then it will be basically plug and play as far as I2S is concerned.

Typically this doesn't matter because there is always quite a bit of "nothingness" at the tail end of each channel's sample value, so higher-resolution DAC will recognize these as trailing zeros.

Quote from: SiliconWizard on September 29, 2021, 05:30:49 pm

Differences may be with how the chips are configured (hardware config via pins, software config via I2C or SPI...), the master clock, but those can be handled separately. As far as I remember, the PCM5102 (probably 5102A as the 5102 is now obsolete if I'm not mistaken) can generate the master clock internally from the I2S bit clock, so it's very easy to interface.

That stuff usually boils down to find a table in the datasheet which tells you what values you need to assign to config pins/config interface registers to get the mode and frequency you want. So it's a one-time task (unless you need to change these on the fly, which is quite rare) for a specific P/N.
The cool part is that some newer DACs (for example CS4344) will auto-detect sampling frequency, so that means less configuration hassle.

BrianHG · « **Reply #2722 on:** September 29, 2021, 10:03:34 pm »

I did not say change the port vector size.

Code: [Select]

// ************************************************************************************************************************************
// ****************  BrianHG_DDR3_COMMANDER configuration parameter settings.
parameter int        PORT_R_TOTAL            = 2,                // Set the total number of DDR3 controller read ports, 1 to 16 max.
parameter int        PORT_W_TOTAL            = 2,                // Set the total number of DDR3 controller write ports, 1 to 16 max.
parameter int        PORT_VECTOR_SIZE        = 16,               // Sets the width of each port's VECTOR input and output.

I said change the data_width to 8 for the read and write on port 2.

Code: [Select]

// PORT_'feature' = '{array a,b,c,d,..} Sets the feature for each DDR3 ram controller interface port 0 to port 15.
parameter bit [8:0]  PORT_R_DATA_WIDTH    [0:15] = '{  8,  8,128,128,128,128,128,128,128,128,128,128,128,128,128,128}, 
parameter bit [8:0]  PORT_W_DATA_WIDTH    [0:15] = '{  8,  8,128,128,128,128,128,128,128,128,128,128,128,128,128,128}, 
                                                            // Use 8,16,32,64,128, or 256 bits, maximum = 'PORT_CACHE_BITS'
                                                            // As a precaution, this will prune/ignore unused data bits and write masks bits, however,
                                                            // all the data ports will still be 'PORT_CACHE_BITS' bits and the write masks will be 'PORT_CACHE_WMASK' bits.
                                                            // (a 'PORT_CACHE_BITS' bit wide data bus has 32 individual mask-able bytes (8 bit words))
                                                            // For ports sizes below 'PORT_CACHE_BITS', the data is stored and received in Big Endian.

And I said raise the priority of read and write port 2 to the max.

Code: [Select]

parameter bit [2:0]  PORT_R_PRIORITY      [0:15] = '{  1,  7,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1},
parameter bit [2:0]  PORT_W_PRIORITY      [0:15] = '{  2,  7,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1},
                                                            // Use 1 through 6 for normal operation.  Use 7 for above refresh priority.  Use 0 for bottom
                                                            // priority, only during free cycles once every other operation has been completed.
                                                            // Open row policy/smart row access only works between ports with identical
                                                            // priority.  If a port with a higher priority receives a request, even if another
                                                            // port's request matches the current page, the higher priority port will take
                                                            // president and force the ram controller to leave the current page.
                                                            // *(Only use 7 for small occasional access bursts which must take president above
                                                            //   all else, yet not consume memory access beyond the extended refresh requirements.)

These are the changes to make all the IO read and write port 2 into an 8 bit ram port compatible with the Z80 and make sure that the Z80 on port 2 has a top read priority above all else.

(*** Note that port 2 is really [ 1 ] while port 1 where the RS232 debugger is tied to is on [ 0 ] )

Don't forget to enable the write mask 'CMD_wmask' on port 2 so the Z80 actually achieves a write.

Also remember to remove all traces of your original RS232 debugger buried inside your GPU core.

And, just in case, these are all your DDR3 controller interface IO ports:

Code: [Select]

// ****************************************
// DDR3 Controller Interface Logic.
// ****************************************
logic                         CMD_R_busy          [0:PORT_R_TOTAL-1];  // For each port, when high, the DDR3 controller will not accept an incoming command on that port.
logic                         CMD_W_busy          [0:PORT_W_TOTAL-1];  // For each port, when high, the DDR3 controller will not accept an incoming command on that port.


logic                         CMD_write_req       [0:PORT_W_TOTAL-1];  // Write request for each port.

logic [PORT_ADDR_SIZE-1:0]    CMD_waddr           [0:PORT_W_TOTAL-1];  // Address pointer for each write memory port.
logic [PORT_CACHE_BITS-1:0]   CMD_wdata           [0:PORT_W_TOTAL-1];  // During a 'CMD_write_req', this data will be written into the DDR3 at address 'CMD_addr'.
                                                                       // Each port's 'PORT_DATA_WIDTH' setting will prune the unused write data bits.
logic [PORT_CACHE_BITS/8-1:0] CMD_wmask           [0:PORT_W_TOTAL-1];  // Write mask for the individual bytes within the 256 bit data bus.
                                                                       // When low, the associated byte will not be written.
                                                                       // Each port's 'PORT_DATA_WIDTH' setting will prune the unused mask bits.


logic [PORT_ADDR_SIZE-1:0]    CMD_raddr           [0:PORT_R_TOTAL-1];  // Address pointer for each read memory port.
logic                         CMD_read_req        [0:PORT_R_TOTAL-1];  // Performs a read request for each port.
logic [PORT_VECTOR_SIZE-1:0]  CMD_read_vector_in  [0:PORT_R_TOTAL-1];  // The contents of the 'CMD_read_vector_in' during a 'CMD_read_req' will be sent to the
                                                                       // 'CMD_read_vector_out' in parallel with the 'CMD_read_data' during the 'CMD_read_ready' pulse.

logic                         CMD_read_ready      [0:PORT_R_TOTAL-1];  // Goes high for 1 clock when the read command data is valid.
logic [PORT_CACHE_BITS-1:0]   CMD_read_data       [0:PORT_R_TOTAL-1];  // Valid read data when 'CMD_read_ready' is high.
logic [PORT_VECTOR_SIZE-1:0]  CMD_read_vector_out [0:PORT_R_TOTAL-1];  // Returns the 'CMD_read_vector_in' which was sampled during the 'CMD_read_req' in parallel
                                                                       // with the 'CMD_read_data'.  This allows for multiple post reads where the output
                                                                       // has a destination pointer. 
logic [PORT_ADDR_SIZE-1:0]    CMD_read_addr_out   [0:PORT_R_TOTAL-1];  // A return of the address which was sent in with the read request.


logic                        CMD_R_priority_boost [0:PORT_R_TOTAL-1];  // Boosts the port's 'PORT_R_PRIORITY' parameter by a weight of 8 when set.
logic                        CMD_W_priority_boost [0:PORT_W_TOTAL-1];  // Boosts the port's 'PORT_W_PRIORITY' parameter by a weight of 8 when set.

Remember, all control inputs need to be wired to a control, or a 0 or 1.
Yes you need to pass some of the output from the Z80 bridge to the top of your GPU module to feed this new TOP where the DDR3 controller and new RS232 debugger exists.
Also, you may need to change where I wired the new RS232 debugger's RXD/TXD to your chosen IOs.

nockieboy · « **Reply #2723 on:** September 30, 2021, 08:07:39 am »

Quote from: BrianHG on September 29, 2021, 10:03:34 pm

And I said raise the priority of read and write port 2 to the max.

Code: [Select]
parameter bit [2:0] PORT_R_PRIORITY [0:15] = '{ 1, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, parameter bit [2:0] PORT_W_PRIORITY [0:15] = '{ 2, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}, // Use 1 through 6 for normal operation. Use 7 for above refresh priority. Use 0 for bottom // priority, only during free cycles once every other operation has been completed. // Open row policy/smart row access only works between ports with identical // priority. If a port with a higher priority receives a request, even if another // port's request matches the current page, the higher priority port will take // president and force the ram controller to leave the current page. // *(Only use 7 for small occasional access bursts which must take president above // all else, yet not consume memory access beyond the extended refresh requirements.)
These are the changes to make all the IO read and write port 2 into an 8 bit ram port compatible with the Z80 and make sure that the Z80 on port 2 has a top read priority above all else.

Ahh okay - didn't understand how the PORT_x_PRIORITY array worked (i.e. didn't realise it was { PORT 1, PORT 2, PORT 3, etc... }).

Quote from: BrianHG on September 29, 2021, 10:03:34 pm

Don't forget to enable the write mask 'CMD_wmask' on port 2 so the Z80 actually achieves a write.

So I'm going to need to edit this section:

Code: [Select]

// Latch the read data from port 0 on the CMD_CLK clock.
always_ff @(posedge CMD_CLK) begin 

   if (RST_OUT) begin // RST_OUT is clocked on the CMD_CLK source.
		
		for(int i = 0; i < PORT_R_TOTAL; i++) begin	// Clear all the read requests.
		
			CMD_read_req[i]         <= 0 ;
			CMD_raddr[i]            <= 0 ;
			CMD_read_vector_in[i]   <= 0 ;
			CMD_R_priority_boost[i] <= 0 ;

		end
		
		for(int i = 0; i < PORT_W_TOTAL; i++) begin // Clear all the write requests.
		
			CMD_write_req[i]        <= 0 ;
			CMD_waddr[i]            <= 0 ;
			CMD_wdata[i]            <= 0 ;
			CMD_wmask[i]            <= 0 ;
			CMD_W_priority_boost[i] <= 0 ;
		
		end

   end else begin
                                                 
      // Wire the 8 bit write port.  We can get away with crossing a clock boundary with the write port.
      // Since there is no busy for the RS232 debugger write command, write port[0]'s priority was made 7 so it overrides everything else.

      CMD_waddr[0]     <= (PORT_ADDR_SIZE)'(DB232_addr)     ; // Set the RS232 write address.
      CMD_wdata[0]     <= (PORT_CACHE_BITS)'(DB232_wdat)    ; // Set the RS232 write data.
      CMD_wmask[0]     <= (PORT_CACHE_BITS/8)'(1)           ; // 8 bit write data has only 1 write mask bit.     

      DB232_wreq_dly   <=  DB232_wreq                       ; // Delay the write request as we are crossing clock boundaries and we want the
                                                              // address and data setup 1 clock early.  We know this can work as the RS232 debugger module
                                                              // holds the data and address for at least 1 clock.
      CMD_write_req[0] <=  DB232_wreq_dly && !CMD_W_busy[0] ; // 1 clock delayes write request.

      // Wire the 8 bit read port address.  When changing clock domains, we rely on a trick where the RS232 debugger keeps the
      // DB232_rreq high until it receives a result from the CMD_read_ready.  BrianHG_DDR3_CONTROLLER_top will see this as
      // many continuous requests at the same address and provide a continuous CMD_read_ready result as the internal
      // smart cache has only a clock cycle delay once the initial DDR Ram has been read.

      DB232_rreq_dly   <=  DB232_rreq                       ; // Create a delayed read request.  Same idea as above...
      CMD_read_req[0]  <=  DB232_rreq_dly && !CMD_R_busy[0] ; // Read request.
      CMD_raddr[0]     <= (PORT_ADDR_SIZE)'(DB232_addr)     ; // Set the RS232 read address.

      if (CMD_read_ready[0]) begin                        // If the read data is ready
      
         p0_data   <= 8'(CMD_read_data[0]) ; // Clean latch the read data.
         p0_drdy   <= 1 ;                    // Set the data ready flag
         
      end else
      
         p0_drdy   <= 0 ;
         
      end

   end // !reset

end // @CMD_CLK

I've already modified the RESET function at the top to clear ALL ports, not just Port 1, as we've got two ports now. Hopefully that's right.

I'm going to have to duplicate the section for non-reset conditions (after the 'end else begin' midway through the code snippet) to wire the second port to the Z80_bridge. If I insert something like this?

Code: [Select]

      // Wire the 8 bit write port.  We can get away with crossing a clock boundary with the write port.

      CMD_waddr[1]     <= (PORT_ADDR_SIZE)'(Z80_addr)       ; // Set the Z80 write address.
      CMD_wdata[1]     <= (PORT_CACHE_BITS)'(Z80_wdat)      ; // Set the Z80 write data.
      CMD_wmask[1]     <= (PORT_CACHE_BITS/8)'(1)           ; // 8 bit write data has only 1 write mask bit.     

      Z80_wreq_dly     <=  Z80_wreq                         ; // Delay the write request as we are crossing clock boundaries and we want the
                                                              // address and data setup 1 clock early.  We know this can work as the RS232 debugger module
                                                              // holds the data and address for at least 1 clock.
      CMD_write_req[1] <=  Z80_wreq_dly && !CMD_W_busy[1]   ; // 1 clock delays write request.

      Z80_rreq_dly     <=  Z80_rreq                         ; // Create a delayed read request.  Same idea as above...
      CMD_read_req[1]  <=  Z80_rreq_dly && !CMD_R_busy[1]   ; // Read request.
      CMD_raddr[1]     <= (PORT_ADDR_SIZE)'(Z80_addr)       ; // Set the Z80 read address.

      if (CMD_read_ready[1]) begin                        // If the read data is ready
      
         p1_data   <= 8'(CMD_read_data[1]) ; // Clean latch the read data.
         p1_drdy   <= 1 ;                    // Set the data ready flag
         
      end else
      
         p1_drdy   <= 0 ;
         
      end

Would that be okay or have I misunderstood something?

Quote from: BrianHG on September 29, 2021, 10:03:34 pm

Also remember to remove all traces of your original RS232 debugger buried inside your GPU core.

I'll be honest, I'm not finding this easy working from HDL files instead of the graphical design we used previously for the EP4CE10 version of the GPU.

Quote from: BrianHG on September 29, 2021, 10:03:34 pm

And, just in case, these are all your DDR3 controller interface IO ports:
...
Remember, all control inputs need to be wired to a control, or a 0 or 1.
Yes you need to pass some of the output from the Z80 bridge to the top of your GPU module to feed this new TOP where the DDR3 controller and new RS232 debugger exists.
Also, you may need to change where I wired the new RS232 debugger's RXD/TXD to your chosen IOs.

Righto. They need to be interfaced to the Z80_bridge, so are we bypassing the data_mux_geo as the DDR3_Controller has two ports?

I actually don't have RXD/TXD set up on the current DECA interface card. I've ordered an updated PCB that brings out all the spare IOs to a header and also provides a TXD/RXD port for the debugger, both raw and via a CH340 if I don't have a spare RS232/TTL handy. In the meantime, I can just hotwire the RXD/TXD IOs up to a loose header and hot-glue it to the PCB.

BrianHG · « **Reply #2724 on:** September 30, 2021, 08:59:11 am »

The Z80 write/output still goes to the MUX so you can write to FPGA core ram in parallel with the DDR3.

Yes, my multiport has a separate read and write port.
IE, if the Z80 bridge has a single address output, you will wire that to both the read and write addr [ 1 ].
Next you have the read and write req ports, IE read / write enable.

OOops, I forgot, you are running your code on a separate PLL.

Ok, we got a new problem. I was going to say you did not need the delays I have in my example code as they were put there for clock domain crossing. This really craps everything up.

Your current core needs 125 and 250MHz and this is the speed coming out of your Z80 bridge. But, running the DDR3 at 400MHz means my IO ports are running at 100Mhz. A slower speed. To be 1:1 compatible, you need tyo run the ram at 250MHz in half mode, or 500MHz in quarter mode.

Ok, let's try the overclock method.
This now means that you need to remove your PLL in your GPU and use the following 2 clocks coming out of my design to replace them.

First change this line:

Code: [Select]

parameter int        CLK_IN_MULT             = 32,               // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,                // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

to:

Code: [Select]

parameter int        CLK_IN_MULT             = 40,               // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,                // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

This will make the DDR3 run at 500MHz.

Disable your PLL in your GPU and now, these will be your clock signals:
DDR3_CLK = 500MHz.
DDR3_CLK_50 = 250MHz. -> This will new feed your GPU 'clk_2x'
DDR3_CLK_25 = 125MHz. -> This will new feed your GPU 'clk'
DDR3_CLK_25 = 125MHz. -> This will new feed your GPU 'clk_2x_phase' *** you may need to invert this one.
CLK_IN = 50MHz -> This will feed your GPU 'com_clk'

As for your code, it should look like this:

Code: [Select]

      CMD_waddr[1]       <= (PORT_ADDR_SIZE)'(gpu_addr)       ; // Set the Z80 write address.
      CMD_wdata[1]       <= (PORT_CACHE_BITS)'(gpu_wdata)      ; // Set the Z80 write data.
      CMD_wmask[1]      <= (PORT_CACHE_BITS/8)'(1)           ; // 8 bit write data has only 1 write mask bit.     
      CMD_write_req[1]  <=  gpu_wr_ena                         ;
 
      CMD_read_req[1]  <=  gpu_rd_req ;
      CMD_raddr[1]      <= (PORT_ADDR_SIZE)'(gpu_addr)       ; // Set the Z80 read address.

      gpu_rd_rdy         <= CMD_read_ready[1] ;
      gpu_rData[7:0]     <= 8'(CMD_read_data[1]) ;

*** The net labels I listed above are what you called them on the Z80 bridge.

Now, I did not touch the 'R/W_busy' for now which in combination with waiting for the ' CMD_read_ready[ 1 ] ' will drive the Z80 hold. This will be added to the Z80 bridge code.

Now, about overclocking the DDR3 to 500MHz. The true goal is to get it back down to 400MHz and make an asynchronous VGA section which will have the 25MHz for the video pixel clock. The means your GPU core will slow down to 100MHz and 200MHz, but, just the final VGA output section will have it's own 25MHz clock with it's own PLL done in the same way I did my random bouncing ellipse demo.

I'm sorry about the missing graphic view you are used to. Beginning that way has made you unfamiliar with the module net names.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: FPGA VGA Controller for 8-bit computer (Read 424827 times)

Share me