FPGA VGA Controller for 8-bit computer

#1800 Reply
Posted by BrianHG on 03 Oct, 2020 22:55
Quote from: nockieboy on 03 Oct, 2020 22:52
Quote from: BrianHG on 03 Oct, 2020 22:47
See the 45 degree angle hatchet...
Also notice the letters are larger...

Remember my line: 0.707:1, or, 1:1.41421 ... (See if you (Or anyone else here) could figure out why this magic down-sample figure is really important...)

Remember this A² + B² = C²...

The checker board issue, the letters being larger, my fancy numbers, and that Pythagorean, and the right scaling settings can be setup to solve the 45 degree issue perfectly.

Ah, of course.. If the image being blitted has dimension AxB, when it's rotated 45 degrees its side A will be 1.41421 times longer due to good old Pythagoras and his squared hypotenuse. Okay, no worries, I'll just need to downscale the blits by 0.707 to 1 to remove the chequer-boarding and make them less overweight.

Hitting the sack now - will hopefully have more time tomorrow.

I predict that making the source scaler set to 2x the source image size, then shrinking the output image size to 0.5x will fill in the checker board, but, the final picture size still be 1.4x the size, but perfectly dot-dot rendered. The output size will need to be set to 0.3535x, ie (0.707/2). That scale will make the output look the right size. The 2x source ensures oversampling of every pixel from the source, but I may be wrong. using 1x and a down-sample of 0.707x may properly fill in the checkerboard without missing a dot saving 1/4 the pixel processing clocks.

#1801 Reply
Posted by nockieboy on 04 Oct, 2020 10:38
Still getting blit corruption if I draw the quad before I run the blit tests. The character blits are fine after the quad is drawn now, but the half-screen blit is still badly corrupted.

Could this be due to an error in my test code, or something still needs to be tied up with the quad function in HDL?

#1802 Reply
Posted by nockieboy on 04 Oct, 2020 10:48
Default font, 45 degree rotation, paste mask enabled, up-scaled 2x (0x8800 & 0xC800), down-scaled 0.3535x (0x95A8 & 0xD5A8):

#1803 Reply
Posted by BrianHG on 04 Oct, 2020 11:08
Quote from: nockieboy on 04 Oct, 2020 10:38
Still getting blit corruption if I draw the quad before I run the blit tests. The character blits are fine after the quad is drawn now, but the half-screen blit is still badly corrupted.

Could this be due to an error in my test code, or something still needs to be tied up with the quad function in HDL?
That corruption looks like a bad screen address & bitmap width setting.
Remember, reading a font or copying a half screen uses a different source address and bitmap width + a copy width and height. Like before when the text was corrupt, you may have missed or got one of these settings backwards.

#1804 Reply
Posted by nockieboy on 04 Oct, 2020 12:08
Quote from: BrianHG on 04 Oct, 2020 11:08
Quote from: nockieboy on 04 Oct, 2020 10:38
Still getting blit corruption if I draw the quad before I run the blit tests. The character blits are fine after the quad is drawn now, but the half-screen blit is still badly corrupted.

Could this be due to an error in my test code, or something still needs to be tied up with the quad function in HDL?
That corruption looks like a bad screen address & bitmap width setting.
Remember, reading a font or copying a half screen uses a different source address and bitmap width + a copy width and height. Like before when the text was corrupt, you may have missed or got one of these settings backwards.

Most likely, I just can't see it at the moment. I'm setting the source address and width for the blit okay - it works after the character blitting, just doesn't play after the quad is drawn. Anyway, here's a video:

#1805 Reply
Posted by BrianHG on 04 Oct, 2020 12:21
Nice vid...

Remember, when you edit a quad, you are using all x&y[ #s ].
When you set a blitter address width, it uses both x&y to set the viewport width since the viewport width control is 16bits wide, not 12bits wide.
Rendering a quad means every X&Y has been filled with data.
Setting the blitter bitmap width & depth only setting the 'X' and not clearing the 'Y' means you may have set the width to something like 16k pixels per line. This was done to allow really wide viewports which can go up to 64k pixels wide.

The copy width and height is limited to 4096x4096 at a time as it is only 12 bits wide and height.
The offset and destination coordinates are from -2048 to +2047 as these registers operate in signed mode as all the geometry x&y plotting controls are signed.

If this doesn't help, please send me a minimal quad plus minimal blit which messes up and I will simulate check it.

#1806 Reply
Posted by nockieboy on 04 Oct, 2020 18:45
Just had another little play - looks like sending command 0x0900 isn't turning off scaling?

When I run the test initially, all is fine. The X scales, program exits, great. When I run the test again immediately after, the character blits are all scaled by 25x, despite turning off scaling before the test program exits, and when the test program starts... (command 0x0900). Any ideas?

#1807 Reply
Posted by BrianHG on 04 Oct, 2020 19:25
Quote from: nockieboy on 04 Oct, 2020 18:45
Just had another little play - looks like sending command 0x0900 isn't turning off scaling?

When I run the test initially, all is fine. The X scales, program exits, great. When I run the test again immediately after, the character blits are all scaled by 25x, despite turning off scaling before the test program exits, and when the test program starts... (command 0x0900). Any ideas?
No, 0x0900 does nothing. You need to sen 0x0903 so that all the xy[0/1] regs take effect. You need to put the 0's into the xy[1/2] regs, then send a 0x0903 so both regs [0/1] are passed to the scaler controls.

#1808 Reply
Posted by nockieboy on 04 Oct, 2020 20:08
Quote from: BrianHG on 04 Oct, 2020 19:25
Quote from: nockieboy on 04 Oct, 2020 18:45
Just had another little play - looks like sending command 0x0900 isn't turning off scaling?

When I run the test initially, all is fine. The X scales, program exits, great. When I run the test again immediately after, the character blits are all scaled by 25x, despite turning off scaling before the test program exits, and when the test program starts... (command 0x0900). Any ideas?
No, 0x0900 does nothing. You need to sen 0x0903 so that all the xy[0/1] regs take effect. You need to put the 0's into the xy[1/2] regs, then send a 0x0903 so both regs [0/1] are passed to the scaler controls.

Ahaahh! My misunderstanding then - I thought 0x0900 turned off the X & Y scaling. So to "turn them off", I need to set XY[0] and XY[1] to appropriate values to set scaling to 1:1. No problem.

#1809 Reply
Posted by BrianHG on 04 Oct, 2020 20:18
As my province is now back on emergency quarantine, we I have a little time to finish ellipse, so, I setting up a freebasic tester for the current linegen to render a diamond based on mouse coordinates with the framework to allow you to play with a second parallel linegen which will be used to generate the arcs of the ellipse. Give me 30-45 minutes to upload it.

Again, the coding will need to easily translate to the current Verilog linegens. Once done, there wont be much left to do with the limitations of you current FPGA.

Though, you can drop the MAGGIE layers from 6 down to 5 or 4 to release a bunch a free space.

#1810 Reply
Posted by BrianHG on 04 Oct, 2020 20:40
I'm thinking you should change the command structure into 2 byte, 3 byte and 4 byte commands.

Commands 0-127 would send 2 byte controls like now.

Commands 128-191, 3 bytes would just directly change a 16 bit integer setting, directly feed any control, or set the x/y[ # ] instead of piping everything through the XY regs. Now the XY regs, room for up to 64 of them will be 16bit each instead of 12bit. (IE Z80 sends 3 bytes total.)

Commands 192-160 would send 24 bit integer commands. (IE Z80 sends 4 bytes total).
Commands 161-255 would send 32 bit integer commands. (IE Z80 sends 5 bytes total).

You will no longer need to convert all your 16 bits into 12bit + shift part of the remainder into the command's LSB's or the next Y register for the 24 bit ints.

This way, you may increase all the 12bit regs to 16bit, making your new XY coordinates go from -32768 to 32767, as well as a scaler which would have a 16bit scales of N:65536, 16x the current 4096 division steps.

Also, you will now have access to many more x/y regs than just 4 if you need them.

With room for 64x 16 bit integer regs and even 32 x 32 bit integers,it is now feasible to implement a 32bit ALU, full multiply/divide & add/sub between the regs with so many spare commands in the first 128 to direct things like holding offset and scale factors for the line drawing engines making high quality accelerated geometry graphics possible.

The 32 32bit integers will allow for true floating point accelerated geometry. Though, we are getting into the realm of how can a Z80 feed all of this. Though, you can feed this command pipe from memory contents of the GPU ram itself.

Maybe just use 2 byte and 4 byte command modes since the FIFO pipe and GPU ram is organized in 16 bits.
To send a 32 bit word, you would need to transmit 2x 4 byte commands, or we can scrap 32 bit ALU and just use 24 bit limiting use to +/- 8million range integers.

#1811 Reply
Posted by BrianHG on 04 Oct, 2020 22:31
Ok, I setup the attached freebasic attached code to feed the coordinates for the Bresenham Bézier curve located on this page:

http://members.chello.at/easyfilter/bresenham.html

Right at the bottom of the code, you need to change the 'drawLine_arc' from the current drawline into the Bézier curve just like we did before, but, make it so that it can be easily translated into verilog. The final code should look almost identical to the existing draw line, just the added +/- arc curve correction will be added.

PLUS- at the end of the Bresenham Bézier curve on that website has a final 'plotLine(x0,y0, x2,y2)' which we will need to make part of the curved linegen code itself.

My basic code allows you to scroll the mouse to manipulate the curve settings.
Press 'ESC' to quit.

Once it works the way we like, we will replace our existing line_generator.sv into this new generator which has a third input coordinate for making the arcs. Then to make a straight line, all we need to do it match P1&P2 when starting a line. When drawing an ellipse, we just need to use the correct x&y offset depending on which quadrant is being drawn as our existing geometry_xy_plotter code is already setup to do. (Just look at drawing a box & quad command inside our latest code...)

To rotate on the arc means calculating a rotation transform for determining point P1 based on X&Y coordinates (which are now separate instead of tied to a center axis) of P0&P2, or letting the Z80 compute the P1. This could not be done with your normal ellipse command.

Touch up the 'Geoarc.bas' and post it back.

Geoarc.bas

#1812 Reply
Posted by nockieboy on 05 Oct, 2020 09:58
Quote from: BrianHG on 04 Oct, 2020 22:31
Right at the bottom of the code, you need to change the 'drawLine_arc' from the current drawline into the Bézier curve just like we did before, but, make it so that it can be easily translated into verilog. The final code should look almost identical to the existing draw line, just the added +/- arc curve correction will be added.

Err.... well, it might look like the existing code if I could get it to work. I'm having trouble translating the C example code to something FreeBasic will run in a coherent form. i.e. I have no idea what this line evaluates to: y1 = 2*err < dx; and I can't seem to find anything online that'll help me out. It looks like (to me) it should assign y1 to either TRUE or FALSE, which doesn't make sense as y1 is an integer value (coordinate position) supplied to the function in the first place.

There's multiple assignments to the same value that I'm not sure I've translated correctly too. All I'm getting in the output window is a part-diamond shape with one edge going mad and making a fish tail before shooting off as a straight line to infinity if I'm not careful with the mouse pointer. I'm trying random things now, so I've stopped as I clearly have no idea what's wrong or how to fix it.

Quote from: BrianHG on 04 Oct, 2020 22:31
Touch up the 'Geoarc.bas' and post it back.

Yeah. Erm... here's my broken mess.

Geoarc.bas

#1813 Reply
Posted by nockieboy on 05 Oct, 2020 10:38
Quote from: BrianHG on 04 Oct, 2020 20:18
As my province is now back on emergency quarantine, we I have a little time to finish ellipse, so, I setting up a freebasic tester for the current linegen to render a diamond based on mouse coordinates with the framework to allow you to play with a second parallel linegen which will be used to generate the arcs of the ellipse. Give me 30-45 minutes to upload it.

Won't be long before lockdown kicks in again here - think it's just a matter of time.

Quote from: BrianHG on 04 Oct, 2020 20:18
Again, the coding will need to easily translate to the current Verilog linegens. Once done, there wont be much left to do with the limitations of you current FPGA.

Though, you can drop the MAGGIE layers from 6 down to 5 or 4 to release a bunch a free space.

Yes, we could always do that - I'm not using more than 2 at a time whilst testing anyway. Starting to think more about the hardware (Cyclone 5) design now than the software as the CE10 has become the bottleneck.

#1814 Reply
Posted by nockieboy on 05 Oct, 2020 10:57
Quote from: BrianHG on 04 Oct, 2020 20:40
I'm thinking you should change the command structure into 2 byte, 3 byte and 4 byte commands.

Commands 0-127 would send 2 byte controls like now.

Commands 128-191, 3 bytes would just directly change a 16 bit integer setting, directly feed any control, or set the x/y[ # ] instead of piping everything through the XY regs. Now the XY regs, room for up to 64 of them will be 16bit each instead of 12bit. (IE Z80 sends 3 bytes total.)

Commands 192-160 would send 24 bit integer commands. (IE Z80 sends 4 bytes total).
Commands 161-255 would send 32 bit integer commands. (IE Z80 sends 5 bytes total).

You will no longer need to convert all your 16 bits into 12bit + shift part of the remainder into the command's LSB's or the next Y register for the 24 bit ints.

This way, you may increase all the 12bit regs to 16bit, making your new XY coordinates go from -32768 to 32767, as well as a scaler which would have a 16bit scales of N:65536, 16x the current 4096 division steps.

Also, you will now have access to many more x/y regs than just 4 if you need them.

With room for 64x 16 bit integer regs and even 32 x 32 bit integers,it is now feasible to implement a 32bit ALU, full multiply/divide & add/sub between the regs with so many spare commands in the first 128 to direct things like holding offset and scale factors for the line drawing engines making high quality accelerated geometry graphics possible.

The 32 32bit integers will allow for true floating point accelerated geometry. Though, we are getting into the realm of how can a Z80 feed all of this. Though, you can feed this command pipe from memory contents of the GPU ram itself.

Sounds like a big improvement in capability, especially as we're going to be able to leverage that extra capability when I get a Cyclone V board up and running.

Funny you should mentioned an ALU. I've been wondering about an FPU core. At the moment I have no idea how an FPU integrates and is utilised by the old processors like the 68020 etc, but I wonder if it would be a benefit to the Z80? There were a couple of 3D (wireframe) games for the old 8-bit computers (Elite, Starglider to name but two), and they managed that all on the Z80 processor. Having hardware that could perform the floating point maths and matrix transformations would surely be a big boost?

Quote from: BrianHG on 04 Oct, 2020 20:40
Maybe just use 2 byte and 4 byte command modes since the FIFO pipe and GPU ram is organized in 16 bits.
To send a 32 bit word, you would need to transmit 2x 4 byte commands, or we can scrap 32 bit ALU and just use 24 bit limiting use to +/- 8million range integers.

No issue with sending 32-bit words, but I'm probably going to have to move from the current method of communicating with the GPU via IO ports to loading the commands/data into GPU RAM and letting the GPU off the leash to work on the command list. Interacting with the GPU via IO is expensive time-wise due to the extra WAIT-state the Z80 inserts and it can only send a byte at a time. A memory interface would speed things up slightly, I guess.

#1815 Reply
Posted by nockieboy on 05 Oct, 2020 15:44
Quote from: BrianHG on 29 Sep, 2020 23:39
Quote from: nockieboy on 29 Sep, 2020 22:10
So which is better - the TLV62130A or the AOZ1284? I'd be using them to provide a 5V and 3.3V rail to the entire uCOM system, as well as the GPU - which would have more chips supplying the 1.2 and 2.5V rails from the 5V rail. The uCOM draws about 120mA without the GPU.

The TI has far more suppliers, though it's double price the AOZ1284. Use it only for the 3.3v & VCC 1.8v core. The 2.5v PLL you may use a cheap linear 50ma regulator in sot23 package. The VCC analog for the DAC you may also use a 100ma regulator. The TI part doesn't require a diode on the output and it's inductor is far cheaper and smaller at 2.2uH 4 amp instead of the AOZ1284's required 10uH 5 amp inductor.

If I were you, look into the latest KiCad. It's public domain and it may already have the Cyclone V component in it's library, or online library. This is the biggest hurdle for you if you are worried about mistakes.

The Cyclone V schematics I sent you will tell you how to hook up the JTAG/Active serial programming ports & configuration lines and filters for the analog PLL core voltages, unless you can find an existing online Cyclone V project which you can load and edit.

Okay, I've spent a little time researching and designing the power rails for the Cyclone V E board using the TI part you recommended. If you have five seconds spare, would really appreciate your thoughts. I'll make a start on the clock, configuration and JTAG/AS interface next.

I've never got on with KiCAD - found its UI to be a brick wall to learning more about how to use it, so I started out with DipTrace which spoiled me, really. It's free for up to 500 pin designs, so I quickly outgrew it and moved on to EasyEDA (as you know), which is my go-to design tool now (and probably not too dissimilar from KiCAD, so I'm aware of the irony!) It has Cyclone V parts on it already, fortunately.

GPU Power Schematic.pdf

#1816 Reply
Posted by BrianHG on 05 Oct, 2020 20:05
Quote from: nockieboy on 05 Oct, 2020 15:44
Okay, I've spent a little time researching and designing the power rails for the Cyclone V E board using the TI part you recommended. If you have five seconds spare, would really appreciate your thoughts. I'll make a start on the clock, configuration and JTAG/AS interface next.

I've never got on with KiCAD - found its UI to be a brick wall to learning more about how to use it, so I started out with DipTrace which spoiled me, really. It's free for up to 500 pin designs, so I quickly outgrew it and moved on to EasyEDA (as you know), which is my go-to design tool now (and probably not too dissimilar from KiCAD, so I'm aware of the irony!) It has Cyclone V parts on it already, fortunately.

So far so good. Remember, the C8 variant is the cheapest, see here:
https://lcsc.com/product-detail/CPLD-FPGA_Altera-5CEBA2F17C8N_C568996.html

Just remember, reserve the high-speed IO banks for wiring to DDR3 ram & to directly drive HDMI out. Everything else is almost do as you please, but still, I would sector off sections for Z80, Analog VGA, (make it close to a high speed IO bank containing the HDMI outputs). Look in the data sheet, the 2 top & 2 bottom IO banks are high speed and the 2 left and 2 right ones are the slower ones. DDR3 ram requires that IO bank to be on a lower supply voltage and that bank needs a PLL differential dedicated output in that IO bank to drive the PLL clock.

#1817 Reply
Posted by BrianHG on 05 Oct, 2020 20:11
Quote from: nockieboy on 05 Oct, 2020 10:57
Quote from: BrianHG on 04 Oct, 2020 20:40
I'm thinking you should change the command structure into 2 byte, 3 byte and 4 byte commands.

Commands 0-127 would send 2 byte controls like now.

Commands 128-191, 3 bytes would just directly change a 16 bit integer setting, directly feed any control, or set the x/y[ # ] instead of piping everything through the XY regs. Now the XY regs, room for up to 64 of them will be 16bit each instead of 12bit. (IE Z80 sends 3 bytes total.)

Commands 192-160 would send 24 bit integer commands. (IE Z80 sends 4 bytes total).
Commands 161-255 would send 32 bit integer commands. (IE Z80 sends 5 bytes total).

You will no longer need to convert all your 16 bits into 12bit + shift part of the remainder into the command's LSB's or the next Y register for the 24 bit ints.

This way, you may increase all the 12bit regs to 16bit, making your new XY coordinates go from -32768 to 32767, as well as a scaler which would have a 16bit scales of N:65536, 16x the current 4096 division steps.

Also, you will now have access to many more x/y regs than just 4 if you need them.

With room for 64x 16 bit integer regs and even 32 x 32 bit integers,it is now feasible to implement a 32bit ALU, full multiply/divide & add/sub between the regs with so many spare commands in the first 128 to direct things like holding offset and scale factors for the line drawing engines making high quality accelerated geometry graphics possible.

The 32 32bit integers will allow for true floating point accelerated geometry. Though, we are getting into the realm of how can a Z80 feed all of this. Though, you can feed this command pipe from memory contents of the GPU ram itself.

Sounds like a big improvement in capability, especially as we're going to be able to leverage that extra capability when I get a Cyclone V board up and running.

Funny you should mentioned an ALU. I've been wondering about an FPU core. At the moment I have no idea how an FPU integrates and is utilised by the old processors like the 68020 etc, but I wonder if it would be a benefit to the Z80? There were a couple of 3D (wireframe) games for the old 8-bit computers (Elite, Starglider to name but two), and they managed that all on the Z80 processor. Having hardware that could perform the floating point maths and matrix transformations would surely be a big boost?

Quote from: BrianHG on 04 Oct, 2020 20:40
Maybe just use 2 byte and 4 byte command modes since the FIFO pipe and GPU ram is organized in 16 bits.
To send a 32 bit word, you would need to transmit 2x 4 byte commands, or we can scrap 32 bit ALU and just use 24 bit limiting use to +/- 8million range integers.

No issue with sending 32-bit words, but I'm probably going to have to move from the current method of communicating with the GPU via IO ports to loading the commands/data into GPU RAM and letting the GPU off the leash to work on the command list. Interacting with the GPU via IO is expensive time-wise due to the extra WAIT-state the Z80 inserts and it can only send a byte at a time. A memory interface would speed things up slightly, I guess.

Instead of command port & a command structure, we can make every word and control a memory address, but, you will no longer have an input fifo. Every sent command will need to wait for the last draw command to complete before you can touch any variables. This will slow down the Z80 when drawing multiple large screen elements and blits.

#1818 Reply
Posted by nockieboy on 06 Oct, 2020 07:51
Quote from: BrianHG on 05 Oct, 2020 20:05
So far so good. Remember, the C8 variant is the cheapest, see here:
https://lcsc.com/product-detail/CPLD-FPGA_Altera-5CEBA2F17C8N_C568996.html

That's not a bad price at all - I've been looking at sourcing the A4 variant from Mouser, which is over £37 and not something I want to buy for an untested board or process (would be an expensive way to learn to solder BGA if I'm going to make mistakes!)

I suppose the A2/A4 versions of the Cyclone V are interchangeable on the same board if it's designed with that in mind? (EDIT: Yes, they are.) Thinking back to previous comments about the Cyclone IV CE6/CE10, I wonder if the A2/A4 are physically identical dies, just marketing spin?

GPU - Video Output Schematics.pdf (164.47 kB - downloaded 65 times.)

Quote from: BrianHG on 05 Oct, 2020 20:05
Just remember, reserve the high-speed IO banks for wiring to DDR3 ram & to directly drive HDMI out. Everything else is almost do as you please, but still, I would sector off sections for Z80, Analog VGA, (make it close to a high speed IO bank containing the HDMI outputs). Look in the data sheet, the 2 top & 2 bottom IO banks are high speed and the 2 left and 2 right ones are the slower ones. DDR3 ram requires that IO bank to be on a lower supply voltage and that bank needs a PLL differential dedicated output in that IO bank to drive the PLL clock.

Yes, I'll try to focus more on grouping IO within the same banks this time - the IO layout of the Cyclone IV version was dictated entirely by the physical IO-pin placement on the board and which 245 level converter it was closest to, and thus which Z80 bus it was physically nearest to. There'll still be a bit of that in the consideration for the Cyclone V version, but on a bank-basis rather than an individual pin-basis this time.

Question re: the video output from the FPGA:
I've got the VGA circuit, am happy with that and have upped it to a full 24-bit bus from the FPGA. I've also added a DVI/HDMI (I know there's a difference, including expensive licensing, but if I mention HDMI by accident I mean DVI) output as well using a TFP410. I assume that it would be okay to share the 24-bit bus from the FPGA to the ADV7125 (VGA DAC) with the TFP410? My reasoning being that the TFP410 will be driving the DVI output, whilst the ADV7125 drives the VGA 'bus', so the current draw should be negligible from the FPGA's 24 colour output IO pins?

Or is there a better way to do the DVI output? I haven't really considered how difficult it would be to run 24 lines to one chip and fork them to another yet, but it's not sounding that bright.

EDIT: Seems the ADV7513 does HDMI and also incorporates audio into the data stream...? Page 21 of the DE10-Nano schematic shows one being used, with an I2S connection (which I know nothing about - yet).

GPU - Video Output Schematics.pdf

#1819 Reply
Posted by nockieboy on 06 Oct, 2020 07:57
Quote from: BrianHG on 05 Oct, 2020 20:11
Instead of command port & a command structure, we can make every word and control a memory address, but, you will no longer have an input fifo. Every sent command will need to wait for the last draw command to complete before you can touch any variables. This will slow down the Z80 when drawing multiple large screen elements and blits.

Unless we implement a FIFO in the memory? Would just require a little routine on the Z80 to check for space in memory before adding the next command on the rare occasion that it's waiting for the GPU to clear some space in the list?

But either way I don't really mind - I don't think there'll be much time saved switching from the IO method to the memory method, probably no noticeable difference practically, as each method has its own overheads (not least if the Z80 wants to access the GPU's memory, I have to switch memory banks about which introduces IO calls anyway, so probably best to just stick with the IO method we have currently).

#1820 Reply
Posted by nockieboy on 06 Oct, 2020 11:07
Schematic attached for configuration, clock and programming interfaces. This is based entirely off of the official Intel/Altera datasheets and manuals.

I'm putting these schematics up for anyone to offer feedback on or make use of for their own nefarious purposes. Once I'm happy there's no obvious errors, I'll post the design in its entirety.

GPU Config Schematic.pdf

#1821 Reply
Posted by BrianHG on 06 Oct, 2020 20:43
Quote from: nockieboy on 06 Oct, 2020 07:51
Quote from: BrianHG on 05 Oct, 2020 20:05
So far so good. Remember, the C8 variant is the cheapest, see here:
https://lcsc.com/product-detail/CPLD-FPGA_Altera-5CEBA2F17C8N_C568996.html

That's not a bad price at all - I've been looking at sourcing the A4 variant from Mouser, which is over £37 and not something I want to buy for an untested board or process (would be an expensive way to learn to solder BGA if I'm going to make mistakes!)

I suppose the A2/A4 versions of the Cyclone V are interchangeable on the same board if it's designed with that in mind? (EDIT: Yes, they are.) Thinking back to previous comments about the Cyclone IV CE6/CE10, I wonder if the A2/A4 are physically identical dies, just marketing spin?

(Attachment Link)

Looking at the bootprom .rbf file size being identical for the 2, I would say that the A2 is an A4 and Quartus is just limiting you available features and sizes just because you told Quartus that you had an A2 in your circuit. It's how Intel makes more profit by faking a smaller size and cheaper price VS getting those who step over that artificial limit to pay more for an A4.

Yes, you can wire the TFP410 and dac in parallel with using any IO. You are just driving 25MHz. Trace length and via count is irrelevant at these speeds.

Yes, £13.81 for the A2 variant which may be an A4 anyways, either way, it is half price of Mouser's A2 list price, never mind the A4.

#1822 Reply
Posted by nockieboy on 07 Oct, 2020 07:40
Okay, so DDR3 memory (I was looking at DDR for some reason) - could I stick a chip on the back of the PCB somewhere?

What about this ADV7513, though? It seems like it would be an excellent replacement for the TFP410 to output video and states it does HDMI which incorporates audio into the data stream. That would be a big win for me if I could use one of those (I could scrub audio DACs from the BOM and at least 10 IOs as I could just output audio to the TV via I2S) - but the datasheet is short on information regarding using it; do I need to get a licence for HDMI or is the licence more to do with tx-ing protected data (which I have no interest in doing)?

#1823 Reply
Posted by BrianHG on 07 Oct, 2020 18:38
Quote from: nockieboy on 07 Oct, 2020 07:40
Okay, so DDR3 memory (I was looking at DDR for some reason) - could I stick a chip on the back of the PCB somewhere?

What about this ADV7513, though? It seems like it would be an excellent replacement for the TFP410 to output video and states it does HDMI which incorporates audio into the data stream. That would be a big win for me if I could use one of those (I could scrub audio DACs from the BOM and at least 10 IOs as I could just output audio to the TV via I2S) - but the datasheet is short on information regarding using it; do I need to get a licence for HDMI or is the licence more to do with tx-ing protected data (which I have no interest in doing)?
The ADV7513 requires I2C commands to work plus an HDMI license to get access to the full datasheet, I think. Make sure you can get access to the full I2C datasheet before you select that chip.

You will probably want to add an MCU to handle the I2C unless you want to make one inside the FPGA.

As for DDR3, it may be too fast for your needs, unless the Cyclone V can handle at least 300MHz, 600mtps on it's DDR DQ port.

#1824 Reply
Posted by BrianHG on 07 Oct, 2020 21:38
Notes on Bresenham line and ellipse, explained and source in floats and integer versions.
It covers theory of operation and how this code compares to others.

Page 27 of 68 is the integer version and it's coded to easily convert to verilog.
https://cs.brown.edu/research/pubs/theses/masters/1989/dasilva.pdf

Though the paper has a general ellipse (rotate-able), it only has a floating point version.