Author Topic: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer. (Read 36347 times)

nockieboy · « **on:** December 22, 2022, 10:35:49 am »

Hi everyone,

This thread is a branch off from another thread here, which is itself a branch from a (large!) thread here.

TLDR for the large original thread; I was almost brand-new to electronics and had built a DIY 8-bit computer on a breadboard and had transitioned to 2-layer PCB design, making the computer as a stack of cards - each card with its own function (CPU, memory, IDE, power/serial etc.), and I needed a video card for the system. That long thread culminates in the production of some serious SystemVerilog HDL to produce an HDMI-compatible graphics card for my computer, with 2D graphics functions, full SD-card read/write, a massive amount of work by BrianHG to produce an excellent DDR3 memory controller, programmable sound generators, etc., all whilst teaching me about FPGAs and HDL.

The FPGA line used were from Intel/Altera - starting with the Cyclone IV and moving to the MAX 10M50 on the Arrow DECA development board.

I'm now wanting to move across to Xilinx, mostly to test the design on a different platform, but also because I can't seem to find a reasonable upgrade path from the Intel 10M50 to something around the 75K LE mark on Mouser that won't take 4 years to be made and delivered and a mortgage to afford.

Following is part of the previous thread to help kick this one off:

Quote from: nockieboy on December 22, 2022, 09:39:41 am

Quote from: asmi on December 22, 2022, 07:04:54 am
As for your project, I would actually recommend going for a FGG484 package - it will allow implementing a 32 bit DDR3@400 MHz interface to give you additional room to grow in terms of memory bandwidth, it also has 4 multi-gigabit transceivers in case you want to implement some kind of serial interface - like SATA, PCI Express (say for NVMe storage, or some extension slot for future upgrades), DisplayPort, high-bandwidth HDMI (above 1080p@60), or something else. Speed grade 2 devices and higher can go as high as 6.6 Gbps per transceiver.
The bigger packages are more enticing for the very reasons you've mentioned, but again I've never soldered a BGA before and was a little concerned I was making too big a leap from 2-layer boards and QFP/QFNs to 6-layers with 484-BGAs. Interestingly, every step 'forward/smaller' in the SMD world (with the exception of discretes - 0402 is a little uncomfortable for me without more practice) has gotten easier. Those E-QFP144 Cyclone IVs were a lot harder than QFNs to solder, for example - I'm hoping BGA will be the same (with the liberal application of heat).

Quote from: asmi on December 22, 2022, 07:04:54 am
Finally, these Artix parts get pretty toasty, so you will need to provision for some sort of heatsink, or maybe even a small fan to help keep things from overheating. Remember, unlike CPUs, FPGAs don't have built-in overheating protection (unless you implement one yourself, that is), and will happily fry themselves if you are not careful.
Good point, hadn't considered that. That will have a direct affect on the PCB design. Instead of going for an intermediate 'Beaglebone-compatible' design which I can plug into the DECA interface (and thus sits at the bottom of the stack with no room for cooling), I might just design a specific board for my DIY computer and include the 3.3V-5V translation on it so it will just go onto the computer stack without an interface board. That way it can sit at the top of the stack and I can mount a passive heatsink, ducted fan or nitrogen-cooled reservoir as necessary.

Quote from: asmi on December 22, 2022, 07:04:54 am
But it's kind of offtopic in this thread, so if you want to discuss this further, let's find a more suitable venue.

...and here's that thread.

So I'm looking at using the XC7A100T-2FGG484C as suggested by asmi. I'm assuming Vivado will support creating bitstreams for this FPGA without the need to pay for a licence? As far as I could tell from their website, it does. Having to buy a licence is a deal-breaker for me as this is a learning journey, I won't be making anything for profit.

The PCB for this board will either be custom to fit my computer's stack headers, or potentially could be more of a generic development card design with Beaglebone headers - I am not fully set on either yet as each has its pros and cons. If I go the custom-board route, it will be powered from the computer's power supply in the stack headers - so power consumption may become a concern as the power supply was never designed with FPGAs in mind (I didn't even know they existed when I slapped a through-hole 7805 and AMS1117-3.3 on the power card), it'll look neat and I can fit a heatsink if required to the FPGA. The generic development card route will give me more room to fit peripherals on the board and its own, separate, power supply so that will negate concerns about power consumption, it'll be a board design that others may be interested in and make use of, but it won't fit the ergonomics of the computer stack (which, to be fair, is hardly a problem at all as I currently have a DECA sticking out at right-angles to the stack on the bottom at the moment anyway).

EDIT: As the design progressed, it has become a fully standalone board, with no intention to plug into the old uCOM stack I was using. Instead, it has become a board designed to run soft-core CPUs as full computer systems.

asmi · « **Reply #1 on:** December 22, 2022, 03:00:11 pm »

Maybe for long term it would be best to design a carrier board of sorts which would host both FPGA module, and your existing stack side-by-side? See a picture in attachment. This way you can upgrade/iterate over both parts separately from each other.

1. FPGA module would be the most "high-tech" part of a design, it would use high-speed connectors on the bottom side to connect to a carrier. The module itself would host the bare minimum required for FPGA to work - FPGA itself, QSPI flash for the bitstream, DDR memory devices and a power delivery system.

2. Carrier will provide power for the whole system, it would also host high-speed interface connectors (PCIE, DisplayPort, HDMI, 1G Ethernet, USB, whatever), and would also connect FPGA module to extension stack.

You can use any Artix-7 part with a free license of Vivado/Vitis.

I think this thread belongs in a FPGA section as it's more about FPGA than it is about CAD. Maybe ask moderators to move it?

nockieboy · « **Reply #2 on:** December 22, 2022, 08:45:16 pm »

Quote from: asmi on December 22, 2022, 03:00:11 pm

Maybe for long term it would be best to design a carrier board of sorts which would host both FPGA module, and your existing stack side-by-side? See a picture in attachment. This way you can upgrade/iterate over both parts separately from each other.

1. FPGA module would be the most "high-tech" part of a design, it would use high-speed connectors on the bottom side to connect to a carrier. The module itself would host the bare minimum required for FPGA to work - FPGA itself, QSPI flash for the bitstream, DDR memory devices and a power delivery system.

2. Carrier will provide power for the whole system, it would also host high-speed interface connectors (PCIE, DisplayPort, HDMI, 1G Ethernet, USB, whatever), and would also connect FPGA module to extension stack.

Ah, I like this idea. You mean something like this?

This is a Lattice FPGA available on AliExpress. The base/carrier board has what appears to be a SODIMM connector (?) which the FPGA and its DDR3 chips slot into. Looks like power generation is done on the FPGA card too. This would allow me to develop the FPGA section and carrier separately - I could do two carriers, one for a general development board to make public and another that fits my DIY computer's layout. Food for thought - thanks for that.

Quote from: asmi on December 22, 2022, 03:00:11 pm

You can use any Artix-7 part with a free license of Vivado/Vitis.

That's good to know.

Quote from: asmi on December 22, 2022, 03:00:11 pm

I think this thread belongs in a FPGA section as it's more about FPGA than it is about CAD. Maybe ask moderators to move it?

Will do.

Whilst that's being done, I have a question about power. Have you built a board for an Artix-7 before? I'm wondering about power chip selection. I've got the schematics for the Arty A7 dev board, but the older version uses power management chips that cost in excess of £15 together, or the latest Arty-A7 board (E.2 revision) uses a newer (DA9062) chip that seems to need to be programmed, is only available non-programmed to particular companies or as a bulk purchase (with signed disclaimer by the client) in excess of £12,000 in total... not really an option for me!

I'm following the Arty 7 schematic and parts as closely as I can, but would like to minimise the cost of the power supply as much as is reasonable. I'm going to need 5V, 3.3V, 1V, 1.35V and 0.675V rails. Any suggestions for an alternative to the DA9062 I could use?

asmi · « **Reply #3 on:** December 22, 2022, 10:13:16 pm »

Quote from: nockieboy on December 22, 2022, 08:45:16 pm

Ah, I like this idea. You mean something like this?

No, something more like this: https://www.myirtech.com/list.asp?id=553

Quote from: nockieboy on December 22, 2022, 08:45:16 pm

This is a Lattice FPGA available on AliExpress. The base/carrier board has what appears to be a SODIMM connector (?) which the FPGA and its DDR3 chips slot into. Looks like power generation is done on the FPGA card too. This would allow me to develop the FPGA section and carrier separately - I could do two carriers, one for a general development board to make public and another that fits my DIY computer's layout. Food for thought - thanks for that.

I don't like edge connectors because they usually command an extra charge for PCB manufacturing.

Quote from: nockieboy on December 22, 2022, 08:45:16 pm

Whilst that's being done, I have a question about power. Have you built a board for an Artix-7 before? I'm wondering about power chip selection. I've got the schematics for the Arty A7 dev board, but the older version uses power management chips that cost in excess of £15 together, or the latest Arty-A7 board (E.2 revision) uses a newer (DA9062) chip that seems to need to be programmed, is only available non-programmed to particular companies or as a bulk purchase (with signed disclaimer by the client) in excess of £12,000 in total... not really an option for me!

I'm following the Arty 7 schematic and parts as closely as I can, but would like to minimise the cost of the power supply as much as is reasonable. I'm going to need 5V, 3.3V, 1V, 1.35V and 0.675V rails. Any suggestions for an alternative to the DA9062 I could use?

You don't have to use these exact converters, any one which meets Artix DC specs (±5% voltage regulation, <50 mV noise) will work just fine. Artix-7 100T device can consume up to about 7 A of current on Vccint rail (at 1 V nominal). For example, you can use MP8772GQ - it can output up to 12 A of current, so very decent margin, it's quite affordable (just over 3 USD for qty 1), and - most importantly in our times - it's actually available in stock in decent quantities. For other rails you can use even "classic" TLV62130 (if you can find it in stock), or something like MP2384 (which is available in stock in quantities).

Alternatively you can use modules like this one: https://www.monolithicpower.com/mpm3683-7.html Their advantage is that they already have an integrated inductor, so all you need to use it is 2 resistors to set the voltage, a couple of caps for decoupling and filtering. They take less space than a "discrete" solution would, and are easier to route because one of the most critical traces (FET-to-inductor loop) is already done inside the package, so you literally just slap a cap or two nearby and you are done. The downside is the higher price - although it's not as clear-cut as it seem because you need more discrete parts for "discrete" solution, so sometimes modules actually end up cheaper. I personally prefer using modules when I need to pack things as tight as possible (typical problem for "compute modules" like that FPGA module we're talking about), but if space is not a big problem, you can use discrete converters as well.

BrianHG · « **Reply #4 on:** December 22, 2022, 11:16:36 pm »

Note that the 48$-39$ 85KLE fpgas from Lattice are finally available once again.
Though, you need to convert my DDR3 controller as Lattice's DDR3 costs money.

Note that in your current design, Parallel layers in my video controller eats a few K-LE per layer while the Sequential layers are near free in LE usage. Obvious using the 2 layer types together is what gives you the 16-64 layers you may be using.

The way your current pixel writer is designed is slow and excessive in gate count because it was engineered to run on the old FPGA dual-port ram and support backward compatibility. This is how things end up when you begin with engineering with a design for a 6KLE FPGA, then just recklessly add layers ontop.

asmi · « **Reply #5 on:** December 23, 2022, 04:01:54 pm »

@nockieboy Please check your PM.

rstofer · « **Reply #6 on:** December 23, 2022, 04:48:16 pm »

I would go to Digilent and look at the Artix 7 boards.

https://digilent.com/shop/boards-and-components/system-boards/fpga-boards/

I only consider the boards with the 100T chip...

Then I would follow the yellow brick road until I got to the schematic like:

https://digilent.com/reference/_media/programmable-logic/arty-a7/arty-a7-e2-sch.pdf

You will note a blank page in the schematic. In the past, this was the details of connecting their JTAG programmer chip into the FPGA. I don't know for sure what or even if something is missing. Still, it's a start.

I would avoid building more of a PCB than absolutely required. If a -35T device was big enough, I would plug the CMOD A7-35T device into a daughtercard. Still, pinout is limited and may not suit the project.

asmi · « **Reply #7 on:** December 23, 2022, 04:54:47 pm »

Me and nockieboy just ordered a few 100Ts and 35Ts for this project as I found them in stock for the great price. So something will definitely come out of this

CMOD and Arty series boards are no good because they lack high speed connectors.

nockieboy · « **Reply #8 on:** December 23, 2022, 06:47:44 pm »

Quote from: rstofer on December 23, 2022, 04:48:16 pm

I would go to Digilent and look at the Artix 7 boards.

https://digilent.com/shop/boards-and-components/system-boards/fpga-boards/

I only consider the boards with the 100T chip...

Then I would follow the yellow brick road until I got to the schematic like:

https://digilent.com/reference/_media/programmable-logic/arty-a7/arty-a7-e2-sch.pdf

You will note a blank page in the schematic. In the past, this was the details of connecting their JTAG programmer chip into the FPGA. I don't know for sure what or even if something is missing. Still, it's a start.

I would avoid building more of a PCB than absolutely required. If a -35T device was big enough, I would plug the CMOD A7-35T device into a daughtercard. Still, pinout is limited and may not suit the project.

Yes, the Arty-A7 board is what I'm using as a reference to start my own design off. That Arty-A7 schematic includes the JTAG connector pinout to connect the Xilinx programmer into (I have a copy of one somewhere on a shelf for an old Spartan board I got years ago but never used). Hopefully that should be sufficient to get the bitstream loaded onto the board.

Quote from: asmi on December 22, 2022, 10:13:16 pm

You don't have to use these exact converters, any one which meets Artix DC specs (±5% voltage regulation, <50 mV noise) will work just fine. Artix-7 100T device can consume up to about 7 A of current on Vccint rail (at 1 V nominal). For example, you can use MP8772GQ - it can output up to 12 A of current, so very decent margin, it's quite affordable (just over 3 USD for qty 1), and - most importantly in our times - it's actually available in stock in decent quantities. For other rails you can use even "classic" TLV62130 (if you can find it in stock), or something like MP2384 (which is available in stock in quantities).

Alternatively you can use modules like this one: https://www.monolithicpower.com/mpm3683-7.html Their advantage is that they already have an integrated inductor, so all you need to use it is 2 resistors to set the voltage, a couple of caps for decoupling and filtering. They take less space than a "discrete" solution would, and are easier to route because one of the most critical traces (FET-to-inductor loop) is already done inside the package, so you literally just slap a cap or two nearby and you are done. The downside is the higher price - although it's not as clear-cut as it seem because you need more discrete parts for "discrete" solution, so sometimes modules actually end up cheaper. I personally prefer using modules when I need to pack things as tight as possible (typical problem for "compute modules" like that FPGA module we're talking about), but if space is not a big problem, you can use discrete converters as well.

Thanks asmi, that's perfect - I'll look at those suggestions in some detail over the next week or two as I work on the power supplies. Do you use EasyEDA at all? If so, I'm happy to share the project with you so you could give me more constructive feedback if you wish?

As far as the mezzanine board idea goes, I understand what you mean now - thanks for the links. I had a quick look on Mouser last night for those mezzanine/back-to-back connectors, but they don't seem to have a good supply of them (certainly not the 144-pin versions anyway), but I guess the whole layout/form-factor of the board (and whether I'll use a mezzanine for the FPGA itself) is something I can worry about a little later. Getting a good schematic sorted is my first priority, and now I've committed to the Artix A7-100T I'm keen to get moving on it - it's just a busy time of year with family commitments as I'm sure you'll understand!

It looks like I'm going to need two DDR3 chips in the design too, so that's something I'll need to bear in mind, thanks to the conversation with BrianHG over in the GPU thread:

Quote from: BrianHG on December 23, 2022, 01:24:43 am

...We are beginning to enter the realm of a simple 3D accelerator and with an added texture reader prior to this writer, with proper design, and maybe a second DDR3 chip for a 256bit wide bus & 128x speed, we will pass the first Sony Playstation in rendering capability.

One of the elements of the PCB design that has caused me headaches with the (admittedly 0.8mm pitch) LFE5U FPGA was the decoupling capacitors underneath the board. I can solder 0402 confidently, but I don't feel confident at all going smaller than that. How did you solder the 0201 caps on your boards?

asmi · « **Reply #9 on:** December 23, 2022, 08:54:44 pm »

Quote from: nockieboy on December 23, 2022, 06:47:44 pm

Thanks asmi, that's perfect - I'll look at those suggestions in some detail over the next week or two as I work on the power supplies. Do you use EasyEDA at all? If so, I'm happy to share the project with you so you could give me more constructive feedback if you wish?

One of my commercial customers paid for my Altium Designer license a couple of years back (before the COVID), so that's what I'm using now. But I will be happy to help with your design as well, if not with doing the actual layout, but at least with feedback.

Quote from: nockieboy on December 23, 2022, 06:47:44 pm

As far as the mezzanine board idea goes, I understand what you mean now - thanks for the links. I had a quick look on Mouser last night for those mezzanine/back-to-back connectors, but they don't seem to have a good supply of them (certainly not the 144-pin versions anyway), but I guess the whole layout/form-factor of the board (and whether I'll use a mezzanine for the FPGA itself) is something I can worry about a little later. Getting a good schematic sorted is my first priority, and now I've committed to the Artix A7-100T I'm keen to get moving on it - it's just a busy time of year with family commitments as I'm sure you'll understand!

That's OK, there is no rush, especially since there is going to be a Chinese New Year coming, so enjoy your time with the family. I've lost my mother to a COVID two years ago, so now I know how important it is to spend as much time as you can with the family, because you never know how much of it you have left...

As for connectors, I know a good place where you can find all kinds of connectors always in stock. And you can sometimes even get some for free

But we will get there when we will get there, right now it's more important to plan your design thoroughly such that you won't have to do a redesign because of some small stupid thing you just forgot about, or didn't think hard enough.

Also, when you receive your FPGAs, please resist your urge to open them up. These things are moisture- and ESD-sensitive, so don't take unnecessary risks if you can help it. Or - if you can't resist - at least take out the 35T part as it's cheaper in case something bad happens

Quote from: nockieboy on December 23, 2022, 06:47:44 pm

It looks like I'm going to need two DDR3 chips in the design too, so that's something I'll need to bear in mind, thanks to the conversation with BrianHG over in the GPU thread:

Quote from: BrianHG on December 23, 2022, 01:24:43 am
...We are beginning to enter the realm of a simple 3D accelerator and with an added texture reader prior to this writer, with proper design, and maybe a second DDR3 chip for a 256bit wide bus & 128x speed, we will pass the first Sony Playstation in rendering capability.

If you wish, you can implement two 32 bit DDR3 interfaces, or even up to 4 independent 16 bit DDR3 interfaces, or implement 4 x 8 bit interface instead of 2 x 16 bit one for higher capacity. There are a lot of possibilities. You just need to figure out what you need before you embark on a board design.

Quote from: nockieboy on December 23, 2022, 06:47:44 pm

One of the elements of the PCB design that has caused me headaches with the (admittedly 0.8mm pitch) LFE5U FPGA was the decoupling capacitors underneath the board. I can solder 0402 confidently, but I don't feel confident at all going smaller than that. How did you solder the 0201 caps on your boards?

I use a stereo microscope (this one: https://amscope.com/products/se410-xyz but you can find cheaper options) and under 10x magnification 0201's are not that hard to solder unless your hands are really shaky. I typically do a reflow, so I place all parts by hand on a solder paste under microscope, and then stuff it into the oven to reflow it all at once. But if neccessary I can solder them manually one-by-one. That said, I try to avoid them and stick to 0402 whenever I can, because 0402 are super easy to solder under the microscope - infact I even taught my wife to do it and she now does it just as good (if not better) than I do - despite her knowing exactly nothing about electronics

I highly recommend you consider buying such microscope - it makes microsoldering a breeze since your depth perception still works (thanks to stereoscopic image). My wife finds using it so comfortable that she even does her nails now under it

Make a good Christmas present for yourself

Also if you have small kids, they might also find it interesting looking at various things under magnification. You will also need a pair of good tweezers (search for "titanium alloy tweezers" on Aliexpress - they are quite good, and should last for a long time, just be careful to avoid punctures as the "business end" is super-sharp).

Finally, to make soldering BGAs (or anything really) easier you might want to consider getting some sort of preheater. It doesn't need to be very powerful because realistically you will only be preheating your board to like 100°C or so, so if you have some sort of hot plate which can reach such temperature - it's good enough. Just make sure you can comfortably work with a hot air gun above PCB while it's on a heater - otherwise it's just a matter of time until you accidentally make a wrong move and end up getting a burn. Trust me, it will eventually happen in any case

, but there is no reason to hasten that eventuality

BrianHG · « **Reply #10 on:** December 24, 2022, 04:02:01 am »

Quote from: asmi on December 23, 2022, 08:54:44 pm

Quote from: nockieboy on December 23, 2022, 06:47:44 pm
Thanks asmi, that's perfect - I'll look at those suggestions in some detail over the next week or two as I work on the power supplies. Do you use EasyEDA at all? If so, I'm happy to share the project with you so you could give me more constructive feedback if you wish?
One of my commercial customers paid for my Altium Designer license a couple of years back (before the COVID), so that's what I'm using now. But I will be happy to help with your design as well, if not with doing the actual layout, but at least with feedback.

Yes, I am an Altium user as well, however, if you are making a product for public domain release, and will beginning with a new EDA tool, I would put in the effort to learn and use the latest KiCad. The latest now does support trace length and impedance matching required for the higher speed DDR3 designs and if you make your design available on GitHub, others who don't have the money for an Altium license may still either contribute of use your cad files.

BrianHG · « **Reply #11 on:** December 24, 2022, 04:07:06 am »

Quote from: asmi on December 23, 2022, 08:54:44 pm

Quote from: nockieboy on December 23, 2022, 06:47:44 pm
It looks like I'm going to need two DDR3 chips in the design too, so that's something I'll need to bear in mind, thanks to the conversation with BrianHG over in the GPU thread:

Quote from: BrianHG on December 23, 2022, 01:24:43 am
...We are beginning to enter the realm of a simple 3D accelerator and with an added texture reader prior to this writer, with proper design, and maybe a second DDR3 chip for a 256bit wide bus & 128x speed, we will pass the first Sony Playstation in rendering capability.
If you wish, you can implement two 32 bit DDR3 interfaces, or even up to 4 independent 16 bit DDR3 interfaces, or implement 4 x 8 bit interface instead of 2 x 16 bit one for higher capacity. There are a lot of possibilities. You just need to figure out what you need before you embark on a board design.

Having a separate DDR3 controller for display buffer, and texture memory buffer can accelerate 3D texture filling with avoiding a cache mechanism to recover speed losses due to cross-memory access. This is how the original 3DFX graphics card operated as back then, we did not have the huge gate densities and block-ram of today's 3D accelerators.

nockieboy · « **Reply #12 on:** December 24, 2022, 12:16:12 pm »

Quote from: asmi on December 23, 2022, 08:54:44 pm

But I will be happy to help with your design as well, if not with doing the actual layout, but at least with feedback.

Thanks asmi, feedback is all I need - I'm happy to do the layout myself (that's the biggest learning point) but I don't want to make a silly mistake if another set of eyes is able to pick up on something.

Quote from: asmi on December 23, 2022, 08:54:44 pm

That's OK, there is no rush, especially since there is going to be a Chinese New Year coming, so enjoy your time with the family. I've lost my mother to a COVID two years ago, so now I know how important it is to spend as much time as you can with the family, because you never know how much of it you have left...

I'm sorry to hear that. Yes, you never know, so make the most of your time!

Quote from: asmi on December 23, 2022, 08:54:44 pm

As for connectors, I know a good place where you can find all kinds of connectors always in stock. And you can sometimes even get some for free But we will get there when we will get there, right now it's more important to plan your design thoroughly such that you won't have to do a redesign because of some small stupid thing you just forgot about, or didn't think hard enough.

Thank you.

Quote from: asmi on December 23, 2022, 08:54:44 pm

If you wish, you can implement two 32 bit DDR3 interfaces, or even up to 4 independent 16 bit DDR3 interfaces, or implement 4 x 8 bit interface instead of 2 x 16 bit one for higher capacity. There are a lot of possibilities. You just need to figure out what you need before you embark on a board design.

I'm sure BrianHG will have an input on this. Xilinx has its own DDR3 controller IP, doesn't it? I also seem to recall a long time ago you and BrianHG having a discussion about Xilinx FPGAs and DDR support. Just wondering what problems and benefits a switch to Xilinx will create...?

Quote from: asmi on December 23, 2022, 08:54:44 pm

I use a stereo microscope (this one: https://amscope.com/products/se410-xyz but you can find cheaper options) and under 10x magnification 0201's are not that hard to solder unless your hands are really shaky. I typically do a reflow, so I place all parts by hand on a solder paste under microscope, and then stuff it into the oven to reflow it all at once. But if neccessary I can solder them manually one-by-one. That said, I try to avoid them and stick to 0402 whenever I can, because 0402 are super easy to solder under the microscope - infact I even taught my wife to do it and she now does it just as good (if not better) than I do - despite her knowing exactly nothing about electronics

I highly recommend you consider buying such microscope - it makes microsoldering a breeze since your depth perception still works (thanks to stereoscopic image). My wife finds using it so comfortable that she even does her nails now under it Make a good Christmas present for yourself Also if you have small kids, they might also find it interesting looking at various things under magnification. You will also need a pair of good tweezers (search for "titanium alloy tweezers" on Aliexpress - they are quite good, and should last for a long time, just be careful to avoid punctures as the "business end" is super-sharp).

Haha - that's good recommendation if the missus uses it too! Not sure I could convince my wife to do soldering of such small parts, with or without a microscope!

Something like that microscope is ideal for me and is definitely on my wishlist, though it'll be a few months before I can invest that much money into one.

Quote from: asmi on December 23, 2022, 08:54:44 pm

Finally, to make soldering BGAs (or anything really) easier you might want to consider getting some sort of preheater. It doesn't need to be very powerful because realistically you will only be preheating your board to like 100°C or so, so if you have some sort of hot plate which can reach such temperature - it's good enough. Just make sure you can comfortably work with a hot air gun above PCB while it's on a heater - otherwise it's just a matter of time until you accidentally make a wrong move and end up getting a burn. Trust me, it will eventually happen in any case , but there is no reason to hasten that eventuality

Yes, I have a hot plate already for when I start BGA soldering - one of these:

Quote from: BrianHG on December 24, 2022, 04:02:01 am

Yes, I am an Altium user as well, however, if you are making a product for public domain release, and will beginning with a new EDA tool, I would put in the effort to learn and use the latest KiCad. The latest now does support trace length and impedance matching required for the higher speed DDR3 designs and if you make your design available on GitHub, others who don't have the money for an Altium license may still either contribute of use your cad files.

Okay, well it looks like I'm going to have to bite the bullet and learn how to use KiCAD then, despite having serious dislike of it. Maybe once I've overcome the UI issues and odd UX choices, I'll grow to like it.

Quote from: BrianHG on December 24, 2022, 04:07:06 am

Having a separate DDR3 controller for display buffer, and texture memory buffer can accelerate 3D texture filling with avoiding a cache mechanism to recover speed losses due to cross-memory access. This is how the original 3DFX graphics card operated as back then, we did not have the huge gate densities and block-ram of today's 3D accelerators.

@BrianHG - does the Xilinx have any performance benefits or memory controller IP or hardcore functions that stand out as affecting the GPU's DDR3 controller at all?

asmi · « **Reply #13 on:** December 24, 2022, 01:20:34 pm »

Quote from: nockieboy on December 24, 2022, 12:16:12 pm

I'm sure BrianHG will have an input on this. Xilinx has its own DDR3 controller IP, doesn't it? I also seem to recall a long time ago you and BrianHG having a discussion about Xilinx FPGAs and DDR support. Just wondering what problems and benefits a switch to Xilinx will create...?

Read through the section "Chapter 1: DDR3 and DDR2 SDRAM Memory Interface Solution -> Interfacing to the Core" of this document: https://docs.xilinx.com/v/u/en-US/ug586_7Series_MIS It explains how to talk to controller. Basically there are four possibilities, ordered from higher level to lower level, with each step down giving you progressively more control over memory devices, but also leaving more of housekeeping work to you as well:
1. AXI4 protocol via AXI4 slave interface block (useful if you have some other components which also talk AXI4), this is how I usually talk to it, it's the highest abstraction level
2. user interface, this level allows sending commands like "read address 0x0010_0000 and return 8xdata bus width worth of data", or "write this 8xdata bus width worth of data to address 0x0020_0000). It's kind of similar to AXI4, but uses a custom bus interface and is architecturally below it. You can also take control over refresh and ZQ calibration timings if you wish.
3. native interface, at this level there is no longer an address, but rank, bank, row, column.
4. PHY only. This is basically only gives you a physical interface, leaving actual controller implementation to you.
Xilinx recommends using levels 1 or 2, I've been using mostly level 1 (AXI4 interface). And just to give you some idea of the amount of resources it takes, a 32 bit DDR3 controller uses 6157 LUTs and 5361 register when AXI4 interface is used (so all memory controller layers are in use). These numbers obviously fluctuate a bit, but they are in the same ballpark. An A100T device has 63400 LUTs and 126800 registers.

Quote from: nockieboy on December 24, 2022, 12:16:12 pm

Haha - that's good recommendation if the missus uses it too! Not sure I could convince my wife to do soldering of such small parts, with or without a microscope! Something like that microscope is ideal for me and is definitely on my wishlist, though it'll be a few months before I can invest that much money into one.

That's good. You can also browse through Aliexpress, there should be plenty of choice starting from about 200 USD. Whichever one you pick, make sure it's got a focal length (sometimes called "Working Distance" in the microscope specifications) of at least 10 cm so that you can comfortably manipulate under it without hitting the head all the time.

Quote from: nockieboy on December 24, 2022, 12:16:12 pm

Yes, I have a hot plate already for when I start BGA soldering - one of these:

That should be more than sufficient.

Quote from: nockieboy on December 24, 2022, 12:16:12 pm

Okay, well it looks like I'm going to have to bite the bullet and learn how to use KiCAD then, despite having serious dislike of it. Maybe once I've overcome the UI issues and odd UX choices, I'll grow to like it.

Really? From my (admittedly VERY limited) experience with EasyEDA I found it's interface to be remarkably similar to KiCAD.
But to tell you the truth, at various times I used DipTrace, Eagle CAD, Orcad/Allergo, Altium Designer and KiCAD, and they all suck in some way or the other. So you basically just need to get used to idiosyncrasies of whichever eCAD you choose, and stick to it. For you I would naturally recommend KiCAD because it's free, and it's quite good (see the project in my signature for the proof - and keep in mind that that project was done in a previous major version of KiCAD).

Quote from: nockieboy on December 24, 2022, 12:16:12 pm

@BrianHG - does the Xilinx have any performance benefits or memory controller IP or hardcore functions that stand out as affecting the GPU's DDR3 controller at all?

The most important hardware difference is Xlinx 7 series has 6-input LUTs as opposed to 4-input LUTs in Cyclone 4-5 and MAX10 series devices. So the same HDL usually consumes less LUTs on 7 series and can be faster due to lower amount of combinatorial logic levels. As for memory controller, take a look at the document I linked above, and it will hopefully answer all your questions.

BrianHG · « **Reply #14 on:** December 24, 2022, 10:26:17 pm »

Quote from: asmi on December 24, 2022, 01:20:34 pm

Quote from: nockieboy on December 24, 2022, 12:16:12 pm
I'm sure BrianHG will have an input on this. Xilinx has its own DDR3 controller IP, doesn't it? I also seem to recall a long time ago you and BrianHG having a discussion about Xilinx FPGAs and DDR support. Just wondering what problems and benefits a switch to Xilinx will create...?
Read through the section "Chapter 1: DDR3 and DDR2 SDRAM Memory Interface Solution -> Interfacing to the Core" of this document: https://docs.xilinx.com/v/u/en-US/ug586_7Series_MIS It explains how to talk to controller. Basically there are four possibilities, ordered from higher level to lower level, with each step down giving you progressively more control over memory devices, but also leaving more of housekeeping work to you as well:
1. AXI4 protocol via AXI4 slave interface block (useful if you have some other components which also talk AXI4), this is how I usually talk to it, it's the highest abstraction level
2. user interface, this level allows sending commands like "read address 0x0010_0000 and return 8xdata bus width worth of data", or "write this 8xdata bus width worth of data to address 0x0020_0000). It's kind of similar to AXI4, but uses a custom bus interface and is architecturally below it. You can also take control over refresh and ZQ calibration timings if you wish.
3. native interface, at this level there is no longer an address, but rank, bank, row, column.
4. PHY only. This is basically only gives you a physical interface, leaving actual controller implementation to you.
Xilinx recommends using levels 1 or 2, I've been using mostly level 1 (AXI4 interface). And just to give you some idea of the amount of resources it takes, a 32 bit DDR3 controller uses 6157 LUTs and 5361 register when AXI4 interface is used (so all memory controller layers are in use). These numbers obviously fluctuate a bit, but they are in the same ballpark. An A100T device has 63400 LUTs and 126800 registers.

Nockieboy quickest route would be to cut out my DDR3 PHY only layer and just tie my Multiport controller to Xilinx's level 2 interface. When my multiport sends a read command, it also sends at the same time a 4 bit 'ID' I called the 'read vector' which you may need to manually return with Xilinx's DDR3 read result in parallel. Everything on the user IO side of my multiport should then be backwards compatible with Nockieboy's existing system, including the multiwindow VGA system.

(Note that the multiwindow system will need 2 Intel inferred dualport dual clock ram blocks changed into Xilinx's equivilant as well as PLL clocks.)

nockieboy · « **Reply #15 on:** December 28, 2022, 12:22:39 pm »

Quote from: asmi on December 24, 2022, 01:20:34 pm

Quote from: nockieboy on December 24, 2022, 12:16:12 pm
Okay, well it looks like I'm going to have to bite the bullet and learn how to use KiCAD then, despite having serious dislike of it. Maybe once I've overcome the UI issues and odd UX choices, I'll grow to like it.
Really? From my (admittedly VERY limited) experience with EasyEDA I found it's interface to be remarkably similar to KiCAD.
But to tell you the truth, at various times I used DipTrace, Eagle CAD, Orcad/Allergo, Altium Designer and KiCAD, and they all suck in some way or the other. So you basically just need to get used to idiosyncrasies of whichever eCAD you choose, and stick to it. For you I would naturally recommend KiCAD because it's free, and it's quite good (see the project in my signature for the proof - and keep in mind that that project was done in a previous major version of KiCAD).

I've downloaded KiCAD 6 and see virtually nothing has changed. I just don't have time to wade through padded-out video tutorials (all people seem to make these days to attract the YouTube algorithm) on how to do basic things like set up multiple schematics etc. when I could be launching straight into parts selection and wiring them up in a schematic in EasyEDA. It helps that I've already got schematics for all the IO etc. done already. All I really need to do is design the power supplies, wire up the FPGA's IOs and configuration and I can move on to PCB design.

I'm going to stick with EasyEDA for these PCB designs as it'll take me long enough to complete them on an EDA I'm familiar with, let alone one I need to learn how to use. Once it's done, I'll see about transferring the schematics and PCBs to KiCAD, if the interest is out there for it. Or I might see if I can progress both alongside each other. Depends on a lot. Change of job role coming up in February which means the amount of time I have to do any of this is really hard to tell at the moment.

Maybe I'll be forced to switch to it if there's a killer feature I need for this project that EasyEDA doesn't do but KiCAD does - we'll have to wait and see.

Quote from: BrianHG on December 24, 2022, 10:26:17 pm

Nockieboy quickest route would be to cut out my DDR3 PHY only layer and just tie my Multiport controller to Xilinx's level 2 interface. When my multiport sends a read command, it also sends at the same time a 4 bit 'ID' I called the 'read vector' which you may need to manually return with Xilinx's DDR3 read result in parallel. Everything on the user IO side of my multiport should then be backwards compatible with Nockieboy's existing system, including the multiwindow VGA system.

(Note that the multiwindow system will need 2 Intel inferred dualport dual clock ram blocks changed into Xilinx's equivilant as well as PLL clocks.)

I suspect it should be fairly straightforward as you've designed the DDR3 controller with multiple platforms in mind anyway. Got to get the boards done first though, and this project is going to represent a significant leap forward for me in complexity.

I hope you all had a great Christmas and are resting up for the New Year's celebrations, however you choose to celebrate them (or not!)

BrianHG · « **Reply #16 on:** December 29, 2022, 01:03:15 am »

Quote from: nockieboy on December 28, 2022, 12:22:39 pm

I suspect it should be fairly straightforward as you've designed the DDR3 controller with multiple platforms in mind anyway. Got to get the boards done first though, and this project is going to represent a significant leap forward for me in complexity.

I hope you all had a great Christmas and are resting up for the New Year's celebrations, however you choose to celebrate them (or not!)

First step, simulate Xilinx's DDR3 controller replacing my PHY module.
Then run all my other sims related to your project...

If I remember correctly, you should have a Z80 bus bridge sim tied to the DDR3 controller.
Also, I have a native 'BrianHG_DDR3_CONTROLLER_v16_top_tb.sv' testbench which does verify full sequential bursts.

asmi · « **Reply #17 on:** December 29, 2022, 02:33:55 am »

Quote from: nockieboy on December 28, 2022, 12:22:39 pm

I've downloaded KiCAD 6 and see virtually nothing has changed. I just don't have time to wade through padded-out video tutorials (all people seem to make these days to attract the YouTube algorithm) on how to do basic things like set up multiple schematics etc. when I could be launching straight into parts selection and wiring them up in a schematic in EasyEDA. It helps that I've already got schematics for all the IO etc. done already. All I really need to do is design the power supplies, wire up the FPGA's IOs and configuration and I can move on to PCB design.

I'm going to stick with EasyEDA for these PCB designs as it'll take me long enough to complete them on an EDA I'm familiar with, let alone one I need to learn how to use. Once it's done, I'll see about transferring the schematics and PCBs to KiCAD, if the interest is out there for it. Or I might see if I can progress both alongside each other. Depends on a lot. Change of job role coming up in February which means the amount of time I have to do any of this is really hard to tell at the moment.

Maybe I'll be forced to switch to it if there's a killer feature I need for this project that EasyEDA doesn't do but KiCAD does - we'll have to wait and see.

It's up to you of course, just want to mention that if you do decide to implement an FPGA module as a separate board than nothing prevents you from designing it in a different eCAD from your other boards. I will try my best to help you out no matter what tool you pick (assuming I can get my hands on it of course).

Quote from: nockieboy on December 28, 2022, 12:22:39 pm

I suspect it should be fairly straightforward as you've designed the DDR3 controller with multiple platforms in mind anyway. Got to get the boards done first though, and this project is going to represent a significant leap forward for me in complexity.

Here I would have to disagree - you can do simulations before you design anything. In fact this is the recommended flow to make sure that a pinout you choose will actually work and close timings. And the higher speed your IO interfaces are running at, the more important it is to simulate simulate simulate before you even create an empty project in your eCAD of choice. I don't even think about designing anything until I have at least a skeleton design ready & working in simulation with I/O interface blocks being the most important parts - because while place & route has a lot of flexibility regarding internal parts of design, I/O components are often hard-tied to specific I/O locations through pin choices you made, and so there is much less a router can do to close the timing if it's tight and/or your choice of I/O is sub-optimal. Again, there are ways to deal with cases when for one reason or another router is not able to close timing, but it's best to avoid getting into these situations if it can be helped.
For my 7 series-based board I typically create a simple CPU design with memory controller and some basic I/O stuff like QSPI, UART and GPIO, as this is super-easy and quick in Vivado (you don't need to write a single line of HDL!), and allows to confirm that DDR controller works like it should, and pinout of memory works too.

Quote from: nockieboy on December 28, 2022, 12:22:39 pm

I hope you all had a great Christmas and are resting up for the New Year's celebrations, however you choose to celebrate them (or not!)

I very much do plan to celebrate. Already "celebrated" Christmas by sleeping through most of those days off as I could never get enough sleep during regular weeks

And I'm really looking forward for another 3 straight days off to catch some more zzz'es

BrianHG · « **Reply #18 on:** December 29, 2022, 03:04:39 am »

Quote from: asmi on December 29, 2022, 02:33:55 am

Quote from: nockieboy on December 28, 2022, 12:22:39 pm
I suspect it should be fairly straightforward as you've designed the DDR3 controller with multiple platforms in mind anyway. Got to get the boards done first though, and this project is going to represent a significant leap forward for me in complexity.
Here I would have to disagree - you can do simulations before you design anything. In fact this is the recommended flow to make sure that a pinout you choose will actually work and close timings. And the higher speed your IO interfaces are running at, the more important it is to simulate simulate simulate before you even create an empty project in your eCAD of choice. I don't even think about designing anything until I have at least a skeleton design ready & working in simulation with I/O interface blocks being the most important parts - because while place & route has a lot of flexibility regarding internal parts of design, I/O components are often hard-tied to specific I/O locations through pin choices you made, and so there is much less a router can do to close the timing if it's tight and/or your choice of I/O is sub-optimal. Again, there are ways to deal with cases when for one reason or another router is not able to close timing, but it's best to avoid getting into these situations if it can be helped.
For my 7 series-based board I typically create a simple CPU design with memory controller and some basic I/O stuff like QSPI, UART and GPIO, as this is super-easy and quick in Vivado (you don't need to write a single line of HDL!), and allows to confirm that DDR controller works like it should, and pinout of memory works too.

Yes, simulate everything.
I would have gotten nowhere without simulating everything when engineering my DDR3 controller & video processor as well as the Z80 bridge which has radically improved once we did a full simulation of it.
We also simulated your PSG to prove and get the right sound.

You have every testbench I already created. I already prepared for you every sub-module simulation as well as complete top level system sims. I'm sure what I have made can all be adapted to Vivado, or, you can bring Vivado's primitives back into Mosdelsim. You will end up building a useless board if you do not take everything I've already provided, swap in Vivado's DDR3 controller into the right point in the code you have and prove in advance that everything works before you build a PCB.

nockieboy · « **Reply #19 on:** December 29, 2022, 11:06:18 am »

Quote from: asmi on December 29, 2022, 02:33:55 am

It's up to you of course, just want to mention that if you do decide to implement an FPGA module as a separate board than nothing prevents you from designing it in a different eCAD from your other boards. I will try my best to help you out no matter what tool you pick (assuming I can get my hands on it of course).

Thanks asmi. I'll be using EasyEDA initially to make a core FPGA card - probably 6 layers - to house the FPGA itself, clock, DDR3 chips, JTAG and power supplies. Then I can make a carrier board housing the IO hopefully with no major issues, as the IO will be a lot slower than anything on the core card itself. I'm hoping I can make the core card small enough to fit within the form-factor of my computer, so I can make a dedicated carrier card with level converters and HDMI/USB I/O that fits the uCOM stack and a development carrier board for general use of the FPGA for anyone and everyone who's interested.

Quote from: asmi on December 29, 2022, 02:33:55 am

Here I would have to disagree - you can do simulations before you design anything. In fact this is the recommended flow to make sure that a pinout you choose will actually work and close timings. And the higher speed your IO interfaces are running at, the more important it is to simulate simulate simulate before you even create an empty project in your eCAD of choice. I don't even think about designing anything until I have at least a skeleton design ready & working in simulation with I/O interface blocks being the most important parts - because while place & route has a lot of flexibility regarding internal parts of design, I/O components are often hard-tied to specific I/O locations through pin choices you made, and so there is much less a router can do to close the timing if it's tight and/or your choice of I/O is sub-optimal. Again, there are ways to deal with cases when for one reason or another router is not able to close timing, but it's best to avoid getting into these situations if it can be helped.
For my 7 series-based board I typically create a simple CPU design with memory controller and some basic I/O stuff like QSPI, UART and GPIO, as this is super-easy and quick in Vivado (you don't need to write a single line of HDL!), and allows to confirm that DDR controller works like it should, and pinout of memory works too.

Fair point. Funny how I still don't go straight to simulation as the first part of the process.

If it's any consolation, I'm not designing the schematics and assigning I/O 'blind'. I'm using reference design(s) wherever possible - I've not found a lot on the XC7A100T unfortunately (no full schematics, anyway), but I've got some reference material I'm using based on the AC7100B board. It uses two DDR3 chips and includes their pin assignments, so I'm hoping that mirroring those assignments will avoid a lot of the pitfalls that you've mentioned re: pin selection. There's no detailed PCB drawings unfortunately, so routing two 16-bit datapaths will still be on me.

I'm aiming to create the same 32-bit data bus width with up to 25Gb/s bandwidth - hopefully that'll be sufficient for the project, @BrianHG?

Once the 32-bit DDR3 layout and traces are done, as far as I can see that's the hardest part done; I'll route out the remainder of the IO to the mezzanine connectors and, as these run at much lower speeds, selection and routing will be slightly less of a concern. The manual I linked to above for the AC7100B includes GTP transceivers and differential signal paths? I don't know much about these and am not looking to include them (or their power supplies) on the core FPGA card, unless anyone can convince me they're needed?

BrianHG · « **Reply #20 on:** December 29, 2022, 11:39:04 am »

My DDR3 bus isn't 32 bit. It is the size of a BL8 command (Burst Length 8 ).

So, currently, on the DECA, with a 16bit DDR3 ram chip, 8 X 16bit = 128bit bus.

This is the first thing you should simulate, setting Xilinx's DDR3 controller to 1x 16bit ram chip with a 128bit read and write bus. The write data should contain a 'Byte Enable' for each 8 bits.

(From here on in, here is an optional path)

Next, rename my DDR3 controller to "BHG_DDR3_Controller_top_using_Xilinx_core.sv' and also rename it's internal 'Altera PHY' source to 'Xilinx DDR3 PHY level 2', and copy my altera's beginning of it's module to your newly renamed 'Xilinx DDR3....'. Inside that new module, you will need to instantiate Xilinx's controller with it's PLL wired to the existing IO ports.

You will need to also create/move/adapt to a Xilinx PLL. Since I know nothing about Xilinx, the PLL may also be instantiated by it's own DDR3 controller, so that may just move inside it's PHY.

Don't forget that my multiport still needs to know all the DDR3 parameters, like addressing BANK-ROW order which you set in Xilinx's DDR3 controller so it may properly optimize port selection and burst interleave and sizing. It's probably just easier to translate my original incoming parameters to fit Xilinx's DDR3 controller parameters.

Try to simulate my controller's existing test-bench with the new Xilinx setup.

If that works, you can try the Z80 bus interface testbench simulation with the new controller.

Then worry about increasing to 2x 16bit DDR3 ram chips, IE: now a 256 bit internal bus.

Good luck.

BrianHG · « **Reply #21 on:** December 29, 2022, 12:11:16 pm »

Ok, just too sleepy to realize you were talking about the bus size on the PCB, not the internal bus size in the FPGA. However, my above comments should still be taken into account when adapting the design.

dolbeau · « **Reply #22 on:** December 29, 2022, 03:10:01 pm »

Quote from: nockieboy on December 29, 2022, 11:06:18 am

I'll be using EasyEDA initially to make a core FPGA card - probably 6 layers - to house the FPGA itself, clock, DDR3 chips, JTAG and power supplies.
(...)
If it's any consolation, I'm not designing the schematics and assigning I/O 'blind'. I'm using reference design(s) wherever possible - I've not found a lot on the XC7A100T unfortunately (no full schematics, anyway)

There's a lot of SoM boards out there with full schematics. You can also use I/O designs from non-A100T Artix-7, as they are built on the same 'building blocks'. HR I/O banks from one will work the same way as HR I/O banks from another. Ditto HP.

You can look up e.g. the Trenz TE0712 and others from Trenz, ZTex 2.13 and others from ZTex, etc. You can also look at cheaper QmTech boards with integrated peripherals like the Wukong, etc. All of those have full schematics available, the boards from Trenz and Ztex have multiple variants with a range of Artix-7.

There's not that many designs with an Artix-7 and a full PCB in addition to schematics that I know of, but there's at least one from Antmicro. The schematics include comments on what is needed for various Artix-7 (decoupling, ...).

I've looked into doing the same thing (FPGA + power + flash + DDR3) for my projects that currently use a ZTex 2.13a plugged into my boards, but most examples/reference materials suggest 8 layers (expensive PCB!), and routing the high-speed signals for the DDR3 still seem too complex for my skills at least when I look at Xilinx UG583.

asmi · « **Reply #23 on:** December 29, 2022, 03:21:35 pm »

Quote from: nockieboy on December 29, 2022, 11:06:18 am

Thanks asmi. I'll be using EasyEDA initially to make a core FPGA card - probably 6 layers - to house the FPGA itself, clock, DDR3 chips, JTAG and power supplies. Then I can make a carrier board housing the IO hopefully with no major issues, as the IO will be a lot slower than anything on the core card itself. I'm hoping I can make the core card small enough to fit within the form-factor of my computer, so I can make a dedicated carrier card with level converters and HDMI/USB I/O that fits the uCOM stack and a development carrier board for general use of the FPGA for anyone and everyone who's interested.

I kind of suspect that Deca board has more than 6 layers judging by the density of components. Perhaps 10 or 12 would be my guess. Most SoMs I came across are 10 or 12 layers.

Quote from: nockieboy on December 29, 2022, 11:06:18 am

If it's any consolation, I'm not designing the schematics and assigning I/O 'blind'. I'm using reference design(s) wherever possible - I've not found a lot on the XC7A100T unfortunately (no full schematics, anyway), but I've got some reference material I'm using based on the AC7100B board. It uses two DDR3 chips and includes their pin assignments, so I'm hoping that mirroring those assignments will avoid a lot of the pitfalls that you've mentioned re: pin selection. There's no detailed PCB drawings unfortunately, so routing two 16-bit datapaths will still be on me.

Don't do this, especially using random chinese board as a reference is not the greatest idea. Instead, get Vivado/Vitis set up and come up with your own, and odds are you are going to change it yet again when you will do routing, which will need to be verified again.
But before you do any of that, come up with an exact list of stuff you want connected to the FPGA to do some planning, and so you will know which I/O interfaces you will need to implement.

Quote from: nockieboy on December 29, 2022, 11:06:18 am

I'm aiming to create the same 32-bit data bus width with up to 25Gb/s bandwidth - hopefully that'll be sufficient for the project, @BrianHG?

That is the question I've asked a few posts back, if neccessary/required, it's possible to implement two 32 bit controllers, or some other mix of controllers if it will be beneficial to your project. For what it's worth, real video cards have a bunch of independent memory controllers as it allows video core to access several non-adjacent memory regions on the same clock, but adds it's own level of complexities regarding managing what is stored on each of connected memory devices.

Quote from: nockieboy on December 29, 2022, 11:06:18 am

Once the 32-bit DDR3 layout and traces are done, as far as I can see that's the hardest part done; I'll route out the remainder of the IO to the mezzanine connectors and, as these run at much lower speeds, selection and routing will be slightly less of a concern. The manual I linked to above for the AC7100B includes GTP transceivers and differential signal paths? I don't know much about these and am not looking to include them (or their power supplies) on the core FPGA card, unless anyone can convince me they're needed?

I understand your desire to rush to PCB design, but trust me, the time spent planning is the time well spent as it might ultimately save you a bunch of money on PCB respins and redesigns.

32 bit DDR3 layout might be the hardest one, but certainly not the fastest one. In your case HDMI will probably be the fastest one (beyond GTPs). As per official specs any regular I/O can go up to 1.25 Gbps per line, but in reality I've never seen an FPGA device that is not capable of doing 1080p@60 at 1.45 Gpbs per differential line. These are certainly faster than 400 MHz DDR3 signals, so don't get too caught up with DDR3 being the biggest obstacle. It might seem that way on a first glance, but it doesn't mean that it actually is. I know DDR3 looks intimidating, trust me I felt the same way back when I was about to embark on my very first one, but it might very well be simply a red herring with real complexity being elsewhere.

GTP are multi-gigabit transceivers and are in my opinion the most exciting feature of these devices, so I would absolutely recommend you to route them to the high-speed connectors and wire up their power supplies. The PDS for GTPs is a bit involed because it requires two ultra-clean power rails (<10 mVpp ripple is no joke!) at 1 and 1.2 V nominal, and the signal routing needs to be very careful as these lines can potentially run at over 6 Gbps, but that is precisely what makes them so exciting. That is my opinion of course and you are free to ignore it, but I think you will miss out a lot if you skip that part.

asmi · « **Reply #24 on:** December 29, 2022, 04:34:29 pm »

Quote from: dolbeau on December 29, 2022, 03:10:01 pm

I've looked into doing the same thing (FPGA + power + flash + DDR3) for my projects that currently use a ZTex 2.13a plugged into my boards, but most examples/reference materials suggest 8 layers (expensive PCB!), and routing the high-speed signals for the DDR3 still seem too complex for my skills at least when I look at Xilinx UG583.

With recent pricing update at JLCPCB it's no longer the case, as five 10x10 cm 8 layer PCBs are now under $100.

BrianHG · « **Reply #25 on:** December 29, 2022, 06:08:25 pm »

Quote from: asmi on December 29, 2022, 03:21:35 pm

Quote from: nockieboy on December 29, 2022, 11:06:18 am
I'm aiming to create the same 32-bit data bus width with up to 25Gb/s bandwidth - hopefully that'll be sufficient for the project, @BrianHG?
That is the question I've asked a few posts back, if neccessary/required, it's possible to implement two 32 bit controllers, or some other mix of controllers if it will be beneficial to your project. For what it's worth, real video cards have a bunch of independent memory controllers as it allows video core to access several non-adjacent memory regions on the same clock, but adds it's own level of complexities regarding managing what is stored on each of connected memory devices.

Quote from: nockieboy on December 29, 2022, 11:06:18 am
Once the 32-bit DDR3 layout and traces are done, as far as I can see that's the hardest part done; I'll route out the remainder of the IO to the mezzanine connectors and, as these run at much lower speeds, selection and routing will be slightly less of a concern. The manual I linked to above for the AC7100B includes GTP transceivers and differential signal paths? I don't know much about these and am not looking to include them (or their power supplies) on the core FPGA card, unless anyone can convince me they're needed?
I understand your desire to rush to PCB design, but trust me, the time spent planning is the time well spent as it might ultimately save you a bunch of money on PCB respins and redesigns.

32 bit DDR3 layout might be the hardest one, but certainly not the fastest one. In your case HDMI will probably be the fastest one (beyond GTPs). As per official specs any regular I/O can go up to 1.25 Gbps per line, but in reality I've never seen an FPGA device that is not capable of doing 1080p@60 at 1.45 Gpbs per differential line. These are certainly faster than 400 MHz DDR3 signals, so don't get too caught up with DDR3 being the biggest obstacle. It might seem that way on a first glance, but it doesn't mean that it actually is. I know DDR3 looks intimidating, trust me I felt the same way back when I was about to embark on my very first one, but it might very well be simply a red herring with real complexity being elsewhere.

Since the Artix7 can go faster than 800mbps, I'm sure Nockieboy will want to maximize DDR3 clock speed.
If Nockieboy keeps on using my multiport, and I hope Xilinx's DDR3 controller is smart enough, there shouldn't be a problem with a single controller 32bit or even 64bit wide. My multiport keeps track of which row is loaded in each bank and interleaves or sequences bursts accordingly. IE: DDR3 has 8 banks, store your 2 frame buffers in banks 0 and 1, store textures in banks 2 through 5, 1/4 of the res each quadratured for super-fast fast bi-linear filtering, and use the remaining banks for geometry, software and sound, and the access should be as efficient as if you had 8 individual controllers. (If Xilinx's controller cant's keep previously used banks open until a mandatory refresh, then my DDR3 controller is a lot smarter than theirs...)

Note, the DDR3 addressing should be set to BANK-ROW-COLUMN for this pseudo 8 channel DDR3 controller.

BrianHG · « **Reply #26 on:** December 30, 2022, 11:59:40 am »

Though if Xilinx's DDR3 controller not only leaves banks open until a necessary bank change or refresh, but if it properly supports 'Dual-Rank' mode, then it is smarter than mine and it would be a plus to use a dual-rank SODIMM module. This will grant absurd speed and if accessed properly, my multiport feeding a dual rank controller will grant near 0 penalty clock cycle continuous memory access stream as if it were a fat 512bit wide static ram running at 1/8 the DDR_CK clock frequency. (This means programming a 3D engine which works in multiple chunks of 512bits)

Though, Nockieboy's skills in HDL and software architecture will need some serious planning and foresight to take full use of this setup, but for simple stuff, you could conceivably render Quake I/II in real-time with a single laptop 1.5GHz, dual-rank 64bit SODIMM module. Or, ~ Sony PS2 territory.

However, I think you will need to go from a Z80 to a >100 MHz 68060 with FPU to handle the game mechanics.

Miti · « **Reply #27 on:** December 30, 2022, 12:00:59 pm »

Quote from: asmi on December 22, 2022, 10:13:16 pm

Alternatively you can use modules like this one: https://www.monolithicpower.com/mpm3683-7.html Their advantage is that they already have an integrated inductor, so all you need to use it is 2 resistors to set the voltage, a couple of caps for decoupling and filtering. They take less space than a "discrete" solution would, and are easier to route because one of the most critical traces (FET-to-inductor loop) is already done inside the package, so you literally just slap a cap or two nearby and you are done. The downside is the higher price - although it's not as clear-cut as it seem because you need more discrete parts for "discrete" solution, so sometimes modules actually end up cheaper. I personally prefer using modules when I need to pack things as tight as possible (typical problem for "compute modules" like that FPGA module we're talking about), but if space is not a big problem, you can use discrete converters as well.

Those modules are amazing but are a nightmare to solder if you don’t have a proper stencil, pick and place machine and an x-ray machine to see what’s underneath. Please be aware.

Quote from: nockieboy on December 28, 2022, 12:22:39 pm

I've downloaded KiCAD 6 and see virtually nothing has changed. I just don't have time to wade through padded-out video tutorials (all people seem to make these days to attract the YouTube algorithm) on how to do basic things like set up multiple schematics etc. when I could be launching straight into parts selection and wiring them up in a schematic in EasyEDA. It helps that I've already got schematics for all the IO etc. done already. All I really need to do is design the power supplies, wire up the FPGA's IOs and configuration and I can move on to PCB design.

I'm going to stick with EasyEDA for these PCB designs as it'll take me long enough to complete them on an EDA I'm familiar with, let alone one I need to learn how to use. Once it's done, I'll see about transferring the schematics and PCBs to KiCAD, if the interest is out there for it. Or I might see if I can progress both alongside each other. Depends on a lot. Change of job role coming up in February which means the amount of time I have to do any of this is really hard to tell at the moment.

Maybe I'll be forced to switch to it if there's a killer feature I need for this project that EasyEDA doesn't do but KiCAD does - we'll have to wait and see.

It’s interesting how different people go through the same set of emotions when it’s about change and learning new things. I tried several times in the past to switch from Eagle to Kicad, after being an Eagle fan boy for… ever. Once I found the patience and the willing to sit and understand what and how Kikad is doing things and considering the fact that it is free and reliable, I would not go back to Eagle even if Autodesk would stop being dickheads with the licensing model. We all go through the same emotional phases of change apparently.

https://insights.lamarsh.com/hs-fs/hubfs/Kubler-Ross-Model.png?width=1500&name=Kubler-Ross-Model.png

BrianHG · « **Reply #28 on:** December 30, 2022, 12:13:00 pm »

Quote from: Miti on December 30, 2022, 12:00:59 pm

Quote from: nockieboy on December 28, 2022, 12:22:39 pm
I've downloaded KiCAD 6 and see virtually nothing has changed. I just don't have time to wade through padded-out video tutorials (all people seem to make these days to attract the YouTube algorithm) on how to do basic things like set up multiple schematics etc. when I could be launching straight into parts selection and wiring them up in a schematic in EasyEDA. It helps that I've already got schematics for all the IO etc. done already. All I really need to do is design the power supplies, wire up the FPGA's IOs and configuration and I can move on to PCB design.

I'm going to stick with EasyEDA for these PCB designs as it'll take me long enough to complete them on an EDA I'm familiar with, let alone one I need to learn how to use. Once it's done, I'll see about transferring the schematics and PCBs to KiCAD, if the interest is out there for it. Or I might see if I can progress both alongside each other. Depends on a lot. Change of job role coming up in February which means the amount of time I have to do any of this is really hard to tell at the moment.

Maybe I'll be forced to switch to it if there's a killer feature I need for this project that EasyEDA doesn't do but KiCAD does - we'll have to wait and see.

It’s interesting how different people go through the same set of emotions when it’s about change and learning new things. I tried several times in the past to switch from Eagle to Kicad, after being an Eagle fan boy for… ever. Once I found the patience and the willing to sit and understand what and how Kikad is doing things and considering the fact that it is free and reliable, I would not go back to Eagle even if Autodesk would stop being dickheads with the licensing model. We all go through the same emotional phases of change apparently.

https://insights.lamarsh.com/hs-fs/hubfs/Kubler-Ross-Model.png?width=1500&name=Kubler-Ross-Model.png

Before I started with Protel 98/99/99se (now called Altium), I was using 'Advanced PCB' on my Amiga 3000/4000. You wouldn't believe the s...t I went through as the Advance PCB has some beautiful automated tools and the time I invested over the years learning it's ins and outs. I was lucky to find at the time a Protel User group where we all helped each other out and it took months, but I learned away and became an avid Protel user.

I watched videos from Dave and others playing with the latest Kicad on youtube. I looked over the latest Kicad features on their website and I can firmly say for a public domain PCB tool, it has really matured and worth the effort to learn if you will be creating a public domain PCB. It is recognized enough and has a large enough community of users where I feel comfortable recommending it and not worrying that you will be left out in the cold with an impossible issue if you use it even for a relatively complex 12 layer PCB. If you are making a PCB for sale or commercial applications as professional work, then I would say go to Altium (also need to buy a license), but this is not the case here.

Miti · « **Reply #29 on:** December 30, 2022, 12:48:12 pm »

Quote from: BrianHG on December 30, 2022, 12:13:00 pm

I watched videos from Dave and others playing with the latest Kicad on youtube. I looked over the latest Kicad features on their website and I can firmly say for a public domain PCB tool, it has really matured and worth the effort to learn if you will be creating a public domain PCB. It is recognized enough and has a large enough community of users where I feel comfortable recommending it and not worrying that you will be left out in the cold with an impossible issue if you use it even for a relatively complex 12 layer PCB. If you are making a PCB for sale or commercial applications as professional work, then I would say go to Altium (also need to buy a license), but this is not the case here.

I work for a contract manufacturer and we get all kind of CAD files from our customers. After learning Kicad and making few boards with it I can say this with confidence: More than 90% of the professional, commercial boards that we assemble could be made with Kicad, and I think that's a conservative number.

nockieboy · « **Reply #30 on:** December 30, 2022, 01:32:49 pm »

Quote from: asmi on December 29, 2022, 03:21:35 pm

Quote from: nockieboy on December 29, 2022, 11:06:18 am
If it's any consolation, I'm not designing the schematics and assigning I/O 'blind'. I'm using reference design(s) wherever possible - I've not found a lot on the XC7A100T unfortunately (no full schematics, anyway), but I've got some reference material I'm using based on the AC7100B board. It uses two DDR3 chips and includes their pin assignments, so I'm hoping that mirroring those assignments will avoid a lot of the pitfalls that you've mentioned re: pin selection. There's no detailed PCB drawings unfortunately, so routing two 16-bit datapaths will still be on me.
Don't do this, especially using random chinese board as a reference is not the greatest idea. Instead, get Vivado/Vitis set up and come up with your own, and odds are you are going to change it yet again when you will do routing, which will need to be verified again.
But before you do any of that, come up with an exact list of stuff you want connected to the FPGA to do some planning, and so you will know which I/O interfaces you will need to implement.

Oh? I figured following an existing working schematic, I couldn't go too far wrong.

I'm installing Vivado via the unified installer as we speak. Just hoping the simulation side of things isn't too different from what I've been using - I really need to get a grip of simulation.

Quote from: asmi on December 29, 2022, 03:21:35 pm

I understand your desire to rush to PCB design, but trust me, the time spent planning is the time well spent as it might ultimately save you a bunch of money on PCB respins and redesigns.

I know, I can be a little hot-headed and want to rush in to the fun stuff.

Thankfully you guys are here to reign me in a little!

Quote from: asmi on December 29, 2022, 03:21:35 pm

32 bit DDR3 layout might be the hardest one, but certainly not the fastest one. In your case HDMI will probably be the fastest one (beyond GTPs). As per official specs any regular I/O can go up to 1.25 Gbps per line, but in reality I've never seen an FPGA device that is not capable of doing 1080p@60 at 1.45 Gpbs per differential line. These are certainly faster than 400 MHz DDR3 signals, so don't get too caught up with DDR3 being the biggest obstacle. It might seem that way on a first glance, but it doesn't mean that it actually is. I know DDR3 looks intimidating, trust me I felt the same way back when I was about to embark on my very first one, but it might very well be simply a red herring with real complexity being elsewhere.

Good point. My inexperience showing through again, not really understanding the relative speeds of all these different protocols. To me though, DDR3 is a black box and although I tried routing to a PSRAM (I think it was) aaaages ago, I needed your help in finishing that. And that's nowhere near as complex as this project will be. I have the vaguest understanding of trace impedance and impedance-matching from the very limited work I've done with USB on these various projects, but on the scale of a 32-bit bus on a 6- (or more!) layer board...

Quote from: asmi on December 29, 2022, 03:21:35 pm

GTP are multi-gigabit transceivers and are in my opinion the most exciting feature of these devices, so I would absolutely recommend you to route them to the high-speed connectors and wire up their power supplies. The PDS for GTPs is a bit involed because it requires two ultra-clean power rails (<10 mVpp ripple is no joke!) at 1 and 1.2 V nominal, and the signal routing needs to be very careful as these lines can potentially run at over 6 Gbps, but that is precisely what makes them so exciting. That is my opinion of course and you are free to ignore it, but I think you will miss out a lot if you skip that part.

Okay, well I can't think what I would use GTPs for, so not including them on the core board would remove the need for two additional power supplies and free up the PCB real-estate they would otherwise consume. The downside is that the core board wouldn't have GTPs and there'd be no option for their use on a carrier board as a result. If I include them on the core board, I don't have to populate the power supplies or use them on the carrier board I suppose, but someone else might want to, so I really should include them I guess.

Quote from: BrianHG on December 30, 2022, 11:59:40 am

Though, Nockieboy's skills in HDL and software architecture will need some serious planning and foresight to take full use of this setup, but for simple stuff, you could conceivably render Quake I/II in real-time with a single laptop 1.5GHz, dual-rank 64bit SODIMM module. Or, ~ Sony PS2 territory.

However, I think you will need to go from a Z80 to a >100 MHz 68060 with FPU to handle the game mechanics.

Yeah, my HDL skills aren't great, though they've come on leaps and bounds since I first learned about FPGAs... The CPU is on the list for an upgrade. Even the limited work I've been able to do on software for the uCOM is highlighting some annoying limitations with the Z80, I have been outlining plans for a 68020 version, but that's a long way off yet. Maybe I should stretch the design to an 030 or 040 clocked at 50MHz or more..

Quote from: Miti on December 30, 2022, 12:00:59 pm

Those modules are amazing but are a nightmare to solder if you don’t have a proper stencil, pick and place machine and an x-ray machine to see what’s underneath. Please be aware.

I'm looking at using these for the power supplies instead: TPS62823.

Quote from: Miti on December 30, 2022, 12:00:59 pm

It’s interesting how different people go through the same set of emotions when it’s about change and learning new things. I tried several times in the past to switch from Eagle to Kicad, after being an Eagle fan boy for… ever. Once I found the patience and the willing to sit and understand what and how Kikad is doing things and considering the fact that it is free and reliable, I would not go back to Eagle even if Autodesk would stop being dickheads with the licensing model. We all go through the same emotional phases of change apparently.

https://insights.lamarsh.com/hs-fs/hubfs/Kubler-Ross-Model.png?width=1500&name=Kubler-Ross-Model.png

Yeah, I'm definitely somewhere between anger and depression with KiCAD at the moment. Wait 'til I start using Vivado!

Quote from: BrianHG on December 30, 2022, 12:13:00 pm

Before I started with Protel 98/99/99se (now called Altium), I was using 'Advanced PCB' on my Amiga 3000/4000.

Would have given my left leg for an A3000 or 4000 back in the day. Loved my A1200, and A500 Plus before it.

Quote from: BrianHG on December 30, 2022, 12:13:00 pm

I watched videos from Dave and others playing with the latest Kicad on youtube. I looked over the latest Kicad features on their website and I can firmly say for a public domain PCB tool, it has really matured and worth the effort to learn if you will be creating a public domain PCB. It is recognized enough and has a large enough community of users where I feel comfortable recommending it and not worrying that you will be left out in the cold with an impossible issue if you use it even for a relatively complex 12 layer PCB.

Quote from: Miti on December 30, 2022, 12:48:12 pm

I work for a contract manufacturer and we get all kind of CAD files from our customers. After learning Kicad and making few boards with it I can say this with confidence: More than 90% of the professional, commercial boards that we assemble could be made with Kicad, and I think that's a conservative number.

Yeah okay, I'm feeling the pro-KiCAD vibe here and will endeavour to put some more effort into understanding and using it. You're all making a very good argument for it.

You know, EasyEDA is free as well, and makes ordering the PCBs as easy as a couple of clicks of the mouse, no messing around with BOMs and all the other complexity KiCAD has.

BrianHG · « **Reply #31 on:** December 30, 2022, 02:00:33 pm »

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Quote from: BrianHG on December 30, 2022, 12:13:00 pm
Before I started with Protel 98/99/99se (now called Altium), I was using 'Advanced PCB' on my Amiga 3000/4000.

Would have given my left leg for an A3000 or 4000 back in the day. Loved my A1200, and A500 Plus before it.

I had one of the first A3000 and A4000 here in Montreal. Only the Universities got a few A3000s before me.

See the attached 8 layer Excalibur 040 accelerator PCB I made for RCS Management on an A4000 for an A4000.

Also see my 4 layer Turbo AGA which I never released. I managed to accelerate Amiga's custom chipset from 14.3/28.6 MHz to a whopping 27/54MHz. The A4000 flew absurdly fast and now had 54Khz audio plus a floppy drive which was almost twice as fast at reading. Too bad the system clock was too fast, you needed a true multiscan monitor, but, we now had authentic native 800x600, 1024x768 and 1280x1024 laced video modes. (Sadly, writing corrupted the discs... At the time, I only managed to patch the serial port which also went absurdly smooth.)

(PCB screenshots only show the top and bottom layers plus power & gnd)

nockieboy · « **Reply #32 on:** December 30, 2022, 03:25:20 pm »

Quote from: BrianHG on December 30, 2022, 02:00:33 pm

Quote from: nockieboy on December 30, 2022, 01:32:49 pm
Would have given my left leg for an A3000 or 4000 back in the day. Loved my A1200, and A500 Plus before it.

I had one of the first A3000 and A4000 here in Montreal. Only the Universities got a few A3000s before me.

Was that personally, or as part of your job?

Quote from: BrianHG on December 30, 2022, 02:00:33 pm

See the attached 8 layer Excalibur 040 accelerator PCB I made for RCS Management on an A4000 for an A4000.

Also see my 4 layer Turbo AGA which I never released. I managed to accelerate Amiga's custom chipset from 14.3/28.6 MHz to a whopping 27/54MHz. The A4000 flew absurdly fast and now had 54Khz audio plus a floppy drive which was almost twice as fast at reading. Too bad the system clock was too fast, you needed a true multiscan monitor, but, we now had authentic native 800x600, 1024x768 and 1280x1024 laced video modes. (Sadly, writing corrupted the discs... At the time, I only managed to patch the serial port which also went absurdly smooth.)

(PCB screenshots only show the top and bottom layers plus power & gnd)

Nice work.

That's the sort of thing I longed to do as a kid with my Amstrad and Amigas, enhance them with add-ons or tweak their electronics to eek out some extra performance, but I didn't have the electronics knowledge I have now, thanks to the Internet.

The floppy was still able to read existing formats? If I tried doubling the clock speed on my Amstrad, it would probably read and write at twice the density.

There's something about a PCB layout that I find almost mesmerising. I hope I'm not the only one who can enjoy a good trace layout, especially if you've designed it yourself and know the problems you've faced getting that last trace to its destination.

Miti · « **Reply #33 on:** December 30, 2022, 04:44:20 pm »

Quote from: nockieboy on December 30, 2022, 03:25:20 pm

There's something about a PCB layout that I find almost mesmerising. I hope I'm not the only one who can enjoy a good trace layout, especially if you've designed it yourself and know the problems you've faced getting that last trace to its destination.

Amen to that! I see PCB design as modern art. Some boards, you want to cover your eyes when you see them, some are on a par with a Picasso or Dali.

asmi · « **Reply #34 on:** December 30, 2022, 05:31:11 pm »

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Oh? I figured following an existing working schematic, I couldn't go too far wrong.

It's not that simple with FPGAs because pinout is partly driven by the kind of I/O logic you want to implement. Not to mention a lot of Chinese boards suffer from many issues like underpowered PDS and bad thermal design, and I don't want you accidentally "importing" those problems into your boards.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

I'm installing Vivado via the unified installer as we speak. Just hoping the simulation side of things isn't too different from what I've been using - I really need to get a grip of simulation.

If anything, it's more simple in Vivado, because it has a separation between "real" hardware HDL and simulation-only stuff, plus the simulator is integrated into IDE itself, while it still officially supports using external simulators if you choose to do so.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

I know, I can be a little hot-headed and want to rush in to the fun stuff. Thankfully you guys are here to reign me in a little!

No worries, I'm more like you, which is why I've wasted a ton of money on dud PCBs due to stupid mistakes because "I just wanna build it already!!!" mentality

You wouldn't believe how many times I found errors in my PCBs right after I sent it out to manufacturing and it was too late to change/cancel the order so I was waiting for boards knowing they are duds! That's why I've introduced some house rules for myself to slow things down, and recommend others to do the same. Delay of one day before your place an order is not going to matter to a hobbyist, but it often mean a ton because you can discover some problems with your PCB when you open it up on the next day with a fresh mind. And since we're now working with pricey stuff, taking it easy might just mean having one less revision of a PCB and saving you in the best case some of face palming once you discover the problem, or in the worst case some fireworks and smoke from expensive components. As you climb up the complexity ladder, the stuff you work with becomes progressively more expensive, so it's good to learn good habits from the get-go, or it can literally cost you quite a bit of money.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Good point. My inexperience showing through again, not really understanding the relative speeds of all these different protocols. To me though, DDR3 is a black box and although I tried routing to a PSRAM (I think it was) aaaages ago, I needed your help in finishing that. And that's nowhere near as complex as this project will be. I have the vaguest understanding of trace impedance and impedance-matching from the very limited work I've done with USB on these various projects, but on the scale of a 32-bit bus on a 6- (or more!) layer board...

That complexity is very overblown. All of that impedance stuff comes down to setting up a specific trace widths and spacings, so once you calculate the widths (it's usually different for different signal layers) you only need to remember to set it for any trace you are routing, and that's it. Routing DDR is a lot like playing chess - you need to think ahead in order to not route yourself into a corner, so as you route a trace, you need to think about how you can route it in a such a way as to be able to also route adjacent traces. I actually like it exactly because how methodical it is.

As for 6 or more layers - you seem to think that having more layers makes routing harder, but in reality it makes it easier because you have more signal layers and so more possibilities of how to route your signals! And the more layers you have, the easier it is to route! This is why nowadays I do 4 layer boards even for simple designs - because my time is worth more than the cost differential between manufacturing a 2 layer and a 4 layer PCB.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Okay, well I can't think what I would use GTPs for, so not including them on the core board would remove the need for two additional power supplies and free up the PCB real-estate they would otherwise consume. The downside is that the core board wouldn't have GTPs and there'd be no option for their use on a carrier board as a result. If I include them on the core board, I don't have to populate the power supplies or use them on the carrier board I suppose, but someone else might want to, so I really should include them I guess.

Oh, I thought I already gave you some examples. Here are some I can think of right off top of my head:
1. Want to have access to a hard drive? No problem - you can implement a SATA interface using GTP and access any SATA HDD or SSD.
2. Want to work with something even faster and physically smaller? No problemo, you can implement an M.2 port and connect NVMe SSD to it.
3. Maybe you want to add ability to output something more modern than 1080p@60 - like 1080p@120, or 1440p@144, or even 2160p@120 (that's 4K resolution)? Again, GTPs come to a rescue - both HDMI 2.0 and DisplayPort 1.4 can be implemented (you will need some external circuitry for HDMI 2.0, but that's not the point).
4. Eventually you are going to run out of logic inside A100T (and you only need to look at what you already went through to see that it's bound to happen as some point). So you will need to somehow split your logic among two (three, four, ...) FPGAs. But how do you connect them to give you a high-bandwidth link such that this interconnect won't hinder your design? How about having to route just four differential traces and having almost as much bandwidth as your existing 16 bit wide DDR3 interface has? Or even going all way in and implementing a PCI Express root port with a bunch of PCI Express slots where you can plug in other FPGA boards, or even some off-the-shelf cards like USB 3 interface adapter? GTPs make all of that possible.
That list can go on and on. 2.5G Ethernet? Sure! Fiber connection? Here you go! 4 1G Ethernet ports using just one pair of differential traces (QSGMII)? Again, not a problem! So, yeah, unless you are really in a pickle, not routing out GTPs is a biiig mistake in my opinion. It doesn't mean you have to use them right away - hence my suggestion for module/baseboard split. Respining or even redesigning baseboard is likely going to be much cheaper than respining the whole FPGA board.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Yeah, my HDL skills aren't great, though they've come on leaps and bounds since I first learned about FPGAs... The CPU is on the list for an upgrade. Even the limited work I've been able to do on software for the uCOM is highlighting some annoying limitations with the Z80, I have been outlining plans for a 68020 version, but that's a long way off yet. Maybe I should stretch the design to an 030 or 040 clocked at 50MHz or more..

Vivado ships with a free Microblaze CPU (which is heavily inspured by MIPS microarchitecture), together with a software toolchain (Vitis), source code-level line-by-line debugging and everything else you expect to have in a modern software development IDE. And it can be configured so that it can run Linux. Also it's quite small (in terms of resources) so you can even put multiple of them to make a multi-core design.

Of course nothing prevents you from using any other CPU cores. As long as you have full HDL code for it, you can just plug it in.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

I'm looking at using these for the power supplies instead: TPS62823.

Be careful with TI parts, as someone who preferentially used their parts in the past I can tell you that their inventory is VERY sketchy nowadays. Soldering LGA modules is not a problem, no x-ray is required unless you plan to do a mass production, a hot air gun is all that required. And since a lot of these modules are designed to work at very high temperature (125°C is not uncommon), you need to try really hard to burn them down with a hot air gun, especially if you use leaded solder.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

Yeah, I'm definitely somewhere between anger and depression with KiCAD at the moment. Wait 'til I start using Vivado!

Yeah okay, I'm feeling the pro-KiCAD vibe here and will endeavour to put some more effort into understanding and using it. You're all making a very good argument for it. You know, EasyEDA is free as well, and makes ordering the PCBs as easy as a couple of clicks of the mouse, no messing around with BOMs and all the other complexity KiCAD has.

My problem with EasyEDA is that it's web-based, and consequently slow. I still remember how painful it was to route like 10 traces, now I shudder to think what it's gonna be like routing close to a hundred of them! KiCAD on the other hand is a desktop application, and so it can take advantage of the GPU to do the heavy lifting (ooh the irony of using a power of GPU to design a GPU!

), that's why it has no problems displaying very complex PCBs.
Also I'm not exactly sure how versioning works in EasyEDA, while KiCAD uses text-based files which can be managed by git for example.

Quote from: nockieboy on December 30, 2022, 01:32:49 pm

There's something about a PCB layout that I find almost mesmerising. I hope I'm not the only one who can enjoy a good trace layout, especially if you've designed it yourself and know the problems you've faced getting that last trace to its destination.

Don't worry - you will get it all with this design! There is going to be plenty of routing-tearing down cycles. Make sure you use a version control so that you can always go back to previous state - because you will need to go back.

nockieboy · « **Reply #35 on:** December 30, 2022, 08:37:19 pm »

Quote from: asmi on December 30, 2022, 05:31:11 pm

No worries, I'm more like you, which is why I've wasted a ton of money on dud PCBs due to stupid mistakes because "I just wanna build it already!!!" mentality You wouldn't believe how many times I found errors in my PCBs right after I sent it out to manufacturing and it was too late to change/cancel the order so I was waiting for boards knowing they are duds! That's why I've introduced some house rules for myself to slow things down, and recommend others to do the same. Delay of one day before your place an order is not going to matter to a hobbyist, but it often mean a ton because you can discover some problems with your PCB when you open it up on the next day with a fresh mind. And since we're now working with pricey stuff, taking it easy might just mean having one less revision of a PCB and saving you in the best case some of face palming once you discover the problem, or in the worst case some fireworks and smoke from expensive components. As you climb up the complexity ladder, the stuff you work with becomes progressively more expensive, so it's good to learn good habits from the get-go, or it can literally cost you quite a bit of money.

I've been lucky so far with my board designs, only having one or at most two occasions where I've spotted a mistake after making the order. Fortunately on both occasions it was rectifiable with some fine wire or breaking a trace and solder-bridging another one. That's not something I want to be doing with a board costing me ten times as much... and any errors on internal layers won't be fixable at all!

Quote from: asmi on December 30, 2022, 05:31:11 pm

As for 6 or more layers - you seem to think that having more layers makes routing harder, but in reality it makes it easier because you have more signal layers and so more possibilities of how to route your signals! And the more layers you have, the easier it is to route! This is why nowadays I do 4 layer boards even for simple designs - because my time is worth more than the cost differential between manufacturing a 2 layer and a 4 layer PCB.

Yes, you're right, I don't know why I get anxious about more layers. I guess in my mind it means more complexity, more things to go wrong.

Quote from: asmi on December 30, 2022, 05:31:11 pm

Oh, I thought I already gave you some examples. Here are some I can think of right off top of my head:
1. Want to have access to a hard drive? No problem - you can implement a SATA interface using GTP and access any SATA HDD or SSD.
2. Want to work with something even faster and physically smaller? No problemo, you can implement an M.2 port and connect NVMe SSD to it.
3. Maybe you want to add ability to output something more modern than 1080p@60 - like 1080p@120, or 1440p@144, or even 2160p@120 (that's 4K resolution)? Again, GTPs come to a rescue - both HDMI 2.0 and DisplayPort 1.4 can be implemented (you will need some external circuitry for HDMI 2.0, but that's not the point).
4. Eventually you are going to run out of logic inside A100T (and you only need to look at what you already went through to see that it's bound to happen as some point). So you will need to somehow split your logic among two (three, four, ...) FPGAs. But how do you connect them to give you a high-bandwidth link such that this interconnect won't hinder your design? How about having to route just four differential traces and having almost as much bandwidth as your existing 16 bit wide DDR3 interface has? Or even going all way in and implementing a PCI Express root port with a bunch of PCI Express slots where you can plug in other FPGA boards, or even some off-the-shelf cards like USB 3 interface adapter? GTPs make all of that possible.
That list can go on and on. 2.5G Ethernet? Sure! Fiber connection? Here you go! 4 1G Ethernet ports using just one pair of differential traces (QSGMII)? Again, not a problem! So, yeah, unless you are really in a pickle, not routing out GTPs is a biiig mistake in my opinion. It doesn't mean you have to use them right away - hence my suggestion for module/baseboard split. Respining or even redesigning baseboard is likely going to be much cheaper than respining the whole FPGA board.

Sold. I'll make sure to include GTPs then. And to read up on them.

Quote from: asmi on December 30, 2022, 05:31:11 pm

Quote from: nockieboy on December 30, 2022, 01:32:49 pm
Yeah, my HDL skills aren't great, though they've come on leaps and bounds since I first learned about FPGAs... The CPU is on the list for an upgrade. Even the limited work I've been able to do on software for the uCOM is highlighting some annoying limitations with the Z80, I have been outlining plans for a 68020 version, but that's a long way off yet. Maybe I should stretch the design to an 030 or 040 clocked at 50MHz or more..
Vivado ships with a free Microblaze CPU (which is heavily inspured by MIPS microarchitecture), together with a software toolchain (Vitis), source code-level line-by-line debugging and everything else you expect to have in a modern software development IDE. And it can be configured so that it can run Linux. Also it's quite small (in terms of resources) so you can even put multiple of them to make a multi-core design.

Of course nothing prevents you from using any other CPU cores. As long as you have full HDL code for it, you can just plug it in.

There's something about designing and building it myself using genuine discrete components that I don't want to lose in all this FPGA excitement, but you're right - I could just use a CPU IP core in the FPGA itself (especially with all the extra space) and it'd run at ludicrous speeds and save me significant costs in time and money.

Quote from: asmi on December 30, 2022, 05:31:11 pm

Quote from: nockieboy on December 30, 2022, 01:32:49 pm
I'm looking at using these for the power supplies instead: TPS62823.
Be careful with TI parts, as someone who preferentially used their parts in the past I can tell you that their inventory is VERY sketchy nowadays. Soldering LGA modules is not a problem, no x-ray is required unless you plan to do a mass production, a hot air gun is all that required. And since a lot of these modules are designed to work at very high temperature (125°C is not uncommon), you need to try really hard to burn them down with a hot air gun, especially if you use leaded solder.

The reason I'm looking at using those TI parts is purely because they're available and in stock - and a good price. I've been using low-temp solder for all my recent PCB work - I think it's about 180 degrees Celsius (without going to look). Obviously I'll have to work with whatever's on the bottom of any BGA packages I'm soldering, but if I use a solder stencil I can always add a thin layer of low-temp solder to help ease the melting point a little lower.

Quote from: asmi on December 30, 2022, 05:31:11 pm

My problem with EasyEDA is that it's web-based, and consequently slow. I still remember how painful it was to route like 10 traces, now I shudder to think what it's gonna be like routing close to a hundred of them!

I've had no issues with speed with EasyEDA? The auto-router was very hit and miss and not as good as DipTrace, but I haven't used auto-route since before the first GPU card I made, when BrianHG shamed me into routing it by hand and I discovered the joys of a good layout.

My experience with EasyEDA has left me VERY reticent to want to try anything else.

Quote from: asmi on December 30, 2022, 05:31:11 pm

KiCAD on the other hand is a desktop application, and so it can take advantage of the GPU to do the heavy lifting (ooh the irony of using a power of GPU to design a GPU! ), that's why it has no problems displaying very complex PCBs.
Also I'm not exactly sure how versioning works in EasyEDA, while KiCAD uses text-based files which can be managed by git for example.

I may not be using EasyEDA properly, but as far as versioning goes I just clone the project and carry on with the changes. Not as flexible as git versioning, admittedly, but has served me well so far.

Talking of KiCAD, I've just tried to make a start on the new PCB schematic project and two things hit me immediately. The first (a positive!) is that I can use a 'dark mode' theme, which is excellent when doing schematics and not something I can do with EasyEDA. The second (a negative), is that I'm stuck at a blank schematic. If I'm going to re-create the existing (incomplete) project from EasyEDA, I need to create 9 schematic pages - the first being the title page - but when I look up how to do a 'flat' schematic, all I get back is that hierarchical schematics are (somehow) better for everything and you should use those as KiCAD does them far better.

I can see why hierarchical schematics would be useful, if you're repeating the same sub-circuit constantly, but for what I'm doing they're completely pointless. I want FPGA config on one schematic, DDR on another, power supplies on another, etc. I don't want to create a hierarchical schematic with a box on the main page with potentially over 200 signals coming out of it. I just want nine different schematics that share global labels.

Quote from: asmi on December 30, 2022, 05:31:11 pm

Quote from: nockieboy on December 30, 2022, 01:32:49 pm
There's something about a PCB layout that I find almost mesmerising. I hope I'm not the only one who can enjoy a good trace layout, especially if you've designed it yourself and know the problems you've faced getting that last trace to its destination.
Don't worry - you will get it all with this design! There is going to be plenty of routing-tearing down cycles. Make sure you use a version control so that you can always go back to previous state - because you will need to go back.

Another reason to stick at KiCAD I suppose, but the issue mentioned above is a real problem for me.

SiliconWizard · « **Reply #36 on:** December 30, 2022, 09:03:42 pm »

Quote from: nockieboy on December 30, 2022, 08:37:19 pm

I can see why hierarchical schematics would be useful, if you're repeating the same sub-circuit constantly, but for what I'm doing they're completely pointless. I want FPGA config on one schematic, DDR on another, power supplies on another, etc. I don't want to create a hierarchical schematic with a box on the main page with potentially over 200 signals coming out of it. I just want nine different schematics that share global labels.

You can define global labels in KiCad and they'll be shared across all sheets.

asmi · « **Reply #37 on:** December 30, 2022, 09:44:21 pm »

Quote from: nockieboy on December 30, 2022, 08:37:19 pm

The reason I'm looking at using those TI parts is purely because they're available and in stock - and a good price. I've been using low-temp solder for all my recent PCB work - I think it's about 180 degrees Celsius (without going to look). Obviously I'll have to work with whatever's on the bottom of any BGA packages I'm soldering, but if I use a solder stencil I can always add a thin layer of low-temp solder to help ease the melting point a little lower.

I personally don't trust power parts which don't have a thermal pad

I've recently found this part: https://www.monolithicpower.com/en/mpm3833c.html It's quite cheap, can be used everywhere (good for BOM consolidation), and is low-ripple (important for sensitive analog circuits like GTPs, PLL, DAC/ADC, etc.). I plan to buy a bunch of them as I'm sure I'll find a good use for them. As for the main Vccint rail, so far I'm leaning towards this part: https://www.monolithicpower.com/mpm3683-7.html It's not quite as cheap, but I like how compact it is.

180°C is about the melting temperature of a regular leaded solder, so I'm not exactly sure what's so "low temperature" about it. When I hear "low temp" I usually think of something like bismuth or indium alloys which melt close to the water boiling temperature (100°C).

Quote from: nockieboy on December 30, 2022, 08:37:19 pm

I've had no issues with speed with EasyEDA? The auto-router was very hit and miss and not as good as DipTrace, but I haven't used auto-route since before the first GPU card I made, when BrianHG shamed me into routing it by hand and I discovered the joys of a good layout. My experience with EasyEDA has left me VERY reticent to want to try anything else.

Maybe because you have nothing to compare it against? Have you ever done a somewhat complex design using something other than EasyEDA? I clearly remember that as soon as I assigned colors to traces, it all ground to a halt, and even their "desktop" version (which is the same Javascript, just running over a local sandbox). And I can't work without such coloring, because it makes it much easier to see at a glance what needs to go there, especially when you work with big BGAs with hundreds of traces coming out of them.

Quote from: nockieboy on December 30, 2022, 08:37:19 pm

Talking of KiCAD, I've just tried to make a start on the new PCB schematic project and two things hit me immediately. The first (a positive!) is that I can use a 'dark mode' theme, which is excellent when doing schematics and not something I can do with EasyEDA. The second (a negative), is that I'm stuck at a blank schematic. If I'm going to re-create the existing (incomplete) project from EasyEDA, I need to create 9 schematic pages - the first being the title page - but when I look up how to do a 'flat' schematic, all I get back is that hierarchical schematics are (somehow) better for everything and you should use those as KiCAD does them far better.

I can see why hierarchical schematics would be useful, if you're repeating the same sub-circuit constantly, but for what I'm doing they're completely pointless. I want FPGA config on one schematic, DDR on another, power supplies on another, etc. I don't want to create a hierarchical schematic with a box on the main page with potentially over 200 signals coming out of it. I just want nine different schematics that share global labels.

It's absolutely possible to design a flat multipage schematics using KiCAD, here is an example from the project in my signature: https://github.com/asmi84/kicad-projects/raw/master/S7_Min/S7_Min.pdf No need for wires sticking out of anything. Most FPGA or SoC projects in KiCAD are done fundamentally the same way I've done it - others just tend to put more effort itno making the first page to not look as empty as mine is, but it's always a good idea to have some sort of title or index page for complex designs which would provide an overview of what's where - in my case I just felt page names are descriptive enough, but they obviously weren't, which became clear once others started looking at it

asmi · « **Reply #38 on:** December 30, 2022, 09:46:24 pm »

Quote from: SiliconWizard on December 30, 2022, 09:03:42 pm

You can define global labels in KiCad and they'll be shared across all sheets.

I still wish there would be a toggle switch somewhere in project settings to make all labels global, like it's done in many other eCADs...

BrianHG · « **Reply #39 on:** December 30, 2022, 10:48:50 pm »

Quote from: nockieboy on December 30, 2022, 03:25:20 pm

Nice work. That's the sort of thing I longed to do as a kid with my Amstrad and Amigas, enhance them with add-ons or tweak their electronics to eek out some extra performance, but I didn't have the electronics knowledge I have now, thanks to the Internet.

The floppy was still able to read existing formats? If I tried doubling the clock speed on my Amstrad, it would probably read and write at twice the density.

There's something about a PCB layout that I find almost mesmerising. I hope I'm not the only one who can enjoy a good trace layout, especially if you've designed it yourself and know the problems you've faced getting that last trace to its destination.

For my work, I was first contracted by RCS management to fix their overheating problem with their Fusion 40 for the Amiga 2000. The Excalibur was still designed by them, but, I was contracted to do the PCB. The prize was the A4000 itself. I was never employed by them.

My turbo AGA had a clock wire which went to the floppy drive's motor crystal. It sped up the floppy drive in tandem with the Amiga chip set. So, loading from floppy was like watching and listening to a video running at 2x speed. Though, game speed and pitch were also all off. Thanks to the Amiga mouse being a parallel interface, the 100Hz refresh rate made it's hand-to-screen connection time beyond amazing. (I still need to try the newer 1kHz sample rate gaming mouses on a PC to compare....)

Though, because of Commodore's half speed motor trick for HD 1.7mb floppies since their Paula was speed limited, with my Turbo AGA, writing to HD floppies still worked unlike 880kb mode where just reading worked fine as the disk speed was too fast for the erase and write head.

miken · « **Reply #40 on:** December 30, 2022, 11:29:57 pm »

Just another vote for KiCAD... Having used a whole bunch of different PCB EDA tools in my career, I'll say that all tools have their annoyances, some more than others. (OrCAD would crash often and sometimes corrupt the project. Xpedition was way too complex for its own good, and yet let odd things through DRC.) I'm perfectly happy using KiCAD for my home projects.

The lock-in from an existing design is real, though, and I totally understand the reluctance to re-do all that work. Maybe you could stay in EasyEDA for the motherboard, and adopt KiCAD for the daughtercard? It means you have to be especially careful about the interface though.

Philosophically speaking, designing an FPGA board or any complex PCB is a lot of bookkeeping, really. There's a lot of subfeatures and requirements that can bite you. So managing that complexity and being methodical is important.

Going back to earlier conversation, if you have the luxury of having (most) of an FPGA project to run through the tools, definitely do as much as you can. Most of the time, projects I've worked on have the HW done before the FPGA, and there's always some pin arrangement that's less than ideal but we have to live with because the HW already shipped.

nockieboy · « **Reply #41 on:** December 31, 2022, 09:49:20 am »

Quote from: asmi on December 30, 2022, 09:44:21 pm

I personally don't trust power parts which don't have a thermal pad
I've recently found this part: https://www.monolithicpower.com/en/mpm3833c.html It's quite cheap, can be used everywhere (good for BOM consolidation), and is low-ripple (important for sensitive analog circuits like GTPs, PLL, DAC/ADC, etc.). I plan to buy a bunch of them as I'm sure I'll find a good use for them. As for the main Vccint rail, so far I'm leaning towards this part: https://www.monolithicpower.com/mpm3683-7.html It's not quite as cheap, but I like how compact it is.

Those MPM3683s are around £7 per piece - a touch on the expensive side for what I want, if I'm honest. The MPM3833s, however, are less than half that at around £3/piece - I'll look a little closer at these as an option once I start worrying about space on the PCB.

Quote from: asmi on December 30, 2022, 09:44:21 pm

180°C is about the melting temperature of a regular leaded solder, so I'm not exactly sure what's so "low temperature" about it. When I hear "low temp" I usually think of something like bismuth or indium alloys which melt close to the water boiling temperature (100°C).

Yeah, maybe it's a typo or I've just got normal solder paste. I think it should say 138 degrees, not 183 - I've had no issues with it, to be honest. Just added a little extra flux to the mix for some of the finer SMD work on QFNs etc. But the fact I'm even questioning it means it's not the best quality and isn't something I'll be using for this project!

I'd be worried using a solder that melts as low as 100 degrees C, specially on power parts and industrial-grade chips. Wouldn't want them dropping off the board under high load!

Quote from: asmi on December 30, 2022, 09:44:21 pm

Maybe because you have nothing to compare it against? Have you ever done a somewhat complex design using something other than EasyEDA?

I've only ever used DipTrace and EasyEDA, so my basis for comparison is rather limited - and DipTrace was limited to 500 pin designs due to the free licence, so it was only simple boards I could design on it (plus I was learning, so it hardly impinged my skills for the first few boards). In fact, DipTrace's pin count limit was one of the reasons I went with a modular computer instead of a single-board design.

Quote from: asmi on December 30, 2022, 09:44:21 pm

I clearly remember that as soon as I assigned colors to traces, it all ground to a halt, and even their "desktop" version (which is the same Javascript, just running over a local sandbox). And I can't work without such coloring, because it makes it much easier to see at a glance what needs to go there, especially when you work with big BGAs with hundreds of traces coming out of them.

That's the one thing I've never done - assigning colours to traces. That must explain our differing experiences of EasyEDA.

Quote from: asmi on December 30, 2022, 09:44:21 pm

It's absolutely possible to design a flat multipage schematics using KiCAD, here is an example from the project in my signature: https://github.com/asmi84/kicad-projects/raw/master/S7_Min/S7_Min.pdf No need for wires sticking out of anything. Most FPGA or SoC projects in KiCAD are done fundamentally the same way I've done it - others just tend to put more effort itno making the first page to not look as empty as mine is, but it's always a good idea to have some sort of title or index page for complex designs which would provide an overview of what's where - in my case I just felt page names are descriptive enough, but they obviously weren't, which became clear once others started looking at it

I thought there must be a simple solution, but when you keep getting hierarchical pages shoved at you on every answer you find on Google...

nockieboy · « **Reply #42 on:** December 31, 2022, 10:03:35 am »

Quote from: BrianHG on December 30, 2022, 10:48:50 pm

For my work, I was first contracted by RCS management to fix their overheating problem with their Fusion 40 for the Amiga 2000. The Excalibur was still designed by them, but, I was contracted to do the PCB. The prize was the A4000 itself. I was never employed by them.

My turbo AGA had a clock wire which went to the floppy drive's motor crystal. It sped up the floppy drive in tandem with the Amiga chip set. So, loading from floppy was like watching and listening to a video running at 2x speed. Though, game speed and pitch were also all off. Thanks to the Amiga mouse being a parallel interface, the 100Hz refresh rate made it's hand-to-screen connection time beyond amazing. (I still need to try the newer 1kHz sample rate gaming mouses on a PC to compare....)

Though, because of Commodore's half speed motor trick for HD 1.7mb floppies since their Paula was speed limited, with my Turbo AGA, writing to HD floppies still worked unlike 880kb mode where just reading worked fine as the disk speed was too fast for the erase and write head.

Did you use your Amigas for recreation as well? How did you get the contract for the work? You must have been pretty experienced in the field for them to choose you?

Quote from: miken on December 30, 2022, 11:29:57 pm

Just another vote for KiCAD... Having used a whole bunch of different PCB EDA tools in my career, I'll say that all tools have their annoyances, some more than others. (OrCAD would crash often and sometimes corrupt the project. Xpedition was way too complex for its own good, and yet let odd things through DRC.) I'm perfectly happy using KiCAD for my home projects.

The lock-in from an existing design is real, though, and I totally understand the reluctance to re-do all that work. Maybe you could stay in EasyEDA for the motherboard, and adopt KiCAD for the daughtercard? It means you have to be especially careful about the interface though.

Philosophically speaking, designing an FPGA board or any complex PCB is a lot of bookkeeping, really. There's a lot of subfeatures and requirements that can bite you. So managing that complexity and being methodical is important.

Going back to earlier conversation, if you have the luxury of having (most) of an FPGA project to run through the tools, definitely do as much as you can. Most of the time, projects I've worked on have the HW done before the FPGA, and there's always some pin arrangement that's less than ideal but we have to live with because the HW already shipped.

I'm not going to gripe any further about KiCAD. Well, at least I'll try not to.

I guarantee, however, that when I get to completing the schematics and have a million symbols to link to footprints, I'll have a rant at that point.

BrianHG · « **Reply #43 on:** December 31, 2022, 10:19:39 am »

Quote from: nockieboy on December 31, 2022, 10:03:35 am

Quote from: BrianHG on December 30, 2022, 10:48:50 pm
For my work, I was first contracted by RCS management to fix their overheating problem with their Fusion 40 for the Amiga 2000. The Excalibur was still designed by them, but, I was contracted to do the PCB. The prize was the A4000 itself. I was never employed by them.

My turbo AGA had a clock wire which went to the floppy drive's motor crystal. It sped up the floppy drive in tandem with the Amiga chip set. So, loading from floppy was like watching and listening to a video running at 2x speed. Though, game speed and pitch were also all off. Thanks to the Amiga mouse being a parallel interface, the 100Hz refresh rate made it's hand-to-screen connection time beyond amazing. (I still need to try the newer 1kHz sample rate gaming mouses on a PC to compare....)

Though, because of Commodore's half speed motor trick for HD 1.7mb floppies since their Paula was speed limited, with my Turbo AGA, writing to HD floppies still worked unlike 880kb mode where just reading worked fine as the disk speed was too fast for the erase and write head.

Did you use your Amigas for recreation as well? How did you get the contract for the work? You must have been pretty experienced in the field for them to choose you?

A lot of Amiga gaming was done as well, but I was more interested in how to make the machine really compute and shine.

I started with my own high end audio digitizers which got recognized for their quality. (There were only a few stores which sold Amigas and since I demoed my digitizer at the local Amiga club, everyone knew me...)
RCS management had problems with their first revisions of their Amiga 2000 Fusion 40 overheating and crashing. The Amiga community was small enough that not many here in Montreal with hardware engineering experience for Amiga existed at all, (I think I was the only other one...) so Suresh of RCS management heard my name through the grapevine and we met. I solved the problem in a single all too simple swoop and we sort of attempted a few other projects together as I had my own private company. (Basically it boiled down to luck...)

nockieboy · « **Reply #44 on:** December 31, 2022, 11:44:53 am »

Quote from: BrianHG on December 31, 2022, 10:19:39 am

A lot of Amiga gaming was done as well, but I was more interested in how to make the machine really compute and shine.

I had Blitz Basic back in the day, wrote a few games with a friend but nothing that got further than our little Amiga club at school - I remember doing a 'Scorched Earth' clone. Wish I had kept all my old Amiga stuff, especially the source code for those old games. Life was moving very quickly at that point and I didn't have the time or resources (books weren't cheap!) to get into the hardware.

Quote from: BrianHG on December 31, 2022, 10:19:39 am

I started with my own high end audio digitizers which got recognized for their quality. (There were only a few stores which sold Amigas and since I demoed my digitizer at the local Amiga club, everyone knew me...)
RCS management had problems with their first revisions of their Amiga 2000 Fusion 40 overheating and crashing. The Amiga community was small enough that not many here in Montreal with hardware engineering experience for Amiga existed at all, (I think I was the only other one...) so Suresh of RCS management heard my name through the grapevine and we met. I solved the problem in a single all too simple swoop and we sort of attempted a few other projects together as I had my own private company. (Basically it boiled down to luck...)

I take it that 'slap a big heatsink on it' wasn't the answer?

BrianHG · « **Reply #45 on:** December 31, 2022, 12:05:07 pm »

Quote from: nockieboy on December 31, 2022, 11:44:53 am

Quote from: BrianHG on December 31, 2022, 10:19:39 am
A lot of Amiga gaming was done as well, but I was more interested in how to make the machine really compute and shine.

I had Blitz Basic back in the day, wrote a few games with a friend but nothing that got further than our little Amiga club at school - I remember doing a 'Scorched Earth' clone. Wish I had kept all my old Amiga stuff, especially the source code for those old games. Life was moving very quickly at that point and I didn't have the time or resources (books weren't cheap!) to get into the hardware.

Quote from: BrianHG on December 31, 2022, 10:19:39 am
I started with my own high end audio digitizers which got recognized for their quality. (There were only a few stores which sold Amigas and since I demoed my digitizer at the local Amiga club, everyone knew me...)
RCS management had problems with their first revisions of their Amiga 2000 Fusion 40 overheating and crashing. The Amiga community was small enough that not many here in Montreal with hardware engineering experience for Amiga existed at all, (I think I was the only other one...) so Suresh of RCS management heard my name through the grapevine and we met. I solved the problem in a single all too simple swoop and we sort of attempted a few other projects together as I had my own private company. (Basically it boiled down to luck...)

I take it that 'slap a big heatsink on it' wasn't the answer?

It was a bus timing and contingency issue.
The answer was change all the 74F245s, 74F574s & 74F04 to 74HC245, 74HC574 and 74HC04.

The higher 2.5v high logic level (vs 1.2v ttl) input delayed output enables long enough to clear everything up and the card ran cool.

nockieboy · « **Reply #46 on:** December 31, 2022, 02:51:51 pm »

Quote from: BrianHG on December 31, 2022, 12:05:07 pm

It was a bus timing and contingency issue.
The answer was change all the 74F245s, 74F574s & 74F04 to 74HC245, 74HC574 and 74HC04.

The higher 2.5v high logic level (vs 1.2v ttl) input delayed output enables long enough to clear everything up and the card ran cool.

What did they say when you came out with that peach of a solution? Might have been a few embarrassed engineers (or one, perhaps) slinking away into the background.

asmi · « **Reply #47 on:** December 31, 2022, 04:05:28 pm »

Quote from: nockieboy on December 31, 2022, 09:49:20 am

Those MPM3683s are around £7 per piece - a touch on the expensive side for what I want, if I'm honest. The MPM3833s, however, are less than half that at around £3/piece - I'll look a little closer at these as an option once I start worrying about space on the PCB.

When comparing prices, remember that parts I've mentioned are modules and so already include inductor and a bunch of other passives.

Quote from: nockieboy on December 31, 2022, 09:49:20 am

That's the one thing I've never done - assigning colours to traces. That must explain our differing experiences of EasyEDA.

I've attached two views of the same layout (this is a 64 bit DDR3 SO-DIMM interface), one without trace coloring (so trace color just indicates a layer it's on), and another one where different trace groups (address/control group and 8 byte lanes) were assigned different colors. And two more views of a single layer from that layout, again with and without trace coloring. Which one is easier to read? And also imagine that none of those traces are there yet (which is going to be your case), and you need to quickly figure out how to place components such that it will be easier to route.

Miti · « **Reply #48 on:** December 31, 2022, 06:06:47 pm »

Quote from: asmi on December 31, 2022, 04:05:28 pm

Which one is easier to read?

To me, definitely the one without the colors but we are all different.

asmi · « **Reply #49 on:** December 31, 2022, 07:10:24 pm »

Quote from: Miti on December 31, 2022, 06:06:47 pm

To me, definitely the one without the colors but we are all different.

What about these?

BrianHG · « **Reply #50 on:** December 31, 2022, 09:02:11 pm »

Quote from: asmi on December 31, 2022, 04:05:28 pm

Quote from: nockieboy on December 31, 2022, 09:49:20 am
Those MPM3683s are around £7 per piece - a touch on the expensive side for what I want, if I'm honest. The MPM3833s, however, are less than half that at around £3/piece - I'll look a little closer at these as an option once I start worrying about space on the PCB.
When comparing prices, remember that parts I've mentioned are modules and so already include inductor and a bunch of other passives.
Quote from: nockieboy on December 31, 2022, 09:49:20 am
That's the one thing I've never done - assigning colours to traces. That must explain our differing experiences of EasyEDA.
I've attached two views of the same layout (this is a 64 bit DDR3 SO-DIMM interface), one without trace coloring (so trace color just indicates a layer it's on), and another one where different trace groups (address/control group and 8 byte lanes) were assigned different colors. And two more views of a single layer from that layout, again with and without trace coloring. Which one is easier to read? And also imagine that none of those traces are there yet (which is going to be your case), and you need to quickly figure out how to place components such that it will be easier to route.

The cyan DDR3 command lines length seem needlessly long with a few too way too short and all messed up.
It's like you set a length tolerance and some lines went maximum while others went minimum.

The DQ & DQS seems ok.

Is it possible that the tighter DQ/DQS specs just made them much better and for some reason your cyan command lines went min and max instead of optimal length? Min & Max routing doesn't mean it wont work, but that will lower the tip top possible performance.

BrianHG · « **Reply #51 on:** December 31, 2022, 09:13:18 pm »

Quote from: nockieboy on December 31, 2022, 02:51:51 pm

Quote from: BrianHG on December 31, 2022, 12:05:07 pm
It was a bus timing and contingency issue.
The answer was change all the 74F245s, 74F574s & 74F04 to 74HC245, 74HC574 and 74HC04.

The higher 2.5v high logic level (vs 1.2v ttl) input delayed output enables long enough to clear everything up and the card ran cool.

What did they say when you came out with that peach of a solution? Might have been a few embarrassed engineers (or one, perhaps) slinking away into the background.

I think they themselves had the Fusion 40 card designed by a contracted third party who made it to the Amiga 68k expansion CPU slot spec. My solution was a glance look observation patch type fix which I never got paid for. The fix worked and from there on in, anything new they designed was done in-house.

It was nothing more than a relief from them as the fix was quick and cheap for their cards already in the market which needed patching. Thankfully, the ICs were on sockets.

asmi · « **Reply #52 on:** December 31, 2022, 09:39:33 pm »

Quote from: BrianHG on December 31, 2022, 09:02:11 pm

The cyan DDR3 command lines length seem needlessly long with a few too way too short and all messed up.
It's like you set a length tolerance and some lines went maximum while others went minimum.

The DQ & DQS seems ok.

Is it possible that the tighter DQ/DQS specs just made them much better and for some reason your cyan command lines went min and max instead of optimal length? Min & Max routing doesn't mean it wont work, but that will lower the tip top possible performance.

Address/control lines in DDR3 needs to be at least as long as the longest DQ group due to flyby routing, and some of the routing is on different layers so it's hard to see it on a screenshot. But that wasn't the point - I wanted to demonstrate how helpful trace coloring is and why I use it everywhere. It's especially useful in placing when you can quickly see where major groups of traces needs to go. Ratlines can help, but when you've got over 600 pin devices (676 balls in that screenshot), it can be really hard to figure out what needs to go where and how to place components to make subsequent routing easier.

BrianHG · « **Reply #53 on:** December 31, 2022, 09:40:02 pm »

Here is an old 800mbps DDR2 layout of mine. Only the crucial differential CLK, and DQS trace lengths were matched within spec.

All the others DQ were within +/- 5mm within their respected domains while the command had a +/-10mm tolerance.

Each DQ/DQS signal went through 1 via from FPGA on top to sodimm module on bottom.
Each command went through 2 vias.
(Green = top, Red = bottom, cyan/magenta = bottom layers.)

If I wanted DDR3, I would need to add some swiggles to length match some of the shorter DQ traces and super balance the DQS traces to achieve 1866mbps spec, or, just program the trace lengths into the FPGA's IO fine delay compensation.

asmi · « **Reply #54 on:** December 31, 2022, 09:55:34 pm »

Quote from: BrianHG on December 31, 2022, 09:40:02 pm

If I wanted DDR3, I would need to add some swiggles to length match some of the shorter DQ traces and super balance the DQS traces to achieve 1866mbps spec, or, just program the trace lengths into the FPGA's IO fine delay compensation.

Nope, that's not how it works. It's actually the other way around - you pull package delays data from your SoC or FPGA and then adjust your layout to take these delays into consideration. 400 MHz DDR2 and 933 MHz DDR3 are very different animals, and layouts for them are going to be very different when it comes to tolerances, impedance and crosstalk. A lot of things you can get away with in slower interface will become a critical flaw at fast one. Your layout as is will probably work for DDR3 (through I'm not sure as some SoCs require flyby routing and will not work with balanced tree of DDR2), but only for slower speeds.

nockieboy · « **Reply #55 on:** January 01, 2023, 12:59:03 am »

Quote from: asmi on December 31, 2022, 04:05:28 pm

When comparing prices, remember that parts I've mentioned are modules and so already include inductor and a bunch of other passives.

Oh I haven't forgotten that. The MPM3833 is my preferred choice at the moment, only working out slightly more expensive than the TPS option (even factoring in the inductor), but looks more easily solderable (not as small) but overall taking up less space than the TPS part because of not needing the external inductor.

Quote from: asmi on December 31, 2022, 04:05:28 pm

Quote from: nockieboy on December 31, 2022, 09:49:20 am
That's the one thing I've never done - assigning colours to traces. That must explain our differing experiences of EasyEDA.
I've attached two views of the same layout (this is a 64 bit DDR3 SO-DIMM interface), one without trace coloring (so trace color just indicates a layer it's on), and another one where different trace groups (address/control group and 8 byte lanes) were assigned different colors. And two more views of a single layer from that layout, again with and without trace coloring. Which one is easier to read? And also imagine that none of those traces are there yet (which is going to be your case), and you need to quickly figure out how to place components such that it will be easier to route.

I think they all look fantastic. The colour ones are clearer to see which signals are grouped, but to me at least that's only an informative 'nice-to-have' and something I could maybe live without. Bearing in mind I haven't tried routing DDR3 or even DDR2 yet.

Quote from: BrianHG on December 31, 2022, 09:13:18 pm

I think they themselves had the Fusion 40 card designed by a contracted third party who made it to the Amiga 68k expansion CPU slot spec. My solution was a glance look observation patch type fix which I never got paid for. The fix worked and from there on in, anything new they designed was done in-house.

It was nothing more than a relief from them as the fix was quick and cheap for their cards already in the market which needed patching. Thankfully, the ICs were on sockets.

You never got paid for it?

It obviously greased the wheels for future work with them, though?

Quote from: asmi on December 31, 2022, 09:55:34 pm

Nope, that's not how it works. It's actually the other way around - you pull package delays data from your SoC or FPGA and then adjust your layout to take these delays into consideration. 400 MHz DDR2 and 933 MHz DDR3 are very different animals, and layouts for them are going to be very different when it comes to tolerances, impedance and crosstalk. A lot of things you can get away with in slower interface will become a critical flaw at fast one. You layout as is will probably work for DDR3 (through I'm not sure as some SoCs require flyby routing and will not work with balanced tree of DDR2), but only for slower speeds.

Oookay. I've clearly got a lot to learn about this stuff before I can get anywhere near routing a DDR3 chip to a standard that you guys will be happy with.

By the way, Happy New Year everyone.

asmi · « **Reply #56 on:** January 01, 2023, 01:20:32 am »

Quote from: nockieboy on January 01, 2023, 12:59:03 am

I think they all look fantastic. The colour ones are clearer to see which signals are grouped, but to me at least that's only an informative 'nice-to-have' and something I could maybe live without. Bearing in mind I haven't tried routing DDR3 or even DDR2 yet.

Well technically everything eCAD has to offer can be done without and is a nice-to-have. But it doesn't mean it should.
Anyway, we will see how it goes when it comes to actually doing it.

Quote from: nockieboy on January 01, 2023, 12:59:03 am

Oookay. I've clearly got a lot to learn about this stuff before I can get anywhere near routing a DDR3 chip to a standard that you guys will be happy with.

Don't you worry as you won't be implementing a 933 MHz DDR3 interface any time soon. The best Artix-7 can do is 533 MHz, and that is only in speed grade 3, while devices we've ordered are speed grade 2, which can only go up to 400 MHz. If you we do out layout and pinout right, we can later perhaps try overcloaking them to speed grade 3, but that is not guaranteed to work so I would only rely on having 400 MHz interface.

Quote from: nockieboy on January 01, 2023, 12:59:03 am

By the way, Happy New Year everyone.

We're still waiting for one to come over here

nockieboy · « **Reply #57 on:** January 01, 2023, 04:45:18 pm »

Quote from: asmi on January 01, 2023, 01:20:32 am

Quote from: nockieboy on January 01, 2023, 12:59:03 am
By the way, Happy New Year everyone.
We're still waiting for one to come over here

The hangover has cleared at last, the decorations are down and I'm thinking about the power supply for the FPGA now.

The 1.0V rail (VCCINT) is going to need to supply some power - I've been doing the schematics for the other power rails and moved on to design VCCINT's supply and realised that the TPS62823 only supplies a maximum 3A. Will that be enough for the FPGA? I know it's hard to guesstimate power consumption with an FPGA as depends entirely on the gate usage, frequency etc, but the Chinese schematic/reference of unspecified origin that I was using quotes a 3A-capable supply for VCCINT. Is there any safety margin in that or should I design for something a little beefier?

nockieboy · « **Reply #58 on:** January 01, 2023, 06:21:07 pm »

Further to the above post, here's the power schematic for the board, first version. Don't ask why I'm using a TPS62823 for VCCINT and MPM3833s for the rest - for some reason I'd gotten confused and thought the TPS part was able to put out more current, so I kept it for VCCINT as I thought it would supply more current if needed. That's clearly not the case, so I can (and probably will) replace the TPS part a copy-paste of one of the MPM3833 supplies beneath it, with R15/R16 adjusted accordingly. That TPS part looks really small and may cause me some headaches trying to solder it, which is a good reason to use an MPM3833 part - that and I won't need the inductor.

As always, feedback appreciated. Remember I'm trying to balance cost against footprint size and ease of construction.

asmi · « **Reply #59 on:** January 01, 2023, 07:36:38 pm »

Quote from: nockieboy on January 01, 2023, 06:21:07 pm

Further to the above post, here's the power schematic for the board, first version. Don't ask why I'm using a TPS62823 for VCCINT and MPM3833s for the rest - for some reason I'd gotten confused and thought the TPS part was able to put out more current, so I kept it for VCCINT as I thought it would supply more current if needed. That's clearly not the case, so I can (and probably will) replace the TPS part a copy-paste of one of the MPM3833 supplies beneath it, with R15/R16 adjusted accordingly. That TPS part looks really small and may cause me some headaches trying to solder it, which is a good reason to use an MPM3833 part - that and I won't need the inductor.

A100T can consume up to 7 A of current on Vccint rail, so none of those parts will work. That's why I offered to take a look at MPM3683-7 - it can provide up to 8 A so there is some margin, and as per datasheet, it seems to have low enough ripple to power 1 V power line of GTPs. GTPs will also require a clean 1.2 V rail with a current of about 0.4 amps, so we can use a good quality LDO to step it down from 1.8 V rail. This might be a bit tricky as not all LDO can go that low for power, pre-biased LDO will probably be the best.

I would suggest to use DDR3 instead of DDR3L as it will theoretically give us ability to overclock things to 533 MHz interface. So switch 1.35 V rail to 1.5 V nominal.

You will also need a DDR3 termination regulator. Something like MP20075 - it provides both Vref (midrail) for memory and FPGA, and can sink/source up to 3 amps of current, which seems more than enough for our design.

Also do we have a full list of all peripherals which will be connected to FPGA? Some of those might require Vccio other than 3.3 V. You also might want to design-in some prexibility to change Vccio of some IO banks.

Someone · « **Reply #60 on:** January 01, 2023, 10:35:01 pm »

Quote from: nockieboy on January 01, 2023, 06:21:07 pm

As always, feedback appreciated. Remember I'm trying to balance cost against footprint size and ease of construction.

Dont split Vccint and Vccbram unless you are doing suspend/low power shenanigans.

But as with some of the comments already, STOP. Put a representative design into the FPGA tools before doing any of the hardware thinking/layout/planning. Then you will have a reasonable power estimate, some validated pin assignments, and a clocking tree.

Once the power estimates are in, LDOs will probably do half the rails if you do care about space and cost. The cost in $$ and footprints will likely end up in the capacitors as much as the power supplies themselves so there is pressure to design the distribution and tuning well. If you are trying to squeeze space and cost, eliminate that sequencing chip and either use enables or make the power rails all come on (and off) together (permissible solution that Xilinx supports).

nockieboy · « **Reply #61 on:** January 01, 2023, 10:43:31 pm »

Quote from: asmi on January 01, 2023, 07:36:38 pm

A100T can consume up to 7 A of current on Vccint rail, so none of those parts will work. That's why I offered to take a look at MPM3683-7 - it can provide up to 8 A so there is some margin, and as per datasheet, it seems to have low enough ripple to power 1 V power line of GTPs. GTPs will also require a clean 1.2 V rail with a current of about 0.4 amps, so we can use a good quality LDO to step it down from 1.8 V rail. This might be a bit tricky as not all LDO can go that low for power, pre-biased LDO will probably be the best.

Okay, cool - will do a redesign to incorporate the increased power demand, and had forgotten to include the GTP supplies. That MPM3683 is not a cheap part though, especially considering it only has one output. Would this be a suitable alternative? SIC402ACD-T1-GE3 - single 10A output.

Are the MPM3833s okay for the other rails though? If it's just VCCINT that needs more juice, then the price of the MPM3683 isn't an issue.

Quote from: asmi on January 01, 2023, 07:36:38 pm

I would suggest to use DDR3 instead of DDR3L as it will theoretically give us ability to overclock things to 533 MHz interface. So switch 1.35 V rail to 1.5 V nominal.

Okay, I've switched to DDR3 (instead of DDR3L) chips (these ones) and have increased the voltage from 1.35V to 1.5V for VCCDDR. I've dropped the DDR3 size down to 64MBx32.

Quote from: asmi on January 01, 2023, 07:36:38 pm

You will also need a DDR3 termination regulator. Something like MP20075 - it provides both Vref (midrail) for memory and FPGA, and can sink/source up to 3 amps of current, which seems more than enough for our design.

Also do we have a full list of all peripherals which will be connected to FPGA? Some of those might require Vccio other than 3.3 V. You also might want to design-in some prexibility to change Vccio of some IO banks.

I'll work on this tomorrow. For some reason I had some spare time and fell into the power schematic, so did some work on that while it seemed like a good idea.

Quote from: Someone on January 01, 2023, 10:35:01 pm

Dont split Vccint and Vccbram unless you are doing suspend/low power shenanigans.

They're only split in terms of the net names, so I can ensure sufficient decoupling caps for the BRAM connections, but I'm going to get rid of the link/separate net name and just box off the BRAM decoupling caps on the schematic. I'll still know what to do with them on the PCB design.

Quote from: Someone on January 01, 2023, 10:35:01 pm

But as with some of the comments already, STOP. Put a representative design into the FPGA tools before doing any of the hardware thinking/layout/planning. Then you will have a reasonable power estimate, some validated pin assignments, and a clocking tree.

I guess because this is the bit I find easiest. You're right though, I need to get more of an overview before rushing headlong into the design.

Quote from: Someone on January 01, 2023, 10:35:01 pm

Once the power estimates are in, LDOs will probably do half the rails if you do care about space and cost. The cost in $$ and footprints will likely end up in the capacitors as much as the power supplies themselves so there is pressure to design the distribution and tuning well. If you are trying to squeeze space and cost, eliminate that sequencing chip and either use enables or make the power rails all come on (and off) together (permissible solution that Xilinx supports).

Ah okay, I didn't know how the Xilinx FPGA would be with just letting all the horses out of the gate at once. If it's not an issue, I'll scrub the sequencer then - thanks.

Someone · « **Reply #62 on:** January 01, 2023, 11:03:58 pm »

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

Quote from: Someone on January 01, 2023, 10:35:01 pm
Dont split Vccint and Vccbram unless you are doing suspend/low power shenanigans.
They're only split in terms of the net names, so I can ensure sufficient decoupling caps for the BRAM connections, but I'm going to get rid of the link/separate net name and just box off the BRAM decoupling caps on the schematic. I'll still know what to do with them on the PCB design.

Either you are going for a cost/space constrained design, and they are the same power rail with caps all sharing the work so need to be designed for the task.

or you are just copying the UG483 recommended design because there is no pressure on size/cost

Quote from: Xilinx UG483

Decoupling methods other than those presented in these tables can be used, but the decoupling network should be designed to meet or exceed the performance of the simple decoupling networks presented here. The impedance of the alternate network must be less than or equal to that of the recommended network across frequencies from 100 KHz to 10 MHz.
Because device capacitance requirements vary with CLB and I/O utilization, PCB decoupling guidelines are provided on a per-device basis based on very high utilization so as to cover a majority of use cases. Resource usage consists (in part) of:

Xilinx provide pretty good documentation, but you have to actually read it. Very few systems are going to push the power distribution up to or beyond their baseline example.

....noting the proposed MPM3833C solution is already sitting outside their worked/reference example, so you're immediately off into needing to do the calculations yourself.

asmi · « **Reply #63 on:** January 01, 2023, 11:22:52 pm »

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

Okay, cool - will do a redesign to incorporate the increased power demand, and had forgotten to include the GTP supplies. That MPM3683 is not a cheap part though, especially considering it only has one output. Would this be a suitable alternative? SIC402ACD-T1-GE3 - single 10A output.

That part requires an insane number of external components to get it going. I looked at it in the past and I think I even bought some of them to try out.

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

Are the MPM3833s okay for the other rails though? If it's just VCCINT that needs more juice, then the price of the MPM3683 isn't an issue.

I thought I was pretty clear that I offered MPM3683-7 only for Vccint rail. MPM3833s should be fine for other rails.

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

Okay, I've switched to DDR3 (instead of DDR3L) chips (these ones) and have increased the voltage from 1.35V to 1.5V for VCCDDR. I've dropped the DDR3 size down to 64MBx32.

All DDR3L chips can work in DDR3 mode (at 1.5 V). So no need to switch devices. I would plan for 4Gb parts - or even 8Gb ones for the layout, but you can always place smaller part - these devices are fully backwards compatible.

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

I'll work on this tomorrow. For some reason I had some spare time and fell into the power schematic, so did some work on that while it seemed like a good idea.

That's a good start, but we will need to make a strategic decision on the connectors used for the module, because those connectors' pin power rating will determine how many pins we will need to allocate for the main power in. High speed connectors typically have a power rating of 2 A per pin assuming no two adjacent pins are used for power. Since the module can consume over 10 W of power (+ we will need to leave a provision for a fan, which can consume up to 1 W), we will need to plan out power budget.
But before we can settle on connectors, we will need to figure out how many pins we will need and what's going to be a pinout. This is why I keep asking about listing everything that is to be connected. We will probably want to route out as many IO as we can, so maybe it will be better to just see which IOs are going to be available.

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

They're only split in terms of the net names, so I can ensure sufficient decoupling caps for the BRAM connections, but I'm going to get rid of the link/separate net name and just box off the BRAM decoupling caps on the schematic. I'll still know what to do with them on the PCB design.

Your schematic has a zero Ohm resistor dividing those lines. No need for that crap. Just do a schematic like I did in mine (see a pdf I linked few posts above), and include a table from Xilinx user guide which lists what caps and of what nominal will we need. See UG483, chapter 2, section PCB Decoupling Capacitors -> Recommended PCB Capacitors per Device. I typically use 0201 caps for 0.47 uF (as these need to be places inside of via field), 0402s for 4.7 uF, larger caps size is not very important as they can be placed anywhere, so I usually use whatever cap I happen to have - like 0805 or even 1210.

Quote from: nockieboy on January 01, 2023, 10:43:31 pm

Ah okay, I didn't know how the Xilinx FPGA would be with just letting all the horses out of the gate at once. If it's not an issue, I'll scrub the sequencer then - thanks.

Yeah, I never bothered with rail sequencing unless I had a system controller MCU for some other reason which I could just reuse for sequencing, and never had any problems with it.

asmi · « **Reply #64 on:** January 01, 2023, 11:24:22 pm »

Quote from: Someone on January 01, 2023, 11:03:58 pm

....noting the proposed MPM3833C solution is already sitting outside their worked/reference example, so you're immediately off into needing to do the calculations yourself.

Not sure I understand what do you mean here?

nockieboy · « **Reply #65 on:** January 02, 2023, 10:58:36 am »

Quote from: Someone on January 01, 2023, 11:03:58 pm

Quote from: nockieboy on January 01, 2023, 10:43:31 pm
Quote from: Someone on January 01, 2023, 10:35:01 pm
Dont split Vccint and Vccbram unless you are doing suspend/low power shenanigans.
They're only split in terms of the net names, so I can ensure sufficient decoupling caps for the BRAM connections, but I'm going to get rid of the link/separate net name and just box off the BRAM decoupling caps on the schematic. I'll still know what to do with them on the PCB design.
Either you are going for a cost/space constrained design, and they are the same power rail with caps all sharing the work so need to be designed for the task.

or you are just copying the UG483 recommended design because there is no pressure on size/cost

Initially the VCCBRAM net didn't exist, I just ran VCCINT to the VCCBRAM inputs on the FPGA. However, whilst copying the UG483 recommended decoupling layout, I realised there are a few caps specifically for the VCCBRAM inputs, so I thought I'd just split the net via a 0R resistor so that I could see more clearly which caps were for VCCBRAM and which weren't by their net names instead of just by their annotation. There are other ways I could have done this that wouldn't be as clear on the PCB layout, like just 'boxing off' a few VCCINT caps on the decoupling schematic for the VCCBRAM connectors, but I went for this one knowing I'd be likely to change it in the future. Didn't think it would be that much of an issue to be honest.

I don't know if you read the TL;DR at the start of this thread, but I'm not a professional PCB designer or electronics engineer. I'm a hobbyist, at best. When I talk about pressure on size/cost, I'm talking about saving money on big things like chip choice, not worrying about saving a few thousandths of a dollar on a cap here and there. I'm literally going to be building (hopefully) one, maybe up to four (to use all the FPGAs I've bought) of these things. I'll be making the designs public, so I suppose there's that, but I'm not designing for mass production and so perhaps I'm not looking at this from the same angle you are with your career/expertise behind you.

Quote from: Someone on January 01, 2023, 11:03:58 pm

Xilinx provide pretty good documentation, but you have to actually read it. Very few systems are going to push the power distribution up to or beyond their baseline example.

I am. I have UG470, UG471, UG472, UG473, UG474, UG475, UG482, UG483 and UG586 (amongst others) open in tabs on my browser. I'm trying to read them as I go and use them as references - I've already used their decoupling design, for example.

Quote from: asmi on January 01, 2023, 11:22:52 pm

I thought I was pretty clear that I offered MPM3683-7 only for Vccint rail. MPM3833s should be fine for other rails.

Sorry; I either misunderstood, misread or outright forgot.

Quote from: asmi on January 01, 2023, 11:22:52 pm

Quote from: nockieboy on January 01, 2023, 10:43:31 pm
Okay, I've switched to DDR3 (instead of DDR3L) chips (these ones) and have increased the voltage from 1.35V to 1.5V for VCCDDR. I've dropped the DDR3 size down to 64MBx32.
All DDR3L chips can work in DDR3 mode (at 1.5 V). So no need to switch devices. I would plan for 4Gb parts - or even 8Gb ones for the layout, but you can always place smaller part - these devices are fully backwards compatible.

Okay, I'll stick with the original choice then. I've really got to stop doing this late at night. I saw 'Vdd=Vddq=1.35V (1.283-1.45V)' and missed the bit underneath that said 'backward compatible to 1.5V'.

Quote from: asmi on January 01, 2023, 11:22:52 pm

But before we can settle on connectors, we will need to figure out how many pins we will need and what's going to be a pinout. This is why I keep asking about listing everything that is to be connected. We will probably want to route out as many IO as we can, so maybe it will be better to just see which IOs are going to be available.

This is the downside to designing a core card that can fit onto any carrier board design - you don't really know what's going to be on the carrier board, so I guess we have to consider worst-case scenarios and try to route as many IOs as possible.

That aside, here's what I really need it to do - everything else is a bonus:

Core board:

2xDDR3 via 32-bit bus @ 400MHz.
JTAG programming port.
1 or 2 IO LEDs for testing the core board without a carrier board.
All power supplies except 5V, which will be provided by carrier board or external source if no carrier connected.

Carrier board (specific to my use-case) - nothing too demanding really other than HDMI:

HDMI TX capable of 1080p @ 60Hz or better.
Audio output via headphone/line out.
USB-A host PHY (CH559 or something similar - open to suggestions - serial/SPI/I2C interface to FPGA) for keyboard/mouse.
USB FPGA programmer.
5V power supply via USB or DC power connector.
80-pin interface to 5V host bus via level converters.
Various buttons/LEDs/RGB LEDs.

I'm going to need at least 60 I/Os for the carrier board for the host interface, then there's the HDMI, JTAG, audio codec, USB, generic IO lines.

Quote from: asmi on January 01, 2023, 11:22:52 pm

Your schematic has a zero Ohm resistor dividing those lines. No need for that crap. Just do a schematic like I did in mine (see a pdf I linked few posts above), and include a table from Xilinx user guide which lists what caps and of what nominal will we need. See UG483, chapter 2, section PCB Decoupling Capacitors -> Recommended PCB Capacitors per Device. I typically use 0201 caps for 0.47 uF (as these need to be places inside of via field), 0402s for 4.7 uF, larger caps size is not very important as they can be placed anywhere, so I usually use whatever cap I happen to have - like 0805 or even 1210.

As I mentioned above, I'm using UG483 already. I've never used 0201 caps and am a little concerned a butterfly will flap in a garden twenty miles away and they'll all just disappear like a cloud of nanites.

Might have to have some discussion around how you solder those things, as 0402 feels like my limit at the moment. And I need to get a microscope!

Quote from: asmi on January 01, 2023, 11:22:52 pm

Quote from: nockieboy on January 01, 2023, 10:43:31 pm
Ah okay, I didn't know how the Xilinx FPGA would be with just letting all the horses out of the gate at once. If it's not an issue, I'll scrub the sequencer then - thanks.
Yeah, I never bothered with rail sequencing unless I had a system controller MCU for some other reason which I could just reuse for sequencing, and never had any problems with it.

Excellent. The sequencer is gone, like all those 0201s as a mosquito flies by.

asmi · « **Reply #66 on:** January 02, 2023, 04:44:08 pm »

Quote from: nockieboy on January 02, 2023, 10:58:36 am

Core board:

2xDDR3 via 32-bit bus @ 400MHz.
JTAG programming port.
1 or 2 IO LEDs for testing the core board without a carrier board.
All power supplies except 5V, which will be provided by carrier board or external source if no carrier connected.

How are you going to test a module without a carrier if it is to be powered by the carrier? Also having JTAG both on a module and on a carrier is not a great idea because of stubs which may negatively affect performance of JTAG.
You also forgot to mention QSPI flash for a bitstream. For A100T you will need at least 64 Mb, preferably more to leave some space for application data and code storage.

Quote from: nockieboy on January 02, 2023, 10:58:36 am

Carrier board (specific to my use-case) - nothing too demanding really other than HDMI:

HDMI TX capable of 1080p @ 60Hz or better.

You need to be more specific if you only want 1080p@60 or "better", because the former can be implemented using regular IO pins, while for the latter we will need to use GTPs and some additional level shifting circuitry. I would recommend sticking to the former because of simplicity - unless you actually want that "better"

Quote from: nockieboy on January 02, 2023, 10:58:36 am

Audio output via headphone/line out.

Ummm, these are analog signals, therefore you will need some sort of audio codec or DAC (if only output is desired). Do you have any specific one in mind, or it doesn't matter?

Quote from: nockieboy on January 02, 2023, 10:58:36 am

USB-A host PHY (CH559 or something similar - open to suggestions - serial/SPI/I2C interface to FPGA) for keyboard/mouse.

I'm not familiar with that device, traditionally USB ULPI PHY devices are used for USB 2.0, something like this one: https://www.microchip.com/en-us/product/USB3300

Quote from: nockieboy on January 02, 2023, 10:58:36 am

USB FPGA programmer.

This of course it up to you, but I'm not really convinced that placing programming circuitry on every board is such a great idea, as opposed to just buying a single programmer and connecting it to all FPGA boards as neccessary. Also see my note above regarding having JTAGs both on a module and a carrier.

Quote from: nockieboy on January 02, 2023, 10:58:36 am

5V power supply via USB or DC power connector.
80-pin interface to 5V host bus via level converters.
Various buttons/LEDs/RGB LEDs.

I'm going to need at least 60 I/Os for the carrier board for the host interface, then there's the HDMI, JTAG, audio codec, USB, generic IO lines.

What about Ethernet? Wouldn't you want your computer to be able to talk to outside world (or maybe just to other devices on your local network)?
What about some sort of USB-UART port to talk to PC for debugging purposes?
What about some kind of storage? SD card interface, or eMMC device, or perhaps something else?

As for other stuff, typically to maximize utility of a module, all IO pins of a bank are routed length-matched such that high-speed peripherals can be connected if desired. And for example for 1G Ethernet PHY you will certainly need that kind of routing. I'm not really sure we will be able to length-match entire banks with 4 routing layers, but we need to at least attempt to do so.

Quote from: nockieboy on January 02, 2023, 10:58:36 am

As I mentioned above, I'm using UG483 already. I've never used 0201 caps and am a little concerned a butterfly will flap in a garden twenty miles away and they'll all just disappear like a cloud of nanites. Might have to have some discussion around how you solder those things, as 0402 feels like my limit at the moment. And I need to get a microscope!

These caps are so cheap that you might as well buy like a 1000 of them so that even if you only manage to solder each fifth cap with other four teleported into another dimension, you will still have plenty of them remaining after all is said and done. You will need about 30 to 40 caps for a single board. I've bought a full reel of 15k parts a year or so ago, so I don't think I will run out of them any time soon

Now, with the new via-in-a-pad tech available at JLCPCB we can try using 0402s with a plated over pads instead of 0201s, but I've never actually used that tech before so it's a terra incognita for me and I'm not sure if it's going to work or not. But if you want to, you are free to give it a try to see if it's going to work or not. I mean, it should work, but I just don't have personal experience with that.

nockieboy · « **Reply #67 on:** January 02, 2023, 07:54:31 pm »

Quote from: asmi on January 02, 2023, 04:44:08 pm

How are you going to test a module without a carrier if it is to be powered by the carrier?

That's the point of the H3 header. You can supply 5V to the core board directly, without a carrier board. Trying to be flexible.

Quote from: asmi on January 02, 2023, 04:44:08 pm

Also having JTAG both on a module and on a carrier is not a great idea because of stubs which may negatively affect performance of JTAG.

Ah okay. Well, blame the Chinese dev board schematics for that one. Seemed like a good idea to have both - the core board is going to need a JTAG interface for testing.

Quote from: asmi on January 02, 2023, 04:44:08 pm

You also forgot to mention QSPI flash for a bitstream. For A100T you will need at least 64 Mb, preferably more to leave some space for application data and code storage.

Yes, I did - I forgot a couple of things actually which you've picked up on. The QSPI chip I've chosen is a W25Q128JVPIQ, its a 128Mb part for plenty of storage, as I think I have a few already anyway.

Quote from: asmi on January 02, 2023, 04:44:08 pm

You need to be more specific if you only want 1080p@60 or "better", because the former can be implemented using regular IO pins, while for the latter we will need to use GTPs and some additional level shifting circuitry. I would recommend sticking to the former because of simplicity - unless you actually want that "better"

1080p@60Hz it is then.

Quote from: asmi on January 02, 2023, 04:44:08 pm

Quote from: nockieboy on January 02, 2023, 10:58:36 am
Audio output via headphone/line out.
Ummm, these are analog signals, therefore you will need some sort of audio codec or DAC (if only output is desired). Do you have any specific one in mind, or it doesn't matter?

I was planning on something like a PCM5101A, I guess - again, unless there's better suggestions.

Quote from: asmi on January 02, 2023, 04:44:08 pm

Quote from: nockieboy on January 02, 2023, 10:58:36 am
USB-A host PHY (CH559 or something similar - open to suggestions - serial/SPI/I2C interface to FPGA) for keyboard/mouse.
I'm not familiar with that device, traditionally USB ULPI PHY devices are used for USB 2.0, something like this one: https://www.microchip.com/en-us/product/USB3300

The CH559 is an odd device, not something I particularly like (there's very little documentation and it's all in Chinese or very poor English), but it's a means to an end. I need a way of being able to connect a USB keyboard and/or mouse to the carrier board and have keycodes or mouse coordinates sent to the FPGA (and forwarded on to the host) as bytes of data. The CH559 does this in one chip - it has an E8051 MCU in it which handles the USB stack and can be programmed in C. I've got one sending out keycode/mouse data via serial, but haven't had the chance to work on it much more.

Quote from: asmi on January 02, 2023, 04:44:08 pm

Quote from: nockieboy on January 02, 2023, 10:58:36 am
USB FPGA programmer.
This of course it up to you, but I'm not really convinced that placing programming circuitry on every board is such a great idea, as opposed to just buying a single programmer and connecting it to all FPGA boards as neccessary. Also see my note above regarding having JTAGs both on a module and a carrier.

I think it'll make my life easier in my particular case - there's a lot of reprogramming going on with the GPU still being developed. I see the core/carrier as more of a development board anyway, so having a programmer built-in would be beneficial instead of having to plug the programmer in all the time. Plus, I imagine people would want to design their own carriers anyway so it would be up to them if they're happy with on- or off-board programming.

The JTAG stub issue is something I was unaware of though, and I don't know how much of an issue that is.

Quote from: asmi on January 02, 2023, 04:44:08 pm

What about Ethernet? Wouldn't you want your computer to be able to talk to outside world (or maybe just to other devices on your local network)?

Not for my uCOM, no. That's not to say that I'm against designing for Ethernet on the carrier board, I don't have to populate those parts I guess and it could be useful if I progress to networking later with a more powerful host computer.

Quote from: asmi on January 02, 2023, 04:44:08 pm

What about some sort of USB-UART port to talk to PC for debugging purposes?
What about some kind of storage? SD card interface, or eMMC device, or perhaps something else?

Yes, USB-UART and SD will be needed on the carrier too. I forgot about them.

Quote from: asmi on January 02, 2023, 04:44:08 pm

As for other stuff, typically to maximize utility of a module, all IO pins of a bank are routed length-matched such that high-speed peripherals can be connected if desired. And for example for 1G Ethernet PHY you will certainly need that kind of routing. I'm not really sure we will be able to length-match entire banks with 4 routing layers, but we need to at least attempt to do so.

I might be coming at the project from the wrong angle then. In my mind, I'm designing the core board first, then worrying about the carrier once the core board is finalised. Perhaps that's wrong, but I figure so long as we best-effort length-match the IOs out to the mezzanine connectors, we're doing the best we can? Presumably we need to route some IOs as differential pairs for the HDMI as well.

Quote from: asmi on January 02, 2023, 04:44:08 pm

These caps are so cheap that you might as well buy like a 1000 of them so that even if you only manage to solder each fifth cap with other four teleported into another dimension, you will still have plenty of them remaining after all is said and done. You will need about 30 to 40 caps for a single board. I've bought a full reel of 15k parts a year or so ago, so I don't think I will run out of them any time soon
Now, with the new via-in-a-pad tech available at JLCPCB we can try using 0402s with a plated over pads instead of 0201s, but I've never actually used that tech before so it's a terra incognita for me and I'm not sure if it's going to work or not. But if you want to, you are free to give it a try to see if it's going to work or not. I mean, it should work, but I just don't have personal experience with that.

I'm more than willing to give the via-in-pad tech a try, especially if that means I can use 0402s instead of nanoparticles.

asmi · « **Reply #68 on:** January 02, 2023, 10:10:48 pm »

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

Ah okay. Well, blame the Chinese dev board schematics for that one. Seemed like a good idea to have both - the core board is going to need a JTAG interface for testing.

It's not that it's not going to work at all, but you might be forced to lower JTAG frequency, which will hinder performance for both programming and if you are going to use any debugging cores.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

Yes, I did - I forgot a couple of things actually which you've picked up on. The QSPI chip I've chosen is a W25Q128JVPIQ, its a 128Mb part for plenty of storage, as I think I have a few already anyway.

It looks like that part is not listed as supported for programming via Vivado: https://docs.xilinx.com/r/en-US/ug908-vivado-programming-debugging/Artix-7-Configuration-Memory-Devices

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

1080p@60Hz it is then.

Ok, we will need a TPD12S521 as ESD protection and level shifter on the carrier board, but that should be it.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

I was planning on something like a PCM5101A, I guess - again, unless there's better suggestions.

That's just a DAC, are you sure you don't want a full codec, with both inputs and outputs?

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

The CH559 is an odd device, not something I particularly like (there's very little documentation and it's all in Chinese or very poor English), but it's a means to an end. I need a way of being able to connect a USB keyboard and/or mouse to the carrier board and have keycodes or mouse coordinates sent to the FPGA (and forwarded on to the host) as bytes of data. The CH559 does this in one chip - it has an E8051 MCU in it which handles the USB stack and can be programmed in C. I've got one sending out keycode/mouse data via serial, but haven't had the chance to work on it much more.

Well that is not really a USB interface then. I was thinking about a full-blown USB 2.0 implementation, which is where ULPI PHY are typically used. What kind of connectivity does it require on FPGA side?

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

I think it'll make my life easier in my particular case - there's a lot of reprogramming going on with the GPU still being developed. I see the core/carrier as more of a development board anyway, so having a programmer built-in would be beneficial instead of having to plug the programmer in all the time. Plus, I imagine people would want to design their own carriers anyway so it would be up to them if they're happy with on- or off-board programming.

It's both, sometimes features of a carrier have implications for the module design. Which is why I want to have a list of stuff on a carrier so that we can confirm that at least what you are going to have on your carrier is going to work.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

The JTAG stub issue is something I was unaware of though, and I don't know how much of an issue that is.

See above. Xilinx recommends against messing with JTAG routing, besides if you are going to have a JTAG port on a module, I don't really see a point of adding another one on a carrier, because you can always use the one on a module even if it's connected to a carrier. Maybe I don't understand something here.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

Not for my uCOM, no. That's not to say that I'm against designing for Ethernet on the carrier board, I don't have to populate those parts I guess and it could be useful if I progress to networking later with a more powerful host computer.

Ethernet is very popular on FPGA boards because it's fairly easy to get it going for limited connectivity with outside world, and since you can run Linux on a softcore inside FPGA, you can get a full TCP/TP implementation essentially for free. And TCP/TP stack (and USB host stack) is one of the very few good reasons to run Linux inside FPGA. Of course you don't need any of that right now, but what about the future? There is a reason computer networks appeared almost at the same time as computers.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

I might be coming at the project from the wrong angle then. In my mind, I'm designing the core board first, then worrying about the carrier once the core board is finalised. Perhaps that's wrong, but I figure so long as we best-effort length-match the IOs out to the mezzanine connectors, we're doing the best we can? Presumably we need to route some IOs as differential pairs for the HDMI as well.

I'm looking at these connectors: https://www.samtec.com/products/bse for the module and https://www.samtec.com/products/bte for the carrier. They are available in a bunch pin count options, and Samtec offers samples for free so you might get them without paying a dime. I did get some connector samples from them in the past, though I did later buy some commercially for my customers' boards. These connectors have quite low crosstalk between adjacent pins, and with a low mating height they can go as high as 16 Gbps single ended/10 Gbps differential, which is plenty for both regular IO lines and GTP transceiver lines.
Speaking of GTP lines, since you are not going to use them immediately, I'd suggest to route them to a pair DisplayPort connectors (one for transmit, another one for receive) as they require very few external components (basically a bunch of AC-coupling caps and some ESD protection), and you can use them to connect to other things via that connector, to itself to test loopback scenario, or even to another copy of the same board for some high-speed board-to-board interface.

Quote from: nockieboy on January 02, 2023, 07:54:31 pm

I'm more than willing to give the via-in-pad tech a try, especially if that means I can use 0402s instead of nanoparticles.

Ok, it's your project, so decisions are yours to make. But so are the risks that something goes south

BrianHG · « **Reply #69 on:** January 02, 2023, 11:42:06 pm »

Quote from: asmi on January 02, 2023, 10:10:48 pm

Quote from: nockieboy on January 02, 2023, 07:54:31 pm
1080p@60Hz it is then.
Ok, we will need a TPD12S521 as ESD protection and level shifter on the carrier board, but that should be it.

The NXP IC, PTN3366BSMP is a superior selection. ESD and multi-level 'amplified' buffered level shifter with 3GHz EQ to help 4K30Hz support in one 90cent IC. You do not need to worry about the VCCIO on the bank you are using for the HDMI.

The IC will translate any received differential signal from 0.7v to 3.3v into a HDMI compliant open-drain current-steering differential output signals.

It also has active DDC level shifting.

asmi · « **Reply #70 on:** January 03, 2023, 12:47:37 am »

Quote from: BrianHG on January 02, 2023, 11:42:06 pm

The NXP IC, PTN3366BSMP is a superior selection. ESD and multi-level 'amplified' buffered level shifter with 3GHz EQ to help 4K30Hz support in one 90cent IC. You do not need to worry about the VCCIO on the bank you are using for the HDMI.

The IC will translate any received differential signal from 0.7v to 3.3v into a HDMI compliant open-drain current-steering differential output signals.

It also has active DDC level shifting.

It's a very poor choice for Artix because LVDS requires Vccio of 2.5 V, which makes the rest of the bank IOs pretty useless as most other standards tend to use either 3.3 V or 1.8 V. The only differential standard supported at Vccio of 3.3 V is TMDS, which is what HDMI uses. That is why most HDMI implementations just use FPGA pins directly without any level shifting or other shenannigans.

BrianHG · « **Reply #71 on:** January 03, 2023, 01:14:37 am »

Quote from: asmi on January 03, 2023, 12:47:37 am

Quote from: BrianHG on January 02, 2023, 11:42:06 pm
The NXP IC, PTN3366BSMP is a superior selection. ESD and multi-level 'amplified' buffered level shifter with 3GHz EQ to help 4K30Hz support in one 90cent IC. You do not need to worry about the VCCIO on the bank you are using for the HDMI.

The IC will translate any received differential signal from 0.7v to 3.3v into a HDMI compliant open-drain current-steering differential output signals.

It also has active DDC level shifting.
It's a very poor choice for Artix because LVDS requires Vccio of 2.5 V, which makes the rest of the bank IOs pretty useless as most other standards tend to use either 3.3 V or 1.8 V. The only differential standard supported at Vccio of 3.3 V is TMDS, which is what HDMI uses. That is why most HDMI implementations just use FPGA pins directly without any level shifting or other shenannigans.

The IC is an analog comparator amplifier. It only requires a 3.3v supply. Any signal you feed the differential inputs from 0.100v through 3.3v will be converted to HDMI levels. Any IO bank on any IO voltage will work. You can use the DDR3 IO banks, a 2.5v IO bank, a 3.3v IO bank, a 1.2v IO bank, a 0.7v IO bank and they will all work the same. It's inputs are cap (on PCB) AC internally terminated and biased.

asmi · « **Reply #72 on:** January 03, 2023, 01:34:08 am »

Quote from: BrianHG on January 03, 2023, 01:14:37 am

The IC is an analog comparator amplifier. It only requires a 3.3v supply. Any signal you feed the differential inputs from 0.100v through 3.3v will be converted to HDMI levels. Any IO bank on any IO voltage will work. You can use the DDR3 IO banks, a 2.5v IO bank, a 3.3v IO bank, a 1.2v IO bank, a 0.7v IO bank and they will all work the same. It's inputs are cap (on PCB) AC internally terminated and biased.

What's the point in doing a conversion if you can do it directly instead? Simplicity is a virtue in my book. Especially since I know for fact that direct output works, while I've never seen that device used on any 7 series board I came across.

nockieboy · « **Reply #73 on:** January 03, 2023, 06:55:55 pm »

Quote from: asmi on January 02, 2023, 10:10:48 pm

Quote from: nockieboy on January 02, 2023, 07:54:31 pm
Yes, I did - I forgot a couple of things actually which you've picked up on. The QSPI chip I've chosen is a W25Q128JVPIQ, its a 128Mb part for plenty of storage, as I think I have a few already anyway.
It looks like that part is not listed as supported for programming via Vivado: https://docs.xilinx.com/r/en-US/ug908-vivado-programming-debugging/Artix-7-Configuration-Memory-Devices

Good spot. Okay, I've changed the part to this one: S25FL128LAGNFI013. Hopefully that will be okay? What do you use normally?

Quote from: asmi on January 02, 2023, 10:10:48 pm

Quote from: nockieboy on January 02, 2023, 07:54:31 pm
I was planning on something like a PCM5101A, I guess - again, unless there's better suggestions.
That's just a DAC, are you sure you don't want a full codec, with both inputs and outputs?

Okay, maybe a TLV320AIC3100IRHBR would be a better choice? I don't know anything about these devices unfortunately, other than the DECA board uses one of these which is quite expensive for something I don't really need (I'll be running audio through the HDMI). However, all these chips seem to require is an SPI or I2S bus, so as long as we ensure there's provision for at least one in the core board connections, we should be good.

Quote from: asmi on January 02, 2023, 10:10:48 pm

Quote from: nockieboy on January 02, 2023, 07:54:31 pm
The CH559 is an odd device, not something I particularly like (there's very little documentation and it's all in Chinese or very poor English), but it's a means to an end. I need a way of being able to connect a USB keyboard and/or mouse to the carrier board and have keycodes or mouse coordinates sent to the FPGA (and forwarded on to the host) as bytes of data. The CH559 does this in one chip - it has an E8051 MCU in it which handles the USB stack and can be programmed in C. I've got one sending out keycode/mouse data via serial, but haven't had the chance to work on it much more.
Well that is not really a USB interface then. I was thinking about a full-blown USB 2.0 implementation, which is where ULPI PHY are typically used. What kind of connectivity does it require on FPGA side?

Serial definitely, or possibly an SPI connection (if I can set aside the time to set one up on the device and test it). A USB3300 PHY would be easy enough to include I guess, we just need to ensure we've got 12 I/Os we can use for a ULPI interface.

Quote from: asmi on January 02, 2023, 10:10:48 pm

See above. Xilinx recommends against messing with JTAG routing, besides if you are going to have a JTAG port on a module, I don't really see a point of adding another one on a carrier, because you can always use the one on a module even if it's connected to a carrier. Maybe I don't understand something here.

Eh. I guess it'll reduce the BOM cost as well. So we'll just stick with a JTAG port on the core card.

Quote from: asmi on January 02, 2023, 10:10:48 pm

Ethernet is very popular on FPGA boards because it's fairly easy to get it going for limited connectivity with outside world, and since you can run Linux on a softcore inside FPGA, you can get a full TCP/TP implementation essentially for free. And TCP/TP stack (and USB host stack) is one of the very few good reasons to run Linux inside FPGA. Of course you don't need any of that right now, but what about the future? There is a reason computer networks appeared almost at the same time as computers.

Yeah, that's a fair point. I know nothing about Ethernet really, so I'll be heavily guided by what I can find in existing dev board schematics. If you have any experience with a particular Ethernet PHY, let me know which device you'd prefer.

Quote from: asmi on January 02, 2023, 10:10:48 pm

I'm looking at these connectors: https://www.samtec.com/products/bse for the module and https://www.samtec.com/products/bte for the carrier. They are available in a bunch pin count options, and Samtec offers samples for free so you might get them without paying a dime. I did get some connector samples from them in the past, though I did later buy some commercially for my customers' boards. These connectors have quite low crosstalk between adjacent pins, and with a low mating height they can go as high as 16 Gbps single ended/10 Gbps differential, which is plenty for both regular IO lines and GTP transceiver lines.

Those connectors look just fine to my inexperienced eye, but they're not cheap (around £20 for a pair of 80-pin connectors). How many are we considering for a core board? I'm thinking 4 x 80-pin connections, but that's a ball-park top-of-my-head estimate.

How do you go about getting free samples? Are they okay giving out freebies to hobbyists who are unlikely to make a purchase?

Quote from: asmi on January 02, 2023, 10:10:48 pm

Speaking of GTP lines, since you are not going to use them immediately, I'd suggest to route them to a pair DisplayPort connectors (one for transmit, another one for receive) as they require very few external components (basically a bunch of AC-coupling caps and some ESD protection), and you can use them to connect to other things via that connector, to itself to test loopback scenario, or even to another copy of the same board for some high-speed board-to-board interface.

Okay, so two DisplayPort connectors for the GTP lines too.

Quote from: asmi on January 02, 2023, 10:10:48 pm

Quote from: nockieboy on January 02, 2023, 07:54:31 pm
1080p@60Hz it is then.
Ok, we will need a TPD12S521 as ESD protection and level shifter on the carrier board, but that should be it.

Quote from: asmi on January 03, 2023, 01:34:08 am

Quote from: BrianHG on January 03, 2023, 01:14:37 am
The IC is an analog comparator amplifier. It only requires a 3.3v supply. Any signal you feed the differential inputs from 0.100v through 3.3v will be converted to HDMI levels. Any IO bank on any IO voltage will work. You can use the DDR3 IO banks, a 2.5v IO bank, a 3.3v IO bank, a 1.2v IO bank, a 0.7v IO bank and they will all work the same. It's inputs are cap (on PCB) AC internally terminated and biased.
What's the point in doing a conversion if you can do it directly instead? Simplicity is a virtue in my book. Especially since I know for fact that direct output works, while I've never seen that device used on any 7 series board I came across.

I've been using the PTN3366 on the previous GPU card with the Cyclone IV and I can see why BrianHG is recommending its use, but if you're sure we can directly drive HDMI from the Artix-7 then reducing the part count and simplifying the layout has to be a positive?

asmi · « **Reply #74 on:** January 03, 2023, 08:11:39 pm »

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Good spot. Okay, I've changed the part to this one: S25FL128LAGNFI013. Hopefully that will be okay? What do you use normally?

I've bought a bunch of S25FL256LAGBHI030 and S25FS128SAGBHV200's for cheap few years back, so that's what I use. They are in 1 mm pitch BGA package (which was a reason I picked them - along with the price) because there are footprint-compatible options all the way up to 1Gb, so I figured it will be convenient for me in case I ever want to have more storage. But any part listed in the list I linked should be fine. Just watch for the voltage level - some of parts listed there are 1.8 V, which would require powering bank 14 with that voltage, with all implications for other pins in the same bank.

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Okay, maybe a TLV320AIC3100IRHBR would be a better choice? I don't know anything about these devices unfortunately, other than the DECA board uses one of these which is quite expensive for something I don't really need (I'll be running audio through the HDMI). However, all these chips seem to require is an SPI or I2S bus, so as long as we ensure there's provision for at least one in the core board connections, we should be good.

Unfortunately here you are going to lose me, as I'm a digital guy and not very good with analog stuff. That said, I2S is a fairly standard bus (I think Xilinx even provides free IP cores for I2S transmitter and I2S receiver), so any one you can get your hands on should be good to go, provided it's a got a half-decent datasheet/appnote showing how analog part is supposed to be designed which you could reproduce on your board.

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Serial definitely, or possibly an SPI connection (if I can set aside the time to set one up on the device and test it). A USB3300 PHY would be easy enough to include I guess, we just need to ensure we've got 12 I/Os we can use for a ULPI interface.

There is no reason not to include both

There is a plenty of free IO pins in this FPGA.

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Eh. I guess it'll reduce the BOM cost as well. So we'll just stick with a JTAG port on the core card.

I actually bought a tag-connect pogo-pin connector recently, and I want to give it a try along with my Digilent HS3 programming cable to save some more even on a connector (it only requires a footprint). BTW if you want to save a few bucks, you can buy a Xilinx programmer on Aliexpress for a few bucks and they seem to work fairly OK. I prefer using geniune Digilent programming cable, it's about $55 so not a big deal considering it's a one-time investment.

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Yeah, that's a fair point. I know nothing about Ethernet really, so I'll be heavily guided by what I can find in existing dev board schematics. If you have any experience with a particular Ethernet PHY, let me know which device you'd prefer.

There is a bit of a shortage of PHY devices right now, so pretty much any device that can talk RGMII and you can get your hands on should be good. There are some gotchas with older RGMII v1.x devices which required adding a clock delay loop on a PCB, but with RGMII v2.0 there is now an option to add that delay internally, so we will need to check a datasheet for whichever device you end up choosing if it requires PCB clock delay or not. I've managed to snag a few of 88E1510-A0-NNB2C000's a couple of months ago, these are fairly expensive (though still in stock at Mouser right now), so if you find something at a more reasonable price, it should still be good.

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

Those connectors look just fine to my inexperienced eye, but they're not cheap (around £20 for a pair of 80-pin connectors). How many are we considering for a core board? I'm thinking 4 x 80-pin connections, but that's a ball-park top-of-my-head estimate.

I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

How do you go about getting free samples? Are they okay giving out freebies to hobbyists who are unlikely to make a purchase?

It never hurts to ask, they sent me a whole bunch of samples over the years, I usually asked for 10 mating pairs, and they never denied, even though I never actually ordered anything commercially from them directly (though I had bought their parts via Digikey and Mouser after I used free samples for prototypes and confirmed their suitability, so I guess their goal of converting me into a paying customer has been achieved).
Besides, you never know - maybe you will end up selling these modules in the future? If they are generic enough (and I'm trying to "steer" design such that they would be) and are actually available, you might get quite a few customers

Quote from: nockieboy on January 03, 2023, 06:55:55 pm

I've been using the PTN3366 on the previous GPU card with the Cyclone IV and I can see why BrianHG is recommending its use, but if you're sure we can directly drive HDMI from the Artix-7 then reducing the part count and simplifying the layout has to be a positive?

I remember that this solution has been devised because Cyclone-4/5 and MAX10 don't support TMDS standard directly. And since Artix-7 do support that standard directly, it seems like an unneccessary complication to me. Also there are plenty of Artix-7 boards which do implement it directly, so it's not just my personal experience with doing so. I personally had no problems with such implementation driving 3 meters long HDMI cable to a 1080p monitor, so I figured it's robust enough for most practical cases.
-----
BTW Artix-7 devices I've ordered have been shipped this morning, DHL says they should arrive by this Friday.

We will also need a bunch of clock chips and some crystals, but we will see into it once we have better idea of the whole system. At the very least we will need a main system clock on a module, and a 135 MHz LVDS clock for GTP/DisplayPort, and some crystals for Ethernet PHY (typically 25 MHz) and a USB 2.0 ULPI PHY (typically 24 MHz) - all of those are on a carrier.

nockieboy · « **Reply #75 on:** January 04, 2023, 12:10:34 pm »

Quote from: asmi on January 03, 2023, 08:11:39 pm

I actually bought a tag-connect pogo-pin connector recently, and I want to give it a try along with my Digilent HS3 programming cable to save some more even on a connector (it only requires a footprint). BTW if you want to save a few bucks, you can buy a Xilinx programmer on Aliexpress for a few bucks and they seem to work fairly OK. I prefer using geniune Digilent programming cable, it's about $55 so not a big deal considering it's a one-time investment.

I've often thought about using a pogo-pin connector instead of having to find room for a physical connector on the board, so would be happy to give this a try. Do you have any links to suitable products?

I have a cheap Xilinx programmer - got one a couple of years ago for the Spartan6 board I'd gotten, but have never used it.

Quote from: asmi on January 03, 2023, 08:11:39 pm

There is a bit of a shortage of PHY devices right now, so pretty much any device that can talk RGMII and you can get your hands on should be good. There are some gotchas with older RGMII v1.x devices which required adding a clock delay loop on a PCB, but with RGMII v2.0 there is now an option to add that delay internally, so we will need to check a datasheet for whichever device you end up choosing if it requires PCB clock delay or not. I've managed to snag a few of 88E1510-A0-NNB2C000's a couple of months ago, these are fairly expensive (though still in stock at Mouser right now), so if you find something at a more reasonable price, it should still be good.

Okay, well don't forget I'm designing for the core board primarily so I'm not overly concerned about lack of parts availability for features that aren't on my 'must have' list for the carrier board. So long as the design for the core board doesn't exclude these 'nice to have' items later, then that's good enough for me.

Quote from: asmi on January 03, 2023, 08:11:39 pm

I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10

That reduces the cost to something more reasonable. I'll see if I can get some samples.

Quote from: asmi on January 03, 2023, 08:11:39 pm

BTW Artix-7 devices I've ordered have been shipped this morning, DHL says they should arrive by this Friday.

I haven't heard anything yet, so will keep patiently waiting.

Quote from: asmi on January 03, 2023, 08:11:39 pm

We will also need a bunch of clock chips and some crystals, but we will see into it once we have better idea of the whole system. At the very least we will need a main system clock on a module, and a 135 MHz LVDS clock for GTP/DisplayPort, and some crystals for Ethernet PHY (typically 25 MHz) and a USB 2.0 ULPI PHY (typically 24 MHz) - all of those are on a carrier.

I've never used them before, but it looks like I'm going to need to design for differential clocks rather than single-ended ones for extra stability, especially with the DDR3 controller. How many clocks do I need for the FPGA? I've run through Vivado earlier and created a DDR3 controller using MIG (for the first time ever - details below) - it's talking about a dedicated clock for the DDR3?

So, I've run MIG to make a start on setting up a DDR3 controller simulation; these are the settings I used:

Code: [Select]

Vivado Project Options:
   Target Device                   : xc7a100t-fgg484
   Speed Grade                     : -2
   HDL                             : verilog
   Synthesis Tool                  : VIVADO

MIG Output Options:
   Module Name                     : mig_7series_0
   No of Controllers               : 1
   Selected Compatible Device(s)   : xc7a35t-fgg484, xc7a50t-fgg484, xc7a75t-fgg484, xc7a15t-fgg484

FPGA Options:
   System Clock Type               : Differential
   Reference Clock Type            : Differential
   Debug Port                      : OFF
   Internal Vref                   : enabled
   IO Power Reduction              : ON
   XADC instantiation in MIG       : Enabled

Extended FPGA Options:
   DCI for DQ,DQS/DQS#,DM          : enabled
   Internal Termination (HR Banks) : 50 Ohms

/*******************************************************/
/*                  Controller 0                       */
/*******************************************************/

Controller Options :
   Memory                        : DDR3_SDRAM
   Interface                     : NATIVE
   Design Clock Frequency        : 2500 ps (400.00 MHz)
   Phy to Controller Clock Ratio : 4:1
   Input Clock Period            : 2499 ps
   CLKFBOUT_MULT (PLL)           : 2
   DIVCLK_DIVIDE (PLL)           : 1
   VCC_AUX IO                    : 1.8V
   Memory Type                   : Components
   Memory Part                   : MT41K256M16XX-107
   Equivalent Part(s)            : --
   Data Width                    : 32
   ECC                           : Disabled
   Data Mask                     : enabled
   ORDERING                      : Strict

AXI Parameters :
   Data Width                    : 256
   Arbitration Scheme            : RD_PRI_REG
   Narrow Burst Support          : 0
   ID Width                      : 4

Memory Options:
   Burst Length (MR0[1:0])          : 8 - Fixed
   Read Burst Type (MR0[3])         : Sequential
   CAS Latency (MR0[6:4])           : 6
   Output Drive Strength (MR1[5,1]) : RZQ/7
   Controller CS option             : Disable
   Rtt_NOM - ODT (MR1[9,6,2])       : RZQ/4
   Rtt_WR - Dynamic ODT (MR2[10:9]) : Dynamic ODT off
   Memory Address Mapping           : BANK_ROW_COLUMN

Bank Selections:
	Bank: 34
		Byte Group T0:	Address/Ctrl-0
		Byte Group T1:	Address/Ctrl-1
		Byte Group T2:	Address/Ctrl-2
	Bank: 35
		Byte Group T0:	DQ[0-7]
		Byte Group T1:	DQ[8-15]
		Byte Group T2:	DQ[16-23]
		Byte Group T3:	DQ[24-31]

Reference_Clock: 
	SignalName: clk_ref_p/n
		PadLocation: T5/U5(CC_P/N)  Bank: 34

System_Clock: 
	SignalName: sys_clk_p/n
		PadLocation: R4/T4(CC_P/N)  Bank: 34

System_Control: 
	SignalName: sys_rst
		PadLocation: No connect  Bank: Select Bank
	SignalName: init_calib_complete
		PadLocation: No connect  Bank: Select Bank
	SignalName: tg_compare_error
		PadLocation: No connect  Bank: Select Bank

Is that right or have I made any mistakes? I wasn't sure about the bank choices - it was defaulting to assigning the controls/address/data to Banks 14-16, but that's no good as it's sharing with the configuration pins, so I've moved all the DDR3-related IO to Banks 34 & 35.

As mentioned above, not sure about reference_clock and system_clock. I presume system_clock is the main FPGA clock, which could be running at a different frequency to the DDR3? Is the reference clock supposed to be 400MHz or 1/4 of the system_clock?

asmi · « **Reply #76 on:** January 04, 2023, 03:20:10 pm »

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

I've often thought about using a pogo-pin connector instead of having to find room for a physical connector on the board, so would be happy to give this a try. Do you have any links to suitable products?

I've bought mine here: www.tag-connect.com For you I would recommend to get a 10 pin legged cable + adapter for Xilinx programmer: https://www.tag-connect.com/debugger-cable-selection-installation-instructions/xilinx-platform-cable-usb#85_171_146:~:text=Xilinx%202mm%20to%2010%20Pin%20Plug%2Dof%2DNails%E2%84%A2%20%2D%20With%20Legs specifically, "Xilinx 2mm to 10 Pin Plug-of-Nails™ - With Legs". BTW some of these things are carried by the likes of Digikey, so you might want to check if they got some in stock locally, because shipping from that company directly from US will probably cost you more than from DK et al which tend to have local warehouses. Just make sure you double check parts numbers you are ordering - this company has got a ton of adapters for different programmers/debuggers, so it's easy to order a wrong one by mistake.

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

That reduces the cost to something more reasonable. I'll see if I can get some samples.

If my experience is anything to go by, you should have no problems. At no time did they ever asked me a million of questions which other companies typically ask (like your project, volume, dates, etc.) - they just shipped what I asked for with no fuss (and even paid for express shipping from US!), which made me a loyal customer of theirs (and an easy recommendation for others) because I know I can rely on them for both samples and for actual productions parts should the likes of DK decide for some reason to not stock the part I'm after. Of course I don't abuse this service by requesting all their inventory or anything like that, but I tried requesting samples from one of their competitors, which asked me to fill out a 3 screens-long form with a metric ton of questions, to which I just said "screw it" and bought samples myself because my time and sanity are worth more to me than these samples were.

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

I haven't heard anything yet, so will keep patiently waiting.

Check your order status on their website. They never sent me a shipping notification, though later in a day DHL has sent me a request to pay sales tax and their customs fee.

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

I've never used them before, but it looks like I'm going to need to design for differential clocks rather than single-ended ones for extra stability, especially with the DDR3 controller. How many clocks do I need for the FPGA? I've run through Vivado earlier and created a DDR3 controller using MIG (for the first time ever - details below) - it's talking about a dedicated clock for the DDR3?

There are many ways to skin this cat. I often used a single-ended 100 MHz base clock and a PLL to generate both clocks required for MIG. But this kind of "wastes" one extra MCMM because MIG itself uses MCMM. In this case you select the frequency for that clock to something like 200 MHz and select a "no buffer" option in the MIG.
Alternative often used solution is to use 200 MHz LVDS differential clock selected right in the MIG and select a "Use system clock" option for the reference clock (I will explain below what it's for). The advantage of this approach is that you only use a single MCMM.

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

So, I've run MIG to make a start on setting up a DDR3 controller simulation; these are the settings I used:

Is that right or have I made any mistakes? I wasn't sure about the bank choices - it was defaulting to assigning the controls/address/data to Banks 14-16, but that's no good as it's sharing with the configuration pins, so I've moved all the DDR3-related IO to Banks 34 & 35.

See, it wasn't that bad, was it?

Like I said above, I would select "5000 ps (200 MHz)" in the "Input Clock Period" drop down, on the next page System Clock: "Differential" (or "No Buffer" if you want more flexibility on which pin(s) your system clock will be), Reference Clock: "Use System Clock" (this option will only appear if you set your input clock period to 200 MHz).
As for pinout, I would do it like this:

Code: [Select]

	Bank: 34
		Byte Group T0:	DQ[0-7]
		Byte Group T1:	DQ[8-15]
		Byte Group T2:	DQ[16-23]
		Byte Group T3:	DQ[24-31]
	Bank: 35
		Byte Group T1:	Address/Ctrl-2
		Byte Group T2:	Address/Ctrl-1
		Byte Group T3:	Address/Ctrl-0

But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.

Quote from: nockieboy on January 04, 2023, 12:10:34 pm

As mentioned above, not sure about reference_clock and system_clock. I presume system_clock is the main FPGA clock, which could be running at a different frequency to the DDR3? Is the reference clock supposed to be 400MHz or 1/4 of the system_clock?

System Clock is the one which will be used to derive the actual memory interface clock, while Reference Clock is used to drive special hardware block - "Delay Controller", which is used by delay blocks as time reference, that clock has to have a fixed frequency of 200 Mhz (in some cases and for some devices it can also be 300 or 400 Mhz). MIG outputs ui_clk, which is the one your HDL has to use to interact with the controller, it is either 1/2 or 1/4 of a memory frequency.

nockieboy · « **Reply #77 on:** January 04, 2023, 06:59:40 pm »

Quote from: asmi on January 03, 2023, 08:11:39 pm

I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10

Just popping back to this post for a sec. I've had a look on the Samtec website at the two connectors you've linked, but the first one (BSE-060-01-F-D-A-TR) looks like it's a 60-pin part? Am a little confused as the symbol imports into EasyEDA with two sub-components, each with 60-pins (the 2nd part you linked imports as a single 120-pin connector symbol), so it could well be a 120-pin connector, but I just wanted to check with you that these are definitely both 120-pin connectors and a matching pair?

EDIT: I've checked the footprints and they've both got 120 pins and are the same width, so I guess it's just an odd difference in the way the symbols are represented for both parts.

EDIT 2: So, male connectors on the core or carrier board? Does it matter?

asmi · « **Reply #78 on:** January 04, 2023, 07:54:21 pm »

Quote from: nockieboy on January 04, 2023, 06:59:40 pm

Just popping back to this post for a sec. I've had a look on the Samtec website at the two connectors you've linked, but the first one (BSE-060-01-F-D-A-TR) looks like it's a 60-pin part? Am a little confused as the symbol imports into EasyEDA with two sub-components, each with 60-pins (the 2nd part you linked imports as a single 120-pin connector symbol), so it could well be a 120-pin connector, but I just wanted to check with you that these are definitely both 120-pin connectors and a matching pair?

EDIT: I've checked the footprints and they've both got 120 pins and are the same width, so I guess it's just an odd difference in the way the symbols are represented for both parts.
EDIT 2: So, male connectors on the core or carrier board? Does it matter?

The number in the part # shows the number of pins per row. I've tripped over that many times already, so I typically just count the number of pins on their 3D model

As for symbols and footprints, I typically make my own just to be sure they are correct (and use their 3D model to make sure it's correct, and if I have any doubts I would print a footprint on a piece of paper and physically place connector on top to see if it's OK), so not really sure what theirs look like.
As for which one goes where, I don't think it matters. I noticed on this module BSE goes on a module and BTE on a carrier, so let's make it the same - presumably those guys knew what they were doing.

While you are working on connectors footprints, please be SUPER careful about which pin number mates to which pin number on the mating connector. I've screwed up this an untold number of times, forgetting that connectors mate with a mirrored pin numbers. So try to number pins on both footprints such that pin 1 of one side mates to pin 1 of the mating side, otherwise you are going to have a disaster on your hands.

nockieboy · « **Reply #79 on:** January 05, 2023, 07:05:00 pm »

Quote from: asmi on January 04, 2023, 03:20:10 pm

I've bought mine here: www.tag-connect.com For you I would recommend to get a 10 pin legged cable + adapter for Xilinx programmer: https://www.tag-connect.com/debugger-cable-selection-installation-instructions/xilinx-platform-cable-usb#85_171_146:~:text=Xilinx%202mm%20to%2010%20Pin%20Plug%2Dof%2DNails%E2%84%A2%20%2D%20With%20Legs specifically, "Xilinx 2mm to 10 Pin Plug-of-Nails™ - With Legs". BTW some of these things are carried by the likes of Digikey, so you might want to check if they got some in stock locally, because shipping from that company directly from US will probably cost you more than from DK et al which tend to have local warehouses. Just make sure you double check parts numbers you are ordering - this company has got a ton of adapters for different programmers/debuggers, so it's easy to order a wrong one by mistake.

I know it's a one-off cost, but I'm a little reticent about having to spend $50+ on what is basically a bit of ribbon cable, some pins and springs. Also, thinking about it, these are intended to be dev boards so we should make it as easy as possible to program them. I think I'm probably going to stick with an SMD or TH pin header if space allows for the JTAG port.

Quote from: asmi on January 04, 2023, 03:20:10 pm

If my experience is anything to go by, you should have no problems. At no time did they ever asked me a million of questions which other companies typically ask (like your project, volume, dates, etc.) - they just shipped what I asked for with no fuss (and even paid for express shipping from US!), which made me a loyal customer of theirs (and an easy recommendation for others) because I know I can rely on them for both samples and for actual productions parts should the likes of DK decide for some reason to not stock the part I'm after. Of course I don't abuse this service by requesting all their inventory or anything like that, but I tried requesting samples from one of their competitors, which asked me to fill out a 3 screens-long form with a metric ton of questions, to which I just said "screw it" and bought samples myself because my time and sanity are worth more to me than these samples were.

Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components. Two of each connector are now in the post, which will allow me to make one carrier/core board.

Quote from: asmi on January 04, 2023, 03:20:10 pm

Check your order status on their website. They never sent me a shipping notification, though later in a day DHL has sent me a request to pay sales tax and their customs fee.

It's left Hong Kong and is on its way, apparently.

Quote from: asmi on January 04, 2023, 03:20:10 pm

There are many ways to skin this cat. I often used a single-ended 100 MHz base clock and a PLL to generate both clocks required for MIG. But this kind of "wastes" one extra MCMM because MIG itself uses MCMM. In this case you select the frequency for that clock to something like 200 MHz and select a "no buffer" option in the MIG.
Alternative often used solution is to use 200 MHz LVDS differential clock selected right in the MIG and select a "Use system clock" option for the reference clock (I will explain below what it's for). The advantage of this approach is that you only use a single MCMM.

If I'm reading the datasheet correctly, the XC7A100T has 6 of them? That's a couple more PLLs than the Cyclone or MAX10. I'm not sure how significant the system clock speed will be for the GPU - perhaps @BrianHG could give me a steer on valid system clock speeds to use? I seem to recall that a 27MHz clock would fit nicely with the video circuitry, but the MAX10 and Cyclone IV GPUs both ran fine on 50MHz system clocks. I guess it boils down to using a PLL to create the 200MHz clock from a slower system clock, or a slower video clock with a PLL from a 200MHz system clock.

Quote from: asmi on January 04, 2023, 03:20:10 pm

See, it wasn't that bad, was it?

I haven't tried it yet!

Not sure if the project is set up correctly or anything tbh, but one downside to Xilinx is that the project is over 6MB in size, even after zipping, so I haven't included it with this post due to size constraints. At least Intel/Altera projects compress down to hundreds of KB without the bitstreams.

Quote from: asmi on January 04, 2023, 03:20:10 pm

Like I said above, I would select "5000 ps (200 MHz)" in the "Input Clock Period" drop down, on the next page System Clock: "Differential" (or "No Buffer" if you want more flexibility on which pin(s) your system clock will be), Reference Clock: "Use System Clock" (this option will only appear if you set your input clock period to 200 MHz).
As for pinout, I would do it like this:
Code: [Select]
Bank: 34 Byte Group T0: DQ[0-7] Byte Group T1: DQ[8-15] Byte Group T2: DQ[16-23] Byte Group T3: DQ[24-31] Bank: 35 Byte Group T1: Address/Ctrl-2 Byte Group T2: Address/Ctrl-1 Byte Group T3: Address/Ctrl-0But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.

I'll take another look at this tomorrow.

Quote from: asmi on January 04, 2023, 03:20:10 pm

The number in the part # shows the number of pins per row. I've tripped over that many times already, so I typically just count the number of pins on their 3D model
As for symbols and footprints, I typically make my own just to be sure they are correct (and use their 3D model to make sure it's correct, and if I have any doubts I would print a footprint on a piece of paper and physically place connector on top to see if it's OK), so not really sure what theirs look like.
As for which one goes where, I don't think it matters. I noticed on this module BSE goes on a module and BTE on a carrier, so let's make it the same - presumably those guys knew what they were doing.

While you are working on connectors footprints, please be SUPER careful about which pin number mates to which pin number on the mating connector. I've screwed up this an untold number of times, forgetting that connectors mate with a mirrored pin numbers. So try to number pins on both footprints such that pin 1 of one side mates to pin 1 of the mating side, otherwise you are going to have a disaster on your hands.

Yes, I'll be checking and double-checking the PCB design when I get that far. I really don't want to mess that up!

Okay, I've attached the latest power supply schematic. It seems like most of the board is going to be taken up with power supplies! Let me know if there's anything obviously wrong. I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...

asmi · « **Reply #80 on:** January 05, 2023, 07:58:28 pm »

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

I know it's a one-off cost, but I'm a little reticent about having to spend $50+ on what is basically a bit of ribbon cable, some pins and springs. Also, thinking about it, these are intended to be dev boards so we should make it as easy as possible to program them. I think I'm probably going to stick with an SMD or TH pin header if space allows for the JTAG port.

That's OK and it's your call. This stuff pays off in a long run, and I know that it will be worth it for me for sure - especially since I plan to use that exact cable for programming STM32 MCUs as well - so two for the price of one (plus one more adapter).

Now, as far as programming headers go, geniune Xilinx programmer (and Digilent HS3 cable) uses 2 mm pitch connector, and some of it's clones retain that too, while others use regular 0.1" one, and yet other clones have both options (this is what I have in my clone), so you might want to check up which one of those do you have so that you will fit the right header. I also preferred using header with a polarized shroud which prevents connecting cable the wrong way around (and likely destroying it in the process because the opposite pins are all grounded). The easiest way to check is to get any 0.1" open header and (with everything unpowered of course) try connecting programmer to it to see if it would fit. If so then you will need a 0.1" pitch connector, otherwise - a 2 mm pitch one.

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components. Two of each connector are now in the post, which will allow me to make one carrier/core board.

Great! See, I told you they are nice people. Though I never had to send them any emails, I always used "Get a Free Sample" buttons on their website and ordered that way. But hey - whichever way works is good in my book.

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

It's left Hong Kong and is on its way, apparently.

Cool!

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

If I'm reading the datasheet correctly, the XC7A100T has 6 of them? That's a couple more PLLs than the Cyclone or MAX10. I'm not sure how significant the system clock speed will be for the GPU - perhaps @BrianHG could give me a steer on valid system clock speeds to use? I seem to recall that a 27MHz clock would fit nicely with the video circuitry, but the MAX10 and Cyclone IV GPUs both ran fine on 50MHz system clocks. I guess it boils down to using a PLL to create the 200MHz clock from a slower system clock, or a slower video clock with a PLL from a 200MHz system clock.

Yea, you are unlikely to run out of MCMMs/PLLs in that device. You will have to use ui_clk output from the MIG to interact with it, but you can add more clocks if you want or need for other parts of your design. Just add a FIFO between clock domain (this device's BRAM blocks can be configured as FIFOs), and make sure you don't under/overrun it.

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

I haven't tried it yet! Not sure if the project is set up correctly or anything tbh, but one downside to Xilinx is that the project is over 6MB in size, even after zipping, so I haven't included it with this post due to size constraints. At least Intel/Altera projects compress down to hundreds of KB without the bitstreams.

There is a File -> Project -> Archive command, and you can even clean up resulting archive somewhat, but MIG generates a lot of stuff (there are actually two designs generated - user design and example design, all in the output). Fortunately you can simply save MIG's *.prj file along with a constraints file and re-generate the whole thing on a receiving end ("Verify Pin Changes and Update Design" branch in the wizard flow).

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

Yes, I'll be checking and double-checking the PCB design when I get that far. I really don't want to mess that up!

Yeah, that's what I thought too, yet I messed it up anyway

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

Okay, I've attached the latest power supply schematic.

No you haven't

Quote from: nockieboy on January 05, 2023, 07:05:00 pm

I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...

According to the datasheet of the MPM3683-7 module, it's supposed to have low enough ripple for powering GTPs (<10 mVpp). Maybe we should include a provision for a simple Pi filter (a bead with two caps on each end of it, for example this one, but any one with DC resistance of < 50 mOhm and a DC current rating of >1 A will do) plus some local capacitance just in case we'll need it, you can simply short out the bead by a zero ohm resistor during initial assembly.

nockieboy · « **Reply #81 on:** January 06, 2023, 09:40:05 am »

Quote from: asmi on January 05, 2023, 07:58:28 pm

Now, as far as programming headers go, geniune Xilinx programmer (and Digilent HS3 cable) uses 2 mm pitch connector, and some of it's clones retain that too, while others use regular 0.1" one, and yet other clones have both options (this is what I have in my clone), so you might want to check up which one of those do you have so that you will fit the right header. I also preferred using header with a polarized shroud which prevents connecting cable the wrong way around (and likely destroying it in the process because the opposite pins are all grounded). The easiest way to check is to get any 0.1" open header and (with everything unpowered of course) try connecting programmer to it to see if it would fit. If so then you will need a 0.1" pitch connector, otherwise - a 2 mm pitch one.

My clone has an adaptor with 3 different connectors and cables - 2x5 pins @2.54mm, 2x7 @2mm and a 2.54mm single-row pin header (for custom connections, I guess).

Quote from: asmi on January 05, 2023, 07:58:28 pm

Quote from: nockieboy on January 05, 2023, 07:05:00 pm
Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components. Two of each connector are now in the post, which will allow me to make one carrier/core board.
Great! See, I told you they are nice people. Though I never had to send them any emails, I always used "Get a Free Sample" buttons on their website and ordered that way. But hey - whichever way works is good in my book.

Yes, very friendly. I could have pushed the 'Get Free Sample' button, but I thought I'd drop them a line to say hello as well. Even after all these years of the internet, I still get a bit excited about talking to people in different time zones/countries.

Quote from: asmi on January 05, 2023, 07:58:28 pm

There is a File -> Project -> Archive command, and you can even clean up resulting archive somewhat, but MIG generates a lot of stuff (there are actually two designs generated - user design and example design, all in the output). Fortunately you can simply save MIG's *.prj file along with a constraints file and re-generate the whole thing on a receiving end ("Verify Pin Changes and Update Design" branch in the wizard flow).

That's handy to know - I'm going to want to share the project with you at some point to verify my results or - more likely - work out what I'm doing wrong.

All I did was create a new blank project, set the target device to XC7A100T-2-FGG484 and ran through MIG. There's no top level entity or anything like that. I guess at this stage I'm just looking for pin assignments for the DDR3, but pretty soon I'm going to need to start integrating BrianHG's DDR3 controller so that I can start simulating it.

Quote from: asmi on January 05, 2023, 07:58:28 pm

Quote from: nockieboy on January 05, 2023, 07:05:00 pm
Okay, I've attached the latest power supply schematic.
No you haven't

Darn it. I've been having trouble with this forum recently - I've noticed a drag'n'drop box has appeared in the attachments section for a new post, which seems to cause more problems for me than it solves. I definitely attached the PDF yesterday, it even said it had uploaded. I'll try again with this post, but instead of drag/drop I'll just select it in the file explorer via the 'Choose file' button.

Quote from: asmi on January 05, 2023, 07:58:28 pm

Quote from: nockieboy on January 05, 2023, 07:05:00 pm
I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...
According to the datasheet of the MPM3683-7 module, it's supposed to have low enough ripple for powering GTPs (<10 mVpp). Maybe we should include a provision for a simple Pi filter (a bead with two caps on each end of it, for example this one, but any one with DC resistance of < 50 mOhm and a DC current rating of >1 A will do) plus some local capacitance just in case we'll need it, you can simply short out the bead by a zero ohm resistor during initial assembly.

Never heard of a Pi filter before, so that's something new to learn about!

Well, hopefully you'll get the attachment this time and can see what I've done so far.

EDIT: Question regarding the power supplies. All of the MPM power chips use a resistor divider to set their output voltage. The datasheets/MPM design software quote values for those resistors, which I have used in the attached schematic. I note that the MCM3683-7 that generates VCCINT and 1V0_MGT uses 2K and 3K resistors (R4 and R12) - is there any reason I can't swap them for 200K and 300K resistors to use the same parts as MPM3833s?

BrianHG · « **Reply #82 on:** January 06, 2023, 10:31:02 am »

In this view of my graphics sub-system: BrianHG_GFX_VGA_Window_System.pdf

Everything encircled in purple runs at the pixel clock rate. (everything else runs at the DDR3 user clock speed) For the Max10, we had a hard limit of 200MHz, so, we chose the industry standard 148.5MHz for authentic 1080p. This limited us to 8 window layers in 1080p. The Artix7 should be able to achieve 297MHz, meaning now you can do 16 window layers in 1080p, or double everything available for 720p as well. My code generates integer divided pixel outputs for running multiple window layers in series in conjunction with the parallel window layers. There is an easy patch to allow division of the series layers across multiple video output, so a 297MHz system could produce 2 1080p video outputs, with 8 video window layers on each, or 4 video outputs at 720p. (This doesn't negate the possibility of including multiple video systems multiplying more video outputs, but now supporting different video modes on each video out.)

Having a 27MHz/54MHz source crystal only means having a perfect divisor for the true ansi 148.5MHz 16:9 modes or 54Mhz/108Mhz/216Mhz for the 4:3 modes. Though, with all the available PLLs, just bridging 2 of them usually means you can generate anything you like from an integer source clock, or just live with a slight imperfect frequency error.

BrianHG · « **Reply #83 on:** January 06, 2023, 10:38:58 am »

For these 3 instantiations in my 'BrianHG_GFX_Video_Line_Buffer.sv':

BrianHG_GFX_Video_Line_Buffer.sv#L521
BrianHG_GFX_Video_Line_Buffer.sv#L827
BrianHG_GFX_Video_Line_Buffer.sv#L998

You will need to create a dummy 'altsyncram' where inside, you tie the important ports and parameters to Xilinx's equivalent dual clock block ram as the write side is on the user DDR3 control clock and the read output is on the video pixel clock. Everything else should be compatible with any vendor's FPGA compiler who supports SystemVerilog. You can ignore the init files, they just help the test-bench visualization results.

I also included ModelSim testbenches for those individual modules for verification.

asmi · « **Reply #84 on:** January 06, 2023, 12:31:18 pm »

Quote from: nockieboy on January 06, 2023, 09:40:05 am

My clone has an adaptor with 3 different connectors and cables - 2x5 pins @2.54mm, 2x7 @2mm and a 2.54mm single-row pin header (for custom connections, I guess).

In this case it's best to stick to 2x7 2mm pin out to remain compatible with official programmers.

Quote from: nockieboy on January 06, 2023, 09:40:05 am

Yes, very friendly. I could have pushed the 'Get Free Sample' button, but I thought I'd drop them a line to say hello as well. Even after all these years of the internet, I still get a bit excited about talking to people in different time zones/countries.

I worked in many large companies, so I always had a problem of receiving too much emails, so I guess I kind of got over this

Quote from: nockieboy on January 06, 2023, 09:40:05 am

That's handy to know - I'm going to want to share the project with you at some point to verify my results or - more likely - work out what I'm doing wrong.

Few versiions ago Vivado did a major change which improved compatibility with source control by moving all generated code out of the ".srcs" folder into ".gen", so that now you can technically only save that folder in the source control. MIG does store it's prj file there, but for some reason it stored constraints file (.xdc) in the ".gen" folder. That file stored pinout selected, so it's required as well. It's in .gen\sources_1\ip\<mig_instance_id>\<mig_instance_id>\user_design\constraints\<mig_instance_id>.xdc

Quote from: nockieboy on January 06, 2023, 09:40:05 am

All I did was create a new blank project, set the target device to XC7A100T-2-FGG484 and ran through MIG. There's no top level entity or anything like that. I guess at this stage I'm just looking for pin assignments for the DDR3, but pretty soon I'm going to need to start integrating BrianHG's DDR3 controller so that I can start simulating it.

Simulating MIG is rather slow (on my PC getting through calibration takes about 3-3.5 minutes to pass through calibration), so I typically use AXI Verification IP to "pretend" to be a DDR RAM for simulation purposes, which is MUCH faster. But that approach will only work if you use MIG with AXI frontend. For UI-only designs I know many people write simple BFMs to simulate MIG's behaviour without a full-on simulation of a controller and memory devices to speed things up. Maybe you should do the same once you figure out to work with it.

Quote from: nockieboy on January 06, 2023, 09:40:05 am

Darn it. I've been having trouble with this forum recently - I've noticed a drag'n'drop box has appeared in the attachments section for a new post, which seems to cause more problems for me than it solves. I definitely attached the PDF yesterday, it even said it had uploaded. I'll try again with this post, but instead of drag/drop I'll just select it in the file explorer via the 'Choose file' button.

I always open the post once posted to verify that all attachments are actually there.

Quote from: nockieboy on January 06, 2023, 09:40:05 am

Never heard of a Pi filter before, so that's something new to learn about! Well, hopefully you'll get the attachment this time and can see what I've done so far.

It's named so because it looks like a Greek "pi" letter - with a bead placed horizontally and caps placed vertically on each side. It's extensively used in RF, but also for noise filtering.

Quote from: nockieboy on January 06, 2023, 09:40:05 am

EDIT: Question regarding the power supplies. All of the MPM power chips use a resistor divider to set their output voltage. The datasheets/MPM design software quote values for those resistors, which I have used in the attached schematic. I note that the MCM3683-7 that generates VCCINT and 1V0_MGT uses 2K and 3K resistors (R4 and R12) - is there any reason I can't swap them for 200K and 300K resistors to use the same parts as MPM3833s?

Those values are always a compromise between noise immunity and efficiency - the higher the values are, the smaller current goes through them, which improves efficiency, but also increases noise sensitivity. So maybe we can meet in the middle and use 20k and 30k in both cases?

I dunnow, I tend to prefer sticking to the recommended values because resistors cost next to nothing.

One important point is to do your very best to follow layout recommendations from datasheets or copying their evaluation boards' layout. A lot in DC-DC converter performance depends on a proper layout, so try sticking to those recommendations as close as possible.

That said, there is something weird going on with that module's layout. In the pinout description, they recommend creating a copper pour for SW nodes, in their devboard's schematics they show these pins as connected, however on a layout per-layer pictures each SW node is isolated. I've sent them an enquiry regarding that, let's see what they say. I guess it's best to do what the layout shows, as opposed to what the text tells to do, but let's see if/what they respond.

----
One more question for you - do you have a code name for that project/module/PCB? I like to come up with cool code names for my projects, for example the board in my signature is "Sparta", so saying "Sparta-50" ("50" stands for FPGA density used) referring to that board is much easier than referring to it as a "beginner-friendly FPGA board with Spartan-7 FPGA and a DDR2 memory"

asmi · « **Reply #85 on:** January 06, 2023, 12:39:58 pm »

Quote from: BrianHG on January 06, 2023, 10:31:02 am

Everything encircled in purple runs at the pixel clock rate.

Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? This is how all video cards work, it allows them to not depend on a current video output mode, and it's also trivial to do double- and triple-buffering to alleviate tearing and other aftifacts which occur when frame generation is running asyncronously to output. It's also trivial to give a CPU direct access to the framebuffer if need be in this approach. But the most important advantage is that you can run your actual frame generation at faster clock to get more stuff done in the same amount of time, and in general seems like a more sensible approach - for example if nothing in the framebuffer changes, whole frame generation pipeline can just idle as opposed to churning out the same frames over and over again. Or I'm missing something here?

BrianHG · « **Reply #86 on:** January 06, 2023, 01:26:31 pm »

Quote from: asmi on January 06, 2023, 12:39:58 pm

Quote from: BrianHG on January 06, 2023, 10:31:02 am
Everything encircled in purple runs at the pixel clock rate.
Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? This is how all video cards work, it allows them to not depend on a current video output mode, and it's also trivial to do double- and triple-buffering to alleviate tearing and other aftifacts which occur when frame generation is running asyncronously to output. It's also trivial to give a CPU direct access to the framebuffer if need be in this approach. But the most important advantage is that you can run your actual frame generation at faster clock to get more stuff done in the same amount of time, and in general seems like a more sensible approach - for example if nothing in the framebuffer changes, whole frame generation pipeline can just idle as opposed to churning out the same frames over and over again. Or I'm missing something here?

It's not a renderer. It is a real-time multi-window, multi-depth, multi-tile memory to a raster display. No tearing. You can double or triple buffer each layer individually if you like, but even that isn't needed with my code for the smoothest silky scrolling.
Set the maximum layers option to 1, turn off the palette and text, and the code enabled would only be 1 line buffer going straight to the video dac output port. Then, you need to software render or simulate all the layers and window blending.

This was the easiest solution for the Z80 to handle multiple display layers without any drawing commands, or waiting for a rendering engine read multiple bitmaps and tile maps, blend them together to construct a buffer frame, when ready, do a frame swap, and redo this every V-sync.

You want text, turn on a font layer. You want backgrounds, turn on a 32bit layer. You want meters to blend in, then out, turn on and off those layers. You want additional overlays, or sprites, turn on and off those layers, set their coordinates, size and other metrics, like which ram location IE: frame for the sprite's animation.

You want 4bit layers mixed with 8bit layers with different palettes for each 8bit layer, plus a few 32bit layers, no problem. All with alpha blended shading.

This was all done in around 18.3kle for 16 layered graphics on an FPGA which can barely maintain 200MHz, yet my core was running it fine at 100MHz, only the pixel clock ran at 148.5Mhz, consuming under 2watts including the DDR3 controller and DVI transmitter.

asmi · « **Reply #87 on:** January 06, 2023, 01:53:10 pm »

Quote from: BrianHG on January 06, 2023, 01:26:31 pm

It's not a renderer. It is a real-time multi-window, multi-depth, multi-tile memory to a raster display. No tearing. You can double or triple buffer each layer individually if you like, but even that isn't needed with my code for the smoothest silky scrolling.
Set the maximum layers option to 1, turn off the palette and text, and the code enabled would only be 1 line buffer going straight to the video dac output port. Then, you need to software render or simulate all the layers and window blending.

This was the easiest solution for the Z80 to handle multiple display layers without any drawing commands, or waiting for a rendering engine read multiple bitmaps and tile maps, blend them together to construct a buffer frame, when ready, do a frame swap, and redo this every V-sync.

You want text, turn on a font layer. You want backgrounds, turn on a 32bit layer. You want meters to blend in, then out, turn on and off those layers. You want additional overlays, or sprites, turn on and off those layers, set their coordinates, size and other metrics, like which ram location IE: frame for the sprite's animation.

You want 4bit layers mixed with 8bit layers with different palettes for each 8bit layer, plus a few 32bit layers, no problem. All with alpha blended shading.

This was all done in around 18.3kle for 16 layered graphics on an FPGA which can barely maintain 200MHz, yet my core was running it fine at 100MHz, only the pixel clock ran at 148.5Mhz, consuming under 2watts including the DDR3 controller and DVI transmitter.

With all due respect, but none of that answers my question. Why can't you simply save the output of your pipeline into a memory buffer instead of streaming it to the output? That's how every single video/2D/3D rendering engine I've ever came across works, hence my surprise that this one do not. I can see lots of advantages of such decoupling, and the only disadvantage I can think of is increased memory bandwidth requirement, which is going to be rectified with this new board as it's going to essentially have double the bandwidth of existing board, and we can increase it even further if neccessary by adding a second controller with 16- or 32-bit data bus (though at the expense of IO pins available for other purposes).

BrianHG · « **Reply #88 on:** January 06, 2023, 02:16:59 pm »

You are correct that with a few extra gates, it would be possible to render the output back into memory.

Currently, with a single 16bit DDR3 running at 800mhz, we got enough to produce a dual 32bit 1080p image plus some extra spare bandwidth for CPU drawing.

With 2x DDR3, with the current setup, we can either render to ram 2 dual 32bit images while showing a back buffer 32bit image in real-time, or in real-time show 4 full sized real-time mixed 1080p 32bit layers.

This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips. The bandwidth increase would have been so high that adapting my code to render any X number of windows in a back buffer while dumbly showing a double or triple buffered output. Also, another plus is than we can expand the system to something like 256 windows, and unlike my DDR3 DECA multi-window demo in my DDR3 thread with the 16 mixed graphics windows, we would now be able to draw all 256 windows even on a 1080p output, but, if all the windows are full screen, there would just be a frame rate drop.

Yes, another change is that when rendering back into memory with my code, you can get rid of the dual clock nature, though, that's it. Everything else remains the same. All we have done is add a programmable frame writing pointer and our final display buffer will now be stuck in 32bit mode.

Also, we now have enough registers to change my integer scaling for each window to a fractional bi-linear scaler for each window effectively making any res up and down you like in each window instead of current the simplex 1x,2x,3x,4x,5x... The up/down bilinear scaler would be great for rendering internally at 4x res, then outputting at 1/4 mode for nice soft dithered edges.

asmi · « **Reply #89 on:** January 06, 2023, 03:12:14 pm »

Quote from: BrianHG on January 06, 2023, 02:16:59 pm

Currently, with a single 16bit DDR3 running at 800mhz

Are you sure you're running it at 800 MHz? I thought it was only 400 MHz.

Quote from: BrianHG on January 06, 2023, 02:16:59 pm

This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.

There are a couple of problems with implementing SODIMM, with the second one being the most important:
1. It's going to make a module very large. SODIMM socket typically has a length of over 70 mm, this is going to drive up the size of the module itself and concequently the price of PCB manufacturing for both module and a carrier.
2. For a device and package we've chosen (and managed to find in stock at a reasonable price) - and really any Artix-7 device aside from the top-level A200T's in 676 or 1156 ball packages - implementing it will require some shenannigans with level shifters for QSPI flash memory, because it uses regular IO pins from bank 14 for data lines. This is something that I know theoretically possible to do, but I've never actually done this, and I don't want to lead nockieboy down that garden path having not walked it myself first so that I can be confident that it will actually work. This is why I've been offering adding a second 16-bit or 32-bit controller instead. It would eat a lot of IO pins, leaving only about 80 for everything else (plus maybe 30 or so can be recovered from banks used for DDR3 if he is willing to add a metric ton of 1.5 V <-> 3.3 V (or even 5 V) voltage level converters, not sure if that's going to be enough for all peripherals.

There is another way - use Kintex-7 part instead - like K70 or K160 - they allow running DDR3 interface at up to 933 MHz (1866 MT/s) depending on a package so giving you massive bandwidth even without increasing interface width, but these part are expensive - at least double to quadruple the price of A100T's we were able to source.

nockieboy · « **Reply #90 on:** January 06, 2023, 05:16:16 pm »

Quote from: BrianHG on January 06, 2023, 02:16:59 pm

This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.

I obviously missed the significance of that and didn't look into using SODIMM at all. Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board? Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup? As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!). Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?

Quote from: asmi on January 06, 2023, 12:31:18 pm

Those values are always a compromise between noise immunity and efficiency - the higher the values are, the smaller current goes through them, which improves efficiency, but also increases noise sensitivity. So maybe we can meet in the middle and use 20k and 30k in both cases? I dunnow, I tend to prefer sticking to the recommended values because resistors cost next to nothing.

It's no problem, I was just wondering if I could reduce the BOM and make it less likely for errors to creep in during assembly with less different resistor values.

Quote from: asmi on January 06, 2023, 12:31:18 pm

One important point is to do your very best to follow layout recommendations from datasheets or copying their evaluation boards' layout. A lot in DC-DC converter performance depends on a proper layout, so try sticking to those recommendations as close as possible.

When I get to PCB design, I'll be following the recommended layouts as closely as possible.

Quote from: asmi on January 06, 2023, 12:31:18 pm

That said, there is something weird going on with that module's layout. In the pinout description, they recommend creating a copper pour for SW nodes, in their devboard's schematics they show these pins as connected, however on a layout per-layer pictures each SW node is isolated. I've sent them an enquiry regarding that, let's see what they say. I guess it's best to do what the layout shows, as opposed to what the text tells to do, but let's see if/what they respond.

It does seem odd to me that they have all those pins and they're not supposed to be connected to anything.

Quote from: asmi on January 06, 2023, 12:31:18 pm

One more question for you - do you have a code name for that project/module/PCB? I like to come up with cool code names for my projects, for example the board in my signature is "Sparta", so saying "Sparta-50" ("50" stands for FPGA density used) referring to that board is much easier than referring to it as a "beginner-friendly FPGA board with Spartan-7 FPGA and a DDR2 memory"

Haha, yes I do. I'm calling it the XCAT-100. It's just the FPGA part name with the 7 removed and one letter moved up from the end to make it (almost) an English word. I can design a logo with a cat's head and an X later.

asmi · « **Reply #91 on:** January 06, 2023, 07:10:55 pm »

Quote from: nockieboy on January 06, 2023, 05:16:16 pm

I obviously missed the significance of that and didn't look into using SODIMM at all. Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board? Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup? As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!). Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?

Maybe we can settle for a sort of a hybrid approach - use 32 bit interface for layers and composition, and then have a dedicated interface for framebuffers? In that case 16-bit interface should be enough for 2 parallel streams (one to write the frame, another one to read a frame and output it to HDMI), and we can implement it in a single IO bank, leaving about 130 or so IO pins (plus whatever we can recover from 32-bit interface banks if we go that route).

Quote from: nockieboy on January 06, 2023, 05:16:16 pm

Haha, yes I do. I'm calling it the XCAT-100. It's just the FPGA part name with the 7 removed and one letter moved up from the end to make it (almost) an English word. I can design a logo with a cat's head and an X later.

Cool! I will refer to it like that from that moment on. Maybe it's time to create a repo on github or something?

-----
I've received my devices today, as was promised by DHL. They came in a sealed bags with dessicant and moisture indicator. So far I've resisted urge to open them up, but I suspect I will eventually succumb to temptation

Good thing that I have an oven where I can bake these parts before reflow, and the air at home is quite dry (~35% humidity) so it shouldn't be that big of a deal...

nockieboy · « **Reply #92 on:** January 06, 2023, 07:58:37 pm »

What values would be appropriate for the Pi filter?

asmi · « **Reply #93 on:** January 06, 2023, 08:07:35 pm »

Quote from: nockieboy on January 06, 2023, 07:58:37 pm

What values would be appropriate for the Pi filter?

0.47 uF on the input side (before the bead), and 4.7 uF on the output side. The point of this filter is to provide a shortcut for high-frequency noise from Vccint pins before it reaches GTP's power pins, and at the same time provide enough capacitance after the filter to sustain the voltage rail within spec once noise subsides. There are going to be two more 0.1 uF caps and another 4.7 uF cap on the MGTAVCC side as per recommendations in UG482, all of that should be enough.
What's the L1 on that schematic?

nockieboy · « **Reply #94 on:** January 06, 2023, 11:18:05 pm »

Quote from: asmi on January 06, 2023, 08:07:35 pm

0.47 uF on the input side (before the bead), and 4.7 uF on the output side. The point of this filter is to provide a shortcut for high-frequency noise from Vccint pins before it reaches GTP's power pins, and at the same time provide enough capacitance after the filter to sustain the voltage rail within spec once noise subsides. There are going to be two more 0.1 uF caps and another 4.7 uF cap on the MGTAVCC side as per recommendations in UG482, all of that should be enough.

These extra caps after the pi filter - these are from Table 5-7 on page 230 of UG482? I take it these are different to the 0.1uF caps mentioned in Table 5-11 on page 233?

Quote from: asmi on January 06, 2023, 08:07:35 pm

What's the L1 on that schematic?

That's there to isolate VCCINT noise from 1V0_MGT? It was there on reference schematics and I've used the same method previously on the Cyclone GPU PCB to filter noise out of the line. I guess the pi filter does a better job and it can be removed?

BrianHG · « **Reply #95 on:** January 06, 2023, 11:31:21 pm »

Quote from: asmi on January 06, 2023, 07:10:55 pm

Quote from: nockieboy on January 06, 2023, 05:16:16 pm
I obviously missed the significance of that and didn't look into using SODIMM at all. Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board? Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup? As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!). Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?
Maybe we can settle for a sort of a hybrid approach - use 32 bit interface for layers and composition, and then have a dedicated interface for framebuffers? In that case 16-bit interface should be enough for 2 parallel streams (one to write the frame, another one to read a frame and output it to HDMI), and we can implement it in a single IO bank, leaving about 130 or so IO pins (plus whatever we can recover from 32-bit interface banks if we go that route).

What? How does handicapping you bandwidth, or forcibly evenly dividing it between 2 different controllers save IOs, or logic elements, or improve global memory design access VS 1 controller at double speed?

asmi · « **Reply #96 on:** January 07, 2023, 12:11:28 am »

Quote from: nockieboy on January 06, 2023, 11:18:05 pm

These extra caps after the pi filter - these are from Table 5-7 on page 230 of UG482?

Yep

Quote from: nockieboy on January 06, 2023, 11:18:05 pm

I take it these are different to the 0.1uF caps mentioned in Table 5-11 on page 233?

Nope, they are the very same caps mentioned in the table 5-7.

Quote from: nockieboy on January 06, 2023, 11:18:05 pm

That's there to isolate VCCINT noise from 1V0_MGT? It was there on reference schematics and I've used the same method previously on the Cyclone GPU PCB to filter noise out of the line. I guess the pi filter does a better job and it can be removed?

That's what pi filter is for. No need for additional beads.

asmi · « **Reply #97 on:** January 07, 2023, 12:12:36 am »

Quote from: BrianHG on January 06, 2023, 11:31:21 pm

What? How does handicapping you bandwidth, or forcibly evenly dividing it between 2 different controllers save IOs, or logic elements, or improve global memory design access VS 1 controller at double speed?

I don't understand this question. What exactly is being handicapped and how?

BrianHG · « **Reply #98 on:** January 07, 2023, 12:23:51 am »

Say I decode /playback a video. On that output, I want to perform multiple Lanczos scaling steps with both X&Y convolutional filtering, plus a few Z frames de-noise filtering before output and analysis, maybe even motion adaptive de-interlacing. Your 2 separate controller cut my image processing bandwidth in half as it will sit in one memory controller before being exported to the frame buffer while a portion of that frame buffer's bandwidth may never be used. Especially if my image size is a video larger in resolution than the available memory on the frame buffer side where it may be mandatory to pre-process everything in one controller and only export the visible area to the other controller's ram.

asmi · « **Reply #99 on:** January 07, 2023, 01:20:19 am »

Quote from: BrianHG on January 07, 2023, 12:23:51 am

Say I decode /playback a video. On that output, I want to perform multiple Lanczos scaling steps with both X&Y convolutional filtering, plus a few Z frames de-noise filtering before output and analysis, maybe even motion adaptive de-interlacing. Your 2 separate controller cut my image processing bandwidth in half as it will sit in one memory controller before being exported to the frame buffer while a portion of that frame buffer's bandwidth may never be used. Especially if my image size is a video larger in resolution than the available memory on the frame buffer side where it may be mandatory to pre-process everything in one controller and only export the visible area to the other controller's ram.

Still don't understand it... Right now you don't store output frames at all, instead streaming them directly to the output. I suggest instead of doing so you save those frames into a dedicated memory, which would then be read by a separate process that would stream it to the output. This way displaying stuff from the framebuffer will be decoupled from generating new frames, and once a new frame is ready, you would switch a pointer to it so that the next screen scan would begin displaying that new frame, all while frame generator will work on creating yet another new frame. So it's essentially a triple buffering - one frame is the one being displayed, another one is the frame will be be displayed once current one is fully out, and a third one is the one being generated. Full 1080p frame at 32 bpp takes a bit over 8 MBytes of memory (or just below 9 MBytes if you align lines to a 2K boundary so making it essentially a 2048x1080 array), if we are going to use 4Gbit DDR3 memory devices, it will provide enough capacity for many-many frames, and since there will be some spare bandwidth, it's possible to use it for other non-bandwidth intensive things as well - like application storage, drive I/O, etc.
None of that is going to even touch a 32 bit controller, which will be dedicated solely to the needs of a frame generator. Which means your existing code will get 2x input memory bandwidth. That's why I don't understand what's being handicapped here and how.

That said, if I were to design a rendering engine, I would approach it from a completely different angle and made it more like modern video cards - with unified memory controllers, unified shader model, universal compute cores, and only the very minimal amount of fixed function logic - for stuff like primitive rasterization and fragment (pixel) generation, which is a fixed function even on the most modern GPUs. Companies behind modern GPUs have burned numerous billions of dollars into researching the best ways to do these things, so I would simply follow their lead as opposed to reinventing the wheel. But it's not my project, and my project would not have contained a pre-historic CPU which cripples entire system's performance, so what do I know?

So, you guys please make up your mind on how do you want to proceed, and we'll go with whatever you decide.

BrianHG · « **Reply #100 on:** January 07, 2023, 02:19:04 am »

I don't understand what you are getting at.

You keep telling us to make 2 ram controllers where I say make 1 wide double speed controller.

Now, if I want to do a 2D convolution filter on an image. Forget about displaying it. We aren't there yet as we may want to use the result as a new texture for 3D modeling, or for image analysis.

How long will it take on a 400MHz 2x16 DDR3 VS a 400MHz 4x16 DDR3.

Say I have another 3 processing steps before I'm ready to render the frame buffer.

free_electron · « **Reply #101 on:** January 07, 2023, 03:08:31 am »

BEFORE you do ANYTHING : get the exact layerstack ( core and prepreg thicknesses and material dk's) Get the dk values for the frequency you will be running at.

So many times people start designing this kind of stuff and then find out all the impedance calcualtions are ou the door because they ran off an invalid specified stack.

asmi · « **Reply #102 on:** January 07, 2023, 03:18:59 am »

Quote from: BrianHG on January 07, 2023, 02:19:04 am

I don't understand what you are getting at.

You keep telling us to make 2 ram controllers where I say make 1 wide double speed controller.

Now, if I want to do a 2D convolution filter on an image. Forget about displaying it. We aren't there yet as we may want to use the result as a new texture for 3D modeling, or for image analysis.

How long will it take on a 400MHz 2x16 DDR3 VS a 400MHz 4x16 DDR3.

Say I have another 3 processing steps before I'm ready to render the frame buffer.

I thought I already explained why we aren't going to implement a 64 bit DDR3 interface, unless nokieboy is willing to assume a risk of implementing something which I personally has never done, and in general very few people done (I couldn't find any board which would have this implemented). Two controllers solution is a compromise I'm offering because that's something I'm fairly certain will work.

asmi · « **Reply #103 on:** January 07, 2023, 03:21:34 am »

Quote from: free_electron on January 07, 2023, 03:08:31 am

BEFORE you do ANYTHING : get the exact layerstack ( core and prepreg thicknesses and material dk's) Get the dk values for the frequency you will be running at.

We are going to user JLCPCB for manufacturing, and they have published stackups they offer with all values we require.

Quote from: free_electron on January 07, 2023, 03:08:31 am

So many times people start designing this kind of stuff and then find out all the impedance calcualtions are ou the door because they ran off an invalid specified stack.

Since we're going to be running DDR3 at relatively slow 400 MHz, exact impedance match is not super-critical. It would be a completely different story if we were to implement a 933 MHz interface.

nockieboy · « **Reply #104 on:** January 09, 2023, 10:25:24 pm »

Just as an aside, my FPGAs arrived today.

As far as the discussion on memory buses goes, I'm staying well out of that one as there's nothing useful I can add to either side. I will go with whatever the final consensus is, as you both (asmi and BrianHG) know faaaaaar more than I do about the subject.

@asmi - have you had a response about the SW connections on the MPM3683-7 yet?

asmi · « **Reply #105 on:** January 09, 2023, 11:31:07 pm »

Quote from: nockieboy on January 09, 2023, 10:25:24 pm

Just as an aside, my FPGAs arrived today.

Great!

Quote from: nockieboy on January 09, 2023, 10:25:24 pm

As far as the discussion on memory buses goes, I'm staying well out of that one as there's nothing useful I can add to either side. I will go with whatever the final consensus is, as you both (asmi and BrianHG) know faaaaaar more than I do about the subject.

Well you are the one who will "bend the metal" so to speak, i.e. will actually spend money on a physical board, so decision is yours to make. I'm actually tempted to burn one of my 35Ts to throw together a devboard with SODIMM just to try it out and see if it works. Now if only days would have more than 24 hours so that I would find enough extra time to actually design a board...

Quote from: nockieboy on January 09, 2023, 10:25:24 pm

@asmi - have you had a response about the SW connections on the MPM3683-7 yet?

They asked me for a company email (I sent them a request from my personal one). Sent it out, so far - nothing. They said they are going to forward this to a local FAE which cover a province where I live in, we will see. But then again, if I am to throw together a SODIMM tester, it will be super-cheap because it's gonna contain an FPGA, PDS parts and an SODIMM connector.

BrianHG · « **Reply #106 on:** January 10, 2023, 03:58:17 am »

Well, if the Artix7 can run the DDR3 at 800Mhz or above (1600mtps), then 2x 16bit DDR3 ram chips will be as good as running a SODIMM module at 400MHz. Though, if you get the SODIMM working at 800MHz, then expect to be able to play Quake2/3 on your FPGA with some ultra serious HDL engineering and coding.

asmi · « **Reply #107 on:** January 10, 2023, 10:36:35 am »

Quote from: BrianHG on January 10, 2023, 03:58:17 am

Well, if the Artix7 can run the DDR3 at 800Mhz or above (1600mtps), then 2x 16bit DDR3 ram chips will be as good as running a SODIMM module at 400MHz.

No, it can't do that. Speed grade 2, which is the device we've purchased, can only go up to 400 MHz, speed grade 3 can go up to 533 MHz, but that's about it.

Quote from: BrianHG on January 10, 2023, 03:58:17 am

Though, if you get the SODIMM working at 800MHz, then expect to be able to play Quake2/3 on your FPGA with some ultra serious HDL engineering and coding.

Quake 2 was working just fine even with a simple SDRAM (not even DDR) at far lower bandwidth than what DDR3 can do.

miken · « **Reply #108 on:** January 10, 2023, 09:20:06 pm »

asmi is correct. The datasheet specs are in Mbps. I made the same mistake myself at first, ~~since the MIG interface is sized at double the memory burst rate~~.

Xilinx uses these "4:1" and "2:1" terms for the MIG gearing but you kind of have to work through what exactly is meant. For example a 4:1 with ~~200MHz~~ EDIT:100MHz FPGA-side is 800 Mbps/pin, 400 MHz DDR.

asmi · « **Reply #109 on:** January 10, 2023, 11:57:27 pm »

Quote from: miken on January 10, 2023, 09:20:06 pm

asmi is correct. The datasheet specs are in Mbps. I made the same mistake myself at first, since the MIG interface is sized at double the memory burst rate.

Xilinx uses these "4:1" and "2:1" terms for the MIG gearing but you kind of have to work through what exactly is meant. For example a 4:1 with 200MHz FPGA-side is 800 Mbps/pin, 400 MHz DDR.

That's not entirely correct. "4:1" and "2:1" refer to memory frequency to UI frequency gearing ratio, in your example memory runs at 400 MHz, and UI on FPGA side runs at 100 MHz. UI data bus width is 8x the interface width, so a single UI transaction covers entire 8n DDR3 burst (which happens at 800 MT/s in this example), even though UI includes a FIFO for write data allowing some advance buffering (meaning you can write data into a FIFO and later issue a write command. This way it's theoretically possible to fully saturate memory bus with back-to-back memory transactions if commanded accordingly, though of course in reality there will be some breaks due to a need to activate a row, precharge and refresh, etc.

BrianHG · « **Reply #110 on:** January 11, 2023, 01:18:18 am »

Quote from: asmi on January 10, 2023, 11:57:27 pm

Quote from: miken on January 10, 2023, 09:20:06 pm
asmi is correct. The datasheet specs are in Mbps. I made the same mistake myself at first, since the MIG interface is sized at double the memory burst rate.

Xilinx uses these "4:1" and "2:1" terms for the MIG gearing but you kind of have to work through what exactly is meant. For example a 4:1 with 200MHz FPGA-side is 800 Mbps/pin, 400 MHz DDR.
That's not entirely correct. "4:1" and "2:1" refer to memory frequency to UI frequency gearing ratio, in your example memory runs at 400 MHz, and UI on FPGA side runs at 100 MHz. UI data bus width is 8x the interface width, so a single UI transaction covers entire 8n DDR3 burst (which happens at 800 MT/s in this example), even though UI includes a FIFO for write data allowing some advance buffering (meaning you can write data into a FIFO and later issue a write command. This way it's theoretically possible to fully saturate memory bus with back-to-back memory transactions if commanded accordingly, though of course in reality there will be some breaks due to a need to activate a row, precharge and refresh, etc.

This matches my DDR3 controller. My Half mode and Quad mode changes the user command clock in relation to the DDR3 _CK clock frequency while the data buss runs at 2x that. However, as I did discover with Lattice, they have a 500mbps cap on their DDR 2:1 IO primitive and a 800mbps cap when using their 2xDDR, IE 4:1. primitive. So for lattice, I am stuck in quad mode unless you want to underclock the DDR3, or, overclock the FPGA. Strangely enough, Altera allows 840mbps on their run of the mill 2:1 DDR IO primitive for their run of the mil FPGAs, though I cannot exclude that they automatically inferred my multiple data shift registers to and from the DDR IO primitive as a larger 4:1 or 8:1 2xDDR primitive as I easily achieved a clean 400MHz DDR3_CK clock in both Half mode (200Mhz user interface clock) and quarter mode (100MHz user clock interface) while Altera's official PHY can only do 300MHz exclusively in half mode.

If Xilinx Spartan7 has a higher maximum DDR IO (or serdes) primitive data rate between their 2:1 primitive and 4:1 primitive, if I were to adapt my DDR3 controller to Xilinx Spartan7, then I can make use of their improved 4:1 serdes primitive's performance to go above the 800mbps cap. With some work, if their high speed 3gbps serdes port can be an IO, work in 8:1 and use a separate read and write clock phase, then my controller can be adapted to operate a DDR3 ram chip up to 3gtps if the chip can go that fast. However, this is the type of project for someone with too much free time on their hands.

BrianHG · « **Reply #111 on:** January 11, 2023, 01:26:32 am »

Quote from: asmi on January 10, 2023, 10:36:35 am

Quote from: BrianHG on January 10, 2023, 03:58:17 am
Well, if the Artix7 can run the DDR3 at 800Mhz or above (1600mtps), then 2x 16bit DDR3 ram chips will be as good as
Quote from: BrianHG on January 10, 2023, 03:58:17 am
Though, if you get the SODIMM working at 800MHz, then expect to be able to play Quake2/3 on your FPGA with some ultra serious HDL engineering and coding.
Quake 2 was working just fine even with a simple SDRAM (not even DDR) at far lower bandwidth than what DDR3 can do.

I know. 640x480 at 60hz, only using downs-ampled 256x256 textures (optional for higher speed or smaller texture size) 8bit palleted, with frame rates between 30fps and 60fps.
Also, my Voodoo2 at the time had 2 ram controller banks of EDO ram, 128bits each, plus, my motherboard had it's own 128bit ram for the CPU geometry and game engine. This is not what I was promising Nockieboy with the large high speed ram. I was promising 120fps, 1920x1080, full 32bit textures at their native 512x512 & 1024x1024 scale, plus enough onboard bandwidth to also run the CPU plus geometry and game engine and audio.

miken · « **Reply #112 on:** January 11, 2023, 07:53:33 am »

Quote from: asmi on January 10, 2023, 11:57:27 pm

That's not entirely correct. "4:1" and "2:1" refer to memory frequency to UI frequency gearing ratio, in your example memory runs at 400 MHz, and UI on FPGA side runs at 100 MHz. UI data bus width is 8x the interface width, so a single UI transaction covers entire 8n DDR3 burst (which happens at 800 MT/s in this example), even though UI includes a FIFO for write data allowing some advance buffering (meaning you can write data into a FIFO and later issue a write command. This way it's theoretically possible to fully saturate memory bus with back-to-back memory transactions if commanded accordingly, though of course in reality there will be some breaks due to a need to activate a row, precharge and refresh, etc.

You're right, I confused myself with different clocks. That makes more sense... Funny thing is I read in some forum post that the UI interface was twice as wide but clearly that's not the case. Have to refresh my memory with hard facts before spouting bad information.

BrianHG · « **Reply #113 on:** January 11, 2023, 12:27:10 pm »

Quote from: asmi on January 06, 2023, 12:39:58 pm

Quote from: BrianHG on January 06, 2023, 10:31:02 am
Everything encircled in purple runs at the pixel clock rate.
Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? ....

Just to make one thing absolutely clear, the geometry unit I engineered with Nockieboy already have a a software ram-to-ram blitter which can convert any multiple source bit depth graphics to and from any ram, window coordinate size and location into memory for onscreen display. Not only that, but that engine also can up-sample and down-sample the source graphics to the destination graphics, use blitter source windows as a paint-brush when running the geometry drawing commands, with the addition of rotate 90, 45, mirror and flip. Nockieboy with the GPU in it's current form have enough to render a Doom, or his current Z80 alone can draw the arcade full Outrun car racing game, but in full 32bit quality, HD quality and at a good 30-60fps depending on the Z80s ability to send out draw commands.

I was planning to help him create a display list processor to automate some control program lists in DDR3 so the Z80 wouldn't have to do anything but load a program, point to it's base address and tell it to 'GO', but Nockieboy wanted a new FPGA first.

nockieboy · « **Reply #114 on:** January 11, 2023, 06:10:53 pm »

I feel totally out of my depth with all this DDR conversation; any suggestions for a good place to start as a primer for this subject so I can start reading up on it all and at least sound like I know vaguely what you're talking about?

Quote from: BrianHG on January 11, 2023, 12:27:10 pm

I was planning to help him create a display list processor to automate some control program lists in DDR3 so the Z80 wouldn't have to do anything but load a program, point to it's base address and tell it to 'GO', but Nockieboy wanted a new FPGA first.

I think having more room to fit that DLP and, perhaps, a soft-core processor and any other peripherals someone might want to squeeze in there is an important step. No reason why we can't do both at the same time though, unless it would pay to finalise the next FPGA first before we develop the HDL further?

@asmi - further to our previous conversation about crystals/clock sources - would a 100MHz differential clock source be okay? Should I be considering a second external clock of some description?

asmi · « **Reply #115 on:** January 11, 2023, 08:27:37 pm »

Quote from: nockieboy on January 11, 2023, 06:10:53 pm

I feel totally out of my depth with all this DDR conversation; any suggestions for a good place to start as a primer for this subject so I can start reading up on it all and at least sound like I know vaguely what you're talking about?

The crux of discussion is how much memory bandwidth will you need. We need a definite answer to that question before we can proceed with design, as DDR3 interface is a major consumer of IO pins, and there are restrictions on which pins can be used for that purpose.
If you want to learn how DDR3 controller works and how to use it, download a UG586 document from Xilinx website and read through chapter 1, which has everything you need to know regarding controller.

Quote from: nockieboy on January 11, 2023, 06:10:53 pm

@asmi - further to our previous conversation about crystals/clock sources - would a 100MHz differential clock source be okay? Should I be considering a second external clock of some description?

You don't have to use differential clock at all, all of boards I designed used regular single-ended 100 MHz clock and they worked just fine. As for additional clocks, it's up to you guys, you know better what do you need for your design. MCMMs are quite flexible in 7 series, each of them has a fractional divider (with 0.125 step) on the first output so you can generate many different clocks. I used 100 MHz clock to generate 148.5 and 742.5 MHz clocks required for 1080p@60Hz HDMI output, though the latter frequency exceeds specs and causes timing violations, but still works just fine in practice.

But, as with everything else, the ultimate call on what you need is on you. You come and tell us what you need, and we'll figure out a way to make it possible.

nockieboy · « **Reply #116 on:** January 12, 2023, 12:00:33 pm »

Quote from: asmi on January 11, 2023, 08:27:37 pm

You don't have to use differential clock at all, all of boards I designed used regular single-ended 100 MHz clock and they worked just fine. As for additional clocks, it's up to you guys, you know better what do you need for your design. MCMMs are quite flexible in 7 series, each of them has a fractional divider (with 0.125 step) on the first output so you can generate many different clocks. I used 100 MHz clock to generate 148.5 and 742.5 MHz clocks required for 1080p@60Hz HDMI output, though the latter frequency exceeds specs and causes timing violations, but still works just fine in practice.

But, as with everything else, the ultimate call on what you need is on you. You come and tell us what you need, and we'll figure out a way to make it possible.

Okay, that's cool then if I don't need to use a differential clock source then I won't make it more complicated than it needs to be.

In terms of what I need, that's where this is more a collaboration than me specifying requirements - I'd rather pick a clock frequency that minimises the use of MMCMs to cater for both the DDR3 controller and memory, and the GPU itself. I'm likely to go with a 50MHz oscillator if it's my choice, as I have a few of those already and if I choose 100MHz, it's going to need to be divided or multiplied like a 50MHz clock would be for the various parts of the FPGA. Plus the lower the clock frequency, the less problems I'll have with routing considerations etc.

@BrianHG - does the GPU or your DDR3 controller make any particular system clock frequency more desirable than another? If I go with 50MHz, that's purely based on what we used previously for the Cyclone IV board - if there's a better choice for whatever reason, let me know.

BrianHG · « **Reply #117 on:** January 12, 2023, 12:32:58 pm »

Quote from: nockieboy on January 12, 2023, 12:00:33 pm

@BrianHG - does the GPU or your DDR3 controller make any particular system clock frequency more desirable than another? If I go with 50MHz, that's purely based on what we used previously for the Cyclone IV board - if there's a better choice for whatever reason, let me know.

It's compatible with any clock input the FPGA can support, however, you will be using Xilins's DDR3 controller unless you deliberately want the fun of adapting mine.

Unless the Xilinx's controller gives you 16 configurable read/write ports, you will probably will be using my controller's multiport front-end interface as it is doing a shit load of heavy lifting for you. My multiport uses the user interface command clock from Xilinx's DDR3 controller unless you want to manually configure your own PLL. In the Max10-6, it's upper limit was ~200Mhz, but your current project is running it at 100MHz. The Spartan7 should achieve 200Mhz with ease. (Actually I think the only reason we went 100MHz instead of 200Mhz was the ellipse line generator was too complex.) This will double the speed of all the rest of your GPU modules, IE:Geometry renderer. The new data buss' double width, going from 128 bit to 256 bit will once again potentially double your maximum throughput speed, though except for my display raster generator, you have yet to code anything else which will make full use of this true 4x top speed.

If I were you, you should already be working on coding and simulating this part.

I do not know about Xilinx in circuit PLL reconfiguration, but I would ask asmi if 2 different PLLs can run from the same 100Mhz clock source, IE use a second one for the HDMI display port. Fiddling with the DDR3's master pll to change res during operation may just introduce headaches.

Also, make sure that from the 100MHz, you can make (148.5Mhz or 297Mhz & 742.5Mhz) & (108Mhz or 216Mhz & 540Mhz). Make sure they are exact, no fractions or decimals, otherwise, just put add a 27Mhz, or 54Mhz, or 108Mhz oscillator to the PCB.

asmi · « **Reply #118 on:** January 12, 2023, 09:43:42 pm »

Quote from: nockieboy on January 12, 2023, 12:00:33 pm

In terms of what I need, that's where this is more a collaboration than me specifying requirements - I'd rather pick a clock frequency that minimises the use of MMCMs to cater for both the DDR3 controller and memory, and the GPU itself. I'm likely to go with a 50MHz oscillator if it's my choice, as I have a few of those already and if I choose 100MHz, it's going to need to be divided or multiplied like a 50MHz clock would be for the various parts of the FPGA. Plus the lower the clock frequency, the less problems I'll have with routing considerations etc.

I just checked and it looks like it's not possible to generate 148.5 and 742.5 MHz exactly from 50 MHz, so you will need to use 100 MHz.

Quote from: BrianHG on January 12, 2023, 12:32:58 pm

I do not know about Xilinx in circuit PLL reconfiguration, but I would ask asmi if 2 different PLLs can run from the same 100Mhz clock source, IE use a second one for the HDMI display port. Fiddling with the DDR3's master pll to change res during operation may just introduce headaches.

Yes you can, albeit with some limitations (MCMMs need to be in the same column and in the same clock region).

Quote from: BrianHG on January 12, 2023, 12:32:58 pm

Also, make sure that from the 100MHz, you can make (148.5Mhz or 297Mhz & 742.5Mhz) & (108Mhz or 216Mhz & 540Mhz). Make sure they are exact, no fractions or decimals, otherwise, just put add a 27Mhz, or 54Mhz, or 108Mhz oscillator to the PCB.

I know for sure you can generate 148.5 and 742.5 MHz exactly from 100 MHz, but what do you need 297 MHz for? Also what mode the second set of clocks is for? I just checked, and it looks like it's possible to generate them from 100 MHz as well (though not at the same time as 148.5/742.5 MHz).

You can play around with MCMM/PLL settings by invoking "Clocking wizard" in IP Catalog. Each MCMM has 7 outputs, first of which (output 0) can have fractional divider, and of course multiplier can be fractional as well (with the same 1/8=0.125 step IIRC).

Alternatively you can use 200 MHz LVDS clock generator connected to DDR3's bank pins, controller will output 100 MHz UI clock which you use to drive the interface, and place an additional 27 MHz clock just for the video out - these clock gens are cheap (about $1), so it shouldn't be that big of a deal. Or instead of 27 MHz fixed frequency you can use a programmable clock generator like SI5351A-B-GT which has 3 outputs each of which can be programmed to a wide range of frequencies via I2C interface for ultimate flexibility. That device is about $3 (+ some cents for the 25 or 27 MHz crystal), so quite a reasonable price.

BrianHG · « **Reply #119 on:** January 13, 2023, 04:18:59 am »

Quote from: asmi on January 12, 2023, 09:43:42 pm

Quote from: BrianHG on January 12, 2023, 12:32:58 pm
Also, make sure that from the 100MHz, you can make (148.5Mhz or 297Mhz & 742.5Mhz) & (108Mhz or 216Mhz & 540Mhz). Make sure they are exact, no fractions or decimals, otherwise, just put add a 27Mhz, or 54Mhz, or 108Mhz oscillator to the PCB.
I know for sure you can generate 148.5 and 742.5 MHz exactly from 100 MHz, but what do you need 297 MHz for? Also what mode the second set of clocks is for? I just checked, and it looks like it's possible to generate them from 100 MHz as well (though not at the same time as 148.5/742.5 MHz).

You can play around with MCMM/PLL settings by invoking "Clocking wizard" in IP Catalog. Each MCMM has 7 outputs, first of which (output 0) can have fractional divider, and of course multiplier can be fractional as well (with the same 1/8=0.125 step IIRC).

The 297 is optional, but offers additional options.
It depends on Spartan7's PLL core speed capabilities.
For example, of the core can do the 1.485 GHz, then for the sub-divisional outputs - /2=742.5 for the LVDS HDMI, /5=297, /10=148.5, all integer divisions. If all the core PLL can do is the 742.5, then we will skip the 297 and just divide that by 5 to give us the 148.5 pixel clock.

With a 54MHz source, /2 = 27Mhz, * 55 = 1.485 Ghz. All integer, no sub-fractional tricks.

On the Deca board, I did a 50Mhz /25, *27 = 54Mhz,
Then made the 148.5Mhz from the 54MHz.

This took 2 PLLs since the Max10 had no fractional dividers, so this is how I made purest possible reference without any jitter using integer only PLLs.

If we used Cyclone V, then we could have had access to a fractional N divider PLL for the first primary frequency output offering a direct conversion of 100Mhz to 1.485Ghz. All other sub-divided pixel clock outputs could be made from that. (Yes, the Cyclone V PLL can operate at 1.485Ghz)

1 PLL should do it on the Spartan7 if it has a fractional N divider PLL. If it can do 1.485 Ghz, then we can output 4K at 30Hz, assuming the LVDS transmitter can do 3gb.

asmi · « **Reply #120 on:** January 13, 2023, 06:03:07 am »

Quote from: BrianHG on January 13, 2023, 04:18:59 am

The 297 is optional, but offers additional options.

Like what? HDMI is output by using a SERDES (actually a pair of cascaded SERDES, but that is irrelevant here) in a 10:1 DDR mode, so you will need a pixel (symbol) clock of 148.5 MHz and a half of bit clock (742.5 MHz). No other clock are required. You will need to perform 8b/10b encoding yourself and feed SERDES encoded 10 bit symbols.

Quote from: BrianHG on January 13, 2023, 04:18:59 am

1 PLL should do it on the Spartan7 if it has a fractional N divider PLL. If it can do 1.485 Ghz, then we can output 4K at 30Hz, assuming the LVDS transmitter can do 3gb.

Where are you getting all these numbers? Maybe you should read DS181. Official limit is 1.25 gb per pin, unofficially it can do 1.485 gb, but that's about it. Where this 3gb is coming from?
If you want to go beyond what regular IO pins can do, you will need to use transceivers. We're currently planning to wire 4 GTPs to a DisplayPort connector. DP version 1.2 specification is publicly available and so it shouldn't be too hard to implement up to HBR2 (5.4 Gbps per lane), which is enough to drive up to 4k@60. But since even modern GPUs sometimes struggle to maintain reasonable framerate at such resolution, utility of actually doing so kind of escapes me, not to mention that it's going to require a MASSIVE memory bandwidth even only for double/triple buffering alone. Even 64 bit DDR3 at 400 MHz will just about be enough for double-buffering. And frankly, I think Artix-7 is not powerful enough for that resolution, we will need something like Artix Ultrascale+ with it's 1.2 GHz DDR4 support.

BrianHG · « **Reply #121 on:** January 13, 2023, 06:11:59 am »

Ok, no 3840x2160 support.

BrianHG · « **Reply #122 on:** January 13, 2023, 06:17:01 am »

Quote from: asmi on January 13, 2023, 06:03:07 am

Even 64 bit DDR3 at 400 MHz will just about be enough for double-buffering. And frankly, I think Artix-7 is not powerful enough for that resolution, we will need something like Artix Ultrascale+ with it's 1.2 GHz DDR4 support.

Yes it is.. Especially at 30hz progressive. Though, for 60hz, I would prefer a 500MHz controller.
Now do not go putting Spartan7 below a cheap crummy CycloneV-GX.

asmi · « **Reply #123 on:** January 13, 2023, 03:21:05 pm »

Quote from: BrianHG on January 13, 2023, 06:17:01 am

Yes it is.. Especially at 30hz progressive. Though, for 60hz, I would prefer a 500MHz controller.
Now do not go putting Spartan7 below a cheap crummy CycloneV-GX.

They both are inadequate. The fact that they can technically output a stream in 4k@60 doesn't change the reality of them being too slow and too small to do much in a way of actually generating that image. A single stream at 4k@60 requires ~1.85 GBytes/s of bandwidth, a DDR3 64bit@400 MHz theoretical max is a bit below 6 GBytes/s, so a simple double-buffering is going to eat almost half of DDR3's available bandwidth, not leaving much for actual rendering engine, which typically requires an order of magnitude more bandwidth than what's required for display, because those resources (textures, primitive streams) also need to be of higher resolution, and generating that many pixels (a single 4k frame contains over 8 million pixels!) requires a lot of hardware, and fast one too. To maintain 60 Hz refresh rate, renderer needs to generate almost half a billion pixels per second, which means at 100 MHz it needs to output 5 pixels on each clock cycle. In my opinion that is waaay beyond what low end 7 series devices can do, Cyclone-5 is even worse than that. Even Artix Ultrascale+'s 64bit DDR4 interface running at 1.2 GHz and providing ~17.9 GBytes/s of bandwidth, while being much better than 400 MHz DDR3, I still suspect might not be enough even for a relatively simple 3D renderer generating 4k@60. For a bit of a perspective, NVidia's 3080 Ti GPU has over 912 GBytes/s of memory bandwidth, so close to two orders of magnitude more than what even Artix US+ can provide.
So, unless you want to simply upscale 1080p to 4K for the ouput (so that you can see big and beautiful pixels in chisp details!), I'd say we'd better abandon all this talk about 4k and focus on something that we can realistically expect to achieve.

BrianHG · « **Reply #124 on:** January 14, 2023, 12:10:44 am »

Oh, so you don't believe we could operate the frame buffer and render in 422 YUV mode.

asmi · « **Reply #125 on:** January 14, 2023, 01:54:27 am »

Quote from: BrianHG on January 14, 2023, 12:10:44 am

Oh, so you don't believe we could operate the frame buffer and render in 422 YUV mode.

I said what I said. Let's move on from this to discussing the actual board.

nockieboy · « **Reply #126 on:** January 14, 2023, 07:22:40 pm »

1080p is more than good enough for me - 4K output is beyond the scope of the GPU project and certainly isn't something I need or want, but it makes for interesting 'pushing the envelope' discussions.

Quote from: BrianHG on January 12, 2023, 12:32:58 pm

Unless the Xilinx's controller gives you 16 configurable read/write ports, you will probably will be using my controller's multiport front-end interface as it is doing a shit load of heavy lifting for you. My multiport uses the user interface command clock from Xilinx's DDR3 controller unless you want to manually configure your own PLL. In the Max10-6, it's upper limit was ~200Mhz, but your current project is running it at 100MHz. The Spartan7 should achieve 200Mhz with ease. (Actually I think the only reason we went 100MHz instead of 200Mhz was the ellipse line generator was too complex.) This will double the speed of all the rest of your GPU modules, IE:Geometry renderer. The new data buss' double width, going from 128 bit to 256 bit will once again potentially double your maximum throughput speed, though except for my display raster generator, you have yet to code anything else which will make full use of this true 4x top speed.

I think it's a given that I'll be using the multiport front-end interface in conjunction with Xilinx's controller, apologies if I caused confusion referring to your DDR3 controller, I meant it as a 'catch all' term to refer to the proposed multiport front-end/Xilinx controller combo.

Quote from: BrianHG on January 12, 2023, 12:32:58 pm

If I were you, you should already be working on coding and simulating this part.

I'm working on learning Vivado and setting up a DDR3 simulation, but progress is glacially slow at the moment due to work commitments and little free time.

Quote from: asmi on January 12, 2023, 09:43:42 pm

Alternatively you can use 200 MHz LVDS clock generator connected to DDR3's bank pins, controller will output 100 MHz UI clock which you use to drive the interface, and place an additional 27 MHz clock just for the video out - these clock gens are cheap (about $1), so it shouldn't be that big of a deal. Or instead of 27 MHz fixed frequency you can use a programmable clock generator like SI5351A-B-GT which has 3 outputs each of which can be programmed to a wide range of frequencies via I2C interface for ultimate flexibility. That device is about $3 (+ some cents for the 25 or 27 MHz crystal), so quite a reasonable price.

Ooh that looks like a pretty cool gadget. Didn't even know they existed! Would that have to be used in addition to a fixed system clock, or could it replace the system clock entirely? Presumably it needs to be configured at power-on via I2C, so the FPGA would require some sort of fixed system clock source to set it up as intended. Would be handy if it passed through the 25/27MHz reference clock by default on one of the outputs to allow the FPGA to set it up with a faster/alternative clock frequency. I've looked at the datasheet (not in massive detail, admittedly) and I couldn't find anything to help with those questions.

dolbeau · « **Reply #127 on:** January 15, 2023, 07:05:50 am »

Quote from: nockieboy on January 14, 2023, 07:22:40 pm

Would be handy if it passed through the 25/27MHz reference clock by default on one of the outputs to allow the FPGA to set it up with a faster/alternative clock frequency.

Those devices can have a configuration pre-programmed in their NVM (non-volatile-memory) to output needed frequencies at boot time. You can generate custom configuration with Skyworks' tool 'ClockBuilder Pro'. I don't know a way to 'decode' the order code Bxxxxx into a configuration, though (as there's some pre-programmed device in stock at e.g. mouser).

BrianHG · « **Reply #128 on:** January 15, 2023, 10:49:51 am »

The Spartan7 has a fractional divider PLL, you do not need an external PLL generator. There may be a rare circumstance if you want to generate huge fraction low phase reference frequency to create every single 1-10hz step reference clocks, like some of those old multiple variable VGA modes, but for the basic 1080p through 480p plus a few old generic VGA 60hz / 72hz, the Spartan7 should be able to do it all internally. Though, it does mean learning to setup a reconfigurable PLL for the Spartan.

asmi · « **Reply #129 on:** January 16, 2023, 02:04:08 pm »

Quote from: BrianHG on January 15, 2023, 10:49:51 am

The Spartan7 has a fractional divider PLL, you do not need an external PLL generator. There may be a rare circumstance if you want to generate huge fraction low phase reference frequency to create every single 1-10hz step reference clocks, like some of those old multiple variable VGA modes, but for the basic 1080p through 480p plus a few old generic VGA 60hz / 72hz, the Spartan7 should be able to do it all internally. Though, it does mean learning to setup a reconfigurable PLL for the Spartan.

Why do you keep talking about Spartan-7? That's not the device we're going to use.

And we still don't seem to have a decision regarding the kind of memory we're going to implement...

nockieboy · « **Reply #130 on:** January 18, 2023, 10:21:01 pm »

Quote from: asmi on January 16, 2023, 02:04:08 pm

Quote from: BrianHG on January 15, 2023, 10:49:51 am
The Spartan7 has a fractional divider PLL, you do not need an external PLL generator. There may be a rare circumstance if you want to generate huge fraction low phase reference frequency to create every single 1-10hz step reference clocks, like some of those old multiple variable VGA modes, but for the basic 1080p through 480p plus a few old generic VGA 60hz / 72hz, the Spartan7 should be able to do it all internally. Though, it does mean learning to setup a reconfigurable PLL for the Spartan.
Why do you keep talking about Spartan-7? That's not the device we're going to use.

And we still don't seem to have a decision regarding the kind of memory we're going to implement...

We're using the Artix-7, not the Spartan - specifically, the XC7A100T-2FGG484.

As far as the memory discussion is going, unless I hear otherwise, I'm going to stick with two of these: MT41K256M16TW-107:P. If anyone has a compelling reason for me to use some other setup or parts, I'll be glad to hear it!

asmi · « **Reply #131 on:** January 19, 2023, 01:01:50 am »

Quote from: nockieboy on January 18, 2023, 10:21:01 pm

As far as the memory discussion is going, unless I hear otherwise, I'm going to stick with two of these: MT41K256M16TW-107:P. If anyone has a compelling reason for me to use some other setup or parts, I'll be glad to hear it!

That's what I assumed too, but BrianHG made it sound like 32 bit interface is not going to be enough, and he would rather have 64 bit one...

nockieboy · « **Reply #132 on:** January 19, 2023, 09:07:59 am »

Quote from: asmi on January 19, 2023, 01:01:50 am

Quote from: nockieboy on January 18, 2023, 10:21:01 pm
As far as the memory discussion is going, unless I hear otherwise, I'm going to stick with two of these: MT41K256M16TW-107:P. If anyone has a compelling reason for me to use some other setup or parts, I'll be glad to hear it!
That's what I assumed too, but BrianHG made it sound like 32 bit interface is not going to be enough, and he would rather have 64 bit one...

This is probably a silly question, so put it down to lack of electronics experience on my part; the switch from 32-bit to 64-bit would just require doubling the number of DDR3 chips and data signal traces, right? And obviously a commensurate massive increase in routing complexity on the PCB? Length-matching all those data lines is going to be no mean feat? How do they do it on graphics cards where you have banks of memory chips and they're not all the same distance from the GPU chip? They must be very proficient at hiding cm's of snaking traces to match line length?

In terms of project progress, I've spent the last week trying to get through the most evil cold I've ever caught (and I'm still in the middle of it) - it's not Covid, but it may as well be. Once I'm feeling better I need to get started on design and simulation of the memory - that's my next step - so all this discussion of memory bandwidth and interface types is timely and relevant.

BrianHG · « **Reply #133 on:** January 19, 2023, 09:30:39 am »

Quote from: nockieboy on January 19, 2023, 09:07:59 am

This is probably a silly question, so put it down to lack of electronics experience on my part; the switch from 32-bit to 64-bit would just require doubling the number of DDR3 chips and data signal traces, right?

The other reason I said to use a SODIMM module is that you do not have to worry about PCB routing the 4 DDR3 chips together. It is done on each SODIMM module. All you need to concern yourself is with the routing and length matching of the FPGA to the SODIMM's connector. Making your own 2 DDR3 ram chips on PCB isn't too bad. 4 of them, well, let the ones who made the SODIMM module worry about the chip-chip control and address lines shared between the 4 DDR3 chips.

Also, no BGA mounting for the 4 DDDR3 chips. And if you ever get a fast oscilloscope, you can probe the SODIMM connector.

nockieboy · « **Reply #134 on:** January 19, 2023, 09:48:14 am »

Quote from: BrianHG on January 19, 2023, 09:30:39 am

The other reason I said to use a SODIMM module is that you do not have to worry about PCB routing the 4 DDR3 chips together. It is done on each SODIMM module. All you need to concern yourself is with the routing and length matching of the FPGA to the SODIMM's connector. Making your own 2 DDR3 ram chips on PCB isn't too bad. 4 of them, well, let the ones who made the SODIMM module worry about the chip-chip control and address lines shared between the 4 DDR3 chips.

Also, no BGA mounting for the 4 DDDR3 chips. And if you ever get a fast oscilloscope, you can probe the SODIMM connector.

Well, you're certainly making a good case for SODIMM. Ease of construction is a big draw, plus flexibility in choosing whatever size memory stick you need for your project. The size of the connector is an issue though, and I believe the biggest problem is that it will literally eat up the vast majority of IOs on the FPGA, to the point where I've seen forum posts about how to connect SODIMMs to Artix-7s and work around the configuration bank-sharing issue. Apparently it IS possible, but the lack of remaining IOs may be a deal breaker. I'll take a closer look at SODIMM just to give it a fair chance, though.

EDIT: I've just inserted a SODIMM-204 connector into the PCB design (it's not designed yet, just a rough idea of sizing and initial component layout) and it's actually not much wider than the mezzanine connectors we're using to attach the core board to the carrier. Might need some fine tuning so that the memory stick itself sits within the outline of the core board, but that's not essential. So realistically, the only negatives for SODIMM are its IO resource demands. I'll look at that in more detail later.

BrianHG · « **Reply #135 on:** January 19, 2023, 10:28:27 am »

Quote from: nockieboy on January 19, 2023, 09:48:14 am

EDIT: I've just inserted a SODIMM-204 connector into the PCB design (it's not designed yet, just a rough idea of sizing and initial component layout) and it's actually not much wider than the mezzanine connectors we're using to attach the core board to the carrier. Might need some fine tuning so that the memory stick itself sits within the outline of the core board, but that's not essential. So realistically, the only negatives for SODIMM are its IO resource demands. I'll look at that in more detail later.

See if vertical SODIMM connectors exist. IE: The memory stick mounts vertically like in a home PC motherboard.

nockieboy · « **Reply #136 on:** January 19, 2023, 12:34:25 pm »

Okay, so my next question is probably more for asmi as it's Vivado MIG-related, but relates to SODIMM selection.

At the moment I'm just using Mouser to get an idea of compatible parts, but it seems that none of the SODIMMs are actually what I'd class as cheap. I don't need gigabytes of memory, one would be more than enough. Prices on Mouser soon hit £70-£90 and beyond for SODIMMs.

I'd (naively, perhaps) assumed that pretty much any SODIMM could be slotted-in and used - refurbished 1GB SODIMMS on eBay, for example, retail for less than the price of a coffee. The Vivado MIG is still asking for specific memory parts - how much does the part selection affect the resulting memory controller that's produced by the MIG? Is it really limited to working with just one part number of DDR3 chip? Does this specificity affect sourcing SODIMMs or am I worrying too much about it and should just select any option for the memory part?

BrianHG · « **Reply #137 on:** January 19, 2023, 12:48:52 pm »

Search on Amazon.com or similar:
'Laptop DDR3 ram'

2 modules for 19$, and they are dual-rank. IE: 16 ram chips in total for 20$.
Name brands like Samsung go for 24$ for 2 modules, but only single rank 4GB each instead of dual rank 8gb each.
You do not need Dual Rank. 12$ for 4gb of Samsung memory, well you get 2 for 24$ is good as 3$ per ram chip.

Basically go for the name brand new Samsung or Micron.
Do not buy ram modules at Digikey or Mouser. Now-a-days, they are a commodity item sold anywhere. Even at my local grocery store and pharmacy.

BrianHG · « **Reply #138 on:** January 19, 2023, 12:53:40 pm »

https://www.amazon.com/s?k=laptop+ddr3&i=electronics&rh=n%3A172500%2Cp_n_feature_four_browse-bin%3A2253866011%2Cp_n_feature_seven_browse-bin%3A24084330011%2Cp_36%3A-3500%2Cp_n_feature_browse-bin%3A23964008011&dc&crid=1GZWPTZK89GQ4&qid=1674132688&rnid=23963802011&sprefix=laptop+ddr3%2Caps%2C217&ref=sr_nr_p_n_feature_browse-bin_2&ds=v1%3AJVmM%2Fwvl3D6LVCoRXkt%2BA28d5A%2F6AxNHFEVX6okd9jo

Example good modules:
https://www.amazon.com/Samsung-memory-PC3-12800-1600MHz-Macbook/dp/B078HXSRXS/ref=sr_1_13?crid=1GZWPTZK89GQ4&keywords=laptop+ddr3&qid=1674133067&refinements=p_n_feature_four_browse-bin%3A2253866011%2Cp_n_feature_seven_browse-bin%3A24084330011%2Cp_36%3A-3500%2Cp_n_feature_browse-bin%3A23964008011&rnid=23963802011&s=pc&sprefix=laptop+ddr3%2Caps%2C217&sr=1-13

BrianHG · « **Reply #139 on:** January 19, 2023, 01:02:12 pm »

There is also Newegg...

nockieboy · « **Reply #140 on:** January 19, 2023, 01:53:57 pm »

Yes there's loads of sources to get SODIMMs from, I'm just hoping that the MIG in Vivado is able to create a memory controller that can handle the variety. The MIG UI seems to require a very specific chip type to be input as part of the controller setup.

BrianHG · « **Reply #141 on:** January 19, 2023, 02:04:17 pm »

Use my example 'Good Module' from Amazon. In the photo, they give you a Samsung part number. Unless you go with the slowest possible DDR3 memory, all other single rank 4gb modules should work. Changing the module to a smaller one should only reconfigure your controller setup and possibly make the upper address lines tied to GND.

nockieboy · « **Reply #142 on:** January 19, 2023, 02:27:08 pm »

Quote from: BrianHG on January 19, 2023, 02:04:17 pm

Use my example 'Good Module' from Amazon. In the photo, they give you a Samsung part number. Unless you go with the slowest possible DDR3 memory, all other single rank 4gb modules should work. Changing the module to a smaller one should only reconfigure your controller setup and possibly make the upper address lines tied to GND.

How do they work it in laptops and desktop computers that use SODIMM and DIMM modules with wildly varying capacities? I have four 8GB sticks in my desktop, but could just as easily have used two 16GB sticks, or four 4GB sticks or whatever. There must be some 'on-the-fly' configuration of the memory interface done by the BIOS?

EDIT: That 'Good Module' part number isn't listed in the MIG, so I'll have to create a custom part for it. My question for asmi (or anyone else in the know) really was that if I do that, and I (or someone else) uses a different SODIMM with different part-numbered chips on it, will that break the interface or require adjustment of the Verilog and a recompile?

EDIT2: The M471B5173QH0-YK0 chips used on that 'Good Module' part are DDR3L, if that makes a difference. At least with the datasheet, I can make an effort to create a custom part.

BrianHG · « **Reply #143 on:** January 19, 2023, 04:05:12 pm »

Try this part number: MT8KTF51264HZ-1G6E1

https://www.ebay.ca/p/215839569

It should match the Samsung module.
I wonder if these modules are just too fast to have an entry in your memory controller.
IE: if we just use the same module, but with a suffix for 1333Mhz, or 1066Mhz.

Also try:
MT8KTF51264HZ-1G9 1866MHz...

asmi · « **Reply #144 on:** January 19, 2023, 04:38:24 pm »

OK, so here is "everything you always wanted to know about memory but was afraid to ask":
1. DIMM modules contain a small EEPROM non-volatile memory which is called SPD (for "Serial Presense Detect", a name is a bit of a historical misnomer because there used to be a PPD - Parallel Presense Detect), which contain information about the module - supported mode, timings, voltage, etc. It uses an SMBus protocol, which is a variant of I2C bus. This is why DIMM requires 3.3 V power in addition to regular DDRx power like 1.5 V - that EEPROM device is powered by 3.3 V.
2. PCs and high-end SoCs read the SPD during early startup (when BIOS/UEFI is executing), and reconfigure memory controller to these specifications. Starting from DDR3, it includes a phase of "memory training", which adjust signal timings to ensure the best signal quality. However since that memory training can take a while (in case of DDR5 it can be minutes!), PCs typically save connected modules' "fingerprints" along with trained delay adjustments in the battery-backed memory - CMOS, this way after a power cycle they can check if connected DIMMs are still the same, and if so - they simply load and apply saved parameters, which is orders of magnitude faster than doing the training.
3. In contrast to PCs, FPGA systems typically don't presume changing DIMMs, and so they don't read SPD because it would be a waste of FPGA resources, instead all nessesary timings and delays are built into the controller HDL when it's configured for specific DIMM module.
4. As a result of (3), whenever you swap a module for another one, you will need to reconfigure your controller and generate a new bitstream, which would contain parameters of that new module.
5. The problem with modules you can buy in computer parts stores is that it's typically impossible to get your hands on datasheets for modules, and for memory devices used on that module, which makes it hard to figure out what timing parameters to use with an FPGA memory controller. Exception to this is Crucial, as it's wholly-owned by Micron and so all their modules use Micron's memory devices, datasheets for which are publicly available on a Micron's website. Micron itself also produces memory modules, but they are typically on a pricey side, though they do have a good reputation on a market.

Now onto specifics of our project:
1. Because of limitations on a MIG pinout for the part we have chosen, we can only use banks 16, 15 and 14 to implement a 64 bit memory controller.
2. However a byte group 0 of the bank 14 also contains pins used by FPGA during configuration - specifically pins D0-D3, and a chip select (CS) of a QSPI flash memory, which is where a bitstream is stored. And since flash memory which can be powered by 1.5 V does not exist, we will have to use a 1.8 V QSPI flash device and a voltage translators to convert between 1.5 V and 1.8 V. We can not use 3.3 V QSPI flash in this case.
3. As memory interface will consume pretty much the entirety of banks 14, 15 and 16, we will only have about 130 pins available from banks 34, 35 and partially-bonded out 13 for everything else. That is not a lot of pins.
4. Due to the large size of a resulting board (SODIMM is rather long, and needs to be placed far enough from FPGA to ensure sufficient clearance for a heatsink and a fan), and not many IO pins remaining available, I'm not really convinced that it's worth making it a module, as opposed to adding all peripherals with their interface connectors on that board, and only having a low-ish speed connector for other peripherals via regular 0.1" headers. That is something that we need to weigh against the costs of making a large baseboard/carrier which would accomodate such large module and some high-speed interfaces.
5. I have never personally implemented such a scheme with SODIMM and voltage translators, so there is an increased design risk that something can go wrong. I'm not saying it will, but I can't be 100% sure due to the lack of personal experience.

nockieboy · « **Reply #145 on:** January 19, 2023, 05:28:16 pm »

Quote from: BrianHG on January 19, 2023, 04:05:12 pm

Try this part number: MT8KTF51264HZ-1G6E1

https://www.ebay.ca/p/215839569

It should match the Samsung module.
I wonder if these modules are just too fast to have an entry in your memory controller.
IE: if we just use the same module, but with a suffix for 1333Mhz, or 1066Mhz.

Also try:
MT8KTF51264HZ-1G9 1866MHz...

Both those chips are in the MIG controller setup and it would appear that creating a custom entry with timings from a datasheet is also pretty easy to do.

Quote from: asmi on January 19, 2023, 04:38:24 pm

OK, so here is "everything you always wanted to know about memory but was afraid to ask":

Thanks for this asmi.

That makes a lot of sense having an I2C (or similar) chip on the SODIMM to tell the host what the memory timings are. Also explains the I2C connections on the SODIMM connector.

Quote from: asmi on January 19, 2023, 04:38:24 pm

Now onto specifics of our project:
1. Because of limitations on a MIG pinout for the part we have chosen, we can only use banks 16, 15 and 14 to implement a 64 bit memory controller.
2. However a byte group 0 of the bank 14 also contains pins used by FPGA during configuration - specifically pins D0-D3, and a chip select (CS) of a QSPI flash memory, which is where a bitstream is stored. And since flash memory which can be powered by 1.5 V does not exist, we will have to use a 1.8 V QSPI flash device and a voltage translators to convert between 1.5 V and 1.8 V. We can not use 3.3 V QSPI flash in this case.

Yes, that's a kick in the teeth. $:-\$

Quote from: asmi on January 19, 2023, 04:38:24 pm

3. As memory interface will consume pretty much the entirety of banks 14, 15 and 16, we will only have about 130 pins available from banks 34, 35 and partially-bonded out 13 for everything else. That is not a lot of pins.

Depends what you want to use on the carrier board. My primary use-case and purpose for creating this board is to use it as a graphics card for a host computer, so 130 pins available for IO will just about do the job, I think. 70 for the host itself will still leave around 60 for other peripherals, such as USB, HDMI, audio codec, etc.

I could always create a core card without SODIMM for those that want something less memory-bandwidth and more IO orientated.

Quote from: asmi on January 19, 2023, 04:38:24 pm

4. Due to the large size of a resulting board (SODIMM is rather long, and needs to be placed far enough from FPGA to ensure sufficient clearance for a heatsink and a fan), and not many IO pins remaining available, I'm not really convinced that it's worth making it a module, as opposed to adding all peripherals with their interface connectors on that board, and only having a low-ish speed connector for other peripherals via regular 0.1" headers. That is something that we need to weigh against the costs of making a large baseboard/carrier which would accomodate such large module and some high-speed interfaces.
5. I have never personally implemented such a scheme with SODIMM and voltage translators, so there is an increased design risk that something can go wrong. I'm not saying it will, but I can't be 100% sure due to the lack of personal experience.

After looking at the PCB initially, it doesn't appear that it will make a lot of difference to the size of the core card. I haven't considered mounting points or clearance for a heatsink or fan yet, but a vertical SODIMM connector (or even a right-angle one if you're not worried about the SODIMM extending past the edge of the core card) shouldn't affect the layout too badly.

For me the biggest concern right now are making sure there's enough IO to go around. SODIMM pros and cons:

Pros:
1. 64-bit interface instead of 32-bit with the alternative.
2. Ready-made cheap packages.
3. Simple connector to solder. No extra BGAs to solder and termination and decoupling is a lot simpler.

Cons:
1. Say buh-bye to lots of IO.
2. Increased complexity in config circuit with level translators required for QSPI chip.
3. SODIMM's a bit on the big side.

The primary con (#1 - loss of IO) is negated primarily because you're trading IO for that 64-bit wide interface. I don't know how #2 will bite until I look into that specific issue in more detail, and #3 is only of limited impact as the core card is already of a size to fit its essential components and the mezzanine connectors, which means it either doesn't need increasing in size at all, or only by a small margin, to fit the SODIMM socket.

I doubt I'll every be running Quake or Crysis on the thing using my Z80 system, but the further down this project I'm going, the more I'm considering using a soft-core processor to replace the 'host system' entirely, as I'm really enjoying learning about FPGAs and their flexibility; far more so than designing and putting together PCBs based on old/ancient technology from the 80's and 90's.

nockieboy · « **Reply #146 on:** January 19, 2023, 08:58:12 pm »

@asmi - what settings should I be specifying here? I presume the Memory Options remain at default settings:

What about System Clock and Reference Clock? Are these both single-ended or should I leave them set to 'Differential'?

Internal Termination for High Range Banks - guessing 50 Ohms is okay?

Here's the Bank Selection:

And finally System Signals Selection. Not sure what do with Reference Clock Pin Selection - I can't select 'Use System Clock' for it and am unsure of the best selection if I'm going to run a separate reference clock in. Presumably all the System Signals will run from the easiest-to-access pins in the same bank as the control signals?

asmi · « **Reply #147 on:** January 19, 2023, 09:18:06 pm »

Quote from: nockieboy on January 19, 2023, 05:28:16 pm

After looking at the PCB initially, it doesn't appear that it will make a lot of difference to the size of the core card. I haven't considered mounting points or clearance for a heatsink or fan yet, but a vertical SODIMM connector (or even a right-angle one if you're not worried about the SODIMM extending past the edge of the core card) shouldn't affect the layout too badly.

Take a look at banks layout in attachment. As you can see, SODIMM will need to be placed vertically along the right side of FPGA (because that's where banks 14, 15 and 16 are). Now let's figure out where board-to-board connectors are going to be. We have banks 34, 35 (on the left side), bank 13 (on the bottom), and the MGT bank 216 (on the top). So where would you place those connectors? The logical place seems to be along the top and bottom sides. But in this case right sides of those connectors might have trouble finding space because DDR3 layout typically takes up a lot of space. It also means that the board needs to be at least ~75 mm tall to accomodate SODIMM connector (it's footprint is 75x36 mm with memory module connected and secured), and at least 53 mm wide to accomodate board-to-board connectors, in reality likely significantly wider than that so as for connectors to not interfere with DDR3 layout. Leaving SODIMM hanging off the edge is a bad idea mechanically. With added margins to both sides of SODIMM connector (you will probably have some traces there as well), the board size is going to encroach onto the "magic" size of 10x10 cm, which is the point at which the cost of manufacturing will start raising quite a bit. This also increases the size of a carrier board, which will need to have at least 4 layers because of impedance requirements for high-speed lines, so manufacturing cost for it is going to be significant (even if it's going to be a one-time investment as they can be reused).

Quote from: nockieboy on January 19, 2023, 05:28:16 pm

I doubt I'll every be running Quake or Crysis on the thing using my Z80 system, but the further down this project I'm going, the more I'm considering using a soft-core processor to replace the 'host system' entirely, as I'm really enjoying learning about FPGAs and their flexibility; far more so than designing and putting together PCBs based on old/ancient technology from the 80's and 90's.

That is a long overdue in my opinion

Besides, you can design your own softcore CPU for lots of additional fun! Even if you only manage to make it run at "pedestrian" 50 MHz, that is still head and shoulders above what Z80 can do.

asmi · « **Reply #148 on:** January 19, 2023, 09:45:12 pm »

Please select the "SODIMMs" -> "MT16KTF1G64HZ-1G6" as you module of choice. This is an 8GB dual-rank module, which will be pretty much the maximum possible configuration, and we should do a layout for that. As smaller modules are backward-compatible, you can use pretty much any other 64bit non-ECC module by just updating MIG configuration for a specific one you happen to have, and layout will still work.

Quote from: nockieboy on January 19, 2023, 08:58:12 pm

@asmi - what settings should I be specifying here? I presume the Memory Options remain at default settings:

Nope - select "5000 ps (200 MHz)" option.

Quote from: nockieboy on January 19, 2023, 08:58:12 pm

What about System Clock and Reference Clock? Are these both single-ended or should I leave them set to 'Differential'?

Leave "differential" for the system clock, and select "Use System Clock" option for the reference clock - this option will appear once you set Input Clock Period to 200 MHz on a previous page.

Quote from: nockieboy on January 19, 2023, 08:58:12 pm

Internal Termination for High Range Banks - guessing 50 Ohms is okay?

Yes it is.

Quote from: nockieboy on January 19, 2023, 08:58:12 pm

Here's the Bank Selection:

That's a good initial selection. We might need to swap byte groups around later for layout reasons.

Quote from: nockieboy on January 19, 2023, 08:58:12 pm

And finally System Signals Selection. Not sure what do with Reference Clock Pin Selection - I can't select 'Use System Clock' for it and am unsure of the best selection if I'm going to run a separate reference clock in. Presumably all the System Signals will run from the easiest-to-access pins in the same bank as the control signals?

If you do everything exactly as I said above, you will have only one option for the System Clock Pin Selection in bank 15 - K18/K19, which is what you should select, and the "Reference Clock Pin Selection" controls will be greyed out, so you don't have to select anything there.

nockieboy · « **Reply #149 on:** January 20, 2023, 01:49:28 pm »

Quote from: asmi on January 19, 2023, 09:18:06 pm

Take a look at banks layout in attachment. As you can see, SODIMM will need to be placed vertically along the right side of FPGA (because that's where banks 14, 15 and 16 are). Now let's figure out where board-to-board connectors are going to be. We have banks 34, 35 (on the left side), bank 13 (on the bottom), and the MGT bank 216 (on the top). So where would you place those connectors? The logical place seems to be along the top and bottom sides. But in this case right sides of those connectors might have trouble finding space because DDR3 layout typically takes up a lot of space. It also means that the board needs to be at least ~75 mm tall to accomodate SODIMM connector (it's footprint is 75x36 mm with memory module connected and secured), and at least 53 mm wide to accomodate board-to-board connectors, in reality likely significantly wider than that so as for connectors to not interfere with DDR3 layout. Leaving SODIMM hanging off the edge is a bad idea mechanically. With added margins to both sides of SODIMM connector (you will probably have some traces there as well), the board size is going to encroach onto the "magic" size of 10x10 cm, which is the point at which the cost of manufacturing will start raising quite a bit. This also increases the size of a carrier board, which will need to have at least 4 layers because of impedance requirements for high-speed lines, so manufacturing cost for it is going to be significant (even if it's going to be a one-time investment as they can be reused).

Okay, so mezzanine connectors top and bottom, SODIMM socket to the right. I'm thinking a vertical SODIMM is best due to space constraints. Until I know the final BOM for the core card and start routing it all on the PCB, it's hard to put an exact estimate on the finished PCB's size, but there's no parts-based reason why the core card can't be less than 100x100mm - in fact, 90x70mm is looking generous at this very early stage.

Quote from: asmi on January 19, 2023, 09:18:06 pm

Quote from: nockieboy on January 19, 2023, 05:28:16 pm
I doubt I'll every be running Quake or Crysis on the thing using my Z80 system, but the further down this project I'm going, the more I'm considering using a soft-core processor to replace the 'host system' entirely, as I'm really enjoying learning about FPGAs and their flexibility; far more so than designing and putting together PCBs based on old/ancient technology from the 80's and 90's.
That is a long overdue in my opinion
Besides, you can design your own softcore CPU for lots of additional fun! Even if you only manage to make it run at "pedestrian" 50 MHz, that is still head and shoulders above what Z80 can do.

Indeed. Well, it's all this talk from BrianHG about what we could do with a 16-bit (or better) CPU at the helm. I've had a very enjoyable stroll down memory lane designing, building and even teaching myself assembly for my Z80 DIY computer, but it seems this GPU project has taken on a life of its own, with an awful lot of potential. My biggest concern at the moment is that its capabilities are already outstripping my programming skills - and certainly my free time - to do it justice. So switching to a more generic 'development board' PCB and eliminating the need for specialist hardware (my uCOM Z80 host) means anyone could create a board, download the project and have a working games machine of some description.

Quote from: asmi on January 19, 2023, 09:45:12 pm

Nope - select "5000 ps (200 MHz)" option.

Hmm - I can't. The highest value allowed is 3,300 ps. If I switch to DDR2 SDRAM in the memory selection, it will allow a value of 5,000 ps, but I thought we were looking at DDR3 SODIMMs?

asmi · « **Reply #150 on:** January 20, 2023, 03:12:40 pm »

Quote from: nockieboy on January 20, 2023, 01:49:28 pm

Hmm - I can't. The highest value allowed is 3,300 ps. If I switch to DDR2 SDRAM in the memory selection, it will allow a value of 5,000 ps, but I thought we were looking at DDR3 SODIMMs?

You mixed up two screens - the one which sets the memory frequency and selects a memory device (title is "Options for Controller 0 - DDR3 SDRAM"), with the one which sets the input clock frequency (title is "Memory Options C0 - DDR3 SDRAM"). On the former you set 2500 ps (400 MHz), on the latter - 5000 ps (200 MHz). See attached screenshots.

Also, I just realized that there are Micron's MT16KTF1G64HZ-1G6E1 8GB SODIMMs on Amazon for like $20. Tempted to order one. Or two

nockieboy · « **Reply #151 on:** January 20, 2023, 11:04:25 pm »

That seems to have worked. Vivado completed its synthesis run on the SODIMM module, with these settings:

Code: [Select]

Vivado Project Options:
   Target Device                   : xc7a100t-fgg484
   Speed Grade                     : -2
   HDL                             : verilog
   Synthesis Tool                  : VIVADO

If any of the above options are incorrect,   please click on "Cancel", change the CORE Generator Project Options, and restart MIG.

MIG Output Options:
   Module Name                     : mig_7series_SODIMM
   No of Controllers               : 1
   Selected Compatible Device(s)   : xc7a35t-fgg484, xc7a50t-fgg484, xc7a75t-fgg484, xc7a15t-fgg484

FPGA Options:
   System Clock Type               : Differential
   Reference Clock Type            : Use System Clock
   Debug Port                      : OFF
   Internal Vref                   : disabled
   IO Power Reduction              : ON
   XADC instantiation in MIG       : Enabled

Extended FPGA Options:
   DCI for DQ,DQS/DQS#,DM          : enabled
   Internal Termination (HR Banks) : 50 Ohms

/*******************************************************/
/*                  Controller 0                       */
/*******************************************************/

Controller Options :
   Memory                        : DDR3_SDRAM
   Interface                     : NATIVE
   Design Clock Frequency        : 2500 ps (400.00 MHz)
   Phy to Controller Clock Ratio : 4:1
   Input Clock Period            : 4999 ps
   CLKFBOUT_MULT (PLL)           : 4
   DIVCLK_DIVIDE (PLL)           : 1
   VCC_AUX IO                    : 1.8V
   Memory Type                   : SODIMMs
   Memory Part                   : MT16KTF1G64HZ-1G6
   Equivalent Part(s)            : --
   Data Width                    : 64
   ECC                           : Disabled
   Data Mask                     : enabled
   ORDERING                      : Strict

AXI Parameters :
   Data Width                    : 512
   Arbitration Scheme            : RD_PRI_REG
   Narrow Burst Support          : 0
   ID Width                      : 4

Memory Options:
   Burst Length (MR0[1:0])          : 8 - Fixed
   Read Burst Type (MR0[3])         : Sequential
   CAS Latency (MR0[6:4])           : 6
   Output Drive Strength (MR1[5,1]) : RZQ/7
   Rtt_NOM - ODT (MR1[9,6,2])       : RZQ/4
   Rtt_WR - Dynamic ODT (MR2[10:9]) : Dynamic ODT off
   Memory Address Mapping           : BANK_ROW_COLUMN

Bank Selections:
	Bank: 14
		Byte Group T1:	DQ[40-47]
		Byte Group T2:	DQ[48-55]
		Byte Group T3:	DQ[56-63]
	Bank: 15
		Byte Group T0:	Address/Ctrl-0
		Byte Group T1:	Address/Ctrl-1
		Byte Group T2:	Address/Ctrl-2
		Byte Group T3:	DQ[32-39]
	Bank: 16
		Byte Group T0:	DQ[0-7]
		Byte Group T1:	DQ[8-15]
		Byte Group T2:	DQ[16-23]
		Byte Group T3:	DQ[24-31]

System_Clock: 
	SignalName: sys_clk_p/n
		PadLocation: K18/K19(CC_P/N)  Bank: 15

System_Control: 
	SignalName: sys_rst
		PadLocation: No connect  Bank: Select Bank
	SignalName: init_calib_complete
		PadLocation: No connect  Bank: Select Bank
	SignalName: tg_compare_error
		PadLocation: No connect  Bank: Select Bank

Hopefully that's all good.

Time to start working on learning more Vivado and also how to start porting BrianHG's multi-port adaptor across - or is there anything else I should be thinking about before that?

asmi · « **Reply #152 on:** January 21, 2023, 07:15:40 am »

Quote from: nockieboy on January 20, 2023, 01:49:28 pm

Indeed. Well, it's all this talk from BrianHG about what we could do with a 16-bit (or better) CPU at the helm. I've had a very enjoyable stroll down memory lane designing, building and even teaching myself assembly for my Z80 DIY computer, but it seems this GPU project has taken on a life of its own, with an awful lot of potential. My biggest concern at the moment is that its capabilities are already outstripping my programming skills - and certainly my free time - to do it justice. So switching to a more generic 'development board' PCB and eliminating the need for specialist hardware (my uCOM Z80 host) means anyone could create a board, download the project and have a working games machine of some description.

Well with 8GB SODIMM you will need to design a 64bit CPU so that you can address that much RAM and still have some address space for memory mapped I/O - unless you want to go the way Intel went with PAE and similar hacks to get around 4GB address space limit on a 32 bit system.
Another advantage of using some sort of softcore is that if you pick some well-known architecture (like RISC-V), you can take advantage of existing gcc toolchain and write code in C as opposed to assembly, which will speed up development tremendously. Also this will allow others to join in because they can actually replicate a hardware setup on their own - not everyone wants to mess with all that ancient stuff like Z80. I for one prefer more-or-less modern tech (maybe because I was too young when the likes of Z80 reigned supreme), and since I happen to be a professional software developer (with hardware being a side hustle of mine which began as a hobby and passion from my university days), using C/C++ will make me that much more likely to be willing to invest what little spare time I have into contributing to the project.

Quote from: nockieboy on January 20, 2023, 11:04:25 pm

Hopefully that's all good.

Looks good to me.

Quote from: nockieboy on January 20, 2023, 11:04:25 pm

Time to start working on learning more Vivado and also how to start porting BrianHG's multi-port adaptor across - or is there anything else I should be thinking about before that?

I would create a simple testbench just to practice talking to controller first. I can throw together a quick one for you to help you get started in a day or two if you don't mind waiting a bit. Once you feel confident enough with the interface, I would implement a simple simulation-only BFM (Bus Functional Model) of controller to speed up simulation. That's how I typically do sims - replace modules not relevant to component I'm focusing on with their simplified models, and only do a full-up high fidelity simulation as the final step before moving onto the real hardware.

nockieboy · « **Reply #153 on:** January 21, 2023, 09:09:20 am »

Quote from: asmi on January 21, 2023, 07:15:40 am

Well with 8GB SODIMM you will need to design a 64bit CPU so that you can address that much RAM and still have some address space for memory mapped I/O - unless you want to go the way Intel went with PAE and similar hacks to get around 4GB address space limit on a 32 bit system.

For an 8-bit processor like the Z80, it's phenomenal cosmic power...

... and itty-bitty 64KB memory space is easily expanded to 4MB in my uCOM with a very simple MMU.

I presume you mean Physical Address Extension, and not Prostate Artery Embolisation which was my first hit on Google I when went to find out what it was?

That sounds like an MMU to me.

Quote from: asmi on January 21, 2023, 07:15:40 am

Another advantage of using some sort of softcore is that if you pick some well-known architecture (like RISC-V), you can take advantage of existing gcc toolchain and write code in C as opposed to assembly, which will speed up development tremendously. Also this will allow others to join in because they can actually replicate a hardware setup on their own - not everyone wants to mess with all that ancient stuff like Z80.

Exactly, and that's another very strong reason for me to consider moving on. Plus, I've realised I just don't have the time to design a 16-bit computer and write the bootstrap (or should I say Kickstart?

) ROM software for it.

And yes, 'ancient stuff like Z80', well that was for me and me alone. When I started the project, I wanted to go back and really understand something that, as a kid, was a bit of a black-box for me and I literally just wanted to see if I could make a working Z80 computer on breadboard. Now I'm feeling the real learning (and potential) is in the FPGA itself. Designing the board for this device is going to take me waaaaay further than anything I'd have made for a 16-bit system.

And opening the project up so that any other interested parties can get involved with a minimum of hurdles to jump is a bonus. It's also been on my mind for the last few days regarding the design. What am I actually looking to make, here?

Originally, I wanted a core board that I can plug into a carrier that will act as an interface to a host system (i.e. my uCOM, or anything else) and be sufficiently generic to allow any variety of carrier boards to be used with it. I'd design a carrier board to fit my uCOM, with HDMI and USB ports, port the GPU HDL and I'd be more than happy. However, now there's a major conceptual change brewing in the design. I have, after all, met all my stated objectives with the uCOM by plugging a DECA card into the stack and using the GPU HDL as it stands. There's nowhere really to go with that project, other than designing a dedicated card for it (which is what the core/carrier combo of this new project would be). But it's not for general release and so what's the point?

However, if I'm going to dispense with the hardware host and move that into the FPGA itself, that releases a LOT of constraints. For one, it frees up around 60 I/O on the FPGA, as I won't need full Address, Data and Control bus access via the carrier card. The majority will be internal to the core card.

It also raises the question of the core/carrier combination entirely. Is it needed? The flexibility of having a carrier was so that I could have a very niche, one-off card for my uCOM and everyone else could do what they wanted. If I don't need that niche one-off carrier anymore, why not just design a single board large enough to accommodate the SODIMM and optional FPGA-cooling comfortably, as the remainder of the peripherals for my implementation of an 8-, 16- or more bit computer will be sufficiently generic that they'll be useful to anyone else anyway (i.e. HDMI port, USB port, SD card).

Quote from: asmi on January 21, 2023, 07:15:40 am

I for one prefer more-or-less modern tech (maybe because I was too young when the likes of Z80 reigned supreme), and since I happen to be a professional software developer (with hardware being a side hustle of mine which began as a hobby and passion from my university days), using C/C++ will make me that much more likely to be willing to invest what little spare time I have into contributing to the project.

I'm certainly no professional programmer and what I do know is all self-taught. I'm a hobbyist from start to finish, hence all the silly questions and silly mistakes.

Quote from: asmi on January 21, 2023, 07:15:40 am

I would create a simple testbench just to practice talking to controller first. I can throw together a quick one for you to help you get started in a day or two if you don't mind waiting a bit. Once you feel confident enough with the interface, I would implement a simple simulation-only BFM (Bus Functional Model) of controller to speed up simulation. That's how I typically do sims - replace modules not relevant to component I'm focusing on with their simplified models, and only do a full-up high fidelity simulation as the final step before moving onto the real hardware.

Okay, that sounds great, thank you.

I'm going to think about this design in more detail. SODIMM does seem to offer more benefits than downsides, plus some huge memory capacities at cheap prices. With a SODIMM and an XC7A100T, you could just add a 32-bit soft core processor and probably install Linux on the thing...

BrianHG · « **Reply #154 on:** January 21, 2023, 09:53:23 am »

@nockieboy, it is OK to wire for 8gb as it will just add 1-2 more IOs. However, some of the 8gb sticks are dual rank (16 ram chips on a module), IE same addresses as 4gb, but using a second CKE pin so you may interleave commands to both banks to hide the RAS-CAS row setup in the first 8 ram chips while the alternate 8 ram chips are in the middle of a transfer. (Not for you to know, this is the job of the ram controller to handle behind your back, streamlines ram access if your data processing pipe is aware of the extended pipeline delay and is designed to make use of it.) Single rank 8gb sticks with 8x ddr3 ram chips also exist. Be careful with your choice of ram. Modules with 4x ddr3 chips will have the least electrical load on the command and clock lines.

nockieboy · « **Reply #155 on:** January 21, 2023, 10:14:46 am »

Quote from: BrianHG on January 21, 2023, 09:53:23 am

@nockieboy, it is OK to wire for 8gb as it will just add 1-2 more IOs. However, some of the 8gb sticks are dual rank (16 ram chips on a module), IE same addresses as 4gb, but using a second CKE pin so you may interleave commands to both banks to hide the RAS-CAS row setup in the first 8 ram chips while the alternate 8 ram chips are in the middle of a transfer. (Not for you to know, this is the job of the ram controller to handle behind your back, streamlines ram access if your data processing pipe is aware of the extended pipeline delay and is designed to make use of it.) Single rank 8gb sticks with 8x ddr3 ram chips also exist. Be careful with your choice of ram. Modules with 4x ddr3 chips will have the least electrical load on the command and clock lines.

That makes sense. So more care must be taken to select suitable sticks if I'm looking for an 8GB one.

Maybe asmi is better able to answer this, but would it be possible to read the SODIMM and adjust the MIG's memory controller to the specific SODIMM's memory chip timings? I don't know how complex or flexible the HDL generated by the MIG is (being a relative HDL simpleton in the Vivado/Xilinx environment, you understand), but if it's trivial to add that flexibility and allow someone to use a wider variety of SODIMMs before they have to edit the HDL and compile a new bitstream, that's got to be a good thing?

Even if it can't adjust for single/dual-rank sticks, but can adjust between certain families of chips on those sticks, any added flexibility like that would be a bonus.

asmi · « **Reply #156 on:** January 21, 2023, 02:57:07 pm »

Quote from: BrianHG on January 21, 2023, 09:53:23 am

@nockieboy, it is OK to wire for 8gb as it will just add 1-2 more IOs. However, some of the 8gb sticks are dual rank (16 ram chips on a module), IE same addresses as 4gb, but using a second CKE pin so you may interleave commands to both banks to hide the RAS-CAS row setup in the first 8 ram chips while the alternate 8 ram chips are in the middle of a transfer. (Not for you to know, this is the job of the ram controller to handle behind your back, streamlines ram access if your data processing pipe is aware of the extended pipeline delay and is designed to make use of it.) Single rank 8gb sticks with 8x ddr3 ram chips also exist. Be careful with your choice of ram. Modules with 4x ddr3 chips will have the least electrical load on the command and clock lines.

If you use any of modules supported by MIG, you won't have to worry about any of this. To you it's all going to look like a single contigious address space, with rank being the MSB of address, then either bank-row-column or row-bank-column (depending on the option you choose while generating a core). You also get to select a number of bank machines in a controller, with 4 being a default, but you can change it to any value between 2 and 8. This number basically means a number of rows in different banks which can be open at the same time, so depending on you access pattern, having more or less bank machines allows you to optimize controller. Same goes for transaction ordering - it's set to "Strict" by default, but you can set it to "Normal" to allow for transaction reordering for higher efficiency.

Quote from: nockieboy on January 21, 2023, 10:14:46 am

That makes sense. So more care must be taken to select suitable sticks if I'm looking for an 8GB one.

The best bet would be to pick a module from the list in the MIG. And since MT16KTF1G64HZ is only about $20, I don't see any reason to even bother with smaller modules.

Quote from: nockieboy on January 21, 2023, 10:14:46 am

Maybe asmi is better able to answer this, but would it be possible to read the SODIMM and adjust the MIG's memory controller to the specific SODIMM's memory chip timings? I don't know how complex or flexible the HDL generated by the MIG is (being a relative HDL simpleton in the Vivado/Xilinx environment, you understand), but if it's trivial to add that flexibility and allow someone to use a wider variety of SODIMMs before they have to edit the HDL and compile a new bitstream, that's got to be a good thing?

Even if it can't adjust for single/dual-rank sticks, but can adjust between certain families of chips on those sticks, any added flexibility like that would be a bonus.

No, it's absolutely NOT flexible by design, but since adapting it to a different module only involves making few changes in a GUI (as opposed to messing with HDL) and perhaps dealing with resulting trivial HDL changes (like UI address width becoming smaller if you use a module of smaller capacity), I don't really see it as a big deal. We might want to leave a possibility of reading SPD so that the software can adapt itself to different amount of RAM, that's going to involve another level shifter if we want to use few remaining pins from DDR3 banks for that.

nockieboy · « **Reply #157 on:** January 21, 2023, 10:25:34 pm »

Quote from: asmi on January 21, 2023, 02:57:07 pm

Quote from: BrianHG on January 21, 2023, 09:53:23 am
@nockieboy, it is OK to wire for 8gb as it will just add 1-2 more IOs. However, some of the 8gb sticks are dual rank (16 ram chips on a module), IE same addresses as 4gb, but using a second CKE pin so you may interleave commands to both banks to hide the RAS-CAS row setup in the first 8 ram chips while the alternate 8 ram chips are in the middle of a transfer. (Not for you to know, this is the job of the ram controller to handle behind your back, streamlines ram access if your data processing pipe is aware of the extended pipeline delay and is designed to make use of it.) Single rank 8gb sticks with 8x ddr3 ram chips also exist. Be careful with your choice of ram. Modules with 4x ddr3 chips will have the least electrical load on the command and clock lines.
If you use any of modules supported by MIG, you won't have to worry about any of this. To you it's all going to look like a single contigious address space, with rank being the MSB of address, then either bank-row-column or row-bank-column (depending on the option you choose while generating a core). You also get to select a number of bank machines in a controller, with 4 being a default, but you can change it to any value between 2 and 8. This number basically means a number of rows in different banks which can be open at the same time, so depending on you access pattern, having more or less bank machines allows you to optimize controller. Same goes for transaction ordering - it's set to "Strict" by default, but you can set it to "Normal" to allow for transaction reordering for higher efficiency.

@BrianHG - do you have thoughts/preferences on these options at all for working best with your multi-port module? I've defaulted to bank-row-column, from what asmi is saying it'll be rank-bank-row-column (don't try saying that when you're drunk!), although the rank part is dealt with by the MIG interface.

What about bank machines? More the merrier, or is 4 the sweet spot? I guess you'd want as many bank machines as you have ports (up to max 8 ), as each port could be using a different row? Depends on resource usage I guess?

What about transaction ordering? Any preference there for maximum compatibility/performance?

Quote from: asmi on January 21, 2023, 02:57:07 pm

The best bet would be to pick a module from the list in the MIG. And since MT16KTF1G64HZ is only about $20, I don't see any reason to even bother with smaller modules.

Righto.

Quote from: asmi on January 21, 2023, 02:57:07 pm

No, it's absolutely NOT flexible by design, but since adapting it to a different module only involves making few changes in a GUI (as opposed to messing with HDL) and perhaps dealing with resulting trivial HDL changes (like UI address width becoming smaller if you use a module of smaller capacity), I don't really see it as a big deal. We might want to leave a possibility of reading SPD so that the software can adapt itself to different amount of RAM, that's going to involve another level shifter if we want to use few remaining pins from DDR3 banks for that.

Yes, true, I just thought it was worth the ask in case we could make it even slightly tolerant of SODIMM variation without having to generate a new bitstream.

nockieboy · « **Reply #158 on:** January 22, 2023, 07:14:15 pm »

Whilst I'm thinking about it, what should I be looking for when I search for a commonly-used fan to cool an FPGA like the XC7A FGG484? I'm not having much luck finding FPGA coolers, or just PCB-mounting fans, in EasyEDA or Mouser, for that matter. My Google-fu is failing me on this...

I just want a PCB footprint really, so I have an idea where to include mounting holes if a fan/heatsink combo is required.

asmi · « **Reply #159 on:** January 22, 2023, 10:31:20 pm »

Quote from: nockieboy on January 22, 2023, 07:14:15 pm

Whilst I'm thinking about it, what should I be looking for when I search for a commonly-used fan to cool an FPGA like the XC7A FGG484? I'm not having much luck finding FPGA coolers, or just PCB-mounting fans, in EasyEDA or Mouser, for that matter. My Google-fu is failing me on this...

I just want a PCB footprint really, so I have an idea where to include mounting holes if a fan/heatsink combo is required.

I'm think of something like this one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-23-33-D-AB-0?qs=PqoDHHvF64%2FNvAsuJB%2Fzyw%3D%3D
Or we can use a one size up one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-27-33-D-AB-0?qs=PqoDHHvF64%2FM6UHqLS5KaQ%3D%3D

As for fans, since we only going to have a 5V rail, we'll need to use 5 V fans, and there isn't that many of them - which is why I'm thinking if it would make sense to use a larger heatsink (like 27 mm I listed above), as it would allow using 25 mm fans, of which there is a larger selection.

----
I've created a quick testbench for MIG, you the code is here: https://github.com/asmi84/a100-484-sodimm It just waits until controller completes initialization (signal "init_calib_complete" goes high), executes a bunch of writes to addresses 0, 0x8, 0x10 (to see back-to-back bursts), as well as 0x20000000 to test access to the second rank, and then performs reads from those addresses (in the same order as writes). Just load the project, click "Run Simulation" on the left panel, and wait - it takes a while, on my PC it takes about 10-11 minutes to complete. That's why you will need to come up with some kind of BFM as waiting that long is going to make development too slow and too annoying. 32bit controller took about half as long.

BrianHG · « **Reply #160 on:** January 23, 2023, 09:44:03 am »

Quote from: nockieboy on January 21, 2023, 10:25:34 pm

@BrianHG - do you have thoughts/preferences on these options at all for working best with your multi-port module? I've defaulted to bank-row-column, from what asmi is saying it'll be rank-bank-row-column (don't try saying that when you're drunk!), although the rank part is dealt with by the MIG interface.

I'm thinking that 'rank-bank-row-column' gives you the opportunity to create a 2 rank controller for when a 8gig stick is installed and it may still function with a 4gig stick except that everything above 4 gig will read and write as blank or error. It depends on Vivado's ram controller's flexibility when in operation.

This is what is meant by the 'order'

Say we have a 33bit address. The rank-bank-row-column means that the address being wired to the ram modules will be:

Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, bank[2:0] , row[14:0], column[9:0], null[3:0] } ;
(For Vivado's ram controller, just ignore the bottom 4 null bits)

The 'rank' controls the 2 separate CS# (S1#/S0# chip select on module) pins for the first 8 and second 8 ram chips wired in parallel except for the CS and CKE pins. Basically it is wired as an upper address line in the 'rank-bank-row-column' order. The bank[2:0] selects 1 of 8 banks in each ram chip. The row selects the row address. The column[9:0] selects which column to read. I placed the null[3:0] there since even though the module is a 64bit DDR/ 128bit, I still point the address down to they theoretical byte even though those address lines in the command are ignored. (Assuming Vivado's ram controller places these dummy address lines in their command like mine. Also, my controller also ignores the column[2:0] and forces 0s there as well as it's read write minimum burst size 8 (Called a BL8) always aligns the read and write in a forward order.)

Now, why would we do this. Say we went with 'rank-row-bank-column', ie:
Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, row[14:0], bank[2:0], column[9:0], null[3:0] } ;

Now, every sequential 16384 bytes of ram, we will switch to a new bank.
bytes 0-16383 is in bank 0.
bytes 16384-32767 is in bank 1.
ect... until bank 7, 131071 bytes. After bank 7, we go to row 1, bank 0,... ect...

With my selected preference rank-bank-row-column, ie:
Desired_33bit_address [32:0] = address_assigned_in_ram_chip { rank, bank[2:0] , row[14:0], column[9:0], null[3:0] } ;

Now the first sequential 536870912 bytes of ram are all in bank 0, though every 16384 bytes will switch to a new row for that bank 0.
Then we will go to bank 1, row 0 and up.

If all your code and loops usually fits into a block or two of 131071 bytes, then it should be advantageous to use 'rank-row-bank-column' or even 'row-rank-bank-column'. However, with out large display buffer and future display textures, sound and network buffers, operating in 'rank-bank-row-column' can offer other future memory access optimization if you properly design to do so. (IE: place display buffers 1,2,3,4 in banks 6 and 7, place textures in banks 5,4,3,2, line interleaved multiplexed, (IE you can parallel read alternate lines of a texture when filling so you can do bi-linear filtering when zooming to a texture without all the additional row precharge/activate/read/ /precharge/activate/read everytime every time you read a new Y coordinate on a texture to acquire the pixel shading blend.) code and audio in banks 0,1. In dual rank mode, make each bank size 1gb and potentially keep track of 16 banks)

Quote

What about bank machines? More the merrier, or is 4 the sweet spot? I guess you'd want as many bank machines as you have ports (up to max 8 ), as each port could be using a different row? Depends on resource usage I guess?

I do not know what Vivado's ram controller's bank machines are all about. Ask asmi. In my controller, since the DDR3 has 8 banks, my controller keeps track an keeps open all 8 banks. It will only close them as needed to optimize access. My guess, and it is only a guess, is that your should set Vivado's ram controller's bank machines to 8 to keep all 8 individual banks open memory access makes use of it.

Quote

What about transaction ordering? Any preference there for maximum compatibility/performance?

I do not know what this setting does.

For my multiport controller, I have:

Code: [Select]

// ************************************************************
// *** Controls are received from the BrianHG_DDR3_PHY_SEQ. ***
// ************************************************************
input                                SEQ_CAL_PASS        ,    // Goes low after a reset, goes high if the read calibration passes.
input                                DDR3_READY          ,    // Goes low after a reset, goes high when the DDR3 is ready to go.

input                                SEQ_BUSY_t          ,    // (*** WARNING: THIS IS A TOGGLE INPUT when parameter 'USE_TOGGLE_OUTPUTS' is 1 ***) Commands will only be accepted when this output is equal to the SEQ_CMD_ENA_t toggle input.
input                                SEQ_RDATA_RDY_t     ,    // (*** WARNING: THIS IS A TOGGLE INPUT when parameter 'USE_TOGGLE_OUTPUTS' is 1 ***) This output will toggle from low to high or high to low once new read data is valid.
input        [PORT_CACHE_BITS-1:0]   SEQ_RDATA           ,    // 256 bit date read from ram, valid when SEQ_RDATA_RDY_t goes high.
input        [DDR3_VECTOR_SIZE-1:0]  SEQ_RDVEC_FROM_DDR3 ,    // A copy of the 'SEQ_RDVEC_FROM_DDR3' input during the read request.  Valid when SEQ_RDATA_RDY_t goes high.

// ******************************************************
// *** Controls are sent to the BrianHG_DDR3_PHY_SEQ. ***
// ******************************************************
output logic                         SEQ_CMD_ENA_t       ,  // (*** WARNING: THIS IS A TOGGLE CONTROL! when parameter 'USE_TOGGLE_OUTPUTS' is 1 *** ) Begin a read or write once this input toggles state from high to low, or low to high.
output logic                         SEQ_WRITE_ENA       ,  // When high, a 256 bit write will be done, when low, a 256 bit read will be done.
output logic [PORT_ADDR_SIZE-1:0]    SEQ_ADDR            ,  // Address of read and write.  Note that ADDR[4:0] are supposed to be hard wired to 0 or low, otherwise the bytes in the 256 bit word will be sorted incorrectly.
output logic [PORT_CACHE_BITS-1:0]   SEQ_WDATA           ,  // write data.
output logic [PORT_CACHE_BITS/8-1:0] SEQ_WMASK           ,  // write data mask.
output logic [DDR3_VECTOR_SIZE-1:0]  SEQ_RDVEC_TO_DDR3   ,  // Read destination vector input.
output logic                         SEQ_refresh_hold       // Prevent refresh.  Warning, if held too long, the SEQ_refresh_queue will max out.
);

(I have a parameter to change the '_toggles' controls to positive logic. IE: normal High = on, low = off)
After the DDR3 is ready...

While you keep my 'SEQ_BUSY_t' input low, I will set 'SEQ_CMD_ENA_t' high when I am sending out a command. Otherwise it is low.

My SEQ_ADDR output will be 33bit for an 8gb module. (For Vivado's ram controller, just ignore the bottom 4 bits)
The bottom 7bits will always be 0s since I am expecting 512bit data.

My 'SEQ_WMASK' output will be 64bits (512/8) and for every bit which is high, the 8 associated data bits are expected to be written.
(Warning, Vivado's ram controller may have this inverted as this is how it is on the DDR3 ram chips)

When I send a read command, my 'SEQ_RDVEC_TO_DDR3' output will have a 4 bit ID number.

My multiport will accept read data word every clock while the (SEQ_RDATA_RDY_t) input is high. While it is high, it is expecting a 4 bit ID input 'SEQ_RDVEC_FROM_DDR3' from the ram controller with the read data on input 'SEQ_RDATA'.

If Vivado's ram controller doesn't support such a feature, you will need to create your own. So long as the reads come back in the same order as the read requests, it is nothing more than a FIFO tied to my read request, and Vivado's ram controller read ready. This FIFO just needs to be long enough to support the maximum Queued read commands Vivado's ram controller allows before an actual read is returned. It is preferred that Vivado's ram controller supports some sort of read target address/pointer function as this removes any possible synchronization error or bug.

As for the dual rank, settings in my multiport, we will just change my parameter 'DDR3_WIDTH_BANK' from 3 to 4, effectively treating the rank as another 8 banks. Basically we will set the ram chips to 8x 4gig, 8bit, but add an extra bit on the bank as the 'rank-bank-row-column' will just operate as a 16 bank memory even though it just comes across 2 groups of ram chips tied in parallel.

BrianHG · « **Reply #161 on:** January 23, 2023, 10:56:28 am »

Quote from: asmi on January 22, 2023, 10:31:20 pm

Quote from: nockieboy on January 22, 2023, 07:14:15 pm
Whilst I'm thinking about it, what should I be looking for when I search for a commonly-used fan to cool an FPGA like the XC7A FGG484? I'm not having much luck finding FPGA coolers, or just PCB-mounting fans, in EasyEDA or Mouser, for that matter. My Google-fu is failing me on this...

I just want a PCB footprint really, so I have an idea where to include mounting holes if a fan/heatsink combo is required.
I'm think of something like this one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-23-33-D-AB-0?qs=PqoDHHvF64%2FNvAsuJB%2Fzyw%3D%3D

Or we can use a one size up one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-27-33-D-AB-0?qs=PqoDHHvF64%2FM6UHqLS5KaQ%3D%3D

As for fans, since we only going to have a 5V rail, we'll need to use 5 V fans, and there isn't that many of them - which is why I'm thinking if it would make sense to use a larger heatsink (like 27 mm I listed above), as it would allow using 25 mm fans, of which there is a larger selection.

For a chip called Artix, you would think it runs cool, or at least a heat sink should be enough.
If you want a fan, you can also try looking for a faned heatsink. The max10 50kle barely gets warm with no heatsink on it. Though I am expecting that Nockieboy can move his core logic from 100Mhz (damn ellipse and line generator was holding us back) to 200MHz on the Artix, and with double logic size, the heat generated could be up to quadruple.

25mm with heatsink, 2 for 5$

30mm with heatsink, 1 for 4$

asmi · « **Reply #162 on:** January 23, 2023, 03:12:50 pm »

Quote from: BrianHG on January 23, 2023, 10:56:28 am

For a chip called Artix, you would think it runs cool, or at least a heat sink should be enough.

You can think that, or you can ask anyone who actually used these chips how cool they are. I needed a heatsink even for A35T device, though it depends on many factors, including board-level stuff like presense of and proximity to other heat-generating components, size of a board, layer composition, etc.

Quote from: BrianHG on January 23, 2023, 10:56:28 am

If you want a fan, you can also try looking for a faned heatsink. The max10 50kle barely gets warm with no heatsink on it. Though I am expecting that Nockieboy can move his core logic from 100Mhz (damn ellipse and line generator was holding us back) to 200MHz on the Artix, and with double logic size, the heat generated could be up to quadruple.

Fan might not be required, but it's good to add a provision for it just in case.

Quote from: BrianHG on January 23, 2023, 10:56:28 am

25mm with heatsink, 2 for 5$

30mm with heatsink, 1 for 4$

If you sanity is worth anything to you, you would never buy a fan on Aliexpress.

nockieboy · « **Reply #163 on:** January 23, 2023, 06:16:43 pm »

Quote from: asmi on January 22, 2023, 10:31:20 pm

Quote from: nockieboy on January 22, 2023, 07:14:15 pm
Whilst I'm thinking about it, what should I be looking for when I search for a commonly-used fan to cool an FPGA like the XC7A FGG484? I'm not having much luck finding FPGA coolers, or just PCB-mounting fans, in EasyEDA or Mouser, for that matter. My Google-fu is failing me on this...

I just want a PCB footprint really, so I have an idea where to include mounting holes if a fan/heatsink combo is required.
I'm think of something like this one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-23-33-D-AB-0?qs=PqoDHHvF64%2FNvAsuJB%2Fzyw%3D%3D
Or we can use a one size up one: https://www.mouser.ca/ProductDetail/Wakefield-Vette/960-27-33-D-AB-0?qs=PqoDHHvF64%2FM6UHqLS5KaQ%3D%3D

As for fans, since we only going to have a 5V rail, we'll need to use 5 V fans, and there isn't that many of them - which is why I'm thinking if it would make sense to use a larger heatsink (like 27 mm I listed above), as it would allow using 25 mm fans, of which there is a larger selection.

Marvellous - thank you. Funny how I couldn't find anything, but now I've got the name/category I keep falling over them.

Quote from: asmi on January 22, 2023, 10:31:20 pm

I've created a quick testbench for MIG, you the code is here: https://github.com/asmi84/a100-484-sodimm It just waits until controller completes initialization (signal "init_calib_complete" goes high), executes a bunch of writes to addresses 0, 0x8, 0x10 (to see back-to-back bursts), as well as 0x20000000 to test access to the second rank, and then performs reads from those addresses (in the same order as writes). Just load the project, click "Run Simulation" on the left panel, and wait - it takes a while, on my PC it takes about 10-11 minutes to complete. That's why you will need to come up with some kind of BFM as waiting that long is going to make development too slow and too annoying. 32bit controller took about half as long.

Brilliant, thanks for this. I've cloned the repo and will give it a spin as soon as I have enough time to sit and start making sense of it all.

Quote from: BrianHG on January 23, 2023, 09:44:03 am

I'm thinking that 'rank-bank-row-column' gives you the opportunity to create a 2 rank controller for when a 8gig stick is installed and it may still function with a 4gig stick except that everything above 4 gig will read and write as blank or error. It depends on Vivado's ram controller's flexibility when in operation.

Thanks for the explanation BrianHG, it's making more sense to me now. It really feels like I've a hell of a lot to learn about memory and controllers before I can truly understand everything you've written about the controller settings etc.

Quote from: BrianHG on January 23, 2023, 09:44:03 am

Quote
What about bank machines? More the merrier, or is 4 the sweet spot? I guess you'd want as many bank machines as you have ports (up to max 8 ), as each port could be using a different row? Depends on resource usage I guess?
I do not know what Vivado's ram controller's bank machines are all about. Ask asmi. In my controller, since the DDR3 has 8 banks, my controller keeps track an keeps open all 8 banks. It will only close them as needed to optimize access. My guess, and it is only a guess, is that your should set Vivado's ram controller's bank machines to 8 to keep all 8 individual banks open memory access makes use of it.

@asmi - is BrianHG on the money here? Makes sense to me to have 8 bank machines, but you're the Xilinx expert here.

Quote from: asmi on January 23, 2023, 03:12:50 pm

Fan might not be required, but it's good to add a provision for it just in case.

I agree. Whilst there are some designs that absolutely won't require one, that's not to say that the GPU project or some other project someone uses won't need one. If I can include provision for one, that's a Good Thing^TM.

Quote from: asmi on January 23, 2023, 03:12:50 pm

If you sanity is worth anything to you, you would never buy a fan on Aliexpress.

I have to agree with asmi here. I don't know why, but something just seems to not sit right with a hydraulic-bearing 25x25mm cooling fan, plus heatsink (albeit not a very tall one), for £2.94 (~$4)... for two. It's going to cost more than that to post them from China to my house...

Anyhow, a cooler on its own from Mouser is over double that, but it's all about options I guess.

EDIT: Oh, nearly forgot. I'm looking more and more seriously at scrubbing the core/carrier combination and just doing one PCB with everything on it - it's looking like a better idea all the time, not least from a cost and complexity perspective, with the preferred SODIMM memory. This is even more important now I'm thinking of moving to soft-core CPUs within the FPGA and not bothering supporting a discrete host system. I need to give some thought to required peripherals on the board:

USB OTG or USB HOST is a must
Ethernet
Audio codec
HDMI output
Sufficient IO for peripherals - PMOD, Beaglebone or generic pin-strip connectors

...plus (and I know we're coming back to the stub issue), a neat way to program/update the FPGA via USB, and USB serial comms.

BrianHG · « **Reply #164 on:** January 23, 2023, 07:50:46 pm »

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

Quote from: BrianHG on January 23, 2023, 09:44:03 am
I'm thinking that 'rank-bank-row-column' gives you the opportunity to create a 2 rank controller for when a 8gig stick is installed and it may still function with a 4gig stick except that everything above 4 gig will read and write as blank or error. It depends on Vivado's ram controller's flexibility when in operation.

Thanks for the explanation BrianHG, it's making more sense to me now. It really feels like I've a hell of a lot to learn about memory and controllers before I can truly understand everything you've written about the controller settings etc.

It is less about understanding Vivado's or my DDR3 controller. It is about understanding how a DDR3 ram chip works.

Download this data sheet: https://media-www.micron.com/-/media/client/global/documents/products/data-sheet/dram/ddr3/4gb_ddr3l.pdf?rev=c2e67409c8e145f7906967608a95069f

Take a look at page 16, figure 4. This is one of the ram chips on your SODIMM module.
Your module has 2 groups of 8 of them in parallel giving you a 64 bit data bus.
The 2 different groups are the 2 different 'rank'.

Looking at the block diagram, you will see banks 0 through 7, 8 of them.
The chip appears as 8 smaller DDR3 ram chips inside a single chip.

To perform a read, you need to look at a few pages:

Step 1) First, if the current/previously activated row in the bank you want to access is the wrong one, you need to 'precharge' that bank to release the selected row. There is an associated minimum amount of time you must wait before executing the 'precharge'. See page 167, figure 75 for an example final 'read' command to the earliest 'precharge' command. As you can see you have to wait all the way until clock T13 before you can activate a new row.

Step 2) Second, if the data you want to read is not in a row which hasn't already been activated, you need to activate that row, page 161, figure 7. They show you an activate on clock T3 and you can only read from 'that bank's row' by clock T11. Another great delay.

Step 3) Once your bank and row has been setup, see page 165 figure 70, you see there is a delay until T5 before the first read command comes out. You will also notice that even though the first data did not come out a second read was performed at T4 and the read stream continues unbroken seen in the DQ output. This is permitted so long as you realize the data you read/or/write is in the same row. What is not illustrated is that if any of your other 8 banks also already have the correct row activated where you wish to read/or/write data, you can skip steps 1 and 2, and just read continuously as in just doing step 3 without any break in the high speed data stream.

Remember I said place your video buffer and textures/painting bitmaps in different banks. If you are reading a straight line of graphic data coming from bank 1 and writing that straight line of data to a bank 2, once the first access of each read and write addresses have 'precharged' then 'activated' their memory location rows in their associated banks, the read and write commands can run looping in step #3 without having to continuously perform the slow steps #1 and #2. If you texture and display data were both in bank 1, but in different rows, you would need to do steps 1, 2, 3 to read a 512bit block of texture data, then do steeps 1, 2 to change the row in the same bank, then an equivalent step 3 to write the 512bits. Then for the next read, back to step 1 and 2 to change the row again, and back to step 3. The other choice to radically decrease steps 1 & 2 is to have a large read and write cache, read the whole texture in 1 straight line into a FPGA BRAM buffer, then write out in a straight line to display ram.

I still say read the above .pdf data sheet so you have an idea of what the bottlenecks in using DDR3 ram is and what the controllers are doing to read and write to them. This can give you an idea in the future when you are trying to optimize the performance of your design or code / asset placement in ram to help speed things along.

Quote

@asmi - is BrianHG on the money here? Makes sense to me to have 8 bank machines, but you're the Xilinx expert here.

Do not quote me on this. I do not know if this is what is meant by Vivado's 'bank machine' settings. I only told you that my ram controller keeps track of all 8 banks and allows for all 8 to remain open. If Vivado's 'bank machine' setting has to do with the number of banks activated in advance of a command stream, a setting of 4 may be optimal as Vivado's controller may still keep all 8 banks open until it must close one due to a new row address in that bank, or a mandatory refresh. I do not foresee this being any noticeable performance help either way until much later in the game when you really done a 3D pixel shader with some really advanced stuff.

asmi · « **Reply #165 on:** January 23, 2023, 07:57:11 pm »

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

Brilliant, thanks for this. I've cloned the repo and will give it a spin as soon as I have enough time to sit and start making sense of it all.

I tried to setup the project such that you would only need to open the project and press the single button. It will simulate 150 us, of those about 120 us is taken by the initialization, memory training, etc, and then there are my commands.

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

Quote from: BrianHG on January 23, 2023, 09:44:03 am
I'm thinking that 'rank-bank-row-column' gives you the opportunity to create a 2 rank controller for when a 8gig stick is installed and it may still function with a 4gig stick except that everything above 4 gig will read and write as blank or error. It depends on Vivado's ram controller's flexibility when in operation.

Thanks for the explanation BrianHG, it's making more sense to me now. It really feels like I've a hell of a lot to learn about memory and controllers before I can truly understand everything you've written about the controller settings etc.

That's actually not true for Micron modules - both MT16KTF51264HZ (4 GB) and MT16KTF1G64HZ (8 GB) are dual rank modules with 16 x8 memory devices, they use different chips - 4 GB version uses 2 Gb devices, while 8 GB version uses 4 Gb ones. I actually happen to have that exact 4 GB module (bought it for a different project which didn't come to life for various reasons). So it's not that simple, and you have to look at specs for a specific module to see if it's a single rank or dual rank. There are some DDR3 RDIMMs which are quad rank as well.

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

@asmi - is BrianHG on the money here? Makes sense to me to have 8 bank machines, but you're the Xilinx expert here.

I never actually needed more than 4 open rows at the same time because I always try to optimize my access patterns to minimize open rows, so I don't really know. One thing I do know is that bank machines cost logic resources which are finite, so add more only if you need them. Thankfully, all it takes to change that number is to run the GUI again and regenerate sources, so I'd say leave it as it is for now, and you can always change it later.

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

I agree. Whilst there are some designs that absolutely won't require one, that's not to say that the GPU project or some other project someone uses won't need one. If I can include provision for one, that's a Good Thing^TM.

Artix devices actually have a thermal diode on die bonded out to pins DXP and DXN, so technically it's possible to implement some kind of external fan controller (you will need to add a temp sense IC which would perform the measurement and give you a temperature reading) and do a PWM with a FET to reduce the noise when you don't need a full blast, but that will require some kind of microcontroller, and I don't think it's worth it to add an MCU just for that purpose, but I just wanted to let you know that this feature is available.
Also these FPGA have dual multichannel 12bit 1 Msps ADCs, which can also measure things like die temperature and power supply voltage, so you can implement a PWM fan controller inside FPGA too. Infact MIG uses that very ADC to measure die temperature and perform thermal compensation of internal delays to make sure DDR3 controller work in the entire range of temperatures (which can be as wide as -40 - +125 °C depending on a device grade).

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

I have to agree with asmi here. I don't know why, but something just seems to not sit right with a hydraulic-bearing 25x25mm cooling fan, plus heatsink (albeit not a very tall one), for £2.94 (~$4)... for two. It's going to cost more than that to post them from China to my house...

Anyhow, a cooler on its own from Mouser is over double that, but it's all about options I guess.

There are two things which I would never buy on Aliexpress (or ebay for that matter) - these are fans and computer storage (SSD, SD Cards, HDD, etc). The former because of noise (cheap stuff is cheap for a reason!), the latter because the data I tend to put on those cost much more than device itself, so I prefer to have something reliable.

Quote from: nockieboy on January 23, 2023, 06:16:43 pm

EDIT: Oh, nearly forgot. I'm looking more and more seriously at scrubbing the core/carrier combination and just doing one PCB with everything on it - it's looking like a better idea all the time, not least from a cost and complexity perspective, with the preferred SODIMM memory. This is even more important now I'm thinking of moving to soft-core CPUs within the FPGA and not bothering supporting a discrete host system. I need to give some thought to required peripherals on the board:
USB OTG or USB HOST is a must
Ethernet
Audio codec
HDMI output
Sufficient IO for peripherals - PMOD, Beaglebone or generic pin-strip connectors
...plus (and I know we're coming back to the stub issue), a neat way to program/update the FPGA via USB, and USB serial comms.

If it's going to be a single board, then we'll need to place everything one would need for a typical SBC (single board computer). So I would add a 4 port USB 2.0 hub (implemented using Microchip's USB2514 or USB2504A device) connected to a USB UPLI device so that you can connect multiple devices at the same time (think keyboard and mouse + perhaps some thumb drive). This also has some advantages as far as implementation does because you won't have implement USB 1.1 alongside of USB 2.0, but instead use special features of USB 2.0 to talk to slower devices, but those are small details not relevant here. Also maybe implement a PCIE, which can be switched with DisplayPort - either as an M.2 connector (for NVMe SSD) or as a full-up x4 PCIE port? There are some affordable switches which are fast enough to switch PCIE 2 traffic. I'm just thinking aloud here of what can you possibly want in a SBC.

asmi · « **Reply #166 on:** January 23, 2023, 08:44:19 pm »

In addition to what @BrianHG said, it will do you good to study the waveforms of activity on a DDR3 bus, attached is a good example of complex commanding for a dual-rank module.

Notes for the image:
1: "Precharge" command to rank 1.
2: "Write" command to a column 0 of rank 0. Line A12 is high for a full 8 beats burst (not a 4 beats chop).
3. "Write" command to a column 0x8 of rank 0. Line A12 is high for a full 8 beats burst (not a 4 beats chop). Data burst 0 for command (2) is about to begin.
4. "Refresh" command to rank 1. Data burst 0 is in progress.
5. "Write" command to a column 0x10 of rank 0. Line A12 is high for a full 8 beats burst (not a 4 beats chop). Data burst 0 for command (2) is about to complete followed by data burst 1 for command (3).

As you can see, write commands are scheduled in order to fully utilize a data bus for uninterrupted stream of data into the memory - three 8 beat bursts with no breaks between them (there is a similar stream for reading later in the timeline), while at the same time using other command slots to give commands to the second rank. You can also see that this controller has 1T commanding, meaning it can issue commands to memory on every memory clock cycle, unlike some other controllers I've seen which have 2T commanding and can only issue a command on every other clock cycle and so can't pack commands as efficiently as 1T one can.

nockieboy · « **Reply #167 on:** January 24, 2023, 06:57:04 pm »

Quote from: asmi on January 23, 2023, 07:57:11 pm

That's actually not true for Micron modules - both MT16KTF51264HZ (4 GB) and MT16KTF1G64HZ (8 GB) are dual rank modules with 16 x8 memory devices, they use different chips - 4 GB version uses 2 Gb devices, while 8 GB version uses 4 Gb ones. I actually happen to have that exact 4 GB module (bought it for a different project which didn't come to life for various reasons). So it's not that simple, and you have to look at specs for a specific module to see if it's a single rank or dual rank. There are some DDR3 RDIMMs which are quad rank as well.

I got an MT16KTF1G64HZ-1G6E1 in the post thanks to an impulse-buy on eBay for £9.

Quote from: asmi on January 23, 2023, 07:57:11 pm

Quote from: nockieboy on January 23, 2023, 06:16:43 pm
@asmi - is BrianHG on the money here? Makes sense to me to have 8 bank machines, but you're the Xilinx expert here.
I never actually needed more than 4 open rows at the same time because I always try to optimize my access patterns to minimize open rows, so I don't really know. One thing I do know is that bank machines cost logic resources which are finite, so add more only if you need them. Thankfully, all it takes to change that number is to run the GUI again and regenerate sources, so I'd say leave it as it is for now, and you can always change it later.

Okay, still with default 4 for the moment.

Quote from: asmi on January 23, 2023, 07:57:11 pm

Artix devices actually have a thermal diode on die bonded out to pins DXP and DXN, so technically it's possible to implement some kind of external fan controller (you will need to add a temp sense IC which would perform the measurement and give you a temperature reading) and do a PWM with a FET to reduce the noise when you don't need a full blast, but that will require some kind of microcontroller, and I don't think it's worth it to add an MCU just for that purpose, but I just wanted to let you know that this feature is available.
Also these FPGA have dual multichannel 12bit 1 Msps ADCs, which can also measure things like die temperature and power supply voltage, so you can implement a PWM fan controller inside FPGA too. Infact MIG uses that very ADC to measure die temperature and perform thermal compensation of internal delays to make sure DDR3 controller work in the entire range of temperatures (which can be as wide as -40 - +125 °C depending on a device grade).

Interesting. Would make sense to use the internal ADC and implement an HDL PWM with pin out to a FET to control the fan, no? Only one external part needed that way (and maybe a couple of resistors), and you get thermally-controlled fan regulation for little effort.

Quote from: asmi on January 23, 2023, 07:57:11 pm

If it's going to be a single board, then we'll need to place everything one would need for a typical SBC (single board computer). So I would add a 4 port USB 2.0 hub (implemented using Microchip's USB2514 or USB2504A device) connected to a USB UPLI device so that you can connect multiple devices at the same time (think keyboard and mouse + perhaps some thumb drive). This also has some advantages as far as implementation does because you won't have implement USB 1.1 alongside of USB 2.0, but instead use special features of USB 2.0 to talk to slower devices, but those are small details not relevant here. Also maybe implement a PCIE, which can be switched with DisplayPort - either as an M.2 connector (for NVMe SSD) or as a full-up x4 PCIE port? There are some affordable switches which are fast enough to switch PCIE 2 traffic. I'm just thinking aloud here of what can you possibly want in a SBC.

4-port USB 2.0 - yeah, I can see that being useful.
x4 PCIE? Interesting. I can see the immediate potential of an M.2 connector, but what about PCIE? Forgive my lack of imagination or knowledge, but what sort of peripherals would that open the board up to?
What about DisplayPort? Is that big thing/must have?

Is there any way the 4-port USB hub could be made use of by a soft-core CPU running something that doesn't have a USB stack like Linux? I'd still like to be able to plug a USB keyboard and mouse in with, for example, a 16-bit Motorola soft-core CPU - or even the dreaded Z80 or other 8-bit processors. This is a pretty key requirement for the board - it's up there with the SODIMM, really. The only solution I could come up with myself was to use a dedicated chip like the CH559 to handle the USB peripheral and send codes to the host via serial. I don't really want to include something as specific and niche as that on a more generic board that will likely be used for more powerful applications that can handle the USB ports natively? Or will be a case of running a MicroBlaze CPU to handle the USB in additional to the actual emulated CPU, whatever that may be, and passing coordinates/keypresses etc. to it internally?

nockieboy · « **Reply #168 on:** January 24, 2023, 07:23:25 pm »

Quote from: asmi on January 23, 2023, 08:44:19 pm

As you can see, write commands are scheduled in order to fully utilize a data bus for uninterrupted stream of data into the memory - three 8 beat bursts with no breaks between them (there is a similar stream for reading later in the timeline), while at the same time using other command slots to give commands to the second rank. You can also see that this controller has 1T commanding, meaning it can issue commands to memory on every memory clock cycle, unlike some other controllers I've seen which have 2T commanding and can only issue a command on every other clock cycle and so can't pack commands as efficiently as 1T one can.

I got the simulation running in Vivado. Like you said, a single click and off it went. Have exactly the same output as your picture too, so it's looking good. I just need to get a grasp of how it all fits together now and get comfortable using it like I am with Quartus. Then I'll be more use in trying to apply BrianHG's multi-port adapter to the simulation and get the BFM set up?

asmi · « **Reply #169 on:** January 24, 2023, 09:08:28 pm »

Quote from: nockieboy on January 24, 2023, 06:57:04 pm

I got an MT16KTF1G64HZ-1G6E1 in the post thanks to an impulse-buy on eBay for £9.

I ordered one for myself as well. I've also learned that apparently there is an even larger capacity version - MT16KTF2G64HZ - 16 GBytes! But it's quite expensive even on the likes of ebay, and in general is not easy to find in stock. I'm also thinking about picking up some cheap "computer store"-sourced modules just to see if they would work - but that is for later when we'll actually have a board in our hands.

Quote from: nockieboy on January 24, 2023, 06:57:04 pm

Interesting. Would make sense to use the internal ADC and implement an HDL PWM with pin out to a FET to control the fan, no? Only one external part needed that way (and maybe a couple of resistors), and you get thermally-controlled fan regulation for little effort.

We've got to make sure it goes on full blast by default as it seems to be the safest option, also there are some complications with using XADC when you have DDR3 controller as well, because it uses it internally for the same reason. It's nothing complex really - you just need to drive current temp measurement to controller, but it can be done.

Quote from: nockieboy on January 24, 2023, 06:57:04 pm

x4 PCIE? Interesting. I can see the immediate potential of an M.2 connector, but what about PCIE? Forgive my lack of imagination or knowledge, but what sort of peripherals would that open the board up to?

Since you can get Linux running on a Microblaze, you can get a lot of commercial PCIE devices to run there using Linux PCIE device drivers - from PCIE-to-M.2 storage cards to multi-gig (2.5G, 5G) network adapters. And perhaps at some point you (or someone else) will want to design an actual video card - with dedicated FPGA and dedicated memory, connecting it to a host via PCIE makes a lot of sense. Or if you run out of FPGA resources in the main FPGA and would want to move some functionality to external FPGA - again, PCIE 2x4 will provide for 20 Gbps of bandwidth in each direction, or perhaps even more if some sort of custom protocol (like Aurora) is used instead of PCIE while still running over the same lanes, but at higher bitrate (GTP transceivers can run up to 6.25 Gbps per lane instead of 5 Gbps of PCIE 2, giving a total of 25 Gbps of bandwidth in each direction). I can see myself using that port with another FPGA board just to play around with different protocols over multi-gigabit serial lines - at the end of the day, PCIE connector is just a connector, nothing in it says that you can't use other protocols over it.

Quote from: nockieboy on January 24, 2023, 06:57:04 pm

What about DisplayPort? Is that big thing/must have?

To be honest I'm not really convinced of it's utility given that we have HDMI 1080p@60. I frankly don't think A100T will have enough resources for CPU, all peripherals AND a GPU powerful enough to generate an image of sufficient complexity with higher than FullHD resolution. The only reason I offered it for the carrier was that it was a relatively painless way to expose those transceiver lanes externally. But now I think PCIE connector would be much better because it can also provide power to a connected board, as well as having both transmitter lanes and received lanes going through the same connector.

Quote from: nockieboy on January 24, 2023, 06:57:04 pm

Is there any way the 4-port USB hub could be made use of by a soft-core CPU running something that doesn't have a USB stack like Linux? I'd still like to be able to plug a USB keyboard and mouse in with, for example, a 16-bit Motorola soft-core CPU - or even the dreaded Z80 or other 8-bit processors. This is a pretty key requirement for the board - it's up there with the SODIMM, really. The only solution I could come up with myself was to use a dedicated chip like the CH559 to handle the USB peripheral and send codes to the host via serial. I don't really want to include something as specific and niche as that on a more generic board that will likely be used for more powerful applications that can handle the USB ports natively? Or will be a case of running a MicroBlaze CPU to handle the USB in additional to the actual emulated CPU, whatever that may be, and passing coordinates/keypresses etc. to it internally?

I think using Microblaze with Linux just to handle keyboard is nothing but a massive waste of resources. But since we are going to have some sort of GPIO header anyway, you can always design a small plug-in PCB with CH559 and have your keyboard issue taken care of this way. Or to have a few PMOD-like connectors and design a PMOD-compliant CH559 module. The point is since that connection isn't going to be particularily high-speed one, we can get away with using cheap PMOD-like headers. And as a bonus, you can connect some of a myriad of PMOD modules which are already on a market if you happen to have some (or buy some from say Digilent).

Quote from: nockieboy on January 24, 2023, 07:23:25 pm

I got the simulation running in Vivado. Like you said, a single click and off it went. Have exactly the same output as your picture too, so it's looking good. I just need to get a grasp of how it all fits together now and get comfortable using it like I am with Quartus. Then I'll be more use in trying to apply BrianHG's multi-port adapter to the simulation and get the BFM set up?

Cool. I think at first you need to get comfortable using MIG UI to issue commands to it and write/read data. So, once you've read the relevant section of a UG586 which explains how to use it, try playing with it, adding some more writes and reads, and see that it behaves like you think it should. It's important that you understand very well how to interact with it and how it does things, because BFM is meant to replicate the outside behavior of it (meaning the "UI" would need to behave exactly like a real UI would in any circumstances). Once you have that familiarity, we can proceed with making a model for it.

To be honest, that is not how I use MIG, because I always use it through AXI bus, and for that there is a free AXI Verification IP which can emulate memory (among other things), which is what I use for anything memory-related, and you can also pre-fill certain addresses of that "memory" with specific data to help with your testing - like for example if component you are testing expects specific data at specific memory location(s), you can set it up as part of a testbench. And since all but the simplest of my designs use AXI, it's a natural for me to prefer using that. Incidentally if you will decide to use Microblaze at some point, you will have to use AXI because that's the bus this CPU uses to talk to any external devices - be it memory, or IO devices.

Now, as far as Vivado goes, here is how I do things. Let me preface it with saying that it's not neccessarily the best way, or the only way, but merely that's what works for me.
1. I don't use Vivado's built in editor, because frankly it sucks. Instead I use VS Code with a plugin for SystemVerilog. I open "<project_name>.srcs" folder in VS Code, which makes all sources I'd be interested in available for me to edit.
2. Depending on the project, there will be one, two, or three subfolders - the one named "sources_1" is where your synthesizable HDL goes (if you click "Add Sources" option on the left panel and select "Add or create design sources" - that's where all these files will go), another "sim_1" is where your simulation-only HDL goes (option "Add or create simulation sources" in the same dialog will land files in that folder), and yet another one "constrs_1" (option "Add or create constraints"). There might be more folders created later if you have multiple simulation sets, but let's leave it aside for now.
3. Once you've made any changes in HDL and want to re-run simulation, there is a button for that in the top toolbar, see the frist screenshot in attachment. By default, simulation will run for a length of time configured in settings dialog (see second screenshot). Other controls next to "Restart simulation" button allows advancing simulation for some more time, or restarting simulation from the beginning without recompiling sources.
4. Simulation waveform interface is broadly similar in form and function to what I remember of Modelsim (or any other HDL sim that I've ever seen for that matter). Some useful controls:

Left-right cursor keys move to the next transition of currently-selected trace, same keys while holding "Shift" allow measuring time intervals between current cursor position and the next edge, up-down keys select trace above-below current one
Mouse scroll wheel scrolles the list of traces, the wheel while holding "Control" changes the timescale, the wheel while holding "Shift" scrolls the waveforms hotizontally (along the time axis)
You can assign different colors to traces, change the radix for multibit traces and other things via context menu on a trace. You can add a separator, create a new group or a virtual bus there as well. To rearrange traces, just drag them around.
If you want to add more traces (for example for internal signals which are not displayed by default), use the "Scope" panel to the left to browse through the modules hierarchy, pick an instance you want traces to added from, and drag traces from "Objects" panel into the waveform viewer

That should get you started, but like always feel free to experiment, if you somehow screw something up, you can always go back to initial working state by simple getting rid of your changes using git. If you have any questions or problems, feel free to post them here and I will try to help.

nockieboy · « **Reply #170 on:** January 25, 2023, 01:37:26 pm »

Quote from: asmi on January 24, 2023, 09:08:28 pm

Quote from: nockieboy on January 24, 2023, 06:57:04 pm
Interesting. Would make sense to use the internal ADC and implement an HDL PWM with pin out to a FET to control the fan, no? Only one external part needed that way (and maybe a couple of resistors), and you get thermally-controlled fan regulation for little effort.
We've got to make sure it goes on full blast by default as it seems to be the safest option, also there are some complications with using XADC when you have DDR3 controller as well, because it uses it internally for the same reason. It's nothing complex really - you just need to drive current temp measurement to controller, but it can be done.

A pull-up on the gate will do that. If the FPGA isn't pulling the gate low, the fan will default to full-on.

Quote from: asmi on January 24, 2023, 09:08:28 pm

Since you can get Linux running on a Microblaze, you can get a lot of commercial PCIE devices to run there using Linux PCIE device drivers - from PCIE-to-M.2 storage cards to multi-gig (2.5G, 5G) network adapters. And perhaps at some point you (or someone else) will want to design an actual video card - with dedicated FPGA and dedicated memory, connecting it to a host via PCIE makes a lot of sense.

It'll be an SBC, like a Raspberry Pi, but with an FPGA instead of a dedicated processor and the ability to attach bona-fide PC expansion cards...

Quote from: asmi on January 24, 2023, 09:08:28 pm

Or if you run out of FPGA resources in the main FPGA and would want to move some functionality to external FPGA - again, PCIE 2x4 will provide for 20 Gbps of bandwidth in each direction, or perhaps even more if some sort of custom protocol (like Aurora) is used instead of PCIE while still running over the same lanes, but at higher bitrate (GTP transceivers can run up to 6.25 Gbps per lane instead of 5 Gbps of PCIE 2, giving a total of 25 Gbps of bandwidth in each direction). I can see myself using that port with another FPGA board just to play around with different protocols over multi-gigabit serial lines - at the end of the day, PCIE connector is just a connector, nothing in it says that you can't use other protocols over it.

Just so I'm absolutely 100% certain we're thinking about the exact same thing, what sort of PCIE connector are you thinking of? Up to this point, I have been thinking of a PCIE socket on the PCB, but would it be worth adding a PCIE edge connector to the PCB to allow the board to be a PCIE peripheral itself, like those expensive dev boards I see here and there? Would that even be possible with a socket as well? With the socket, am I going to need to provide a 12V source for it as well to be fully compliant and compatible with all peripherals?

Quote from: asmi on January 24, 2023, 09:08:28 pm

I think using Microblaze with Linux just to handle keyboard is nothing but a massive waste of resources.

I agree.

Quote from: asmi on January 24, 2023, 09:08:28 pm

But since we are going to have some sort of GPIO header anyway, you can always design a small plug-in PCB with CH559 and have your keyboard issue taken care of this way. Or to have a few PMOD-like connectors and design a PMOD-compliant CH559 module. The point is since that connection isn't going to be particularily high-speed one, we can get away with using cheap PMOD-like headers. And as a bonus, you can connect some of a myriad of PMOD modules which are already on a market if you happen to have some (or buy some from say Digilent).

Well I mentioned PMOD connectors in a previous post to expose some of the free IO, as well as maybe a Raspberry Pi-compatible header etc., depending on how many free IO there will be. I don't want to waste any. But using a CH559 - or whatever other solution I want to use - is actually a really good idea and I wish I'd had it!

Quote from: asmi on January 24, 2023, 09:08:28 pm

To be honest, that is not how I use MIG, because I always use it through AXI bus, and for that there is a free AXI Verification IP which can emulate memory (among other things), which is what I use for anything memory-related, and you can also pre-fill certain addresses of that "memory" with specific data to help with your testing - like for example if component you are testing expects specific data at specific memory location(s), you can set it up as part of a testbench. And since all but the simplest of my designs use AXI, it's a natural for me to prefer using that. Incidentally if you will decide to use Microblaze at some point, you will have to use AXI because that's the bus this CPU uses to talk to any external devices - be it memory, or IO devices.

I'm not going to be using AXI for this memory controller, am I? This is how I think it all fits together (in my mind, at least):

BrianHG · « **Reply #171 on:** January 25, 2023, 03:37:31 pm »

Ok, here you go. I've attached your MIG setup which will be backwards compatible with your existing GPU project.

There are 2 files:
BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv :
Replaces my BrianHG_DDR3_CONTROLLER_v16_top.sv. Read inside carefully, the:

Code: [Select]

//xxxxxxxxx
//
//xxxxxxxxx

Denotes something you should pay attention to and may need changing.

You will see it now only requires 2 other files from my original DDR3 project:

Code: [Select]

//   - BrianHG_DDR3_COMMANDER_v16.sv                     -> v1.6 High FMAX speed multi-port read and write requests and cache, commands the BrianHG_DDR3_PHY_SEQ.sv sequencer.
//   - BrianHG_DDR3_FIFOs.sv                             -> Serial shifting logic FIFOs.

There is a second new file, BrianHG_Xilinx_MIG_DDR3.sv.
This file replaces my BrianHG_DDR3_PHY_SEQ_v16.sv for the new controller as seen in this diagram:

Again, in the code, read the:

Code: [Select]

//xxxxxxxxx
//
//xxxxxxxxx

Which denotes something you should pay attention to and may need changing.

You need to place the Xilinx Vivado MIG at the bottom of this source file and wire it to all of my 'SEQ_xxx' command control ports with any additional logic, wire, clocks, signals which may be required for the system to function.

I've commented out all the unused parameters. The ones remaining are either used by existing modules in your GPU system, or may just be there as useful notes or functions which may be added in the future.

The only other change is that for the wide 128bit bus setting used for my VGA generator's display read channel, you will need to switch that read port from 128bits to 512bits, or whatever the MIG bus width becomes. This will ensure unbroken full speed read bursting when the VGA generator reads data from DDR3 ram to fill the video output line buffer when displaying each line of video. (IE: with 512bit, fill an entire line of video before the Z80 executes 2-3 instructions, then wait for the next H-Sync to make the next line.)

As for the VGA generator, I'm now patching the source file 'BrianHG_GFX_Video_Line_Buffer.sv' to remove it's dependence on Quartus' altdpram megafunction. I have already made the new working code (works in modelsim), but a Quartus related bug cant infer a wide block-ram which contains 'byte-enable' without it taking hours to compile (yes, its a compiler bug). I'll send you the patched code next. Most likely Vivado doesn't have the same bug, so it should work. But if it doesn't, I left space at the bottom of the code to allow you to manually insert Vivado's BRAM function.

(Also, remember there is a parameter for the ellipse generator for using the Quartus multiply megafunction in place of a standard verilog ' C <= A * B ', you also need to turn that off.)

(opps, in BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv, I forgot to erase lines 497-499. Those ports no longer exist.)

(I will also edit my testbench 'BrianHG_DDR3_CONTROLLER_v16_top_tb.sv' into 'BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top_tb.sv' so you may test that my multiport successfully generates unbroken sequential bursts with Xilinx's MIG controller. It will basically be the same as the original, but, it will just call the new 'BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv' demonstrating backwards compatibility.)

asmi · « **Reply #172 on:** January 26, 2023, 02:10:20 am »

Quote from: nockieboy on January 25, 2023, 01:37:26 pm

Just so I'm absolutely 100% certain we're thinking about the exact same thing, what sort of PCIE connector are you thinking of? Up to this point, I have been thinking of a PCIE socket on the PCB, but would it be worth adding a PCIE edge connector to the PCB to allow the board to be a PCIE peripheral itself, like those expensive dev boards I see here and there? Would that even be possible with a socket as well? With the socket, am I going to need to provide a 12V source for it as well to be fully compliant and compatible with all peripherals?

Yes, I'm talking about a PCIE x4 connector, kinda like what you have on PC motherboards. As for 12 V - well we can power the system with that voltage (this is actually a good idea because it's going to reduce the current in the connecting cable), and have a DC-DC converter to convert 12 V to 5 V. This will also allow us to use a 12 V fan, which are much more plentiful on a market.
As for connecting two boards - there are cables like this one: https://www.samtec.com/products/pciec-064-0050-ec-ec-cp which allows connecting two PCIE connectors without an edge connector, you can also find similar male-to-male PCIE cables on the likes of Aliexpress for less money - they are not as abundant as male-to-female extension cables, but they do exist and can be found. Or you can design another FPGA board with an edge connector so that you can plug it into this board.

Quote from: nockieboy on January 25, 2023, 01:37:26 pm

I'm not going to be using AXI for this memory controller, am I? This is how I think it all fits together (in my mind, at least):

I don't know what are you going to use - it's for you and BrianHG to decide. I use AXI because everything in Xilinx world speaks AXI, so it was natural for me to also adopt it. Just to give you idea, here is a system I designed for a board from my signature:

As you can see, there are two big crossbar interconnects:
1. First (axi_smc) connects a MIG DDR2 controller and a execute-in-place (XIP) QSPI flash on the slave side with a CPU's instruction (M_AXI_IC) and data (M_AXI_DC) ports so that CPU can fetch commands and read/write data from any address in DDR2 and XIP (of course XIP is read-only, so you can't write anything there), also there are two more master ports for DMA-capable IPs - one of which (v_frmbuf_wr_0) writes video frames into memory (in this case those frames are generated by v_tpg_0 - video test pattern generator), and another one (v_frmbuf_rd_0) reads frames from the memory and eventually they are output via HDMI video output.
2. Second (microblaze_0_axi_periph) connects "peripheral" port of a CPU with all other modules via AXI4-lite bus - things like debug module (mdm_1), interrupt controller (microblaze_0_axi_intc), I2C master module (axi_iic_0), GPIO module (axi_gpio_0), and configuration ports of all other blocks.

Those crossbar interconnects handle address mapping and translation as per address map:

Here each master has it's own address space, and you can setup address windows for slave interfaces however you want or need. For example, you can see that Microblaze's data port can access everything, while instruction port can only access DDR2 memory, XIP flash and a local memory (not shown in a diagram because it uses dedicated port of Microblaze, and not routed through main interconnect), and video reader and writer can only access DDR2 memory (XIP is not mapped to their address spaces).

Oh, and BTW - the only part of this diagram that I designed myself (and wrote HDL for) is HDMI Video Out, all other components are provided by Xilinx for free!

Now you will hopefully understand why I use AXI pretty much everywhere

BrianHG · « **Reply #173 on:** January 26, 2023, 03:15:31 am »

It's up to Nockieboy what he wants to do. All I offered was a way of backwards compatibility to his existing GPU code as all it took were a few cut and paste on my side. I no longer have time to help his to adapt to an entirely new layout and architecture. That will now become your job asmi.

nockieboy · « **Reply #174 on:** January 26, 2023, 09:03:15 am »

Quote from: BrianHG on January 26, 2023, 03:15:31 am

It's up to Nockieboy what he wants to do. All I offered was a way of backwards compatibility to his existing GPU code as all it took were a few cut and paste on my side. I no longer have time to help his to adapt to an entirely new layout and architecture. That will now become your job asmi.

There I was thinking switching FPGA wouldn't be that much of an upheaval to the project... $:-\$

I'm going to have to go the path of least resistance, which for me at the moment is to retain the existing architecture as much as possible (as I understand it) and keep the interconnects within the GPU project close to the metal, rather than using something like AXI to make it all plug 'n' play. I guess this means I'll have to write an AXI interface to use a MicroBlaze with the GPU, but I did that with Wishbone so perhaps it's possible for me to do it with AXI too. It still means any other soft-core CPU (I'm looking at you, 68000 etc.) will be straightforward to interface to the GPU via a bridge like the Z80_Bridge module.

In terms of 'what's next', aside from having a play with Vivado and the simulation software to get familiar with it and send some commands to the DDR3, I need to start looking at the MIG's HDL itself and working out what signals, ports, buses are exposed by it to look into wiring it to BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv.

BrianHG · « **Reply #175 on:** January 26, 2023, 01:02:37 pm »

Quote from: nockieboy on January 26, 2023, 09:03:15 am

Quote from: BrianHG on January 26, 2023, 03:15:31 am
It's up to Nockieboy what he wants to do. All I offered was a way of backwards compatibility to his existing GPU code as all it took were a few cut and paste on my side. I no longer have time to help his to adapt to an entirely new layout and architecture. That will now become your job asmi.

There I was thinking switching FPGA wouldn't be that much of an upheaval to the project... $:-\$

I'm going to have to go the path of least resistance, which for me at the moment is to retain the existing architecture as much as possible (as I understand it) and keep the interconnects within the GPU project close to the metal, rather than using something like AXI to make it all plug 'n' play. I guess this means I'll have to write an AXI interface to use a MicroBlaze with the GPU, but I did that with Wishbone so perhaps it's possible for me to do it with AXI too. It still means any other soft-core CPU (I'm looking at you, 68000 etc.) will be straightforward to interface to the GPU via a bridge like the Z80_Bridge module.

In terms of 'what's next', aside from having a play with Vivado and the simulation software to get familiar with it and send some commands to the DDR3, I need to start looking at the MIG's HDL itself and working out what signals, ports, buses are exposed by it to look into wiring it to BrianHG_CONTROLLER_v16_Xilinx_MIG_DDR3_top.sv.

Once again, my multiport has a 'wait/busy' input to tell it to wait and not send any commands.
It has a 'enable command' output to tell you it is sending a command.
It has a 'read/write' flag for the type of command.
It provides an address.
If it wants to write, it provides the write data + write byte enable which specifies which bytes in the 512bits should be written into DDR3.
If reading, it provides an ID code to instruct where that read command's data belongs.

It also has a 'read data ready' input with a 512bit data input port plus the expected read ID code input associated with that read ready data.

I'm sure you can wire this to Vivado's existing AXI standard if you like by adding some simple glue control logic, or, you also might be able to use Vivado MIG's lower level user interface if you like to achieve the same communication with the DDR3 MIG. It is up to you Nockieboy.

If AXI operates with separate read and write data paths or runs the data at 2x clock internally for bidir communication, then there is no hindrance to using it. Otherwise if the data path is shared as a single bidirectional read/write bus, note that my multiport will not be able to send out write data commands while read data is being returned, slowing down mixed/bidirectional DDR3 transactions. If you want this added speed, then you will need to use a lower level direct interface to Vivado's MIG to allow write data posting while read data is still being received in the data pipeline.

Asmi should have a much better understanding of AXI's capabilities. I'm also assuming if you tie my multiport to the AXI buss, you should be able to tie additional AXI compliant clients on the AXI side as well.

asmi · « **Reply #176 on:** January 26, 2023, 04:13:56 pm »

Quote from: BrianHG on January 26, 2023, 01:02:37 pm

I'm sure you can wire this to Vivado's existing AXI standard if you like by adding some simple glue control logic, or, you also might be able to use Vivado MIG's lower level user interface if you like to achieve the same communication with the DDR3 MIG. It is up to you Nockieboy.

If AXI operates with separate read and write data paths or runs the data at 2x clock internally for bidir communication, then there is no hindrance to using it. Otherwise if the data path is shared as a single bidirectional read/write bus, note that my multiport will not be able to send out write data commands while read data is being returned, slowing down mixed/bidirectional DDR3 transactions. If you want this added speed, then you will need to use a lower level direct interface to Vivado's MIG to allow write data posting while read data is still being received in the data pipeline.

Asmi should have a much better understanding of AXI's capabilities. I'm also assuming if you tie my multiport to the AXI buss, you should be able to tie additional AXI compliant clients on the AXI side as well.

AXI is not a Xilinx standard, but an ARM standard, and it's widely used in ARM SoCs - I guess that's why they decided to adopt it all those years ago. Currently AXI4 variant of a bus is used by Xilinx IPs. Full specification is publicly accessible here: https://developer.arm.com/documentation/ihi0022/e/?lang=en

There are three flavors of AXI bus - AXI4 memory-mapped Full, AXI4 memory-mapped lite (it's a simplified version of a full bus, it lacks burst capability and is used where high bandwidth is not required - like in a control registers interface), and AXI stream - which is a simple parallel interface with ready/valid handshake so that it can support throttling from both sides of the stream (source and sink), it's mostly used for non-memory-mapped stream-like data transfer, so I only mention it here for the sake of completeness.

AXI4 memory-mapped bus is a point-to-point connection (so if you want to have more devices connected you will need to use interconnect) and consists of 5 separate channels - read address, read data, write address, write data and write response. Each channel is semi-independent (in a sense that transactions on each channel can happen independently of other channels), but of course they are logically related - transfer over "read data" channel (or a series of them in case of a burst) is a response for earlier read request over "read address" channel, "write data" channel provides data to write for a write request over "write address" channel, and "write response" channel is used to communicate result of a write request back to a requestor. Here is a diagram from the specification:

It's very easy to use on a master side (because you get to initiate all transactions), it's also relatively easy to implement and AXI4-lite slave (because it doesn't support the most complex things which full one does) like a control registers interface, AXI4 full slave is the most complex because it needs to support quite wide range of transactions.

BrianHG · « **Reply #177 on:** January 26, 2023, 04:28:52 pm »

Quote from: asmi on January 26, 2023, 04:13:56 pm

AXI4 memory-mapped bus is a point-to-point connection (so if you want to have more devices connected you will need to use interconnect) and consists of 5 separate channels - read address, read data, write address, write data and write response. Each channel is semi-independent (in a sense that transactions on each channel can happen independently of other channels)

That's perfect for Nockieboy. He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.

The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.

Quote

write data

Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.

Quote

write response

Other than if you need to certify writes, I'm guessing you may ignore this.

asmi · « **Reply #178 on:** January 27, 2023, 03:47:02 pm »

Quote from: BrianHG on January 26, 2023, 04:28:52 pm

That's perfect for Nockieboy. He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.

How do you deal with variable latency (say due to ongoing refresh)? Do you have some sort of elastic FIFO to sort it out?
I will create a testbench similar to what I created for UI some time during weekend.

Quote from: BrianHG on January 26, 2023, 04:28:52 pm

The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.

Advantage of using AXI is that you can connect your AXI master port to a crossbar interconnect instead of directly to MIG and this will make MIG available to other AXI masters alongside your component.
Also with AXI you won't have to use individual requests, for example, you can command up to 4KB-long burst with a single command, instead of making a whole bunch of 64 byte requests.

One thing to keep in mind is that AXI addresses are true byte addresses, unlike MIG UI, which uses addresses derived from rank/bank/row/column.

Quote from: BrianHG on January 26, 2023, 04:28:52 pm

Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.

There is a write byte mask support via WSTRB signal. Its' logic is the opposite of DDR3 DM signals in that only bytes for which the corresponding bit of WSTRB is "high" are written.

Quote from: BrianHG on January 26, 2023, 04:28:52 pm

Other than if you need to certify writes, I'm guessing you may ignore this.

If you don't need these, you can simply hardwire BREADY signal to "high" and leave other signals of that channel unconnected.

BrianHG · « **Reply #179 on:** January 27, 2023, 05:10:52 pm »

Quote from: asmi on January 27, 2023, 03:47:02 pm

Quote from: BrianHG on January 26, 2023, 04:28:52 pm
That's perfect for Nockieboy. He can familiarize himself with AXI4 and it offers the independent read and write channel allowing for any read or write transactions to happen even as the huge pipeline delayed read data stream comes in at a delayed time which my multiport relies on to maintain the top speed.
How do you deal with variable latency (say due to ongoing refresh)? Do you have some sort of elastic FIFO to sort it out?
I will create a testbench similar to what I created for UI some time during weekend.

Yes, my user command requests have a variable self adjusting Queue.
When controlling the DDR3, or in the case talking to the AXI4, when I send a write, it is expected to eventually be written.
When I send a read, I know that the read data comes in way in the future, so I transmit an ID with the read request.

On my read data bus input, when a read data is ready, I expect to see that ID I transmitted with the read so I know where that read data belongs.

Quote

Quote from: BrianHG on January 26, 2023, 04:28:52 pm
The only advantage of directly connecting to the DDR3 MIG set to a low level interface is fewer gates, less work, or maybe saving ~1 clock cycle when requesting a transaction.
Advantage of using AXI is that you can connect your AXI master port to a crossbar interconnect instead of directly to MIG and this will make MIG available to other AXI masters alongside your component.
Also with AXI you won't have to use individual requests, for example, you can command up to 4KB-long burst with a single command, instead of making a whole bunch of 64 byte requests.

One thing to keep in mind is that AXI addresses are true byte addresses, unlike MIG UI, which uses addresses derived from rank/bank/row/column.

My multiport will send a 64byte request each time. It was designed like this because if 2 read/write ports which have identical priority have a small max burst size and have access in the same bank and column or different banks who have already been activated with the same row, my multiport will automatically perform an interleaved access knowing that the DDR3 bandwidth will still be maximum, but 2 simultaneous ports will at least be moving data at the same time. My multiport also shaves out any unnecessary access caches the 64bytes for each user IO port locally. (IE: if a port for the Z80 is set to 8bit, then 1 read will send a read, reading and writing the next 63 bytes, or repeats will not send out any commands to the DDR3 until the Z80 needs a new 64byte chunk. Both read and write 64byte chunks are separate caches and are automatically aware of each other if they have the same memory address.)

My multiport address output is a true address, down to the byte, but it has been sanitized for the MIG at the other end to evenly land on 512bit/64byte blocks every time. IE: the lower address bits are effectively tied the GND.

Quote

Quote from: BrianHG on January 26, 2023, 04:28:52 pm
Since the data bus is 512bit and we might need to only write a selected number of bytes within, we also require a write byte mask.
There is a write byte mask support via WSTRB signal. Its' logic is the opposite of DDR3 DM signals in that only bytes for which the corresponding bit of WSTRB is "high" are written.

So, 100% compatible with my multiport.

Quote

Quote from: BrianHG on January 26, 2023, 04:28:52 pm
Other than if you need to certify writes, I'm guessing you may ignore this.
If you don't need these, you can simply hardwire BREADY signal to "high" and leave other signals of that channel unconnected.

AOK.

BrianHG · « **Reply #180 on:** January 27, 2023, 05:15:04 pm »

Also, my multiport has parameters for the width of the DDR3 column, bank, row, and bit width sizes. It needs these set to match the DDR3 module to know when to prioritize which read/write commands go out when. It also has a parameter for your DDR3 controller's 'BANK-ROW-COLUMN' as this needs to be known to best prioritize DDR3 access.

(Though if another AXI device through access changes the DDR3 bank #7 compared to what my multiport was expecting, the DDR3 MIG controller will need to do additional gymnastics.)

asmi · « **Reply #181 on:** February 06, 2023, 03:47:51 pm »

Sorry about delays, some urgent stuff came up that I had to deal with.
I've added another project into the source control for AXI version of a MIG. There are two simulation sets - one (sim_1) is a test bench for talking to MIG via AXI, and another one (sim_2) is (almost) the same client code, but instead of MIG I have an AXI Verification IP instance set up as an AXI slave memory. Simulation with the latter takes just a few seconds, while AXI interface is the same. And as a bonus, AXI VIP also performs AXI protocol compliance checks and will report an error if something is wrong with AXI signalling.
Also, I finally received a response from MPS about MPM3683-7 layout, and they basically said to do as their eval board is doing. So that settles that question.

nockieboy · « **Reply #182 on:** February 06, 2023, 04:11:53 pm »

Quote from: asmi on February 06, 2023, 03:47:51 pm

Sorry about delays, some urgent stuff came up that I had to deal with.

No need to apologise - I haven't had a day off in nearly two weeks it's been so busy here, so zero progress made on the project at all. :'(

Change of role at work coming up in next week, so not likely to be making much headway for a while. What time I do have, I'll be trying to get my head around the Vivado simulation of the SODIMM and get more confidence using it in general. From scanning (literally) the conversation up to this point, I should be looking to create an AXI4 interface to BrianHG's multiport interface?

asmi · « **Reply #183 on:** February 06, 2023, 04:29:42 pm »

Quote from: nockieboy on February 06, 2023, 04:11:53 pm

No need to apologise - I haven't had a day off in nearly two weeks it's been so busy here, so zero progress made on the project at all. :'(

That's OK - life takes a priority.
I've been also quite busy lately, but since I work from home most of the time, I've been using whatever breaks in work I had to draw up schematics in Altium. They are still far from being completed, but I will get there eventually.

Quote from: nockieboy on February 06, 2023, 04:11:53 pm

Change of role at work coming up in next week, so not likely to be making much headway for a while.

Good luck in your new role, you will need the money to make this project a reality

Quote from: nockieboy on February 06, 2023, 04:11:53 pm

What time I do have, I'll be trying to get my head around the Vivado simulation of the SODIMM and get more confidence using it in general. From scanning (literally) the conversation up to this point, I should be looking to create an AXI4 interface to BrianHG's multiport interface?

I would think so. But in any case now you have both versions, and they aren't all too different from each other to be honest as far as interface goes.

asmi · « **Reply #184 on:** February 07, 2023, 04:42:06 am »

I've been reading a datasheet for that memory module I bought (MT16KTF1G64HZ-1G6E1, which uses revision E dies), and found that it can consume up to a bit over 2 Amps of current during refresh (this is the mode in which it consumes the most)! For comparison, that same module with revision N dies consumes ~1.5 Amps of current in that mode, and the one with revision P dies (this is the most recent revision) takes up only 1.3 Amps! What a difference between die revisions!
In addition to that, the module can also sink or source up to 0.6 Amps of current via termination rail (VTT), which will ultimately come from the same VDDR rail (via DDRx termination regulator). With that 3 Amps DC-DC converter we've chosen is just about enough to provide all of that current. I never really thought about this until today, because the most I used was a pair of 4G x16 chips, which consume like 240 mA MAX each, and so the choice of a converter for that rail has never been something I needed to think about - I typically used the same converter that is used elsewhere in a design (like in a Vccio rail) to help consolidate BOM.
In case you are curious, you can get a datasheet for the module here: https://www.micron.com/products/dram-modules/sodimm/part-catalog/mt16ktf1g64hz-1g9 There is also a printout of SPD contents in case you want to know what's stored there.

nockieboy · « **Reply #185 on:** February 07, 2023, 11:00:24 am »

Quote from: asmi on February 07, 2023, 04:42:06 am

I've been reading a datasheet for that memory module I bought (MT16KTF1G64HZ-1G6E1, which uses revision E dies), and found that it can consume up to a bit over 2 Amps of current during refresh (this is the mode in which it consumes the most)! For comparison, that same module with revision N dies consumes ~1.5 Amps of current in that mode, and the one with revision P dies (this is the most recent revision) takes up only 1.3 Amps! What a difference between die revisions!
In addition to that, the module can also sink or source up to 0.6 Amps of current via termination rail (VTT), which will ultimately come from the same VDDR rail (via DDRx termination regulator). With that 3 Amps DC-DC converter we've chosen is just about enough to provide all of that current. I never really thought about this until today, because the most I used was a pair of 4G x16 chips, which consume like 240 mA MAX each, and so the choice of a converter for that rail has never been something I needed to think about - I typically used the same converter that is used elsewhere in a design (like in a Vccio rail) to help consolidate BOM.
In case you are curious, you can get a datasheet for the module here: https://www.micron.com/products/dram-modules/sodimm/part-catalog/mt16ktf1g64hz-1g9 There is also a printout of SPD contents in case you want to know what's stored there.

I have the 1G6E1 variant; I didn't think we'd need to be running the RAM at 1866MHz, thought 1600 would be more than enough.

I've scanned the datasheet - peak current draw doesn't seem to drop off too much with lower frequencies unfortunately, but I guess a 3A supply provides a little margin - I'll have to be careful to remember this when I'm routing the power supplies.

Had 10 minutes to look at the simulation project (not the latest version with AXI, admittedly), and it looks like mig_tb.sv is the file that generates the signals to the SODIMM in the simulation? Did you write that yourself or was it generated?

asmi · « **Reply #186 on:** February 07, 2023, 02:05:35 pm »

Quote from: nockieboy on February 07, 2023, 11:00:24 am

I've scanned the datasheet - peak current draw doesn't seem to drop off too much with lower frequencies unfortunately, but I guess a 3A supply provides a little margin - I'll have to be careful to remember this when I'm routing the power supplies.

We will have to ensure good decoupling to make sure the rail voltage does not dip during such current spikes.

Quote from: nockieboy on February 07, 2023, 11:00:24 am

Had 10 minutes to look at the simulation project (not the latest version with AXI, admittedly), and it looks like mig_tb.sv is the file that generates the signals to the SODIMM in the simulation? Did you write that yourself or was it generated?

That's the testbench I've writted to demonstrate how to send commands to MIG. MIG generates an example design as well, but I think it's a bit too complex because it's primary goal is not to demonstrate how to work with MIG, but rather to provide a platform for a hardware checkout (that example design includes a traffic generator and a validation logic to check for memory errors).

nockieboy · « **Reply #187 on:** February 07, 2023, 07:52:54 pm »

Quick question regarding SODIMM power supply:

VTT, VREFCA and VREFDQ all reference VDD/2. What's the best way to create this 0.75V supply? Do I need to add another discrete supply, or can I use a voltage divider from the 1.5V rail?

asmi · « **Reply #188 on:** February 07, 2023, 08:27:46 pm »

Quote from: nockieboy on February 07, 2023, 07:52:54 pm

Quick question regarding SODIMM power supply:

VTT, VREFCA and VREFDQ all reference VDD/2. What's the best way to create this 0.75V supply? Do I need to add another discrete supply, or can I use a voltage divider from the 1.5V rail?

You will need to use a tracking regulator (such that VTT and VREF would track closely VDDR rail), often called DDRx termination regulator, something like MP20075, TPS51206 or TPS51200, you connect VREFCA and VREFDQ to VTTREF output of those regulators, and VTT to main (VTT) output of those regs - see attached screenshot from Xilinx's AC701 devboard schematics for reference. You will also need a bunch of decoupling caps, again, you can refer to my second screenshot from the same schematics. And before you ask - CPx devices on the second screenshot are four 0.1 uF caps in a single physical package.

nockieboy · « **Reply #189 on:** February 09, 2023, 05:45:18 pm »

Have been thinking about the power supplies today whilst working. I was originally going to power the board via the USB connector, with the option of a DC jack providing the juice if the design (whatever it may be) gets too hungry. I guess a diode on the USB 5V rail will prevent power going back up the programming/serial USB cable, but there's probably better solutions that don't have such a voltage drop. Ideally I want something that will automatically switch between the USB input and a 5V rail generated from the DC jack, if it's plugged in. That DC jack needs to also generate a 12V rail - if someone wants to use (full) PCIE, they'll need to supply some extra juice via the DC input. So something like this:

The blue boxes are unknowns at this time. The smaller blue box I guess will be a power switch of some kind (LTC4412?), cutting the USB 5V supply off if there's a feed from the DC jack.

I've had a quick look on Mouser for suitable power ICs, it seems switching voltage regulators offer the best performance specs to my inexperienced eye, but they aren't cheap for a decent 12V one. Dual output would allow me to get 12V and 5V rails from a single IC, but it's going to require some meaty supporting components. Any suggestions or ideas?

asmi · « **Reply #190 on:** February 09, 2023, 05:51:02 pm »

Forget about USB - it can't provide enough power. Here is how I'm thinking to implement the power (note that DDR3 termination regulator is not there yet):

Vccint rail is going to be powered by MPM3683-7, which is powered by 12 V input, other rails will be powered by 5 V created by MP8772. I will wire up enable pins and "power good" outputs such that 12 V -> 5 V converter is going to start first, and once it's output is stabilized, it's "power good" output will allow all other rails to start up. The reason I've chosen MP8772 is because I happen to have them in my stock, so I won't have to buy them.

nockieboy · « **Reply #191 on:** February 09, 2023, 10:11:28 pm »

Yeah, I've got no issue creating the <12V rails - it's the 12V rail itself I'm thinking about. Where are you getting that from? I was considering using a DC jack which would take a 'wall-wart' input, or a laptop power supply, for example. Can't guarantee it'll be 12V, probably more like 12-18V, so some form of regulator is going to be required.

NOTE: The above schematic is NOT finished and there are a couple of misnamed nets (power good signals, specifically) and it's missing the DC INPUT -> 12V output regulator.

asmi · « **Reply #192 on:** February 10, 2023, 12:54:26 am »

Quote from: nockieboy on February 09, 2023, 10:11:28 pm

Yeah, I've got no issue creating the <12V rails - it's the 12V rail itself I'm thinking about. Where are you getting that from? I was considering using a DC jack which would take a 'wall-wart' input, or a laptop power supply, for example. Can't guarantee it'll be 12V, probably more like 12-18V, so some form of regulator is going to be required.

You can get a power supply which provides more-or-less precise 12 V power. I happen to have this one, but there are plenty of other supplies on a market which will provide for a good regulation. As for PCI Express power, +12V rail allows ±8% voltage tolerance (which means anything from ~11 to ~13 V is going to be within spec), and as per spec, an x4 card is allowed to draw up to 2.1 Amps from that rail (for total power from that rail of 25 W nominal), which is enough to power the entire devboard (it will probably consume around 15 W - 2-3 W for the RAM, 5-7 W for the FPGA itself, and another 5 watts for other random bits and bobs on a board). So that if you connect your power input directly to the +12 V rail of PCIE and use one of those jumper cable, you can power the second one directly from PCIE slot (provided that a power supply you use can output enough current for both boards).

PCI Express power specification gives quite a bit of freedom for addon cards as to how they can power themselves, but this creates some headaches for the motherboard/host designers (and it kind of makes sense if you think about it - as there are much more addon cards then there are hosts, so addons are more price-sensitive). For example, PCI Express Electromechanical Specification Revision 2.0 (that's the most recent revision I have access to) requires all hosts to provide a 3.3 V ± 9% rail with 3 Amps of current, and also a 12 V ± 8% rail with 2.1 Amps of current (for x4 or x8 PCIE slot), but an x4/x8 addon card can only draw up to 25 W of power from these rails, so some cards will draw power from 3.3 V rail only, others from 12 V rail only, yet others some combination of both. Which means a compliant implementation of a PCIE x4 slot will require another converter just for 3.3 V PCIE rail, or beefing up the one we have for 3.3 V rail so that it can feed all peripherals on a board AND provide enough power for the PCIE slot. Of course there is also an option to just connect PCIE 3.3V rail to the existing converter and pray that no addon card connected to that board will ever consume a significant current off that rail, which is what some chinese boards do (for example MYD-C7Z015 which I have), but I don't think that's the right approach. But I didn't get to PCIE slot yet, so I didn't think about that part of design.

BrianHG · « **Reply #193 on:** February 10, 2023, 01:11:51 am »

Use a USB-C PD power supply with a power delivery profile setting #3 which offers 12v, 3 amp.

Now you can use any 36 watt capable USB wall wart or battery bank.

Even profile #2 may work, 12v 1.5amp, 18 watts, but you probably wont have any breathing room.

asmi · « **Reply #194 on:** February 10, 2023, 01:30:21 am »

Quote from: BrianHG on February 10, 2023, 01:11:51 am

Use a USB-C PD power supply with a power delivery profile setting #3 which offers 12v, 3 amp.

Now you can use any 36 watt capable USB wall wart or battery bank.

Even profile #2 may work, 12v 1.5amp, 18 watts, but you probably wont have any breathing room.

You will need a PD controller for that to work, not worth it for this board IMHO.

nockieboy · « **Reply #195 on:** February 10, 2023, 10:33:39 am »

Quote from: asmi on February 10, 2023, 12:54:26 am

You can get a power supply which provides more-or-less precise 12 V power.

Yeah, I was just thinking of making the board as flexible as possible. I guess a 12V supply isn't too big a hurdle for people to jump if they want to use this board.

Quote from: asmi on February 10, 2023, 12:54:26 am

I happen to have this one, but there are plenty of other supplies on a market which will provide for a good regulation.

This is why I was thinking of flexibility - there's no guarantees someone will plug a good quality supply in, or even one in spec. Maybe I'm worrying too much about that - it takes a pretty big regulator/inductor off the board by having the 12V rail supplied externally. I had a look for something a little cheaper than what you've got and found this - T5994ST on Mouser for a little over £12. Only 5A / 60W, but that should be more than sufficient if you're saying the board will need around 25W.

Quote from: asmi on February 10, 2023, 12:54:26 am

As for PCI Express power, +12V rail allows ±8% voltage tolerance (which means anything from ~11 to ~13 V is going to be within spec), and as per spec, an x4 card is allowed to draw up to 2.1 Amps from that rail (for total power from that rail of 25 W nominal), which is enough to power the entire devboard (it will probably consume around 15 W - 2-3 W for the RAM, 5-7 W for the FPGA itself, and another 5 watts for other random bits and bobs on a board). So that if you connect your power input directly to the +12 V rail of PCIE and use one of those jumper cable, you can power the second one directly from PCIE slot (provided that a power supply you use can output enough current for both boards).

Jumper cable / PCIE edge connector. I could make it so the board has a PCIE slave connector on a board edge for some real bizarre board-ception.

It's a shame those MPM3833's input voltages max out at 6V, otherwise I'd use the 12V rail to directly power all the switching regulators and could get rid of the 5V rail entirely. What if I replace the MPM3833's with these MPM3632s instead? MPM3632 datasheet. I can just run them straight from the 12V rail, get rid of the 5V regulator and they're slightly cheaper than the MPM3833s they're replacing, unless I've missed something obvious (or not so obvious) in the part selection?

Quote from: asmi on February 10, 2023, 12:54:26 am

PCI Express power specification gives quite a bit of freedom for addon cards as to how they can power themselves, but this creates some headaches for the motherboard/host designers (and it kind of makes sense if you think about it - as there are much more addon cards then there are hosts, so addons are more price-sensitive). For example, PCI Express Electromechanical Specification Revision 2.0 (that's the most recent revision I have access to) requires all hosts to provide a 3.3 V ± 9% rail with 3 Amps of current, and also a 12 V ± 8% rail with 2.1 Amps of current (for x4 or x8 PCIE slot), but an x4/x8 addon card can only draw up to 25 W of power from these rails, so some cards will draw power from 3.3 V rail only, others from 12 V rail only, yet others some combination of both. Which means a compliant implementation of a PCIE x4 slot will require another converter just for 3.3 V PCIE rail, or beefing up the one we have for 3.3 V rail so that it can feed all peripherals on a board AND provide enough power for the PCIE slot. Of course there is also an option to just connect PCIE 3.3V rail to the existing converter and pray that no addon card connected to that board will ever consume a significant current off that rail, which is what some chinese boards do (for example MYD-C7Z015 which I have), but I don't think that's the right approach. But I didn't get to PCIE slot yet, so I didn't think about that part of design.

Okay, so I had a look for 3.3V/6A+ regulators (MPM ones, anyway, as I like the non-inductor solution for saving space, if not cost). I'd be looking to use one costing nearly four-times the MPM3632 for a combined 3.3V rail that will supply all the peripherals, including PCIE. I think the logical solution is to add another MPM3632 (if they're an appropriate replacement for the MPM3833s) for a couple of GBP.

asmi · « **Reply #196 on:** February 10, 2023, 05:21:32 pm »

Quote from: nockieboy on February 10, 2023, 10:33:39 am

Yeah, I was just thinking of making the board as flexible as possible. I guess a 12V supply isn't too big a hurdle for people to jump if they want to use this board.

This is why I was thinking of flexibility - there's no guarantees someone will plug a good quality supply in, or even one in spec. Maybe I'm worrying too much about that - it takes a pretty big regulator/inductor off the board by having the 12V rail supplied externally. I had a look for something a little cheaper than what you've got and found this - T5994ST on Mouser for a little over £12. Only 5A / 60W, but that should be more than sufficient if you're saying the board will need around 25W.

If someone wants to power a $400+ board with a crappy $5 wallwart from the nearest ~~dump~~surplus store - more power to them, but I personally think it's a bad idea, which is why I presumed that a person who buys/makes such a board can afford to spend some extra dosh for a quality power supply.

Quote from: nockieboy on February 10, 2023, 10:33:39 am

Jumper cable / PCIE edge connector. I could make it so the board has a PCIE slave connector on a board edge for some real bizarre board-ception.

Having both PCIE slot and an edge connector will require using a high-speed switch for PCIE lanes and a switch for the reference clock line (because PCIE addon cards are supposed to use a clock provided by the host), and I think that's not that great of idea as PCIE addon design has some limitations on a form factor as well as on connector placement, which is why I would prefer to design a separate PCB with an edge connector. Incidentally PCIE connector has an optional JTAG lines, which - if connected properly - can allow programming both host and an addon at the same time using a single JTAG connection (this is called a JTAG chain). AC701 devboard has such connection for a FMC connector with an switch which is tripped automatically when something is connected to that connector, so we can use the same idea for a PCIE port. That will require an addon card designed specifically to support such scenario - as JTAG from FPGA will need to be wired to an edge connector, but I like that idea nonetheless.

Quote from: nockieboy on February 10, 2023, 10:33:39 am

It's a shame those MPM3833's input voltages max out at 6V, otherwise I'd use the 12V rail to directly power all the switching regulators and could get rid of the 5V rail entirely. What if I replace the MPM3833's with these MPM3632s instead? MPM3632 datasheet. I can just run them straight from the 12V rail, get rid of the 5V regulator and they're slightly cheaper than the MPM3833s they're replacing, unless I've missed something obvious (or not so obvious) in the part selection?

That's a good point and a good suggestion. I think MPM3632C would be a better idea than MPM3833, just make sure you pick a MPM3632C and not MPM3632S as these are different parts in different packages absolutely not compatible with each other. I was under impression that you will need a 5V rail anyway to power your existing Z80 sandwitch, but if it's not required, than we can eliminate that rail.

Quote from: nockieboy on February 10, 2023, 10:33:39 am

Okay, so I had a look for 3.3V/6A+ regulators (MPM ones, anyway, as I like the non-inductor solution for saving space, if not cost). I'd be looking to use one costing nearly four-times the MPM3632 for a combined 3.3V rail that will supply all the peripherals, including PCIE. I think the logical solution is to add another MPM3632 (if they're an appropriate replacement for the MPM3833s) for a couple of GBP.

That's what I'm thinking too. I would also connect 12V and 3.3V pins of a PCIE connector through fat (something like 0805) zero ohm resistors to allow disconnecting those rails in case it's required for a connected card (thinking about scenario of a PCIE jumper cable which would connect 3.3V PCIE regulator on one board with the same regulator on another one which will cause problems, or if you want to power a second board with a separate power supply - for example if the one you use isn't powerful enough to power both).

nockieboy · « **Reply #197 on:** February 10, 2023, 08:07:26 pm »

Quote from: asmi on February 10, 2023, 05:21:32 pm

Having both PCIE slot and an edge connector will require using a high-speed switch for PCIE lanes and a switch for the reference clock line (because PCIE addon cards are supposed to use a clock provided by the host), and I think that's not that great of idea as PCIE addon design has some limitations on a form factor as well as on connector placement, which is why I would prefer to design a separate PCB with an edge connector. Incidentally PCIE connector has an optional JTAG lines, which - if connected properly - can allow programming both host and an addon at the same time using a single JTAG connection (this is called a JTAG chain). AC701 devboard has such connection for a FMC connector with an switch which is tripped automatically when something is connected to that connector, so we can use the same idea for a PCIE port. That will require an addon card designed specifically to support such scenario - as JTAG from FPGA will need to be wired to an edge connector, but I like that idea nonetheless.

Yes, I've noticed the JTAG lines and wondered about the possibilities there. So they'd need to be connected to the FPGA following the rules for daisy-chained JTAG devices (TDI being the chained link, everything else is a parallel bus.)

I've been looking at the AC701 design and intend to use a 74LV541A buffer to (presumably) prevent the stubbing issue you've mentioned previously if there's going to be more than two endpoints on the JTAG bus.

I like the automatic switch idea for connecting/disconnecting the PCIE JTAG to the board's FPGA programming bus - is there a purpose-built IC I can use for that task, or is it a case of OR-ing the PRSNT#2's together into a transistor to short TDI/TDO across the PCIE connector?

Speaking of which, I'm not 100% sure how the PRSNT lines are used by the PCIE host; does the host have a weak pullup on PRSNT#1 and the remaining PRSNT#2s are connected to IOs so the FPGA can detect which one is high and determine if a card is connected and, if so, whether it's a x1 or x4 card, or does something else go on there?

Quote from: asmi on February 10, 2023, 05:21:32 pm

I was under impression that you will need a 5V rail anyway to power your existing Z80 sandwitch, but if it's not required, than we can eliminate that rail.

Originally yes, but now I've decided to ditch the old uCOM stack and go all-in for making this board a full soft-core CPU system, I don't see the need for a 5V rail any more. 100k LEs should be enough to emulate most systems people would want to run, including up to Linux-running 32-bit systems. It shouldn't take much effort for me to get a Z80 core running and emulate the ROM.

In fact, ROM is going to be something I need to think about. There needs to be some form of permanent storage on the board for ROM software/data. i.e., my uCOM boots from ROM (as do most computers, I suspect), so I'd need space on an EEPROM or other form of storage (I'm open to suggestions) to hold this data and allow the soft-core CPU to boot up without having to rely on using the FPGA's internal memory. Should be simple enough to connect a FRAM or serial EEPROM to the FPGA and map its memory into wherever it would need to go for the soft-core CPU? Would it be possible to use spare room on the FPGA's config flash chip for this?

Quote from: asmi on February 10, 2023, 05:21:32 pm

That's what I'm thinking too. I would also connect 12V and 3.3V pins of a PCIE connector through fat (something like 0805) zero ohm resistors to allow disconnecting those rails in case it's required for a connected card (thinking about scenario of a PCIE jumper cable which would connect 3.3V PCIE regulator on one board with the same regulator on another one which will cause problems, or if you want to power a second board with a separate power supply - for example if the one you use isn't powerful enough to power both).

Good point. What about instead of 0805 links, would DIP switches be okay/suitable? Would make it all much more easily configurable.

nockieboy · « **Reply #198 on:** February 10, 2023, 08:43:00 pm »

Something else just sprang to mind for a peripheral - as well as the 10/100/1000 Ethernet interface, what about wifi? What would be the best way to implement that? I guess a wifi module, like the ESP32, would work but is there a way to do it more simply, without the middle-man microcontroller?

asmi · « **Reply #199 on:** February 10, 2023, 09:15:06 pm »

Quote from: nockieboy on February 10, 2023, 08:07:26 pm

I like the automatic switch idea for connecting/disconnecting the PCIE JTAG to the board's FPGA programming bus - is there a purpose-built IC I can use for that task, or is it a case of OR-ing the PRSNT#2's together into a transistor to short TDI/TDO across the PCIE connector?

Come to think about it, it might not be the best idea to do the switch automatically because if you insert a card which doesn't have JTAG pins connected at all (which is what most consumer addon cards do), you won't be able to program FPGA at all as the JTAG chain will not be completed. So some kind of jumper or manual switch might be a better idea.

Quote from: nockieboy on February 10, 2023, 08:07:26 pm

Speaking of which, I'm not 100% sure how the PRSNT lines are used by the PCIE host; does the host have a weak pullup on PRSNT#1 and the remaining PRSNT#2s are connected to IOs so the FPGA can detect which one is high and determine if a card is connected and, if so, whether it's a x1 or x4 card, or does something else go on there?

Here is a diagram from the spec:

Basically an add-in card is required to connect PRSNT1# to the farthest PRSNT2# pin it's got (there are multiple of them depending on a number of links). That stuff is only really required if you want to implement a hotplug, but in that case you will need to add power switches to power lines and only enable power once you detect that PRSNT1# and PRSNT2# are shorted, these pins are supposed to be shorter on the edge connector so that they will be the last to mate and first to unmate. This prevents arcs, sparks and other issues which can occur as power pins are being mated/unmated.

Quote from: nockieboy on February 10, 2023, 08:07:26 pm

In fact, ROM is going to be something I need to think about. There needs to be some form of permanent storage on the board for ROM software/data. i.e., my uCOM boots from ROM (as do most computers, I suspect), so I'd need space on an EEPROM or other form of storage (I'm open to suggestions) to hold this data and allow the soft-core CPU to boot up without having to rely on using the FPGA's internal memory. Should be simple enough to connect a FRAM or serial EEPROM to the FPGA and map its memory into wherever it would need to go for the soft-core CPU? Would it be possible to use spare room on the FPGA's config flash chip for this?

You can use whatever leftover space in a QSPI flash for that purpose. That's actually how a lot of my designs worked - bitstream includes a small bootloader which resides inside BRAM which would copy the main application code from QSPI into RAM and then launches it.

Quote from: nockieboy on February 10, 2023, 08:07:26 pm

Good point. What about instead of 0805 links, would DIP switches be okay/suitable? Would make it all much more easily configurable.

You will need one hell of a switch as it will need to be rated for at least 3 Amps. Jumpers are much easier IMHO. Especially since I don't expect that you will need to flip it often.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee