I actually bought a tag-connect pogo-pin connector recently, and I want to give it a try along with my Digilent HS3 programming cable to save some more even on a connector (it only requires a footprint). BTW if you want to save a few bucks, you can buy a Xilinx programmer on Aliexpress for a few bucks and they seem to work fairly OK. I prefer using geniune Digilent programming cable, it's about $55 so not a big deal considering it's a one-time investment.
There is a bit of a shortage of PHY devices right now, so pretty much any device that can talk RGMII and you can get your hands on should be good. There are some gotchas with older RGMII v1.x devices which required adding a clock delay loop on a PCB, but with RGMII v2.0 there is now an option to add that delay internally, so we will need to check a datasheet for whichever device you end up choosing if it requires PCB clock delay or not. I've managed to snag a few of 88E1510-A0-NNB2C000's a couple of months ago, these are fairly expensive (though still in stock at Mouser right now), so if you find something at a more reasonable price, it should still be good.
I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10
BTW Artix-7 devices I've ordered have been shipped this morning, DHL says they should arrive by this Friday.
We will also need a bunch of clock chips and some crystals, but we will see into it once we have better idea of the whole system. At the very least we will need a main system clock on a module, and a 135 MHz LVDS clock for GTP/DisplayPort, and some crystals for Ethernet PHY (typically 25 MHz) and a USB 2.0 ULPI PHY (typically 24 MHz) - all of those are on a carrier.
Vivado Project Options:
Target Device : xc7a100t-fgg484
Speed Grade : -2
HDL : verilog
Synthesis Tool : VIVADO
MIG Output Options:
Module Name : mig_7series_0
No of Controllers : 1
Selected Compatible Device(s) : xc7a35t-fgg484, xc7a50t-fgg484, xc7a75t-fgg484, xc7a15t-fgg484
FPGA Options:
System Clock Type : Differential
Reference Clock Type : Differential
Debug Port : OFF
Internal Vref : enabled
IO Power Reduction : ON
XADC instantiation in MIG : Enabled
Extended FPGA Options:
DCI for DQ,DQS/DQS#,DM : enabled
Internal Termination (HR Banks) : 50 Ohms
/*******************************************************/
/* Controller 0 */
/*******************************************************/
Controller Options :
Memory : DDR3_SDRAM
Interface : NATIVE
Design Clock Frequency : 2500 ps (400.00 MHz)
Phy to Controller Clock Ratio : 4:1
Input Clock Period : 2499 ps
CLKFBOUT_MULT (PLL) : 2
DIVCLK_DIVIDE (PLL) : 1
VCC_AUX IO : 1.8V
Memory Type : Components
Memory Part : MT41K256M16XX-107
Equivalent Part(s) : --
Data Width : 32
ECC : Disabled
Data Mask : enabled
ORDERING : Strict
AXI Parameters :
Data Width : 256
Arbitration Scheme : RD_PRI_REG
Narrow Burst Support : 0
ID Width : 4
Memory Options:
Burst Length (MR0[1:0]) : 8 - Fixed
Read Burst Type (MR0[3]) : Sequential
CAS Latency (MR0[6:4]) : 6
Output Drive Strength (MR1[5,1]) : RZQ/7
Controller CS option : Disable
Rtt_NOM - ODT (MR1[9,6,2]) : RZQ/4
Rtt_WR - Dynamic ODT (MR2[10:9]) : Dynamic ODT off
Memory Address Mapping : BANK_ROW_COLUMN
Bank Selections:
Bank: 34
Byte Group T0: Address/Ctrl-0
Byte Group T1: Address/Ctrl-1
Byte Group T2: Address/Ctrl-2
Bank: 35
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]
Reference_Clock:
SignalName: clk_ref_p/n
PadLocation: T5/U5(CC_P/N) Bank: 34
System_Clock:
SignalName: sys_clk_p/n
PadLocation: R4/T4(CC_P/N) Bank: 34
System_Control:
SignalName: sys_rst
PadLocation: No connect Bank: Select Bank
SignalName: init_calib_complete
PadLocation: No connect Bank: Select Bank
SignalName: tg_compare_error
PadLocation: No connect Bank: Select Bank
I've often thought about using a pogo-pin connector instead of having to find room for a physical connector on the board, so would be happy to give this a try. Do you have any links to suitable products?
That reduces the cost to something more reasonable. I'll see if I can get some samples.
I haven't heard anything yet, so will keep patiently waiting.
I've never used them before, but it looks like I'm going to need to design for differential clocks rather than single-ended ones for extra stability, especially with the DDR3 controller. How many clocks do I need for the FPGA? I've run through Vivado earlier and created a DDR3 controller using MIG (for the first time ever - details below) - it's talking about a dedicated clock for the DDR3?
So, I've run MIG to make a start on setting up a DDR3 controller simulation; these are the settings I used:
Is that right or have I made any mistakes? I wasn't sure about the bank choices - it was defaulting to assigning the controls/address/data to Banks 14-16, but that's no good as it's sharing with the configuration pins, so I've moved all the DDR3-related IO to Banks 34 & 35.
Bank: 34
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]
Bank: 35
Byte Group T1: Address/Ctrl-2
Byte Group T2: Address/Ctrl-1
Byte Group T3: Address/Ctrl-0
But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.As mentioned above, not sure about reference_clock and system_clock. I presume system_clock is the main FPGA clock, which could be running at a different frequency to the DDR3? Is the reference clock supposed to be 400MHz or 1/4 of the system_clock?
I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10
Just popping back to this post for a sec. I've had a look on the Samtec website at the two connectors you've linked, but the first one (BSE-060-01-F-D-A-TR) looks like it's a 60-pin part? Am a little confused as the symbol imports into EasyEDA with two sub-components, each with 60-pins (the 2nd part you linked imports as a single 120-pin connector symbol), so it could well be a 120-pin connector, but I just wanted to check with you that these are definitely both 120-pin connectors and a matching pair?
EDIT: I've checked the footprints and they've both got 120 pins and are the same width, so I guess it's just an odd difference in the way the symbols are represented for both parts.
EDIT 2: So, male connectors on the core or carrier board? Does it matter?
I've bought mine here: www.tag-connect.com For you I would recommend to get a 10 pin legged cable + adapter for Xilinx programmer: https://www.tag-connect.com/debugger-cable-selection-installation-instructions/xilinx-platform-cable-usb#85_171_146:~:text=Xilinx%202mm%20to%2010%20Pin%20Plug%2Dof%2DNails%E2%84%A2%20%2D%20With%20Legs specifically, "Xilinx 2mm to 10 Pin Plug-of-Nails™ - With Legs". BTW some of these things are carried by the likes of Digikey, so you might want to check if they got some in stock locally, because shipping from that company directly from US will probably cost you more than from DK et al which tend to have local warehouses. Just make sure you double check parts numbers you are ordering - this company has got a ton of adapters for different programmers/debuggers, so it's easy to order a wrong one by mistake.
If my experience is anything to go by, you should have no problems. At no time did they ever asked me a million of questions which other companies typically ask (like your project, volume, dates, etc.) - they just shipped what I asked for with no fuss (and even paid for express shipping from US!), which made me a loyal customer of theirs (and an easy recommendation for others) because I know I can rely on them for both samples and for actual productions parts should the likes of DK decide for some reason to not stock the part I'm after. Of course I don't abuse this service by requesting all their inventory or anything like that, but I tried requesting samples from one of their competitors, which asked me to fill out a 3 screens-long form with a metric ton of questions, to which I just said "screw it" and bought samples myself because my time and sanity are worth more to me than these samples were.
Check your order status on their website. They never sent me a shipping notification, though later in a day DHL has sent me a request to pay sales tax and their customs fee.
There are many ways to skin this cat. I often used a single-ended 100 MHz base clock and a PLL to generate both clocks required for MIG. But this kind of "wastes" one extra MCMM because MIG itself uses MCMM. In this case you select the frequency for that clock to something like 200 MHz and select a "no buffer" option in the MIG.
Alternative often used solution is to use 200 MHz LVDS differential clock selected right in the MIG and select a "Use system clock" option for the reference clock (I will explain below what it's for). The advantage of this approach is that you only use a single MCMM.
See, it wasn't that bad, was it?
Like I said above, I would select "5000 ps (200 MHz)" in the "Input Clock Period" drop down, on the next page System Clock: "Differential" (or "No Buffer" if you want more flexibility on which pin(s) your system clock will be), Reference Clock: "Use System Clock" (this option will only appear if you set your input clock period to 200 MHz).
As for pinout, I would do it like this:Code: [Select]Bank: 34
But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]
Bank: 35
Byte Group T1: Address/Ctrl-2
Byte Group T2: Address/Ctrl-1
Byte Group T3: Address/Ctrl-0
The number in the part # shows the number of pins per row. I've tripped over that many times already, so I typically just count the number of pins on their 3D model
As for symbols and footprints, I typically make my own just to be sure they are correct (and use their 3D model to make sure it's correct, and if I have any doubts I would print a footprint on a piece of paper and physically place connector on top to see if it's OK), so not really sure what theirs look like.
As for which one goes where, I don't think it matters. I noticed on this module BSE goes on a module and BTE on a carrier, so let's make it the same - presumably those guys knew what they were doing.
While you are working on connectors footprints, please be SUPER careful about which pin number mates to which pin number on the mating connector. I've screwed up this an untold number of times, forgetting that connectors mate with a mirrored pin numbers. So try to number pins on both footprints such that pin 1 of one side mates to pin 1 of the mating side, otherwise you are going to have a disaster on your hands.
I know it's a one-off cost, but I'm a little reticent about having to spend $50+ on what is basically a bit of ribbon cable, some pins and springs. Also, thinking about it, these are intended to be dev boards so we should make it as easy as possible to program them. I think I'm probably going to stick with an SMD or TH pin header if space allows for the JTAG port.
Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components. Two of each connector are now in the post, which will allow me to make one carrier/core board.
It's left Hong Kong and is on its way, apparently.
If I'm reading the datasheet correctly, the XC7A100T has 6 of them? That's a couple more PLLs than the Cyclone or MAX10. I'm not sure how significant the system clock speed will be for the GPU - perhaps @BrianHG could give me a steer on valid system clock speeds to use? I seem to recall that a 27MHz clock would fit nicely with the video circuitry, but the MAX10 and Cyclone IV GPUs both ran fine on 50MHz system clocks. I guess it boils down to using a PLL to create the 200MHz clock from a slower system clock, or a slower video clock with a PLL from a 200MHz system clock.
I haven't tried it yet! Not sure if the project is set up correctly or anything tbh, but one downside to Xilinx is that the project is over 6MB in size, even after zipping, so I haven't included it with this post due to size constraints. At least Intel/Altera projects compress down to hundreds of KB without the bitstreams.
Yes, I'll be checking and double-checking the PCB design when I get that far. I really don't want to mess that up!
Okay, I've attached the latest power supply schematic.
I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...
Now, as far as programming headers go, geniune Xilinx programmer (and Digilent HS3 cable) uses 2 mm pitch connector, and some of it's clones retain that too, while others use regular 0.1" one, and yet other clones have both options (this is what I have in my clone), so you might want to check up which one of those do you have so that you will fit the right header. I also preferred using header with a polarized shroud which prevents connecting cable the wrong way around (and likely destroying it in the process because the opposite pins are all grounded). The easiest way to check is to get any 0.1" open header and (with everything unpowered of course) try connecting programmer to it to see if it would fit. If so then you will need a 0.1" pitch connector, otherwise - a 2 mm pitch one.
Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components. Two of each connector are now in the post, which will allow me to make one carrier/core board.Great! See, I told you they are nice people. Though I never had to send them any emails, I always used "Get a Free Sample" buttons on their website and ordered that way. But hey - whichever way works is good in my book.
There is a File -> Project -> Archive command, and you can even clean up resulting archive somewhat, but MIG generates a lot of stuff (there are actually two designs generated - user design and example design, all in the output). Fortunately you can simply save MIG's *.prj file along with a constraints file and re-generate the whole thing on a receiving end ("Verify Pin Changes and Update Design" branch in the wizard flow).
Okay, I've attached the latest power supply schematic.No you haven't
I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...According to the datasheet of the MPM3683-7 module, it's supposed to have low enough ripple for powering GTPs (<10 mVpp). Maybe we should include a provision for a simple Pi filter (a bead with two caps on each end of it, for example this one, but any one with DC resistance of < 50 mOhm and a DC current rating of >1 A will do) plus some local capacitance just in case we'll need it, you can simply short out the bead by a zero ohm resistor during initial assembly.
My clone has an adaptor with 3 different connectors and cables - 2x5 pins @2.54mm, 2x7 @2mm and a 2.54mm single-row pin header (for custom connections, I guess).
Yes, very friendly. I could have pushed the 'Get Free Sample' button, but I thought I'd drop them a line to say hello as well. Even after all these years of the internet, I still get a bit excited about talking to people in different time zones/countries.
That's handy to know - I'm going to want to share the project with you at some point to verify my results or - more likely - work out what I'm doing wrong.
All I did was create a new blank project, set the target device to XC7A100T-2-FGG484 and ran through MIG. There's no top level entity or anything like that. I guess at this stage I'm just looking for pin assignments for the DDR3, but pretty soon I'm going to need to start integrating BrianHG's DDR3 controller so that I can start simulating it.
Darn it. I've been having trouble with this forum recently - I've noticed a drag'n'drop box has appeared in the attachments section for a new post, which seems to cause more problems for me than it solves. I definitely attached the PDF yesterday, it even said it had uploaded. I'll try again with this post, but instead of drag/drop I'll just select it in the file explorer via the 'Choose file' button.
Never heard of a Pi filter before, so that's something new to learn about! Well, hopefully you'll get the attachment this time and can see what I've done so far.
EDIT: Question regarding the power supplies. All of the MPM power chips use a resistor divider to set their output voltage. The datasheets/MPM design software quote values for those resistors, which I have used in the attached schematic. I note that the MCM3683-7 that generates VCCINT and 1V0_MGT uses 2K and 3K resistors (R4 and R12) - is there any reason I can't swap them for 200K and 300K resistors to use the same parts as MPM3833s?
Everything encircled in purple runs at the pixel clock rate.
Everything encircled in purple runs at the pixel clock rate.Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? This is how all video cards work, it allows them to not depend on a current video output mode, and it's also trivial to do double- and triple-buffering to alleviate tearing and other aftifacts which occur when frame generation is running asyncronously to output. It's also trivial to give a CPU direct access to the framebuffer if need be in this approach. But the most important advantage is that you can run your actual frame generation at faster clock to get more stuff done in the same amount of time, and in general seems like a more sensible approach - for example if nothing in the framebuffer changes, whole frame generation pipeline can just idle as opposed to churning out the same frames over and over again. Or I'm missing something here?
It's not a renderer. It is a real-time multi-window, multi-depth, multi-tile memory to a raster display. No tearing. You can double or triple buffer each layer individually if you like, but even that isn't needed with my code for the smoothest silky scrolling.
Set the maximum layers option to 1, turn off the palette and text, and the code enabled would only be 1 line buffer going straight to the video dac output port. Then, you need to software render or simulate all the layers and window blending.
This was the easiest solution for the Z80 to handle multiple display layers without any drawing commands, or waiting for a rendering engine read multiple bitmaps and tile maps, blend them together to construct a buffer frame, when ready, do a frame swap, and redo this every V-sync.
You want text, turn on a font layer. You want backgrounds, turn on a 32bit layer. You want meters to blend in, then out, turn on and off those layers. You want additional overlays, or sprites, turn on and off those layers, set their coordinates, size and other metrics, like which ram location IE: frame for the sprite's animation.
You want 4bit layers mixed with 8bit layers with different palettes for each 8bit layer, plus a few 32bit layers, no problem. All with alpha blended shading.
This was all done in around 18.3kle for 16 layered graphics on an FPGA which can barely maintain 200MHz, yet my core was running it fine at 100MHz, only the pixel clock ran at 148.5Mhz, consuming under 2watts including the DDR3 controller and DVI transmitter.
Currently, with a single 16bit DDR3 running at 800mhz
This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.
This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.
Those values are always a compromise between noise immunity and efficiency - the higher the values are, the smaller current goes through them, which improves efficiency, but also increases noise sensitivity. So maybe we can meet in the middle and use 20k and 30k in both cases? I dunnow, I tend to prefer sticking to the recommended values because resistors cost next to nothing.
One important point is to do your very best to follow layout recommendations from datasheets or copying their evaluation boards' layout. A lot in DC-DC converter performance depends on a proper layout, so try sticking to those recommendations as close as possible.
That said, there is something weird going on with that module's layout. In the pinout description, they recommend creating a copper pour for SW nodes, in their devboard's schematics they show these pins as connected, however on a layout per-layer pictures each SW node is isolated. I've sent them an enquiry regarding that, let's see what they say. I guess it's best to do what the layout shows, as opposed to what the text tells to do, but let's see if/what they respond.
One more question for you - do you have a code name for that project/module/PCB? I like to come up with cool code names for my projects, for example the board in my signature is "Sparta", so saying "Sparta-50" ("50" stands for FPGA density used) referring to that board is much easier than referring to it as a "beginner-friendly FPGA board with Spartan-7 FPGA and a DDR2 memory"
I obviously missed the significance of that and didn't look into using SODIMM at all. Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board? Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?
As we're designing from scratch, what would be your preferred memory setup? As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!). Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?
Haha, yes I do. I'm calling it the XCAT-100. It's just the FPGA part name with the 7 removed and one letter moved up from the end to make it (almost) an English word. I can design a logo with a cat's head and an X later.
What values would be appropriate for the Pi filter?
0.47 uF on the input side (before the bead), and 4.7 uF on the output side. The point of this filter is to provide a shortcut for high-frequency noise from Vccint pins before it reaches GTP's power pins, and at the same time provide enough capacitance after the filter to sustain the voltage rail within spec once noise subsides. There are going to be two more 0.1 uF caps and another 4.7 uF cap on the MGTAVCC side as per recommendations in UG482, all of that should be enough.
What's the L1 on that schematic?
I obviously missed the significance of that and didn't look into using SODIMM at all. Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board? Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?
As we're designing from scratch, what would be your preferred memory setup? As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!). Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?Maybe we can settle for a sort of a hybrid approach - use 32 bit interface for layers and composition, and then have a dedicated interface for framebuffers? In that case 16-bit interface should be enough for 2 parallel streams (one to write the frame, another one to read a frame and output it to HDMI), and we can implement it in a single IO bank, leaving about 130 or so IO pins (plus whatever we can recover from 32-bit interface banks if we go that route).
These extra caps after the pi filter - these are from Table 5-7 on page 230 of UG482?
I take it these are different to the 0.1uF caps mentioned in Table 5-11 on page 233?
That's there to isolate VCCINT noise from 1V0_MGT? It was there on reference schematics and I've used the same method previously on the Cyclone GPU PCB to filter noise out of the line. I guess the pi filter does a better job and it can be removed?
What? How does handicapping you bandwidth, or forcibly evenly dividing it between 2 different controllers save IOs, or logic elements, or improve global memory design access VS 1 controller at double speed?
Say I decode /playback a video. On that output, I want to perform multiple Lanczos scaling steps with both X&Y convolutional filtering, plus a few Z frames de-noise filtering before output and analysis, maybe even motion adaptive de-interlacing. Your 2 separate controller cut my image processing bandwidth in half as it will sit in one memory controller before being exported to the frame buffer while a portion of that frame buffer's bandwidth may never be used. Especially if my image size is a video larger in resolution than the available memory on the frame buffer side where it may be mandatory to pre-process everything in one controller and only export the visible area to the other controller's ram.