Author Topic: Planning/design/review for a 6-layer Xilinx Artix-7 board for DIY computer.  (Read 36334 times)

0 Members and 1 Guest are viewing this topic.

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
I actually bought a tag-connect pogo-pin connector recently, and I want to give it a try along with my Digilent HS3 programming cable to save some more even on a connector (it only requires a footprint). BTW if you want to save a few bucks, you can buy a Xilinx programmer on Aliexpress for a few bucks and they seem to work fairly OK. I prefer using geniune Digilent programming cable, it's about $55 so not a big deal considering it's a one-time investment.

I've often thought about using a pogo-pin connector instead of having to find room for a physical connector on the board, so would be happy to give this a try.  Do you have any links to suitable products?

I have a cheap Xilinx programmer - got one a couple of years ago for the Spartan6 board I'd gotten, but have never used it.

There is a bit of a shortage of PHY devices right now, so pretty much any device that can talk RGMII and you can get your hands on should be good. There are some gotchas with older RGMII v1.x devices which required adding a clock delay loop on a PCB, but with RGMII v2.0 there is now an option to add that delay internally, so we will need to check a datasheet for whichever device you end up choosing if it requires PCB clock delay or not. I've managed to snag a few of 88E1510-A0-NNB2C000's a couple of months ago, these are fairly expensive (though still in stock at Mouser right now), so if you find something at a more reasonable price, it should still be good.

Okay, well don't forget I'm designing for the core board primarily so I'm not overly concerned about lack of parts availability for features that aren't on my 'must have' list for the carrier board.  So long as the design for the core board doesn't exclude these 'nice to have' items later, then that's good enough for me.

I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10

That reduces the cost to something more reasonable.  I'll see if I can get some samples. :)

BTW Artix-7 devices I've ordered have been shipped this morning, DHL says they should arrive by this Friday.

I haven't heard anything yet, so will keep patiently waiting. :)

We will also need a bunch of clock chips and some crystals, but we will see into it once we have better idea of the whole system. At the very least we will need a main system clock on a module, and a 135 MHz LVDS clock for GTP/DisplayPort, and some crystals for Ethernet PHY (typically 25 MHz) and a USB 2.0 ULPI PHY (typically 24 MHz) - all of those are on a carrier.

I've never used them before, but it looks like I'm going to need to design for differential clocks rather than single-ended ones for extra stability, especially with the DDR3 controller.  How many clocks do I need for the FPGA?  I've run through Vivado earlier and created a DDR3 controller using MIG (for the first time ever - details below) - it's talking about a dedicated clock for the DDR3?

So, I've run MIG to make a start on setting up a DDR3 controller simulation; these are the settings I used:

Code: [Select]
Vivado Project Options:
   Target Device                   : xc7a100t-fgg484
   Speed Grade                     : -2
   HDL                             : verilog
   Synthesis Tool                  : VIVADO

MIG Output Options:
   Module Name                     : mig_7series_0
   No of Controllers               : 1
   Selected Compatible Device(s)   : xc7a35t-fgg484, xc7a50t-fgg484, xc7a75t-fgg484, xc7a15t-fgg484

FPGA Options:
   System Clock Type               : Differential
   Reference Clock Type            : Differential
   Debug Port                      : OFF
   Internal Vref                   : enabled
   IO Power Reduction              : ON
   XADC instantiation in MIG       : Enabled

Extended FPGA Options:
   DCI for DQ,DQS/DQS#,DM          : enabled
   Internal Termination (HR Banks) : 50 Ohms

/*******************************************************/
/*                  Controller 0                       */
/*******************************************************/

Controller Options :
   Memory                        : DDR3_SDRAM
   Interface                     : NATIVE
   Design Clock Frequency        : 2500 ps (400.00 MHz)
   Phy to Controller Clock Ratio : 4:1
   Input Clock Period            : 2499 ps
   CLKFBOUT_MULT (PLL)           : 2
   DIVCLK_DIVIDE (PLL)           : 1
   VCC_AUX IO                    : 1.8V
   Memory Type                   : Components
   Memory Part                   : MT41K256M16XX-107
   Equivalent Part(s)            : --
   Data Width                    : 32
   ECC                           : Disabled
   Data Mask                     : enabled
   ORDERING                      : Strict

AXI Parameters :
   Data Width                    : 256
   Arbitration Scheme            : RD_PRI_REG
   Narrow Burst Support          : 0
   ID Width                      : 4

Memory Options:
   Burst Length (MR0[1:0])          : 8 - Fixed
   Read Burst Type (MR0[3])         : Sequential
   CAS Latency (MR0[6:4])           : 6
   Output Drive Strength (MR1[5,1]) : RZQ/7
   Controller CS option             : Disable
   Rtt_NOM - ODT (MR1[9,6,2])       : RZQ/4
   Rtt_WR - Dynamic ODT (MR2[10:9]) : Dynamic ODT off
   Memory Address Mapping           : BANK_ROW_COLUMN

Bank Selections:
Bank: 34
Byte Group T0: Address/Ctrl-0
Byte Group T1: Address/Ctrl-1
Byte Group T2: Address/Ctrl-2
Bank: 35
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]

Reference_Clock:
SignalName: clk_ref_p/n
PadLocation: T5/U5(CC_P/N)  Bank: 34

System_Clock:
SignalName: sys_clk_p/n
PadLocation: R4/T4(CC_P/N)  Bank: 34

System_Control:
SignalName: sys_rst
PadLocation: No connect  Bank: Select Bank
SignalName: init_calib_complete
PadLocation: No connect  Bank: Select Bank
SignalName: tg_compare_error
PadLocation: No connect  Bank: Select Bank

Is that right or have I made any mistakes?  I wasn't sure about the bank choices - it was defaulting to assigning the controls/address/data to Banks 14-16, but that's no good as it's sharing with the configuration pins, so I've moved all the DDR3-related IO to Banks 34 & 35.

As mentioned above, not sure about reference_clock and system_clock.  I presume system_clock is the main FPGA clock, which could be running at a different frequency to the DDR3?  Is the reference clock supposed to be 400MHz or 1/4 of the system_clock?

 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
I've often thought about using a pogo-pin connector instead of having to find room for a physical connector on the board, so would be happy to give this a try.  Do you have any links to suitable products?
I've bought mine here: www.tag-connect.com For you I would recommend to get a 10 pin legged cable + adapter for Xilinx programmer: https://www.tag-connect.com/debugger-cable-selection-installation-instructions/xilinx-platform-cable-usb#85_171_146:~:text=Xilinx%202mm%20to%2010%20Pin%20Plug%2Dof%2DNails%E2%84%A2%20%2D%20With%20Legs specifically, "Xilinx 2mm to 10 Pin Plug-of-Nails™ - With Legs". BTW some of these things are carried by the likes of Digikey, so you might want to check if they got some in stock locally, because shipping  from that company directly from US will probably cost you more than from DK et al which tend to have local warehouses. Just make sure you double check parts numbers you are ordering - this company has got a ton of adapters for different programmers/debuggers, so it's easy to order a wrong one by mistake.

That reduces the cost to something more reasonable.  I'll see if I can get some samples. :)
If my experience is anything to go by, you should have no problems. At no time did they ever asked me a million of questions which other companies typically ask (like your project, volume, dates, etc.) - they just shipped what I asked for with no fuss (and even paid for express shipping from US!), which made me a loyal customer of theirs (and an easy recommendation for others) because I know I can rely on them for both samples and for actual productions parts should the likes of DK decide for some reason to not stock the part I'm after. Of course I don't abuse this service by requesting all their inventory or anything like that, but I tried requesting samples from one of their competitors, which asked me to fill out a 3 screens-long form with a metric ton of questions, to which I just said "screw it" and bought samples myself because my time and sanity are worth more to me than these samples were.

I haven't heard anything yet, so will keep patiently waiting. :)
Check your order status on their website. They never sent me a shipping notification, though later in a day DHL has sent me a request to pay sales tax and their customs fee.

I've never used them before, but it looks like I'm going to need to design for differential clocks rather than single-ended ones for extra stability, especially with the DDR3 controller.  How many clocks do I need for the FPGA?  I've run through Vivado earlier and created a DDR3 controller using MIG (for the first time ever - details below) - it's talking about a dedicated clock for the DDR3?
There are many ways to skin this cat. I often used a single-ended 100 MHz base clock and a PLL to generate both clocks required for MIG. But this kind of "wastes" one extra MCMM because MIG itself uses MCMM.  In this case you select the frequency for that clock to something like 200 MHz and select a "no buffer" option in the MIG.
Alternative often used solution is to use 200 MHz LVDS differential clock selected right in the MIG and select a "Use system clock" option for the reference clock (I will explain below what it's for). The advantage of this approach is that you only use a single MCMM.

So, I've run MIG to make a start on setting up a DDR3 controller simulation; these are the settings I used:

Is that right or have I made any mistakes?  I wasn't sure about the bank choices - it was defaulting to assigning the controls/address/data to Banks 14-16, but that's no good as it's sharing with the configuration pins, so I've moved all the DDR3-related IO to Banks 34 & 35.
See, it wasn't that bad, was it? ;)
Like I said above, I would select "5000 ps (200 MHz)" in the "Input Clock Period" drop down, on the next page System Clock: "Differential" (or "No Buffer" if you want more flexibility on which pin(s) your system clock will be), Reference Clock: "Use System Clock" (this option will only appear if you set your input clock period to 200 MHz).
As for pinout, I would do it like this:
Code: [Select]
Bank: 34
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]
Bank: 35
Byte Group T1: Address/Ctrl-2
Byte Group T2: Address/Ctrl-1
Byte Group T3: Address/Ctrl-0
But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.

As mentioned above, not sure about reference_clock and system_clock.  I presume system_clock is the main FPGA clock, which could be running at a different frequency to the DDR3?  Is the reference clock supposed to be 400MHz or 1/4 of the system_clock?
System Clock is the one which will be used to derive the actual memory interface clock, while Reference Clock is used to drive special hardware block - "Delay Controller", which is used by delay blocks as time reference, that clock has to have a fixed frequency of 200 Mhz (in some cases and for some devices it can also be 300 or 400 Mhz). MIG outputs ui_clk, which is the one your HDL has to use to interact with the controller, it is either 1/2 or 1/4 of a memory frequency.

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
I think a pair of 120 pin connectors (240 pins total) should be plenty for our needs. Each connector is 53 mm long, so something like 5 x 6 cm PCB should be good. I looked up parts, and it looks like those two are in stock and in reserve (so they usually ship quickly): https://www.samtec.com/products/bse-060-01-f-d-a-tr $7 for qty 1, $6.47 for qty 10, mating part is this one: https://www.samtec.com/products/bte-060-01-f-d-a-k-tr $7.3 for qty 1, $6.75 for qty 10

Just popping back to this post for a sec.  I've had a look on the Samtec website at the two connectors you've linked, but the first one (BSE-060-01-F-D-A-TR) looks like it's a 60-pin part?  Am a little confused as the symbol imports into EasyEDA with two sub-components, each with 60-pins (the 2nd part you linked imports as a single 120-pin connector symbol), so it could well be a 120-pin connector, but I just wanted to check with you that these are definitely both 120-pin connectors and a matching pair?

EDIT: I've checked the footprints and they've both got 120 pins and are the same width, so I guess it's just an odd difference in the way the symbols are represented for both parts. :-//
EDIT 2:  So, male connectors on the core or carrier board?  Does it matter?
« Last Edit: January 04, 2023, 07:07:43 pm by nockieboy »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
Just popping back to this post for a sec.  I've had a look on the Samtec website at the two connectors you've linked, but the first one (BSE-060-01-F-D-A-TR) looks like it's a 60-pin part?  Am a little confused as the symbol imports into EasyEDA with two sub-components, each with 60-pins (the 2nd part you linked imports as a single 120-pin connector symbol), so it could well be a 120-pin connector, but I just wanted to check with you that these are definitely both 120-pin connectors and a matching pair?

EDIT: I've checked the footprints and they've both got 120 pins and are the same width, so I guess it's just an odd difference in the way the symbols are represented for both parts. :-//
EDIT 2:  So, male connectors on the core or carrier board?  Does it matter?
The number in the part # shows the number of pins per row. I've tripped over that many times already, so I typically just count the number of pins on their 3D model ;D
As for symbols and footprints, I typically make my own just to be sure they are correct (and use their 3D model to make sure it's correct, and if I have any doubts I would print a footprint on a piece of paper and physically place connector on top to see if it's OK), so not really sure what theirs look like.
As for which one goes where, I don't think it matters. I noticed on this module BSE goes on a module and BTE on a carrier, so let's make it the same - presumably those guys knew what they were doing.

While you are working on connectors footprints, please be SUPER careful about which pin number mates to which pin number on the mating connector. I've screwed up this an untold number of times, forgetting that connectors mate with a mirrored pin numbers. So try to number pins on both footprints such that pin 1 of one side mates to pin 1 of the mating side, otherwise you are going to have a disaster on your hands.
« Last Edit: January 05, 2023, 02:44:52 am by asmi »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
I've bought mine here: www.tag-connect.com For you I would recommend to get a 10 pin legged cable + adapter for Xilinx programmer: https://www.tag-connect.com/debugger-cable-selection-installation-instructions/xilinx-platform-cable-usb#85_171_146:~:text=Xilinx%202mm%20to%2010%20Pin%20Plug%2Dof%2DNails%E2%84%A2%20%2D%20With%20Legs specifically, "Xilinx 2mm to 10 Pin Plug-of-Nails™ - With Legs". BTW some of these things are carried by the likes of Digikey, so you might want to check if they got some in stock locally, because shipping  from that company directly from US will probably cost you more than from DK et al which tend to have local warehouses. Just make sure you double check parts numbers you are ordering - this company has got a ton of adapters for different programmers/debuggers, so it's easy to order a wrong one by mistake.

I know it's a one-off cost, but I'm a little reticent about having to spend $50+ on what is basically a bit of ribbon cable, some pins and springs.  Also, thinking about it, these are intended to be dev boards so we should make it as easy as possible to program them.  I think I'm probably going to stick with an SMD or TH pin header if space allows for the JTAG port.

If my experience is anything to go by, you should have no problems. At no time did they ever asked me a million of questions which other companies typically ask (like your project, volume, dates, etc.) - they just shipped what I asked for with no fuss (and even paid for express shipping from US!), which made me a loyal customer of theirs (and an easy recommendation for others) because I know I can rely on them for both samples and for actual productions parts should the likes of DK decide for some reason to not stock the part I'm after. Of course I don't abuse this service by requesting all their inventory or anything like that, but I tried requesting samples from one of their competitors, which asked me to fill out a 3 screens-long form with a metric ton of questions, to which I just said "screw it" and bought samples myself because my time and sanity are worth more to me than these samples were.

Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components.  Two of each connector are now in the post, which will allow me to make one carrier/core board. :-+

Check your order status on their website. They never sent me a shipping notification, though later in a day DHL has sent me a request to pay sales tax and their customs fee.

It's left Hong Kong and is on its way, apparently. :popcorn:

There are many ways to skin this cat. I often used a single-ended 100 MHz base clock and a PLL to generate both clocks required for MIG. But this kind of "wastes" one extra MCMM because MIG itself uses MCMM.  In this case you select the frequency for that clock to something like 200 MHz and select a "no buffer" option in the MIG.
Alternative often used solution is to use 200 MHz LVDS differential clock selected right in the MIG and select a "Use system clock" option for the reference clock (I will explain below what it's for). The advantage of this approach is that you only use a single MCMM.

If I'm reading the datasheet correctly, the XC7A100T has 6 of them?  That's a couple more PLLs than the Cyclone or MAX10.  I'm not sure how significant the system clock speed will be for the GPU - perhaps @BrianHG could give me a steer on valid system clock speeds to use?  I seem to recall that a 27MHz clock would fit nicely with the video circuitry, but the MAX10 and Cyclone IV GPUs both ran fine on 50MHz system clocks.  I guess it boils down to using a PLL to create the 200MHz clock from a slower system clock, or a slower video clock with a PLL from a 200MHz system clock.

See, it wasn't that bad, was it? ;)

I haven't tried it yet! ;)  Not sure if the project is set up correctly or anything tbh, but one downside to Xilinx is that the project is over 6MB in size, even after zipping, so I haven't included it with this post due to size constraints.  At least Intel/Altera projects compress down to hundreds of KB without the bitstreams.

Like I said above, I would select "5000 ps (200 MHz)" in the "Input Clock Period" drop down, on the next page System Clock: "Differential" (or "No Buffer" if you want more flexibility on which pin(s) your system clock will be), Reference Clock: "Use System Clock" (this option will only appear if you set your input clock period to 200 MHz).
As for pinout, I would do it like this:
Code: [Select]
Bank: 34
Byte Group T0: DQ[0-7]
Byte Group T1: DQ[8-15]
Byte Group T2: DQ[16-23]
Byte Group T3: DQ[24-31]
Bank: 35
Byte Group T1: Address/Ctrl-2
Byte Group T2: Address/Ctrl-1
Byte Group T3: Address/Ctrl-0
But it's just a preliminary pinout to give you some high-level idea, once you actually have a layout, you would go "Fixed Pin Out" route in the wizard and specify each pin assignment explicitly.

I'll take another look at this tomorrow.

The number in the part # shows the number of pins per row. I've tripped over that many times already, so I typically just count the number of pins on their 3D model ;D
As for symbols and footprints, I typically make my own just to be sure they are correct (and use their 3D model to make sure it's correct, and if I have any doubts I would print a footprint on a piece of paper and physically place connector on top to see if it's OK), so not really sure what theirs look like.
As for which one goes where, I don't think it matters. I noticed on this module BSE goes on a module and BTE on a carrier, so let's make it the same - presumably those guys knew what they were doing.

While you are working on connectors footprints, please be SUPER careful about which pin number mates to which pin number on the mating connector. I've screwed up this an untold number of times, forgetting that connectors mate with a mirrored pin numbers. So try to number pins on both footprints such that pin 1 of one side mates to pin 1 of the mating side, otherwise you are going to have a disaster on your hands.

Yes, I'll be checking and double-checking the PCB design when I get that far.  I really don't want to mess that up!

Okay, I've attached the latest power supply schematic.  It seems like most of the board is going to be taken up with power supplies!  Let me know if there's anything obviously wrong.  I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
I know it's a one-off cost, but I'm a little reticent about having to spend $50+ on what is basically a bit of ribbon cable, some pins and springs.  Also, thinking about it, these are intended to be dev boards so we should make it as easy as possible to program them.  I think I'm probably going to stick with an SMD or TH pin header if space allows for the JTAG port.
That's OK and it's your call. This stuff pays off in a long run, and I know that it will be worth it for me for sure - especially since I plan to use that exact cable for programming STM32 MCUs as well - so two for the price of one (plus one more adapter).

Now, as far as programming headers go, geniune Xilinx programmer (and Digilent HS3 cable) uses 2 mm pitch connector, and some of it's clones retain that too, while others use regular 0.1" one, and yet other clones have both options (this is what I have in my clone), so you might want to check up which one of those do you have so that you will fit the right header. I also preferred using header with a polarized shroud which prevents connecting cable the wrong way around (and likely destroying it in the process because the opposite pins are all grounded). The easiest way to check is to get any 0.1" open header and (with everything unpowered of course) try connecting programmer to it to see if it would fit. If so then you will need a 0.1" pitch connector, otherwise - a 2 mm pitch one.

Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components.  Two of each connector are now in the post, which will allow me to make one carrier/core board. :-+
Great! See, I told you they are nice people. Though I never had to send them any emails, I always used "Get a Free Sample" buttons on their website and ordered that way. But hey - whichever way works is good in my book.

It's left Hong Kong and is on its way, apparently. :popcorn:
Cool!

If I'm reading the datasheet correctly, the XC7A100T has 6 of them?  That's a couple more PLLs than the Cyclone or MAX10.  I'm not sure how significant the system clock speed will be for the GPU - perhaps @BrianHG could give me a steer on valid system clock speeds to use?  I seem to recall that a 27MHz clock would fit nicely with the video circuitry, but the MAX10 and Cyclone IV GPUs both ran fine on 50MHz system clocks.  I guess it boils down to using a PLL to create the 200MHz clock from a slower system clock, or a slower video clock with a PLL from a 200MHz system clock.
Yea, you are unlikely to run out of MCMMs/PLLs in that device. You will have to use ui_clk output from the MIG to interact with it, but you can add more clocks if you want or need for other parts of your design. Just add a FIFO between clock domain (this device's BRAM blocks can be configured as FIFOs), and make sure you don't under/overrun it.

I haven't tried it yet! ;)  Not sure if the project is set up correctly or anything tbh, but one downside to Xilinx is that the project is over 6MB in size, even after zipping, so I haven't included it with this post due to size constraints.  At least Intel/Altera projects compress down to hundreds of KB without the bitstreams.
There is a File -> Project -> Archive command, and you can even clean up resulting archive somewhat, but MIG generates a lot of stuff (there are actually two designs generated - user design and example design, all in the output). Fortunately you can simply save MIG's *.prj file along with a constraints file and re-generate the whole thing on a receiving end ("Verify Pin Changes and Update Design" branch in the wizard flow).

Yes, I'll be checking and double-checking the PCB design when I get that far.  I really don't want to mess that up!
Yeah, that's what I thought too, yet I messed it up anyway ::)

Okay, I've attached the latest power supply schematic.
No you haven't ;D

I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...
According to the datasheet of the MPM3683-7 module, it's supposed to have low enough ripple for powering GTPs (<10 mVpp). Maybe we should include a provision for a simple Pi filter (a bead with two caps on each end of it, for example this one, but any one with DC resistance of < 50 mOhm and a DC current rating of >1 A will do) plus some local capacitance just in case we'll need it, you can simply short out the bead by a zero ohm resistor during initial assembly.
« Last Edit: January 06, 2023, 07:37:00 am by asmi »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Now, as far as programming headers go, geniune Xilinx programmer (and Digilent HS3 cable) uses 2 mm pitch connector, and some of it's clones retain that too, while others use regular 0.1" one, and yet other clones have both options (this is what I have in my clone), so you might want to check up which one of those do you have so that you will fit the right header. I also preferred using header with a polarized shroud which prevents connecting cable the wrong way around (and likely destroying it in the process because the opposite pins are all grounded). The easiest way to check is to get any 0.1" open header and (with everything unpowered of course) try connecting programmer to it to see if it would fit. If so then you will need a 0.1" pitch connector, otherwise - a 2 mm pitch one.

My clone has an adaptor with 3 different connectors and cables - 2x5 pins @2.54mm, 2x7 @2mm and a 2.54mm single-row pin header (for custom connections, I guess).

Well, I just sent them a polite e-mail this morning with a couple of sentences outlining my project and asking if they were willing to supply some test components.  Two of each connector are now in the post, which will allow me to make one carrier/core board. :-+
Great! See, I told you they are nice people. Though I never had to send them any emails, I always used "Get a Free Sample" buttons on their website and ordered that way. But hey - whichever way works is good in my book.

Yes, very friendly.  I could have pushed the 'Get Free Sample' button, but I thought I'd drop them a line to say hello as well.  Even after all these years of the internet, I still get a bit excited about talking to people in different time zones/countries. :)

There is a File -> Project -> Archive command, and you can even clean up resulting archive somewhat, but MIG generates a lot of stuff (there are actually two designs generated - user design and example design, all in the output). Fortunately you can simply save MIG's *.prj file along with a constraints file and re-generate the whole thing on a receiving end ("Verify Pin Changes and Update Design" branch in the wizard flow).

That's handy to know - I'm going to want to share the project with you at some point to verify my results or - more likely - work out what I'm doing wrong. ;)

All I did was create a new blank project, set the target device to XC7A100T-2-FGG484 and ran through MIG.  There's no top level entity or anything like that.  I guess at this stage I'm just looking for pin assignments for the DDR3, but pretty soon I'm going to need to start integrating BrianHG's DDR3 controller so that I can start simulating it.

Okay, I've attached the latest power supply schematic.
No you haven't ;D

Darn it.  I've been having trouble with this forum recently - I've noticed a drag'n'drop box has appeared in the attachments section for a new post, which seems to cause more problems for me than it solves.  I definitely attached the PDF yesterday, it even said it had uploaded.  I'll try again with this post, but instead of drag/drop I'll just select it in the file explorer via the 'Choose file' button.

I'm running a 1.0V MGT supply from the VCCINT output - hopefully that isn't criminally insane from an electrical engineer's point of view - it saves me adding yet another power supply chip and supporting discretes...
According to the datasheet of the MPM3683-7 module, it's supposed to have low enough ripple for powering GTPs (<10 mVpp). Maybe we should include a provision for a simple Pi filter (a bead with two caps on each end of it, for example this one, but any one with DC resistance of < 50 mOhm and a DC current rating of >1 A will do) plus some local capacitance just in case we'll need it, you can simply short out the bead by a zero ohm resistor during initial assembly.

Never heard of a Pi filter before, so that's something new to learn about! ;D  Well, hopefully you'll get the attachment this time and can see what I've done so far.

EDIT: Question regarding the power supplies.  All of the MPM power chips use a resistor divider to set their output voltage.  The datasheets/MPM design software quote values for those resistors, which I have used in the attached schematic.  I note that the MCM3683-7 that generates VCCINT and 1V0_MGT uses 2K and 3K resistors (R4 and R12) - is there any reason I can't swap them for 200K and 300K resistors to use the same parts as MPM3833s?
« Last Edit: January 06, 2023, 09:46:56 am by nockieboy »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
In this view of my graphics sub-system: BrianHG_GFX_VGA_Window_System.pdf



Everything encircled in purple runs at the pixel clock rate.  (everything else runs at the DDR3 user clock speed) For the Max10, we had a hard limit of 200MHz, so, we chose the industry standard 148.5MHz for authentic 1080p.  This limited us to 8 window layers in 1080p.  The Artix7 should be able to achieve 297MHz, meaning now you can do 16 window layers in 1080p, or double everything available for 720p as well.  My code generates integer divided pixel outputs for running multiple window layers in series in conjunction with the parallel window layers.  There is an easy patch to allow division of the series layers across multiple video output, so a 297MHz system could produce 2 1080p video outputs, with 8 video window layers on each, or 4 video outputs at 720p.  (This doesn't negate the possibility of including multiple video systems multiplying more video outputs, but now supporting different video modes on each video out.)

Having a 27MHz/54MHz source crystal only means having a perfect divisor for the true ansi 148.5MHz 16:9 modes or 54Mhz/108Mhz/216Mhz for the 4:3 modes.  Though, with all the available PLLs, just bridging 2 of them usually means you can generate anything you like from an integer source clock, or just live with a slight imperfect frequency error.
« Last Edit: January 06, 2023, 10:45:56 am by BrianHG »
 
The following users thanked this post: nockieboy

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
For these 3 instantiations in my 'BrianHG_GFX_Video_Line_Buffer.sv':

BrianHG_GFX_Video_Line_Buffer.sv#L521
BrianHG_GFX_Video_Line_Buffer.sv#L827
BrianHG_GFX_Video_Line_Buffer.sv#L998

You will need to create a dummy 'altsyncram' where inside, you tie the important ports and parameters to Xilinx's equivalent dual clock block ram as the write side is on the user DDR3 control clock and the read output is on the video pixel clock.  Everything else should be compatible with any vendor's FPGA compiler who supports SystemVerilog.  You can ignore the init files, they just help the test-bench visualization results.

I also included ModelSim testbenches for those individual modules for verification.

« Last Edit: January 06, 2023, 10:43:05 am by BrianHG »
 
The following users thanked this post: nockieboy

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
My clone has an adaptor with 3 different connectors and cables - 2x5 pins @2.54mm, 2x7 @2mm and a 2.54mm single-row pin header (for custom connections, I guess).
In this case it's best to stick to 2x7 2mm pin out to remain compatible with official programmers.

Yes, very friendly.  I could have pushed the 'Get Free Sample' button, but I thought I'd drop them a line to say hello as well.  Even after all these years of the internet, I still get a bit excited about talking to people in different time zones/countries. :)
I worked in many large companies, so I always had a problem of receiving too much emails, so I guess I kind of got over this ;D

That's handy to know - I'm going to want to share the project with you at some point to verify my results or - more likely - work out what I'm doing wrong. ;)
Few versiions ago Vivado did a major change which improved compatibility with source control by moving all generated code out of the ".srcs" folder into ".gen", so that now you can technically only save that folder in the source control. MIG does store it's prj file there, but for some reason it stored constraints file (.xdc) in the ".gen" folder. That file stored pinout selected, so it's required as well. It's in .gen\sources_1\ip\<mig_instance_id>\<mig_instance_id>\user_design\constraints\<mig_instance_id>.xdc

All I did was create a new blank project, set the target device to XC7A100T-2-FGG484 and ran through MIG.  There's no top level entity or anything like that.  I guess at this stage I'm just looking for pin assignments for the DDR3, but pretty soon I'm going to need to start integrating BrianHG's DDR3 controller so that I can start simulating it.
Simulating MIG is rather slow (on my PC getting through calibration takes about 3-3.5 minutes to pass through calibration), so I typically use AXI Verification IP to "pretend" to be a DDR RAM for simulation purposes, which is MUCH faster. But that approach will only work if you use MIG with AXI frontend. For UI-only designs I know many people write simple BFMs to simulate MIG's behaviour without a full-on simulation of a controller and memory devices to speed things up. Maybe you should do the same once you figure out to work with it.

Darn it.  I've been having trouble with this forum recently - I've noticed a drag'n'drop box has appeared in the attachments section for a new post, which seems to cause more problems for me than it solves.  I definitely attached the PDF yesterday, it even said it had uploaded.  I'll try again with this post, but instead of drag/drop I'll just select it in the file explorer via the 'Choose file' button.
I always open the post once posted to verify that all attachments are actually there.

Never heard of a Pi filter before, so that's something new to learn about! ;D  Well, hopefully you'll get the attachment this time and can see what I've done so far.
It's named so because it looks like a Greek "pi" letter - with a bead placed horizontally and caps placed vertically on each side. It's extensively used in RF, but also for noise filtering.

EDIT: Question regarding the power supplies.  All of the MPM power chips use a resistor divider to set their output voltage.  The datasheets/MPM design software quote values for those resistors, which I have used in the attached schematic.  I note that the MCM3683-7 that generates VCCINT and 1V0_MGT uses 2K and 3K resistors (R4 and R12) - is there any reason I can't swap them for 200K and 300K resistors to use the same parts as MPM3833s?
Those values are always a compromise between noise immunity and efficiency - the higher the values are, the smaller current goes through them, which improves efficiency, but also increases noise sensitivity. So maybe we can meet in the middle and use 20k and 30k in both cases? :) I dunnow, I tend to prefer sticking to the recommended values because resistors cost next to nothing.

One important point is to do your very best to follow layout recommendations from datasheets or copying their evaluation boards' layout. A lot in DC-DC converter performance depends on a proper layout, so try sticking to those recommendations as close as possible.

That said, there is something weird going on with that module's layout. In the pinout description, they recommend creating a copper pour for SW nodes, in their devboard's schematics they show these pins as connected, however on a layout per-layer pictures each SW node is isolated. I've sent them an enquiry regarding that, let's see what they say. I guess it's best to do what the layout shows, as opposed to what the text tells to do, but let's see if/what they respond.

----
One more question for you - do you have a code name for that project/module/PCB? I like to come up with cool code names for my projects, for example the board in my signature is "Sparta", so saying "Sparta-50" ("50" stands for FPGA density used) referring to that board is much easier than referring to it as a "beginner-friendly FPGA board with Spartan-7 FPGA and a DDR2 memory" ;)
« Last Edit: January 06, 2023, 12:45:41 pm by asmi »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
Everything encircled in purple runs at the pixel clock rate. 
Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? This is how all video cards work, it allows them to not depend on a current video output mode, and it's also trivial to do double- and triple-buffering to alleviate tearing and other aftifacts which occur when frame generation is running asyncronously to output. It's also trivial to give a CPU direct access to the framebuffer if need be in this approach. But the most important advantage is that you can run your actual frame generation at faster clock to get more stuff done in the same amount of time, and in general seems like a more sensible approach - for example if nothing in the framebuffer changes, whole frame generation pipeline can just idle as opposed to churning out the same frames over and over again. Or I'm missing something here?
« Last Edit: January 06, 2023, 12:41:49 pm by asmi »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
Everything encircled in purple runs at the pixel clock rate. 
Oh wow - it's more complicated that I thought!
I got a question though - why do you have to run so much at video pixel clock? Wouldn't it be better to run it at a faster clock and write the resulting image into DDR, and them have a totally asyncronous process which does run at a video pixel clock which would read that image from the framebuffer (in DDR) and output it via HDMI/VGA/DisplayPort/whatever? This is how all video cards work, it allows them to not depend on a current video output mode, and it's also trivial to do double- and triple-buffering to alleviate tearing and other aftifacts which occur when frame generation is running asyncronously to output. It's also trivial to give a CPU direct access to the framebuffer if need be in this approach. But the most important advantage is that you can run your actual frame generation at faster clock to get more stuff done in the same amount of time, and in general seems like a more sensible approach - for example if nothing in the framebuffer changes, whole frame generation pipeline can just idle as opposed to churning out the same frames over and over again. Or I'm missing something here?
It's not a renderer.  It is a real-time multi-window, multi-depth, multi-tile memory to a raster display.  No tearing.  You can double or triple buffer each layer individually if you like, but even that isn't needed with my code for the smoothest silky scrolling.
Set the maximum layers option to 1, turn off the palette and text, and the code enabled would only be 1 line buffer going straight to the video dac output port.  Then, you need to software render or simulate all the layers and window blending.

This was the easiest solution for the Z80 to handle multiple display layers without any drawing commands, or waiting for a rendering engine read multiple bitmaps and tile maps, blend them together to construct a buffer frame, when ready, do a frame swap, and redo this every V-sync.

You want text, turn on a font layer.  You want backgrounds, turn on a 32bit layer.  You want meters to blend in, then out, turn on and off those layers.  You want additional overlays, or sprites, turn on and off those layers, set their coordinates, size and other metrics, like which ram location IE: frame for the sprite's animation.

You want 4bit layers mixed with 8bit layers with different palettes for each 8bit layer, plus a few 32bit layers, no problem.  All with alpha blended shading.

This was all done in around 18.3kle for 16 layered graphics on an FPGA which can barely maintain 200MHz, yet my core was running it fine at 100MHz, only the pixel clock ran at 148.5Mhz, consuming under 2watts including the DDR3 controller and DVI transmitter.
« Last Edit: January 06, 2023, 01:30:49 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
It's not a renderer.  It is a real-time multi-window, multi-depth, multi-tile memory to a raster display.  No tearing.  You can double or triple buffer each layer individually if you like, but even that isn't needed with my code for the smoothest silky scrolling.
Set the maximum layers option to 1, turn off the palette and text, and the code enabled would only be 1 line buffer going straight to the video dac output port.  Then, you need to software render or simulate all the layers and window blending.

This was the easiest solution for the Z80 to handle multiple display layers without any drawing commands, or waiting for a rendering engine read multiple bitmaps and tile maps, blend them together to construct a buffer frame, when ready, do a frame swap, and redo this every V-sync.

You want text, turn on a font layer.  You want backgrounds, turn on a 32bit layer.  You want meters to blend in, then out, turn on and off those layers.  You want additional overlays, or sprites, turn on and off those layers, set their coordinates, size and other metrics, like which ram location IE: frame for the sprite's animation.

You want 4bit layers mixed with 8bit layers with different palettes for each 8bit layer, plus a few 32bit layers, no problem.  All with alpha blended shading.

This was all done in around 18.3kle for 16 layered graphics on an FPGA which can barely maintain 200MHz, yet my core was running it fine at 100MHz, only the pixel clock ran at 148.5Mhz, consuming under 2watts including the DDR3 controller and DVI transmitter.
With all due respect, but none of that answers my question. Why can't you simply save the output of your pipeline into a memory buffer instead of streaming it to the output? That's how every single video/2D/3D rendering engine I've ever came across works, hence my surprise that this one do not. I can see lots of advantages of such decoupling, and the only disadvantage I can think of is increased memory bandwidth requirement, which is going to be rectified with this new board as it's going to essentially have double the bandwidth of existing board, and we can increase it even further if neccessary by adding a second controller with 16- or 32-bit data bus (though at the expense of IO pins available for other purposes).

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
You are correct that with a few extra gates, it would be possible to render the output back into memory.

Currently, with a single 16bit DDR3 running at 800mhz, we got enough to produce a dual 32bit 1080p image plus some extra spare bandwidth for CPU drawing.

With 2x DDR3, with the current setup, we can either render to ram 2 dual 32bit images while showing a back buffer 32bit image in real-time, or in real-time show 4 full sized real-time mixed 1080p 32bit layers.

This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.  The bandwidth increase would have been so high that adapting my code to render any X number of windows in a back buffer while dumbly showing a double or triple buffered output.  Also, another plus is than we can expand the system to something like 256 windows, and unlike my DDR3 DECA multi-window demo in my DDR3 thread with the 16 mixed graphics windows, we would now be able to draw all 256 windows even on a 1080p output, but, if all the windows are full screen, there would just be a frame rate drop.

Yes, another change is that when rendering back into memory with my code, you can get rid of the dual clock nature, though, that's it.  Everything else remains the same.  All we have done is add a programmable frame writing pointer and our final display buffer will now be stuck in 32bit mode.

Also, we now have enough registers to change my integer scaling for each window to a fractional bi-linear scaler for each window effectively making any res up and down you like in each window instead of current the simplex 1x,2x,3x,4x,5x...  The up/down bilinear scaler would be great for rendering internally at 4x res, then outputting at 1/4 mode for nice soft dithered edges.
« Last Edit: January 06, 2023, 02:20:00 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
Currently, with a single 16bit DDR3 running at 800mhz
Are you sure you're running it at 800 MHz? I thought it was only 400 MHz.

This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.
There are a couple of problems with implementing SODIMM, with the second one being the most important:
1. It's going to make a module very large. SODIMM socket typically has a length of over 70 mm, this is going to drive up the size of the module itself and concequently the price of PCB manufacturing for both module and a carrier.
2. For a device and package we've chosen (and managed to find in stock at a reasonable price) - and really any Artix-7 device aside from the top-level A200T's in 676 or 1156 ball packages - implementing it will require some shenannigans with level shifters for QSPI flash memory, because it uses regular IO pins from bank 14 for data lines. This is something that I know theoretically possible to do, but I've never actually done this, and I don't want to lead nockieboy down that garden path having not walked it myself first so that I can be confident that it will actually work. This is why I've been offering adding a second 16-bit or 32-bit controller instead. It would eat a lot of IO pins, leaving only about 80 for everything else (plus maybe 30 or so can be recovered from banks used for DDR3 if he is willing to add a metric ton of 1.5 V <-> 3.3 V (or even 5 V) voltage level converters, not sure if that's going to be enough for all peripherals.

There is another way - use Kintex-7 part instead - like K70 or K160 - they allow running DDR3 interface at up to 933 MHz (1866 MT/s) depending on a package so giving you massive bandwidth even without increasing interface width, but these part are expensive - at least double to quadruple the price of A100T's we were able to source.
« Last Edit: January 06, 2023, 03:24:06 pm by asmi »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
This is why I was pushing for a SODIMM 64bit module instead of 2x16bit DDR3 chips.

I obviously missed the significance of that and didn't look into using SODIMM at all.  Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board?  Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup?  As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!).  Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?


Those values are always a compromise between noise immunity and efficiency - the higher the values are, the smaller current goes through them, which improves efficiency, but also increases noise sensitivity. So maybe we can meet in the middle and use 20k and 30k in both cases? :) I dunnow, I tend to prefer sticking to the recommended values because resistors cost next to nothing.

It's no problem, I was just wondering if I could reduce the BOM and make it less likely for errors to creep in during assembly with less different resistor values.

One important point is to do your very best to follow layout recommendations from datasheets or copying their evaluation boards' layout. A lot in DC-DC converter performance depends on a proper layout, so try sticking to those recommendations as close as possible.

When I get to PCB design, I'll be following the recommended layouts as closely as possible.

That said, there is something weird going on with that module's layout. In the pinout description, they recommend creating a copper pour for SW nodes, in their devboard's schematics they show these pins as connected, however on a layout per-layer pictures each SW node is isolated. I've sent them an enquiry regarding that, let's see what they say. I guess it's best to do what the layout shows, as opposed to what the text tells to do, but let's see if/what they respond.

It does seem odd to me that they have all those pins and they're not supposed to be connected to anything. :-//

One more question for you - do you have a code name for that project/module/PCB? I like to come up with cool code names for my projects, for example the board in my signature is "Sparta", so saying "Sparta-50" ("50" stands for FPGA density used) referring to that board is much easier than referring to it as a "beginner-friendly FPGA board with Spartan-7 FPGA and a DDR2 memory" ;)

Haha, yes I do.  I'm calling it the XCAT-100.  It's just the FPGA part name with the 7 removed and one letter moved up from the end to make it (almost) an English word.  I can design a logo with a cat's head and an X later. ;)



 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
I obviously missed the significance of that and didn't look into using SODIMM at all.  Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board?  Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup?  As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!).  Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?
Maybe we can settle for a sort of a hybrid approach - use 32 bit interface for layers and composition, and then have a dedicated interface for framebuffers? In that case 16-bit interface should be enough for 2 parallel streams (one to write the frame, another one to read a frame and output it to HDMI), and we can implement it in a single IO bank, leaving about 130 or so IO pins (plus whatever we can recover from 32-bit interface banks if we go that route).

Haha, yes I do.  I'm calling it the XCAT-100.  It's just the FPGA part name with the 7 removed and one letter moved up from the end to make it (almost) an English word.  I can design a logo with a cat's head and an X later. ;)
Cool! I will refer to it like that from that moment on. Maybe it's time to create a repo on github or something?

-----
I've received my devices today, as was promised by DHL. They came in a sealed bags with dessicant and moisture indicator. So far I've resisted urge to open them up, but I suspect I will eventually succumb to temptation ;D Good thing that I have an oven where I can bake these parts before reflow, and the air at home is quite dry (~35% humidity) so it shouldn't be that big of a deal...
« Last Edit: January 06, 2023, 07:21:23 pm by asmi »
 

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
What values would be appropriate for the Pi filter?

 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
What values would be appropriate for the Pi filter?


0.47 uF on the input side (before the bead), and 4.7 uF on the output side. The point of this filter is to provide a shortcut for high-frequency noise from Vccint pins before it reaches GTP's power pins, and at the same time provide enough capacitance after the filter to sustain the voltage rail within spec once noise subsides. There are going to be two more 0.1 uF caps and another 4.7 uF cap on the MGTAVCC side as per recommendations in UG482, all of that should be enough.
What's the L1 on that schematic?

Offline nockieboyTopic starter

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
0.47 uF on the input side (before the bead), and 4.7 uF on the output side. The point of this filter is to provide a shortcut for high-frequency noise from Vccint pins before it reaches GTP's power pins, and at the same time provide enough capacitance after the filter to sustain the voltage rail within spec once noise subsides. There are going to be two more 0.1 uF caps and another 4.7 uF cap on the MGTAVCC side as per recommendations in UG482, all of that should be enough.

These extra caps after the pi filter - these are from Table 5-7 on page 230 of UG482?  I take it these are different to the 0.1uF caps mentioned in Table 5-11 on page 233?

What's the L1 on that schematic?

That's there to isolate VCCINT noise from 1V0_MGT?  It was there on reference schematics and I've used the same method previously on the Cyclone GPU PCB to filter noise out of the line.  I guess the pi filter does a better job and it can be removed?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
I obviously missed the significance of that and didn't look into using SODIMM at all.  Correct me if I'm wrong, but a 64-bit SODIMM module would just require a single SODIMM slot to be soldered to the board?  Even so, they're 200-pin+ connectors - isn't that going to eat IO resources and leave little for anything else?

As we're designing from scratch, what would be your preferred memory setup?  As asmi has pointed out, I don't think a SODIMM module is going to be an option (as much as I like the idea of not having to solder memory chips!).  Are two 16-bit DDR3's going to enough of an upgrade, or can we go further?
Maybe we can settle for a sort of a hybrid approach - use 32 bit interface for layers and composition, and then have a dedicated interface for framebuffers? In that case 16-bit interface should be enough for 2 parallel streams (one to write the frame, another one to read a frame and output it to HDMI), and we can implement it in a single IO bank, leaving about 130 or so IO pins (plus whatever we can recover from 32-bit interface banks if we go that route).
What?  How does handicapping you bandwidth, or forcibly evenly dividing it between 2 different controllers save IOs, or logic elements, or improve global memory design access VS 1 controller at double speed?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
These extra caps after the pi filter - these are from Table 5-7 on page 230 of UG482? 
Yep

I take it these are different to the 0.1uF caps mentioned in Table 5-11 on page 233?
Nope, they are the very same caps mentioned in the table 5-7.

That's there to isolate VCCINT noise from 1V0_MGT?  It was there on reference schematics and I've used the same method previously on the Cyclone GPU PCB to filter noise out of the line.  I guess the pi filter does a better job and it can be removed?
That's what pi filter is for. No need for additional beads.

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
What?  How does handicapping you bandwidth, or forcibly evenly dividing it between 2 different controllers save IOs, or logic elements, or improve global memory design access VS 1 controller at double speed?
I don't understand this question. What exactly is being handicapped and how?

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 7742
  • Country: ca
Say I decode /playback a video.  On that output, I want to perform multiple Lanczos scaling steps with both X&Y convolutional filtering, plus a few Z frames de-noise filtering before output and analysis, maybe even motion adaptive de-interlacing.  Your 2 separate controller cut my image processing bandwidth in half as it will sit in one memory controller before being exported to the frame buffer while a portion of that frame buffer's bandwidth may never be used.  Especially if my image size is a video larger in resolution than the available memory on the frame buffer side where it may be mandatory to pre-process everything in one controller and only export the visible area to the other controller's ram.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2733
  • Country: ca
Say I decode /playback a video.  On that output, I want to perform multiple Lanczos scaling steps with both X&Y convolutional filtering, plus a few Z frames de-noise filtering before output and analysis, maybe even motion adaptive de-interlacing.  Your 2 separate controller cut my image processing bandwidth in half as it will sit in one memory controller before being exported to the frame buffer while a portion of that frame buffer's bandwidth may never be used.  Especially if my image size is a video larger in resolution than the available memory on the frame buffer side where it may be mandatory to pre-process everything in one controller and only export the visible area to the other controller's ram.
Still don't understand it... Right now you don't store output frames at all, instead streaming them directly to the output. I suggest instead of doing so you save those frames into a dedicated memory, which would then be read by a separate process that would stream it to the output. This way displaying stuff from the framebuffer will be decoupled from generating new frames, and once a new frame is ready, you would switch a pointer to it so that the next screen scan would begin displaying that new frame, all while frame generator will work on creating yet another new frame. So it's essentially a triple buffering - one frame is the one being displayed, another one is the frame will be be displayed once current one is fully out, and a third one is the one being generated. Full 1080p frame at 32 bpp takes a bit over 8 MBytes of memory (or just below 9 MBytes if you align lines to a 2K boundary so making it essentially a 2048x1080 array), if we are going to use 4Gbit DDR3 memory devices, it will provide enough capacity for many-many frames, and since there will be some spare bandwidth, it's possible to use it for other non-bandwidth intensive things as well - like application storage, drive I/O, etc.
None of that is going to even touch a 32 bit controller, which will be dedicated solely to the needs of a frame generator. Which means your existing code will get 2x input memory bandwidth. That's why I don't understand what's being handicapped here and how.

That said, if I were to design a rendering engine, I would approach it from a completely different angle and made it more like modern video cards - with unified memory controllers, unified shader model, universal compute cores, and only the very minimal amount of fixed function logic - for stuff like primitive rasterization and fragment (pixel) generation, which is a fixed function even on the most modern GPUs. Companies behind modern GPUs have burned numerous billions of dollars into researching the best ways to do these things, so I would simply follow their lead as opposed to reinventing the wheel. But it's not my project, and my project would not have contained a pre-historic CPU which cripples entire system's performance, so what do I know? :D

So, you guys please make up your mind on how do you want to proceed, and we'll go with whatever you decide.


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf