Author Topic: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.  (Read 85564 times)

0 Members and 1 Guest are viewing this topic.

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #75 on: October 04, 2021, 08:59:11 am »
@BrianHG

I been trying over the weekend to make some simple tweaks just to see I actually get something into the FPGA over HDMI

I did remove the RS232 debug part

Changed the GPIO's to only input

Control signals from the HDMI board goes to LED

To be sure freq are not a problem I a 800x600@60 HDMI input (instead of 1920x1080@60)
I can messure the  PIXCLK, H,V, on the LED with a Scope so I for sure have some input the board

(no FIFO yet )

I tried to than just write to the DDR
  • based on the PIXCLK and expected to see some garbage on the output... none, just the internal test signals.


I tried to write fixed 128bit  FF's to the DDR  based on the CMD_Clk still no garbage on the output just test signals

I have removed the Scroll and have fixed 0x0000, 0x0000 to be sure I do see top let (0,0) on the output

I tried to reduce the DDR to only use 1 read port .... then the test signal goes Red so I must must miss to change some signals so I back on 2xR and 2xW

So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?

I will post some images how I have connected the hardware if other could be interested to have external HDMI input to the DECA board

Thank you
Wiljan


 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #76 on: October 04, 2021, 09:04:53 am »
If you do not have a scope or logic analyzer and you want to check your input signals, or any other in the system signals, just setup Quartus SignalTap.  It will give you a multichannel real-time logic analyzer right through the J-Tag connection right into Quartus.

You should be able to scope your source video clocked data, HS, DE and some of the data bus for testing as well as locking onto HS and VS if you like.  (This includes all DDR3 internal bus connections as well as a few other goodies...)

As for video output through HDMI, you will need to set it's PLL with valid settings plus I recommend keeping to 720p, or 480p standards unless you change the HDMI transmitter from HDMI mode to DVI mode.

Keep the DDR3 at 400MHz or 300MHz.  300MHz may take less time to compile.
« Last Edit: October 04, 2021, 09:10:37 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #77 on: October 04, 2021, 09:14:13 am »
So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?
Take a look at the 'BrianHG_DDR3_DECA_Show_1080p' project.  That project just displays ram at 1920x1080.
Like I said, I made this stuff for everyone to figure out.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #78 on: October 04, 2021, 12:49:59 pm »
I do have a 4ch Keysight Scope and I do have the HDMI input signals PIXCLK, HS,VS and R,G,B data in the FPGA so that part is fine,  the HDMI output are also fine 1920x1080@60 fine as well

Never did never play with the SignalTap, but sure something I will look into

I have just tried to hook up the 'BrianHG_DDR3_DECA_Show_1080p' and the rs232 debugger to the DECA board and that works as well I can change  pixels and save / load images from PC over the rs232

Have attached few images of the hardware

DigiKey part numbers
PART: 1528-1452-ND MFG : Adafruit Industries LLC / 2219 DESC: TFP401 HDMI/DVI DECODE 40PIN TTL
PART: 1528-2243-ND MFG : Adafruit Industries LLC / 2098 DESC: 40-PIN FPC EXTENSION BRD W/CABLE
PART: 1528-4905-ND MFG : Adafruit Industries LLC / 4905 DESC: 40-PIN FPC TO STRAIGHT 2X20 IDC

My goal is to combine 2x HDMI input (cutout / cropped) to 1 HDMI output signal  all 1920x1080

But I will be happy just to see 1 signal come through  :scared:

« Last Edit: October 04, 2021, 12:56:36 pm by Wiljan »
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #79 on: October 04, 2021, 01:08:42 pm »
It's not too difficult.  You will make it.  Begin with just getting 1 picture onscreen.  Just take a look at my module which writes draws graphics.  It's only if you have multiple 1080p signals simultaneously, writing 32bit pixels will be too slow at the 100MHz bus.  This is why I mentioned writing 128bits at a time which means writing 32bit pixels are at 4x speed.

Note that my ellipse drawing engine has an X/Y coordinate to address generator which is a little complex.  You do not need to go this far.  You only need a reset X&Y position, and add the Y coordinate by a fixed amount once every HS.  The address counter adds for every pixel written to create the X axis.

To begin, try a 720p or 480p source image to sample and copy my 32bit pixel write mode in the ellipse drawing engine.

It's fine to post results / examples and things you created with my DDR3 controller here.

If you are looking for in-depth help on coding techniques for sampling video, make a separate new thread as this thread should stick with my DDR3 controller issues or results/success stories.


BTW, with a Cyclone III of similar size to the DECA's MAX10 and DDR2, I did make a complete 2 video in, 1 video out scaler with PIP, each window crop-able and zoom in and out with test patterns, bi-linear filtering and picture enhancement and color processing, controlled through ethernet.  Though, the DDR2 bus width was a 128bit wide ram module, not a single 16bit wide chip.  I guess at 500MHz with 16bit color instead or 32bit color, you could achieve the same with the DECA board.
« Last Edit: October 04, 2021, 01:24:10 pm by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #80 on: October 09, 2021, 05:34:57 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #81 on: October 09, 2021, 11:48:41 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D

Wow, a 90 degree rotate.
The nasty non-sequential access preventing clean long bursts must be a killer unless you have worked around that.  I know a number of dedicated ways to work around this and get full performance, but they are advanced techniques.  For a first timer, even with small errors, that is still a great start.

Is that real-time?
Double buffered?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #82 on: October 10, 2021, 01:02:47 pm »
Feature update:

******************************************
******** Finally, V1.00 release here: ***********
******************************************
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/

Things to do:

a)  I will be contacting Intel's technical support about Cyclone V's poor 60% speed FMAX performance for my 1 multiport section in my design as seen in the above screenshots with the red arrow.  I'll see if something can be done.

b)  As described in my v0.95 notes, I will look into designing my simpler pyramid stack-able 2:1 multiport module aimed to achieve an FMAX of at least 200MHz allowing multiport running at Half rate interface controller speed, but with a loss of a few smart advanced features.

c)  I will download and install the latest Lattice Diamond and see if I can adapt and get my controller to compile and simulate there.  The LFE5U-45F/LFE5U-85F at 45kgate & 85kgate are just such a price bargain at 16$ and 36$ each respectively and if my DDR3 controller runs fast there, it is the next route to take.

I will be working on feature 'b)' this week.  I will be targeting 400MHz, not 200MHz.  This will make the module bare bones simple, but, for example, you should be able to run 32bit read and writes at full 400MHz speed saturating the DECA's 800MHz 16bit DDR3, or you should be able to run the the port at 200MHz, 64 bit and still saturate DECA's DDR3.  The new multiport head end should also get Altera's Cyclone V running to speed as my DDR3 phy is already fast enough, it was just the multiport module which was the bottleneck.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #83 on: October 11, 2021, 06:57:11 am »
Quote
Wow, a 90 degree rotate.

Is that real-time?
Double buffered?

No double buffer, I write directly to 2 x 32bit ports where I have swapped the x1, y1 on the rotate one.
Real time, maybe... had no time to test with moving video yet, will not be home until next week
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #84 on: October 13, 2021, 09:41:32 am »
DDR3 V1.5 engineering update:

     New high FMAX speed multiport front end 'MUX' called BrianHG_DDR3_COMMANDER_4x1.sv.  Unlike the earlier commander, each port input is a read/write channel combined.  Each channel input is identical to my core DDR3 controller's 'BrianHG_DDR3_PHY_SEQ.sv' SEQ_*** inputs.  This will allow you to use additional BrianHG_DDR3_COMMANDER_4x1.sv controllers to drive another one down in the chain making extremely huge port counts if needed.  2:1 mode will offer the greatest possible FMAX while 4:1 will still offer a good FMAX, but allow large ports counts with fewer modules.   My 'USE_TOGGLE_INPUT' and 'USE_TOGGLE_OUTPUT' parameters will allow you to clock each module in a different clock domain.  For example, connecting right to the 'BrianHG_DDR3_PHY_SEQ.sv', you may use a 2:1 module running at 400MHz.  On that first layer module, on port (A) you may use another 2:1 running at 400MHz while on port (B) you may run another MUX in 4:1 mode at 200MHz giving you a total of 2x400MHz read/write ports and 4x200MHz read/write ports.  *Note that crossing clock domain boundaries will only compile with good FMAX results when using PLL clocks frequencies in powers of 2.


Code: [Select]
// Features:
//
// - Input and output ports identical to the BrianHG_DDR3_PHY_SEQ's interface with the optional USE_TOGGLE_CONTROLS
//
// - 2 to 4 Read/Write ports in, 1 port out with user set burst length limiter with read req vector/destination pointer.
// - Designed for high FMAX speed.
// - Designed to be pyramid stacked offering maximum speed 2 R/W ports with 1 COMMANDER_4x1 module, 4 ports using 3 modules,
//   8 ports using 7 modules, 16 ports using 15 modules, or, medium speed 4:1 offering 4 ports with 1 module, or 16 ports
//   using 5 modules...  3:1 mode offers a middle ground of speed VS density VS chosen FPGA speed grade.
//
// - 2 command input FIFO on each port.
// - 16 or 32 stacked read commands for DDR3 read data delay.
// - Separate cached read and write BL8 block.
// - Adjustable write data cache dump timeout.

     Note that now when assessing/configuring a port priority and maximum sequential burst length, unlike the original 16 port commander, you will now need to asses each set of priorities going down through the chain when you stack multiple MUX commanders together.
« Last Edit: October 14, 2021, 03:08:14 pm by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #85 on: October 23, 2021, 11:00:00 am »
Hi Brian I'm back and I have made some changes.

SignalTap is very useful, thank you for mentioned it  :)

Tested with mowing video and the Rotate was now where real time ... right now I'm not interested in Rotate but 2 straight inputs
But for sure I would like to scale and rotate later

I have removed the flat cable and added in single wires (same length) so fit GPIO's
I have 2 HDMI inputs running 1920x1080@60 non sync in parallel from 2 BrighSign players

I place the 2 inputs side by side on the 4K buffer and can scroll to see the Left / Right transition and it pretty OK
I had some PSU issues and have spitted to more PSU's to avoid interference

I do have some noise here and there in the picture ... you can see in the black hole on the video
Not sure why, but I suspect the "wires" and potential wrong terminations

I would like to write to the DDR as 128 bit instead of 32 bit to lower the traffic to the DDR

Attached is the Quartus project

Link for video
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #86 on: October 23, 2021, 02:40:36 pm »
     Years and years ago, I also transferred 1080p parallel through flex cables.  You are already at the limit of what can be transmitted perfectly clean not counting your hand wired jumpers.  I usually had to invert the incoming clock depending on source resolution to aid in corrupt pixel captures.

     I'm almost finished my new multiport.  It is virtually compatible to the old one except each port is a read and write port, the max is 4:1 per multiport unit, but, you may have a multiport 4:1's output new feed an input of another 4:1 down in the chain offering 16 ports with a 2 layer pyramid stack.  IE 4 units in 4:1 mode, whose 4 outputs drive another 4:1 inputs at the top of the chain while that one feeds the DDR3_PHY controller module.  The advantage here is you now can run the multiport's CMD_CLK in half speed mode up to 250MHz, all 16 ports, instead of the current limit of ~100MHz once you pass 4 IO ports.

   Running in half speed mode instead of quarter means that to completely fill the DDR3 bandwidth, you only need 64bit bus at 200MHz instead of 128bit bus at 100MHz.  With the multiport in 2:1 mode, IE: 400MHz CMD_CLK, you can achieve full DDR3 bandwidth with a 32bit bus, but, the on-FPGA M9K blockram's speed limit is 330MHz, so, no matter what you do, you are stuck with 200MHz mode, or 250MHz if you overclock the FPGA to 500MHz DDR3.


Because of your wiring, remember to at least single if not double D-Flipflop all your inputs from your HDMI receiver boards and for the inputs before you feed any logic and use the attribute (*useioff=1*), example:
Code: [Select]
(* useioff = 1 *) input  logic         Z80_CLK,           // Z80 clock signal (8 MHz)
(* useioff = 1 *) input  logic [21:0]  Z80_ADDR,          // Z80 22-bit address bus

Also, if your CLK inputs are not going the the FPGA's dedicated CLK input pin, try to keep all the data inputs in the same bank as the CLK signal which feeds them.  I know this can be a hassle with the DECA being pre-wired.  If you HDMI decoders have a DDR mode, this may help keeping 15 inputs all clocked inside 1 IO bank instead of 27 inputs with one clock.
« Last Edit: October 23, 2021, 02:55:44 pm by BrianHG »
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 212
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #87 on: October 30, 2021, 05:46:41 pm »
Played half the day with your DDR3 controller on a DECA board and just wanted to say thank you for that absolutely brilliant piece of work! :-+
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #88 on: November 01, 2021, 06:51:36 pm »
Preview Demo .sof programming files of DECA BrianHG_DDR3_Controller V1.5 for Arrow DECA eval board overclocked to 500MHz in Half-rate mode.
(Actual full v1.5 project files coming in 2 days.)

  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D
  >:D  500MHz/1GTPS! with 250MHz multiport interface.  >:D
  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D

Just open your JTAG programmer and add one of the following 3 files:
1. 'BrianHG_DDR3_DECA_500MHz_DDR3_v1.0_QR_GFX_1080p_v3.sof'
        -> DDR3_V1.0, 500MHz DDR_CK, Quarter Rate 125MHz Multiport & Ellipse Generator.

2. 'BrianHG_DDR3_DECA_400MHz_DDR3_V1.5_HR_GFX_1080p_v3.sof'
        -> DDR3_V1.5, 400MHz DDR_CK, Half Rate 200MHz Multiport & Ellipse Generator.

3. 'BrianHG_DDR3_DECA_500MHz_DDR3_V1.5_HR_GFX_1080p_NOELLIPSE.sof'
        -> DDR3_V1.5, 500MHz DDR_CK, Half Rate 250MHz Multiport & Random noise/Binary counter.

Note that the Ellipse generator function has a <200MHz bottleneck, so with demo programming file 3, only pressing buttons 0 or 1 will illustrate the DDR3 32 bit color 250MHz fill speed with random noise or the binary counter pattern.

Check-on the 'Program/Configure' and click 'Start' to program.
The DECA's HDMI should output a 1080p image.

IMPORTANT NOTE:
If the picture is still or scrolling noise, just press buttons 0 or 1, or flip 'Switch 0' to enable drawing ellipses.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.

https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: November 02, 2021, 01:56:17 pm by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #89 on: November 10, 2021, 05:03:23 pm »
What does it mean by 500MHz DDR_CK, Half Rate 250MHz Multiport ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #90 on: November 10, 2021, 06:25:27 pm »
This means my controller runs at 500MHz, or basically the PHY driving the DRR3 command pins is running at 500MHz while the user interface which has 16 read/write ports is running at 250MHz.  This is actually overclocking the FPGA as some timings come out in the red, ie negative slack.  My controller can achieve a true 100% positive slack at 400MHz PHY with the user interface running at 200MHz.  The older V1.0 could only achieve a user interface of around 100MHz configured to ~3 read + 2 write user ports with the DDR3 PHY running at 400MHz.

My v1.5 constructs a tree / branch stacked join + fork command section allowing a user configured full 16 read/write ports running the full 200MHz with 400MHz DDR3 PHY controller with enough breathing room to compile an unofficial but functional 250MHz 16 port user interface with 500MHz PHY.

Half-rate means my controller will accept a new command once every second DDR_CK clock.  Quarter-rate means my controller will accept a new command once every 4 DDR_CK clocks.  It is the user interface clock frequency.

My DDR3 v1.5 multiport section now generates a smarter version of Xilins illustration shown here on page 18 figure 2.2:
https://www.xilinx.com/support/documentation/user_guides/ug388.pdf
The difference is you just set the total port parameter and my controller is programmed to render that 'branched' system, but all at 128 bit with smart caching of bursts allowing a superior FMAX to my DDR3 v1.0 which had all the ports at the first branch level where they show configuration 5.  You may also configure the width of each branch if you do not require a top FMAX, but want less clock join points between your RW port and the DDR3 phy.

EXAMPLE:
Code: [Select]
// ************************************************************************************************************************************
// ****************  BrianHG_DDR3_COMMANDER_2x1 configuration parameter settings.
parameter int        PORT_TOTAL              = 2,                // Set the total number of DDR3 controller write ports, 1 to 4 max.
parameter int        PORT_MLAYER_WIDTH [0:3] = '{2,2,2,2},       // Use 2 through 16.  This sets the width of each MUX join from the top PORT
                                                                 // inputs down to the final SEQ output.  2 offers the greatest possible FMAX while
                                                                 // making the first layer width = to PORT_TOTAL will minimize MUX layers to 1,
                                                                 // but with a large number of ports, FMAX may take a beating.
// ************************************************************************************************************************************
// PORT_MLAYER_WIDTH illustration
// ************************************************************************************************************************************
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {4,4,x,x}
//
// (PORT_MLAYER_WIDTH[0]=4)    (PORT_MLAYER_WIDTH[1]=4)     (PORT_MLAYER_WIDTH[2]=N/A) (not used)          (PORT_MLAYER_WIDTH[3]=N/A) (not used)
//                                                          These layers are not used since we already
//  PORT_xxxx[ 0] ----------\                               reached one single port to drive the DDR3 SEQ.
//  PORT_xxxx[ 1] -----------==== ML10_xxxx[0] --------\
//  PORT_xxxx[ 2] ----------/                           \
//  PORT_xxxx[ 3] ---------/                             \
//                                                        \
//  PORT_xxxx[ 4] ----------\                              \
//  PORT_xxxx[ 5] -----------==== ML10_xxxx[1] -------------==== SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[ 6] ----------/                              /
//  PORT_xxxx[ 7] ---------/                              /
//                                                       /
//  PORT_xxxx[ 8] ----------\                           /
//  PORT_xxxx[ 9] -----------==== ML10_xxxx[2] --------/
//  PORT_xxxx[10] ----------/                         /
//  PORT_xxxx[11] ---------/                         /
//                                                  /
//  PORT_xxxx[12] ----------\                      /
//  PORT_xxxx[13] -----------==== ML10_xxxx[3] ---/
//  PORT_xxxx[14] ----------/
//  PORT_xxxx[15] ---------/
//
//
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {3,3,3,x}
//  This will offer a better FMAX compared to {4,4,x,x}, but the final DDR3 SEQ command has 1 additional clock cycle pipe delay.
//
// (PORT_MLAYER_WIDTH[0]=3)    (PORT_MLAYER_WIDTH[1]=3)    (PORT_MLAYER_WIDTH[2]=3)                   (PORT_MLAYER_WIDTH[3]=N/A)
//                                                         It would make no difference if             (not used, we made it down to 1 port)
//                                                         this layer width was set to [2].
//  PORT_xxxx[ 0] ----------\
//  PORT_xxxx[ 1] -----------=== ML10_xxxx[0] -------\
//  PORT_xxxx[ 2] ----------/                         \
//                                                     \
//  PORT_xxxx[ 3] ----------\                           \
//  PORT_xxxx[ 4] -----------=== ML10_xxxx[1] -----------==== ML20_xxxx[0] ---\
//  PORT_xxxx[ 5] ----------/                           /                      \
//                                                     /                        \
//  PORT_xxxx[ 6] ----------\                         /                          \
//  PORT_xxxx[ 7] -----------=== ML10_xxxx[2] -------/                            \
//  PORT_xxxx[ 8] ----------/                                                      \
//                                                                                  \
//  PORT_xxxx[ 9] ----------\                                                        \
//  PORT_xxxx[10] -----------=== ML11_xxxx[0] -------\                                \
//  PORT_xxxx[11] ----------/                         \                                \
//                                                     \                                \
//  PORT_xxxx[12] ----------\                           \                                \
//  PORT_xxxx[13] -----------=== ML11_xxxx[1] -----------==== ML20_xxxx[1] ---------------====  SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[14] ----------/                           /                                /
//                                                     /                                /
//  PORT_xxxx[15] ----------\                         /                                /
//         0=[16] -----------=== ML11_xxxx[2] -------/                                /
//         0=[17] ----------/                                                        /
//                                                                                  /
//                                                                                 /
//                                                                                /
//                                                       0 = ML20_xxxx[2] -------/
//
// ************************************************************************************************************************************
« Last Edit: November 10, 2021, 06:30:03 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #91 on: November 10, 2021, 07:08:57 pm »
I wonder if I could achieve a 'Full-rate' controller at 300MHz.  Having a user 300MHz reading/writing 32bits data can generate perfect ~98% DDR3 data bus saturation consecutive bursts with a 16bit ram, good for 300MHz 32 bit cpus.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #92 on: November 11, 2021, 01:23:14 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
« Last Edit: November 11, 2021, 03:24:30 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #93 on: November 11, 2021, 01:40:09 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
Yes.

Half-rate means 2 things.  When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.  The controller provides a busy signal if there are mandatory command delays required by the DDR3 and it's input buffer memory has exceeded it's stack.  Only my user multiport interface has been running in quarter-rate mode due to it's multiplexer complexity which I am currently enhancing performance there.

I only have a tiny pin driving command timer running at the DDR3 400MHz which receives the stream of generated commands from the above 200MHz controller called 'BrianHG_DDR3_CMD_SEQUENCER.sv', simulated by the 'BrianHG_DDR3_CMD_SEQUENCER_tb.sv'.
« Last Edit: November 11, 2021, 01:58:23 am by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #94 on: November 11, 2021, 02:36:17 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
« Last Edit: November 11, 2021, 03:24:02 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #95 on: November 11, 2021, 02:50:04 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
No.  I have a command output section running at the full 400MHz.  That section has a 2 word fifo which takes in the stream of commands generated at 200MHz by the 'BrianHG_DDR3_CMD_SEQUENCER.sv' processor and outputs 1 DDR_CK wide commands at 400MHz.  Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #96 on: November 13, 2021, 04:15:47 am »
Quote
Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.

I have pondered a bit on your sentence quoted above,
However, when exactly should a new command be "enqueued" into the mentioned FPGA FIFO ?

I asked this question because from my understanding, whether it is half-rate or quarter-rate, the FSM timing event for the initialization sequence will still need to be triggered one at a time.
This means that there is no point of having a 2 words depth FIFO.  The new generated command only needs to be stored in a 1 word depth FIFO (which is basically a register), released to the DRAM once timing is up.
and the next generated command will be "enqueued" into the FIFO at the beginning of the next FSM event ?

Please correct me if wrong.

So, why do you need to have half-rate when quarter-rate already does the same job pretty well enough for power optimization (due to lesser clock transition for a given amount of time passed) ?
« Last Edit: November 13, 2021, 04:41:51 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #97 on: November 13, 2021, 05:18:30 am »
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.  Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.  For my design, the type of FIFO I need, FWFT type with acknowledge, has difficulty routing the acknowledge tied to 7 individual DDR3 command timers operating at 400MHz on Altera Cyclone devices.  My 400MHz side doesn't care about the commands it receives, only that each DDR3 command has a different set amount of time for each other possible new command coming in and it is not allowed to violate those minimum delay clock cycles depending on the next command to be sent.

You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.  One thing I cannot fix it the 'skew' or delay between receiving a user command and the length of pipe time it takes to get that command out to the DDR3 as my 'command sequencer' section is a 3-5 clock pipe running at 200 MHz.  (I have optimization parameters which can combine pipe stages at the cost of FMAX or FPGA size.)
« Last Edit: November 13, 2021, 05:35:53 am by BrianHG »
 
The following users thanked this post: promach

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7835
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #98 on: November 13, 2021, 05:23:47 am »
The stage piping and the rest of my coding of my controller is so efficient, that even overclocking the FPGA to 500MHz, even while drawing ellipses, the FPGA barely goes above room temperature even without a heatsink.  At 400Mhz, it barely consumes 200mw, never mind what 300 MHz must consume.  Remember, it is the rate of changes in logic state which consume power, not the static state of the command speed going through.

Take a really close look at when and how I even cycle my address and bank lines and control the OE timing and spacing of the data IO port and drive of the ODT line.  Everything is tuned for minimal transitions and proper central clearance and IO bus direction change with extra half cycle hold to achieve the cleanest, quietest, best possible communications with the DDR3.  Error free 500 MHz would have been otherwise impossible as Altera's max for a software DDRIO port is only supposed to be 300MHz.
« Last Edit: November 13, 2021, 05:30:34 am by BrianHG »
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #99 on: November 13, 2021, 06:36:46 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.

Could ACTIVATE command for a new bank be issued to DRAM when a write burst for other bank is still ongoing ?
A check on ACTIVATE timing does not suggest so though.

Besides, why odd DDR_CK clock cycles when 2 words depth FIFO is used ?


Quote
Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.

Why 4 words depth FIFO does not have the every 2nd DDR_CK concern ?


Quote
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.

I am confused with which other situations are not taken advantage of ?
« Last Edit: November 13, 2021, 07:12:52 am by promach »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf