Author Topic: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.  (Read 56315 times)

0 Members and 2 Guests are viewing this topic.

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
*****************************************************
*** NEW June 11, 2021.  BrianHG_DDR3_Controller V1.6 ***
*****************************************************
----------------------------------------------------------------
 :scared: A maddening over fnK lines of code....  :scared:
----------------------------------------------------------------
My Github has the full latest v1.6 release package:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/archive/refs/tags/v1.60.zip
-----------------------------------------------------
BrianHG_DDR3_Controller V1.6 Release, June 11, 2022.
Includes new BrianHG_GFX_VGA_Window_System.
-----------------------------------------------------
(Control ports are the same as v1.5)


Folder BrianHG_DDR3 now contains the new v1.6 controller.
Main source files:

Code: [Select]
- BrianHG_DDR3_v15_and_v16_Block_Diagram.png -> Illustration of module connections.

 - Includes these following sub-modules :
   - BrianHG_DDR3_CONTROLLER_v16_top.sv     -> v1.6 TOP entry to the complete project which wires the DDR3_COMMANDER_v16 to the DDR3_PHY_SEQ giving you access to all the read/write ports + access to the DDR3 IO pins.
   - BrianHG_DDR3_COMMANDER_v16.sv          -> v1.6 High FMAX speed multi-port read and write requests and cache, commands the BrianHG_DDR3_PHY_SEQ.sv sequencer.
   - BrianHG_DDR3_CMD_SEQUENCER_v16.sv      -> v1.6 Takes in the read and write requests, generates a stream of DDR3 commands to execute the read and writes.
   - BrianHG_DDR3_PHY_SEQ_v16.sv            -> v1.6 DDR3 PHY sequencer.          (If you want just a compact DDR3 controller, skip the DDR3_CONTROLLER_top & DDR3_COMMANDER and just use this module alone.)
   - BrianHG_DDR3_PLL.sv                    -> Generates the system clocks. (*** Currently Altera/Intel only ***)
   - BrianHG_DDR3_GEN_tCK.sv                -> Generates all the tCK count clock cycles for the DDR3_PHY_SEQ so that the DDR3 clock cycle requirements are met.
   - BrianHG_DDR3_FIFOs.sv                  -> Serial shifting logic FIFOs.

 - Includes the following test-benches :
   - BrianHG_DDR3_CONTROLLER_v16_top_tb.sv  -> Test the entire 'BrianHG_DDR3_CONTROLLER_v16_top.sv' system with Mircon's DDR3 Verilog model.
   - BrianHG_DDR3_COMMANDER_v16_tb.sv       -> Test just the commander_v16.  The 'DDR3_PHY_SEQ' is dummy simulated.  (*** This one will simulate on any vendor's ModelSim ***)
   - BrianHG_DDR3_CMD_SEQUENCER_v16_tb.sv   -> Test just the DDR3 command sequencer.                                 (*** This one will simulate on any vendor's ModelSim ***)
   - BrianHG_DDR3_PHY_SEQ_v16_tb.sv         -> Test just the DDR3 PHY sequencer with Mircon's DDR3 Verilog model providing logged DDR3 command results with any access violations listed.
   - BrianHG_DDR3_PLL_tb.sv                 -> Test just the PLL module.

 - IO port vendor specific modules :
   - BrianHG_DDR3_IO_PORT_ALTERA.sv         -> Physical DDR IO pin driver specifically for Altera/Intel Cyclone III/IV/V and MAX10.

 - Modelsim 'do' script files.
   - All setup_xxx.do files setup their associated Modelsim simulation.
   - All run_xxx.do   files quick re-compile and run their associated Modelsim simulation.


Folder 'BrianHG_DDR3_GFX_source_v16' contains my new BrianHG_GFX_VGA_Window_System multi-window system.
Main source files:

 - BrianHG_GFX_VGA_Window_System.pdf          -> Visual block diagram for the graphics system and layer-swapping illustration.
 - BrianHG_GFX_VGA_Window_System.txt          -> Full documentation for the VGA window system.

 - Includes these top hierarchy files:
   - BrianHG_GFX_VGA_Window_System.sv           -> Full window system where you drive the CMD_win_xxx controls via input ports.
   - BrianHG_GFX_VGA_Window_System_DDR3_REGS.sv -> Full window system where you drive the CMD_win_xxx controls via writing to DDR3 memory addresses through any multiport.

 - Modelsim 'do' script files.
   - All setup_xxx.do files setup their associated Modelsim simulation.
   - All run_xxx.do   files quick re-compile and run their associated Modelsim simulation.


New Arrow DECA board demo complete projects running the v1.6 BrianHG_DDR3_Controller conected to
the BrianHG_GFX_VGA_Window_System, all at 400MHz, all 100% timing requirements met.
Source folders:

 - BrianHG_DDR3_DECA_GFX_DEMO_v16_1_LAYER     -> Replaces the original ellipse demo, but now uses my new BrianHG_GFX_VGA_Window_System.
 - BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS    -> Improved ellipse demo using 2 translucent windows scrolling at different speeds.
 - BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS -> Example 16 window layer system where writes to the DDR3 controls the window's regs.
 - BrianHG_DDR3_DECA_RS232_DEBUG_TEST_v16     -> Single port DDR3 controller example connected to my RS232 debugger.
 - BrianHG_DDR3_DECA_PHY_SEQ_only_v16         -> (No multiport controller.) Bare minimum DDR3 PHY_SEQ controller connected to my RS232 debugger.

Test hypothetical builds for Cyclone III,IV,V to see if we can meet FMAX.
 - BrianHG_DDR3_CIII_GFX_TEST_v16_1_LAYER_Q13.0sp1  -> Cyclone III example using Quartus 13.0 sp1.
 - BrianHG_DDR3_CIV_GFX_TEST_v16_1_LAYER            -> Cyclone IV example using Quartus 20.1.
 - BrianHG_DDR3_CV_GFX_TEST_v16_1_LAYER_350MHz      -> Cyclone V example running only at 350MHz using Quartus 20.1.


- Get new 2 window layer ellipse demo here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4230856/#msg4230856

- Get new VGA video system demo configured for up to 16 window layers driven by my RS232_Debugger here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4233016/#msg4233016


- User 'davemuscle' created an Avalon interface wrapper comparing my DDR3 PHY_SEQ_only free controller to Altera's expensive UniPHY IP, see here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4108963/#msg4108963
(Note that both controllers should have achieve double the bandwidth and my 'BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS' demo already runs at double the efficiency.)

- User 'Nockieboy' has been working on integrating my controller with his Z80 8-bit GPU project, see his first multi-layer-window test here:
https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg3980567/#msg3980567
and tile mode test:
https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg4029019/#msg4029019



Check here for older compiled FMAX & LC/LUT usage stats:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg3649318/#msg3649318
« Last Edit: June 12, 2022, 08:29:43 pm by BrianHG »
 
The following users thanked this post: Ed.Kloonk, tom66, Jope, Berni, agehall, Omega Glory, Emo, nockieboy, asmi, dmendesf, bgm370, Ted/KC9LKE

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 316
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #1 on: July 14, 2021, 02:17:12 am »
Nice work. Is there a way to use it with VHDL under Quartus?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #2 on: July 14, 2021, 02:29:50 am »
Though I am not familiar with VHDL, I do know that in either Verilog or VHDL, you can initiate either code type module.

Just google: How do I instantiate a SystemVerilog module inside a VHDL design?

There has got to be good examples.  You just need to find out how to pass 2 dimentional arrays if you will be using my multi-port module unless you make a simple smaller verilog module with only the ports and settings you want shrinking what you call in your VHDL code.

If crossing code is vendor specific, maybe the best place to ask would be on Intel's forum.
I know Intel has instructions on how to insert VHDL code/modules into verilog source code.
« Last Edit: July 14, 2021, 02:38:19 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1896
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #3 on: July 14, 2021, 06:46:41 am »
Thanks for sharing, Please make a reop on github too :-+ :-+ :-+
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #4 on: July 14, 2021, 10:40:06 am »
Thanks for sharing, Please make a reop on github too :-+ :-+ :-+
Coming in a few days once I fix the FMAX bottleneck.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #5 on: July 14, 2021, 04:58:12 pm »
Nice work. Is there a way to use it with VHDL under Quartus?

I don't use Quartus, but otherwise, the answer is usually, yes. Brian's controller is written in SystemVerilog, so first you need to check whether this is properly supported by the Quartus version you're using. If it's recent enough, I guess it is.

Mixing HDLs is usually no problem. You'll just need to define a component interface for the controller in VHDL following the SV interface.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #6 on: July 14, 2021, 05:09:55 pm »
SystemVerilog, so first you need to check whether this is properly supported by the Quartus version you're using. If it's recent enough, I guess it is.
My code should work all the way back to QuartusII V9.0 from 2005...
I did design it to run on Cyclone II & III which requires QII V13.x or earlier.
 
The following users thanked this post: SiliconWizard

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 316
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #7 on: July 14, 2021, 07:25:50 pm »
I just bought a Deca Max 10 (arrived today, my birthday... Perfect timing :) and plan to use this code with it, but with VHDL. I'll probably use the latest Quartus unless something requires an older version.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #8 on: July 14, 2021, 08:45:23 pm »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #9 on: July 14, 2021, 09:38:58 pm »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
I just need to find a ECP5 board with at least 1 DDR3 ram chip.

The code was designed to be ported to old and new FPGAs alike, however, the capture read data method I used to be compatible across all basic FPGA limits the DDR3 controller to around 500MHz / 1gtps.  Higher than that and it would be recommended to change the read data sampling to using the DQS strobe input as a clock instead of as a latch-enable.

The DDR3 controller alone, 1 read, 1 write port, running a 16bit DDR3 512mb ram chip in Quartus uses:
3480 logic cells in the HDMI out ellipse demo.
512 LUT-Only LCs,
1806 Registers-Only LCs
1166 LUT/Registers LCs

The 3480 number may be inflated since it is connected to the Multiport module, and that one eats a crap-load of registers as it it has independent caches on each port and it's a huge cross-bar matrix.
In the Ellipse demo, it eats another ~3k logic cells.
« Last Edit: July 14, 2021, 10:46:11 pm by BrianHG »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #10 on: July 15, 2021, 01:08:49 am »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
I just need to find a ECP5 board with at least 1 DDR3 ram chip.

Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...

The DDR3 controller alone, 1 read, 1 write port, running a 16bit DDR3 512mb ram chip in Quartus uses:
3480 logic cells in the HDMI out ellipse demo.
512 LUT-Only LCs,
1806 Registers-Only LCs
1166 LUT/Registers LCs

The 3480 number may be inflated since it is connected to the Multiport module, and that one eats a crap-load of registers as it it has independent caches on each port and it's a huge cross-bar matrix.
In the Ellipse demo, it eats another ~3k logic cells.

Ok, should give me a rough idea of what to expect. Doesn't look too bad.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #11 on: July 15, 2021, 03:56:16 am »
Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...
Design you own? ;D It won't be cheap either - at least initially - but it will surely be a lot of fun :-+ And if you team up with others, it will help to spread the NRE around, as well as speed up the process.

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #12 on: July 15, 2021, 09:42:04 pm »
Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...
Design you own? ;D It won't be cheap either - at least initially - but it will surely be a lot of fun :-+ And if you team up with others, it will help to spread the NRE around, as well as speed up the process.
I know how this may sound, but from my personal point of view, I would like to first get my controller working on Lattice, then worry about a custom PCB.  For me, having a PCB with a proven functional wired DDR3 setup allowing me to plug and play is a preferred first step.  With the DECA board, this was one thing I did not have to second guess any buggy behavior in my code right at the beginning just powering up the DDR3.  The initial failure of function was that I was using the DDR_IO primitive for Cyclone II/III/IV/V, not the newer primitive used by the MAX10.  Quartus did not complain.  It compiled and even simulated properly both at the logic level and gate level.  Yet, the DDR3 was doing nothing.  Using the Cyclone's DDR_IO primitive meant that nothing was outputting on the data lines in the real MAX10 FPGA, but, the input was still working.  This wasted over a week and if I encountered this problem on my home-made PCB, it might have taken me an extra month to figure out that the DDR_IO primitive which compiled and simulated fine was the culprit.

Lattice tools and FPGAs are new to me, so I do not know what would go wrong where.  An existing eval PCB is a preferred first step removing a piece in the debug equation.
« Last Edit: July 15, 2021, 09:45:39 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #13 on: July 15, 2021, 11:24:01 pm »
You can always use a trial version of their DDR3 controller to verify the hardware. I would also use it to confirm that the pinout will work.
« Last Edit: July 15, 2021, 11:27:14 pm by asmi »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #14 on: July 16, 2021, 12:38:51 am »
I tend to agree with BrianHG here. One thing to debug at a time...
Now I suppose if you're familiar with routing DDR3 stuff, going directly for a custom board should not be a problem. But I'm not. (Now I guess possibly the design could be shared, and someone else could do the routing...)
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #15 on: July 16, 2021, 02:46:06 am »
I tend to agree with BrianHG here. One thing to debug at a time...
Like I said, using a trial version of controller solves the hardware checkout problem. I use this method all the time, albeit with Xilinx devices.

Now I suppose if you're familiar with routing DDR3 stuff, going directly for a custom board should not be a problem. But I'm not. (Now I guess possibly the design could be shared, and someone else could do the routing...)
Or you can do the routing yourself and ask someone to check it once completed, and/or perhaps ask some questions in case DDR3 layout rules are not sufficiently clear. Otherwise it's going to be a classic chicken-and-egg problem when you won't attempt a DDR3 design because you have no experience, but you can't gain experience without actually doing it.

There is nothing particularly difficult about it, especially if you go for a relatively simple design - like a single DDR3 memory device, no ADDR/CTRL termination, and low'ish (as far as DDR3 standard goes) frequency. It's just a handful of rules you've got to follow, and that's pretty much it.

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 316
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #16 on: July 17, 2021, 03:31:01 am »
Asmi, I feel exactly that about using DDR 3. Do you have a list of layout rules and a description about how DDR3 ? I surely understand SRAM memories and more or less get dynamic memories, but I have no idea what other complications exists for synchronous memories.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #17 on: July 17, 2021, 05:19:23 am »
Asmi, I feel exactly that about using DDR 3. Do you have a list of layout rules and a description about how DDR3 ? I surely understand SRAM memories and more or less get dynamic memories, but I have no idea what other complications exists for synchronous memories.
DDR3 memory interface consists of a bunch of traces, which are divided into several groups: a single group typically called "Address/control" (or "Command/address/control") which, as the name suggests, consists of address lines (A0-A15, depending on module's capacity), bank address lines (BA0-BA2), and command lines (CKE, RAS, CAS, WE, CS, ODT); and then one or more of "byte lanes" (also called "DQ group"), each one consisting of 8 DQ (data) lines, associated DQS and DQS# lines, and a data mask DM. Attached is a good image from iMX6 datasheet showing example rules for length matching in case of a single x16 DDR3 memory device.

0. Remember that the length matching is about signal length (sometimes also called electrical length) a.k.a. propagation delay, not necessarily physical length! This is important to keep in mind in case you route traces within a group on different layers - signals on outer layers travel faster than on internal ones. Also remember that a portion of via height that the signal is going along also needs to be included - it's typically called via z-length. Not all eCAD tools take this into account, so you have to be on a lookout for these things.
1. The clock line needs to be at least as long as address/control lines are. I typically match it as part of the group, but it can be a bit longer.
2. Traces within address/control lines need to be matched to ±10 ps.
3. All byte lanes has to be no longer than address/command traces.
4. All signals within a single byte lane need to be matched to ±10ps.
5. There is no requirement to match traces of different byte lanes.
6. Traces within differential pair (CK/CK#, DQSn/DQSn#) needs to be matched to ±2ps.

As far as impedance goes, depending on a frequency and a specific controller it can be 50 Ohm or as low as 40 Ohm (the latter is a Xilinx requirement for 7 series FPGAs for DDR3 frequencies above 666 MHz, 50 Ohm is good enough for speeds below that).

If you intend to use several memory devices in your interface, things get more complicated because with DDR3 there are two possible topologies for address/control lines - a balanced tree (like DDR2 and below), and a fly-by (new for DDR3). Technically all DDR3 controllers are supposed to support fly-by topology (which is easier to route), but in reality there are some which don't support it.

Finally, it might be required to implement a termination for address/control lines (DQ, DQS and DM lines don't need one because memory devices have dynamic on-die termination controlled by ODT input). Whether it's required or not can be determined by SI simulations, but typically you can get away without it for a single component which is close to the controller. For multi-chip interfaces you will more likely than not need to implement it.
« Last Edit: September 13, 2021, 09:01:03 am by asmi »
 
The following users thanked this post: dmendesf

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #18 on: July 17, 2021, 02:48:29 pm »
FMAX bottleneck update in the DDR3_PHY_SEQUENCER:  Currently, I'm running individual serial shift timers which are tested before allowing a command request from the 'BrianHG_DDR3_CMD_SEQUENCER', which runs at half DDR_CK clock frequency, to go out.  There are 6 timers for each of the individual 6 main DDR3 commands which are selective reset to the new appropriate values when each new command is sent.  This design has allowed me to request any command at any time and the DDR3 would be controlled as quick as possible.  However, these timers have created a 'nexus' lump where an acknowledgement flag dependent on which command went out needs to get back to the BrianHG_DDR3_CMD_SEQUENCER's cmd out fifo as fast as possible which hits the FMAX as the compiler tries to work out the routing across 2 clock domains.

Currently, 1 test I am doing is to run the timers at half clock frequency on the same clock as the BrianHG_DDR3_CMD_SEQUENCER and pass a 'half-time' delay bit to the DDR3_CK clock command output stage.  This has erased the FMAX problem, and even allowed compiles where I get a legit 400MHz FMAX.  However, a good number of times, where we have delays which only require an ODD number of DDR3_CK clocks, say 5, have averaged up to 6.  This is not good as when using a 300MHz controller, every clock is precious.  I'm currently trying to debug and eliminate this 1 odd clock cycle penalty.  Once done, a beta version 0.95 will be uploaded where 350MHz should be easily achievable & 400MHz with compiler effort turned up to max and careful interfacing with my controller.

It is too bad the original code had this 1 problem as the solution was sweet without any coding hacks, though it was problematic to get Quartus to properly hit a >300MHz core with tc of 85 degrees.  Unless an idea spark comes in on how to solve the 'cmd_ack' bottleneck with the current design, I will once again have to muddy clean code for work-around solutions.
« Last Edit: July 17, 2021, 02:54:14 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #19 on: August 05, 2021, 02:05:06 am »
New release:
**********************************
Beta Release V0.95, August 4, 2021.
**********************************

Has now been uploaded at the top of this thread:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/
Changes are written there and the history is in the README file.
Please make sure you are downloading the _V0.95 as I kept the earlier revisions available for download as well.


Utilization report DDR3 controller inside the HDMI out ellipse demo:

Old V0.9.
3480  logic cells in the HDMI out ellipse demo,
512    LUT-Only LCs,
1806  Registers-Only LCs,
1166  LUT/Registers LCs.

New V0.95.
3100  logic cells in the HDMI out ellipse demo,
478    LUT-Only LCs,
1826  Registers-Only LCs,
796    LUT/Registers LCs.

New V0.95 True Stand-alone DDR3_PHY_SEQ.sv DDR3 controller with 128bit read and write port.
2082  logic cells,
515    LUT-Only LCs,
984    Registers-Only LCs,
583    LUT/Registers LCs.
« Last Edit: August 05, 2021, 02:32:30 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #20 on: August 06, 2021, 11:03:19 pm »
A minor issue has bee found with 'DDR3_PHY_SEQ.sv' V0.95 when it swaps ram banks.  It just occasionally delays the next read or write command by 2 DDR_CK clocks.  This is minor as there is no data corruption and you may revert to the V0.90 if you truly need to.

I'm working on it now with a major improvement in FMAX and again a shrinkage of used logic cells.  It should be out in a day or 2, so, I wont release an intermediate patch.
 
The following users thanked this post: nockieboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #21 on: August 14, 2021, 07:40:35 am »
Update:  Just cleaned up the main BrianHG_DDR3_PHY_SEQ.sv controller and sequencer.

All FMAX limitations and cross clock domain problems have been cleaned out.  Easily achieved proper 350MHz controller and 400MHz with only a few cross-clock domain signals not making the cut @ TC85 degrees by less the 0.070ns, yet it runs clean.  (The signals being some of the OEs for the write data, however, my codes is programmed to turn on the OE 1 clock cycle early, and turn it off 1 clock cycle late meaning an error here will not occur.  The core and all data + IO paths clear the timing analysis fine above 400MHz.)  The logic cell and LUT count has also shrunk.  The change in code has generated what appears as a minor occasional 1 DDR_CK clock delay occasionally when bursting after the first BL8, however, what is going on is if a read or write burst begins on an odd DDR_CK phase compared to it's half-clock interface clock, or the tRCD, tCAS happens to be an odd number, an alignment to the phase of the burst size of 8 may realign to the even matched phase after the first BL8 burst.  The old full speed V0.9 controller stacked a few additional commands in advance and would retain the 'ODD' alignments generating an unbroken consecutive burst making the appearance of a tighter ram controller by packing everything back-to-back.  After the release of version 1.0, I will see if there is a cheap method of packing the commands in this way once again without having to move the timers back to the full speed DDR_CK clock rate.

The one issue remaining is the multi-port handler 'BrianHG_DDR3_COMMANDER.sv'.  It's current FMAX has trouble passing 130MHz if it is configured with 2x128 bit read, 2x128bit write ports, all smart options enabled @ TC85 degrees.  I'm thinking of a way to get this one to compile robustly above 150MHz without any big compromise so you may at least use it with ram running at 300MHz and half rate.  Right now, it can work fine at 75MHz, or even at 100MHz with the ram at 400MHz.  (This means no timing violations at any temperature.  The 130MHz builds always work error free at 150MHz, but this is not what we want.)  If I cannot come up with a solution, I will leave it as it is and create a secondary FAST multi-port handler 'BrianHG_DDR3_COMMANDER_fast.sv' which will be a strict 2 read, 2 write port device targeting a 200MHz FMAX allowing half-rate support with 400MHz ram.  This multiport will be designed to be chain-able where you can use 3 of them to give you 4 read, 4 write ports, or 7 of them for 8 read, 8 write ports.

Note that the Altera's Cyclone & MAX FPGA fabrics are really slow and low power, my current code would probably be at least 50% faster on other vendor's FPGAs, or even Altera's Arria/Stratix FPGAs.  Achieving a full consistent 400MHz core builds without the paid version of Quartus for manually placing cells, even though that portion is truly only software serializers & IO port controller is still difficult as it still has to be controlled by a 200MHz section and bridge the 2 clock domains in both directions.
« Last Edit: August 14, 2021, 10:12:26 pm by BrianHG »
 

Offline Omega Glory

  • Regular Contributor
  • *
  • Posts: 73
  • Country: us
    • Ezra's Robots
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #22 on: August 15, 2021, 04:27:39 am »
Have you thought about hosting the code on GitHub? I bet a lot of people would find your work very helpful, and that would be an easy and free way to distribute it more widely than on this forum. In any case, amazing work!

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #23 on: August 15, 2021, 06:23:47 am »
Have you thought about hosting the code on GitHub? I bet a lot of people would find your work very helpful, and that would be an easy and free way to distribute it more widely than on this forum. In any case, amazing work!
We'll see in a few days after I release v1.00.
I never used GitHub before and would probably need to venture around first.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #24 on: August 16, 2021, 08:17:48 am »
Get the new BrianHG_DDR3_Controller_V1.5 demo over here: https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg3785711/#msg3785711

 >:D  >:D  >:D  >:D  >:D  >:D
 >:D 500MHz/1GTPS >:D
 >:D  >:D  >:D  >:D  >:D  >:D

Error free, well, on my DECA board anyways...
So much for Altera's software DDR3 300MHz limit.
Though, the reported FMAX at 0C reads only 461MHz.

If you download & program the attached ellipse-demo, scoping the SMD termination resistor tied to the DDR_CK line should show 500MHz.

Arrow DECA DEMO .sof programming file instructions:
(If the picture is still or scrolling noise, just flip 'Switch 0'.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.)


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.


Note: V3 just fixes a reset bug when using the RS232-Debugger.  Nothing else has changed.
« Last Edit: November 01, 2021, 10:08:38 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #25 on: August 16, 2021, 06:49:02 pm »
 >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D
 >:D   400MHz/800MTPS, Zero timing violations with 85C model   >:D
 >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D



V1.00 coming...
Just cleaning up and increasing the clearance by a bit more...
« Last Edit: August 16, 2021, 09:30:45 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #26 on: August 16, 2021, 10:09:38 pm »
Huge clearance boost, latest V1.00 build @ 400MHz...



Just compile testing a few different configuration, then some documenting the changes and the full V1.00 should be released tonight.

If it weren't for CLK[2] having that clock restriction to 405.19MHz (read data in clock) instead of the 450.05MHz restrictions on CLK[0] and CLK[1] (data out clocks),  I could have made a 450MHz controller with no timing violations at 85C.  However, make no mistake that this controller does run at 450MHz error free and can even be overclocked to 500MHz  (PLL Maxes out at 475MHz according to data sheet).  The minimum period restrictions are only the limitations of the DDR IO PIN buffer itself and it's required data hold time.

At 450MHz & 500MHz, my tuneable read data clock's has 5 out of 8 error free tuning positions.  At 400MHz, it is 6 out of 8 while at 300MHz, it is 7 out of 8.  Note that 8=theoretical perfect all 180 degree error free positions, nearly impossible unless I begin to use individual DQ PIN deskew-tuning calibration with picosecond alignment.  (Each tuning step is 22.5 degrees.)
« Last Edit: August 16, 2021, 10:53:53 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #27 on: August 17, 2021, 12:41:07 am »
Slowest fabric -8 build using the device 10M50DAF484C8GES set to 300MHz.



Note that the DECA eval board uses the fastest -6 Max10 fabric.
Note that Altera doesn't support DDR3 on -8 Max10/Cyclone V FPGA as the DDR buffer transceivers max out at 550MTPS instead of the required 600MTPS.  Though, my source code has the 'DDR_TRICK_MTPS_CAP' parameter function to allow you to get around this (used to break Quartus' 600MTPS limiter on my 800MTPS builds), otherwise, you could not do a full compile.  (The fitter and timing analyzer still performs the rest of their function properly recognizing the requested true 300MHz core frequency.)

Yes, you can now have a DDR3 controller on any slowest -8 Cyclone III / IV / V / Max10.
« Last Edit: August 17, 2021, 12:46:31 am by BrianHG »
 

Online Daixiwen

  • Frequent Contributor
  • **
  • Posts: 351
  • Country: no
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #28 on: August 17, 2021, 06:22:07 am »
That's really impressive for Cyclone devices.  :-+
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #29 on: August 17, 2021, 07:08:55 am »
That's really impressive for Cyclone devices.  :-+
LOL, Cyclone III/IV has the fastest fabric in the series and can outperform Max10 & Cyclone V.  IE, it should compile with better FMAX than what I have accomplished here, though it's only about 10% faster.

It's been hell.
I've started our around 2 months ago with something which only occasionally worked after each build, even when under clocking the DDR3 at 250MHz.

Then, the 300MHz got a little more stable, but 350MHz was a fluke and I got to see 400MHz initialize once as well as a 450MHz fluke with horrible FMAX report and no other code in the FPGA like the graphics geometry engine and video output.  Once I begun with the 1080p output, it took a few weeks to stabilize the 300MHz, but adding the ellipse geometry unit killed that.  500Mhz was a fable dream.

Finally, 300MHz was stable and 350 not too reliable while 400 was dead until last week.  Some concentration and fine tuning, now 400MHz is a breeze and 300MHz is a given.  Even 500MHz surprisingly worked first shot, though, right now, 450MHz has a weird addressing bug, but it's not with the DDR3 controller, but in the extended 16 read/write channel multiport front end.

I'm almost done with some final tests and tweaks, I should have a rev 1.00 out in a day where it's only limiting factor is the top FMAX speed of the 16 channel multi-port module.  It looks like the best solution here will be to make an alternate fast strip-down version with 2 read, 2 write port designed for speed and pyramid style stacking support so you can have as many ports as you like, though ports deep in the pyramid will have an extended sequential pipe delay.  Stack it the way you like and you can have at least 1 read & write port right next to the DDR ram controller with a single pipe stage.
« Last Edit: August 17, 2021, 11:08:20 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1896
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #30 on: August 17, 2021, 11:28:33 am »
Thumbs up BrianHG :-+ Github is waiting for your rev 1.0 >:D
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 
The following users thanked this post: BrianHG

Offline nockieboy

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #31 on: August 17, 2021, 11:55:08 am »
Latest version of the project tested and working fine on the DECA at 500MHz!!!!!  :wtf:

https://youtu.be/a1k106CNylI
« Last Edit: August 17, 2021, 04:58:21 pm by nockieboy »
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #32 on: August 19, 2021, 09:10:51 am »
V1.00 update.

Just cleaned up a bunch of stuff and also got rid of all multicycle paths in the .sdc as they are no longer needed.

400 MHz with 100% timing in the black is easy to achieve, though sometimes you may need to change compile settings or change the compiler beginning 'SEED' number as it is still a stretch for Cyclone/MAX10 devices.  300MHz is a given...  Though overclocking to 450/500Mhz functions with a -6 MAX10, it is not something I'm officially supporting.

I have some documenting to do and a compile test for a Cyclone IV to verify that we reach the same FMAX range.  Then I will upload V1.00.
« Last Edit: August 19, 2021, 09:15:51 am by BrianHG »
 

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 316
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #33 on: August 22, 2021, 01:05:10 am »
Tried the 500MHz version in my DECA10 and it worked flawlessy. Looking formward for the 1.0 release. I plan to make a VHDL wrapper for it. Congratulations!
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #34 on: August 22, 2021, 10:54:38 am »
V1.00 update:

One last thing to do.  I just need to do a Cyclone IV E build test tonight to make absolutely sure we reach FMAX with that series as well, then .zip everything and I'll be ready to upload everything.

IE: I need to re-assign a bunch of IOs from my Ellipse DECA demo switched to a Cyclone IV FPGA following Altera's recommended connections for external DDR memory, then compile...

I will also be doing the same for Cyclone V as that FPGA seems to be slower/lower power (yet higher density and designed for DDR3) than the Cyclone IV.
« Last Edit: August 22, 2021, 11:02:18 am by BrianHG »
 
The following users thanked this post: dmendesf

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #35 on: August 23, 2021, 03:42:38 am »
Ok, Cyclone III and Cyclone IV compiles and reached a true 400MHz FMAX.

But for some FN reason, Cyclone V's PLL isn't compatible with Cyclone III/IV/10 LP/Max 10/Arria II.

Yes, there is a different PLL megafunction dedicated just for the Cyclone V.  If I use the normal 'altpll', it wont support phase stepping on a Cyclone V.  So, I need to use Cyclone V's 'altera_pll' function which uses a shit load of strings to define it's settings and yet still has the same identical phase step tuning function as the 'altpll' other than it has many more clock outputs.  Something which I might not be able to handle with my current parameters auto-pll generator system.

Altera drives me nuts.  Max 10 will simulate and compile correctly using the old 'altddio_bidir', but will not output anything unless you use the new 'altera_gpio_lite'.  Cyclone V needs the new PLL, and at least wont compile, but uses the old 'altddio_bidir', but the Max 10 uses the old PLL but new IO scheme...

Once I get the Cyclone V pll working and test the FMAX, I'll upload V1.00.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #36 on: August 24, 2021, 12:50:18 am »
V1.00 update:
Arrrrgggg, this Cyclone V -6 is a piece of crap.

Ok, I can reach 400MHz & 200MHz for the controller clocks at 0C, after an 11 minute compile, and after I falsely set the read clock phase offset to 1ps so that it would not remove that tune-able clock and merge it with my DDR_CK clock 0. (Yes, that BS is a thing...) But somehow, my multiport interface module's FMAX is 41MHz?

WTF? 41MHz?

Even the Cyclone III achieved 115MHz for this clock while CIV got 118MHz and Max10 got 114MHz.
How can the next generation of Cyclone devices drop so terribly in performance, yet they improved the IO speed compared to the Max10's 450MHz and CIII/CIV's 500MHz limit.  And yet, these guys compile in 4-5 minutes.  And only 3.5 minutes for the CycloneIII only using 1 CPU core with the old Quartus 13.0sp1.

It's 11 minutes a shot for the Cyclone V compiles.  I though I would just have to add and adapt it's PLL function.  Not investigate how the hell Quartus's fitter decides to optimize my core into a snail of a design.
« Last Edit: August 24, 2021, 04:29:17 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #37 on: August 25, 2021, 06:38:12 am »
V1.00 update.

Ok, I got a Cyclone-V-6 to run at 300MHz, just barely with advanced smart banking feature in the multiport module disabled.  (The stand alone DDR3_PHY ram controller can still run at 400/200MHz and it's smart bank management is always enabled.  It's been designed to run at speed on the most pathetically slow FPGA s ever...)

I'll stick V1.00 here, document the updates and upload 1.00.

My DDR3 PHY stand-alone controller can do over 300MHz easy on the C-V -6, it's just the multiport hub which has a devastating FMAX of 50% speed.  I'm also going to upload it to Intel's support with the C-IV-6 versions and ask why there is such a huge speed difference.

Setting up it's PLL was a nightmare with built in system bugs, parameters as fancy non-standard string types & the compiler simplifying out some crucial clocks which I needed to invent multiple work arounds.

V1.00 coming tonight...
« Last Edit: August 25, 2021, 07:29:42 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #38 on: August 26, 2021, 02:28:54 am »
I don't know WTF is going on with this Cyclone V build, but, I have this once build where the 85C model has an FMAX of 204.33MHz and the 0C model has an FMAX of 203.29 MHz.   Yes.  You read right.  The colder model is 'SLOWER' than the hot model.  WTF?

Ok, need to do a bunch of builds...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #39 on: August 27, 2021, 03:56:57 am »
So, reporting back somewhat later than I expected, I have the bouncing ellipses running on the screen at 1080p, and it's FREAKING AWESOME  :clap:

Regarding the blue eye-destroyers, LEDs 3 through 7 are lit :)
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #40 on: August 27, 2021, 07:44:10 am »
V1.00 FMAX results:

Files,  Description:

300MHz, Hypothetical Cyclone III-8 DDR3 System scrolling ellipse build to verify FMAX.
(Uses Quartus 13.0sp1)


400MHz, Hypothetical Cyclone III-6 DDR3 System scrolling ellipse build to verify FMAX.
(Uses Quartus 13.0sp1)



300MHz, Hypothetical Cyclone IV-8 DDR3 System scrolling ellipse build to verify FMAX.


400MHz, Hypothetical Cyclone IV-6 DDR3 System scrolling ellipse build to verify FMAX.



300MHz, functional DDR3 System scrolling ellipse with optional RS232 debug port demo for Arrow DECA eval board, but compiled for a -8.


400MHz, functional DDR3 System scrolling ellipse with optional RS232 debug port demo for Arrow DECA eval board.




400MHz, Hypothetical Cyclone V-6 DDR3 System scrolling ellipse build to verify FMAX.
( :-- FMAX FAILED  :-- )  Take a look at the multiport clock.


300MHz, Hypothetical Cyclone V-6 DDR3 System scrolling ellipse build to verify FMAX.
(PASSED, but with I had to disable some smart multiport features and this is a CV-6 :--)


300MHz, Hypothetical Cyclone V-7 DDR3 PHY Only controller with RS232 debug port build to verify FMAX.
(300MHz only, no multiport )  A CV-7  :--, not even a -8.  Compiling for a -8 leaves 4 clock domain crossing nets in the red even though the rest of the design including IO ports easily pass.


375MHz, Hypothetical Cyclone V-6 DDR3 PHY Only controller with RS232 debug port build to verify FMAX.
(375MHz only, no multiport  :-- ) Compiling for 400MHz reveals ~8 clock domain crossing nets in the red even though the rest of the design including IO ports easily pass.  In fact, this FPGA should have reached 500MHz.



I will be sending my code to Intel to see why their Cyclone V only gets 60% speed on my multiport commander module.  Maybe there is something in the compiler setting to help as the FPGA fabric of Cyclone V is radically different compared to all other Cyclone & MAX FPGAs.


Clocks [ 0 ],[ 1 ],[ 2 ] are the 400MHz DDR_CK, Write clock, read clock.
Clock  [ 3 ] is the DDR_CLK_50 200 MHz half speed clock, the interface speed of the Brian_DDR3_PHY_SEQ.
Clock  [ 4 ] is the DDR_CLK_25 100MHz quarter speed clock, currently set for the multiport COMMANDER module.
« Last Edit: September 03, 2021, 08:10:03 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #41 on: August 27, 2021, 07:55:07 am »
BrianHD_DDR3 V1.00 system FPGA utilization reports:

300MHz_PHY_only.png - DDR3 controller with 1 read & write port to an 8 bit device build.


300MHz-8_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 300MHz Max10-8.


300MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 300MHz Max10-6.


400MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 400MHz Max10-6.


450MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 450MHz Max10-6.


500MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 500MHz Max10-6.



I've included a few builds.  You will notice that the LC/LUT increases with frequency.  This is most likely the compiler adding duplicate parallel logic cells to improve FMAX timing.


I highlighted the 'BrianHG_PHY_SEQ' module which tells you the full LC/LUT count is you were to build a stand-alone 1 read/write port DDR3 controller.

The COMMANDER module is the multiport handler configured with 2 read and 2 write ports in the ellipse demo.
« Last Edit: August 27, 2021, 07:53:50 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #42 on: August 27, 2021, 08:35:35 am »
******************************************
******** Finally, V1.00 release here: ***********
******************************************
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/

Things to do:

a)  I will be contacting Intel's technical support about Cyclone V's poor 60% speed FMAX performance for my 1 multiport section in my design as seen in the above screenshots with the red arrow.  I'll see if something can be done.

b)  As described in my v0.95 notes, I will look into designing my simpler pyramid stack-able 2:1 multiport module aimed to achieve an FMAX of at least 200MHz allowing multiport running at Half rate interface controller speed, but with a loss of a few smart advanced features.

c)  I will download and install the latest Lattice Diamond and see if I can adapt and get my controller to compile and simulate there.  The LFE5U-45F/LFE5U-85F at 45kgate & 85kgate are just such a price bargain at 16$ and 36$ each respectively and if my DDR3 controller runs fast there, it is the next route to take.
« Last Edit: August 28, 2021, 11:29:15 pm by BrianHG »
 
The following users thanked this post: nockieboy, SpacedCowboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #43 on: August 28, 2021, 09:21:17 pm »
OMG, learning how to use Github with it's esoteric project generation and file entry is not treating me well.  I'm wondering if it is worth the hassle.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #44 on: August 29, 2021, 01:46:00 am »
Arrrg, is Github a place to share some projects / source code, or, is it a place where you have to learn a bunch of their own esoteric terms and learn an entire new mix of text and web-page click OS just to post some .zop or source code.  And, I don't see any official support for HDL firmware languages, though I do know people do post such code there.

Ok, can I assume I am not allowed to upload a .zip file to GitHub?
How do I get Quartus binaries uploaded to my 'Repository'?
Or, do I somehow place these files within my listed 'Projects' which appear to have no consequences or arent even searchable?
« Last Edit: August 29, 2021, 01:58:30 am by BrianHG »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3971
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #45 on: August 29, 2021, 02:48:14 am »
On github just create a new empty repository (in the menu on the top right of the web page). Hit the green CODE button and copy the ssh URL you find there there.

In your local git repo type "git remote add github <url>" then "git push github"

If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
« Last Edit: August 29, 2021, 02:50:17 am by brucehoult »
 
The following users thanked this post: BrianHG

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1896
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #46 on: August 29, 2021, 07:22:00 am »
As A side note on how to add every thing in the git repo you can do as flow
if you need to not include some parts from github, like the project outputs you can use a .gitignore file with the directories and files that you don't want to include.


git init
git add .
git commit -m "Ininit repo"
git remote add github <url>
git push github
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #47 on: August 29, 2021, 09:22:50 pm »
If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
What do you mean by ', not output files.'?

For example, my Quartus projects do have some binary files and may include a hex file.

Also, when I copy and paste ASCII files/show readme files, why does all my carriage returns disappear?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #48 on: August 30, 2021, 12:06:14 am »
You can use TortoiseGit utility if you don't feel like messing with command line. That's what I use all the time - I have a Synology NAS which has a private Git server installed. I use Git even for projects that I don't intend to ever publish, as it makes development much easier.
 
The following users thanked this post: BrianHG

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3971
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #49 on: August 30, 2021, 02:19:13 am »
If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
What do you mean by ', not output files.'?

You don't include output files in a SOURCE CODE control system because they are not source code. The outputs of compilation or synthesis and routing, or whatever it is your project does (I'm talking in general terms here) change every time you run the build process. AND anyone who checks out the project will generate them themselves, from the source files.

If you want to put bitstreams or something somewhere so that people don't have to run synthesis themselves, that's a binary release, which is a different thing. Github has a different place to put releases (or you can put them on any web or ft server etc).

Quote
For example, my Quartus projects do have some binary files and may include a hex file.

If those are inputs to the process then that is fine.

Quote
Also, when I copy and paste ASCII files/show readme files, why does all my carriage returns disappear?

What do you mean "copy and paste"?

Whatever you put into git comes back out absolutely byte identical to what you put in. There is no line ending translation. Git handles binary files just fine, and in fact treats text as binary e.g. diffs are by bytes not lines.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #50 on: August 31, 2021, 01:57:51 am »
You can use TortoiseGit utility if you don't feel like messing with command line. That's what I use all the time - I have a Synology NAS which has a private Git server installed. I use Git even for projects that I don't intend to ever publish, as it makes development much easier.
Ok, nice.  I went to Google tutorials on the tool.  Fine.  But still, just even the setup for TortoiseGit is super esoteric.  It's like I'm going back to the 90's with systems setups designed for insiders only.  A putty setup enviroment and public/private key generation + paste and copy URL address from my web browser display status of my generated repository into TortoiseGit.  Seriously, it's 2021.  Could you imagine having to go through this trouble to securely purchase anything on Amazon?

Still, it looks like I'm going to end up using TortoiseGit.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #51 on: September 01, 2021, 03:54:38 am »
Ok, nice.  I went to Google tutorials on the tool.  Fine.  But still, just even the setup for TortoiseGit is super esoteric.  It's like I'm going back to the 90's with systems setups designed for insiders only.  A putty setup enviroment and public/private key generation + paste and copy URL address from my web browser display status of my generated repository into TortoiseGit.  Seriously, it's 2021.  Could you imagine having to go through this trouble to securely purchase anything on Amazon?

Still, it looks like I'm going to end up using TortoiseGit.
This is a typical example of what happens when you let developers design a user interface. Even after using it for many years professionally, I still stumble upon it every once in a while.

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #52 on: September 01, 2021, 05:44:46 am »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #53 on: September 05, 2021, 04:19:36 am »
Finally!!!!  (Damn javascript bug finally bypassed...)

My GitHub repository release:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: September 05, 2021, 08:06:57 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #54 on: September 09, 2021, 08:06:24 am »
If anyone has seen some cheap or free Lattice ECP5 LFE5U-25/45/85F boards with at least 1 16 bit DDR3 ram chip and preferably and HDMI output, please link them here...

I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.
« Last Edit: September 09, 2021, 08:11:50 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1896
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #55 on: September 09, 2021, 11:04:48 am »
Brian play with Gowin too, they are the cheapest,  also they promised to spin out a new chip this year with 12Gb SERDES and internal DDR4 Memory!
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #56 on: September 09, 2021, 05:26:01 pm »
Brian play with Gowin too, they are the cheapest,  also they promised to spin out a new chip this year with 12Gb SERDES and internal DDR4 Memory!
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram.
 
The following users thanked this post: ali_asadzadeh

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #57 on: September 09, 2021, 05:28:55 pm »
If anyone has seen some cheap or free Lattice ECP5 LFE5U-25/45/85F boards with at least 1 16 bit DDR3 ram chip and preferably and HDMI output, please link them here...

Free? :-DD

There are very few boards with an ECP5. You have the Lattice dev boards, but they all come with an LFE5UM AFAIR.
Then there is the ULX3S, but no DDR3, just SDRAM. Also boards I mentioned earlier (which are repurposed boards from Colorlight) that you can find on Aliexpress. I have a couple. They are fine. HDMI connector, but no DDR3, only SDRAM...

One such board is the OrangeCrab, ECP5-25F, 1Gbit DDR3, no HDMI connector though, and limited IOs: https://1bitsquared.com/products/orangecrab

I don't know of anything else on the market at the moment.


 
The following users thanked this post: BrianHG

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1896
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #58 on: September 10, 2021, 08:19:23 am »
Quote
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram
It's free but not open source. ;)
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline dolbeau

  • Regular Contributor
  • *
  • Posts: 86
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #59 on: September 11, 2021, 06:15:23 pm »
I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.

Zero idea about Lattice licensing, but the TrellisBoard has a LFE5UM5G-85F and is usable with the Yosys opensource toolchain, including the DDR3 SDRAM. So is the ECPIX5, which (unlike the TrellisBoard) is commercially manufactured.

The FOSS toolchain might not be able to reach the performance of the Lattice toolchain, but might be an interesting option to try nonetheless.
 
The following users thanked this post: BrianHG

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #60 on: September 11, 2021, 06:47:30 pm »
I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.

Zero idea about Lattice licensing, but the TrellisBoard has a LFE5UM5G-85F and is usable with the Yosys opensource toolchain, including the DDR3 SDRAM. So is the ECPIX5, which (unlike the TrellisBoard) is commercially manufactured.

As we mentioned, Lattice Diamond requires a subscription license for the LFE5UM. You can see this here: https://www.latticesemi.com/en/Products/DesignSoftwareAndIP/FPGAandLDS/LatticeDiamond

Those are nice boards otherwise, but the above point is certainly a problem...

The FOSS toolchain might not be able to reach the performance of the Lattice toolchain, but might be an interesting option to try nonetheless.

It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys

For those interested, there's a GHDL plugin for Yosys, potentially allowing to use VHDL with the Yosys-based toolchain. I admit I have never gotten around to trying it yet, but I'd be curious.
 
The following users thanked this post: BrianHG

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #61 on: September 12, 2021, 01:19:54 am »
Just adding a quick note about Yosys here, as this post finally decided me to take the plunge.

So, using the latest GHDL, Yosys, Prjtrellis and Nextpnr from git (warning: Yosys/Prjtrellis/Nextpnr take a fair amount of time to build from source), I was able to generate a .bit file from VHDL source files (+.lpf constraint file). The only modification I had to do was on the .lpf file, as nextpnr doesn't support port groups yet (which kind of bites, but that's not a dealbreaker). But all in all, it was smoother than I feared.

What I can say at this point is that Fmax is higher than what I get with Lattice Diamond for the design I tried, but I didn't dig enough to figure out if nextpnr is just being more optimistic, or if it indeed yields faster logic. This was a relatively simple design too, so that may not be as good for larger designs. Another point is that it all runs much faster than the commercial tools (but again, this is probably due, at least partly, to the fact that optimization is less aggressive.)

As I said, I don't know if SV support in Yosys is enough for Brian's work, but it could be worth a shot. With recent versions of GHDL and the combo with Yosys, VHDL support has become pretty good. So now, I'm curious to try this with larger designs.
« Last Edit: September 12, 2021, 01:21:43 am by SiliconWizard »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3971
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #62 on: September 12, 2021, 03:27:58 am »
I don't know for sure, but it would not surprise me if yosys was using sufficiently better algorithms than FPGA vendor's tools to get better results in a shorter time. It's had a significant and growing amount of work put into it and like most hardware manufacturers FPGA vendor probably don't want to "waste" any more money on software tools than absolutely necessary.

The same as eventually happened with gcc and llvm and the Linux kernel.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #63 on: September 12, 2021, 04:35:24 am »
It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys


My SystemVerilog coding relies on 2 dimensional arrays for IO ports and 'genvar/generate/if' to render repetitious instances of a single module, each one pointing to incremented dimension in one of my 2D arrays.

Also, I use the attribute ' (*preserve*) logic [ x:x ] var_name [ 0:x ] ; ' to force the compiler to not simplify out that particular register bundle, force it to use logic cells to aid in speed optimization.

I also use parameters and localparams, with string support and have a number of 'task' in a few places.

If it can handle these plus 'if (x=y) $stop', 'if (x=y) $error' and '$display ("msg")' or '$warning ("msg")' during compile time, my code should work.

If 'yosys' can handle this much, then the rest of my code operates down at a simplistic regular 'Verilog' level and it should work.  (IE: I went to SystemVerilog exclusively for the 2D interconnected IO ports capability.)  The dumber the compiler, the better FMAX you can expect from my coding style.  (IE: Cranking up Altera's Quartus 'speed' optimizations to the max actually slows down my designs, optimize for Area and my FMAX usually goes through the roof...)

The problem is how many Lattice users use 'yosys' and with a fresh Win7 install and nothing else, can I get it running, or, do I require other utils and need to build it on my system?
« Last Edit: September 12, 2021, 04:42:02 am by BrianHG »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #64 on: September 12, 2021, 09:46:04 pm »
It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys


My SystemVerilog coding relies on 2 dimensional arrays for IO ports and 'genvar/generate/if' to render repetitious instances of a single module, each one pointing to incremented dimension in one of my 2D arrays.

Also, I use the attribute ' (*preserve*) logic [ x:x ] var_name [ 0:x ] ; ' to force the compiler to not simplify out that particular register bundle, force it to use logic cells to aid in speed optimization.

I also use parameters and localparams, with string support and have a number of 'task' in a few places.

If it can handle these plus 'if (x=y) $stop', 'if (x=y) $error' and '$display ("msg")' or '$warning ("msg")' during compile time, my code should work.

If 'yosys' can handle this much, then the rest of my code operates down at a simplistic regular 'Verilog' level and it should work.  (IE: I went to SystemVerilog exclusively for the 2D interconnected IO ports capability.)  The dumber the compiler, the better FMAX you can expect from my coding style.  (IE: Cranking up Altera's Quartus 'speed' optimizations to the max actually slows down my designs, optimize for Area and my FMAX usually goes through the roof...)

The problem is how many Lattice users use 'yosys' and with a fresh Win7 install and nothing else, can I get it running, or, do I require other utils and need to build it on my system?

The easiest way of using Yosys on Windows is to use MSYS2, which has pre-built packages for Yosys.
https://www.msys2.org/
https://packages.msys2.org/group/mingw-w64-x86_64-eda

For a usable Yosys toochain, you'll need to install from MSYS2:
mingw-w64-x86_64-nextpnr
mingw-w64-x86_64-prjtrellis
mingw-w64-x86_64-yosys
(additionally, mingw-w64-x86_64-ghdl-llvm for those willing to use VHDL.)

This is done via the MSYS2 console with:
Code: [Select]
pacman -S mingw-w64-x86_64-nextpnr mingw-w64-x86_64-prjtrellis mingw-w64-x86_64-yosys
I have tried Yosys with more complex projects, and this wasn't as pain-free as with the first, simple one. Expect any Lattice generated IP, if you want to reuse this, to require some manual modifications. Also, some Lattice primitives may not be supported or not completely well.

I've found some oddities regarding VHDL support too (through the yosys ghdl plugin), while the same code using GHDL for simulation is fine...

So, not tried SV yet, but I would expect a number of unsupported features that may drive you nuts. Would be interesting to get feedback on this.

And if you need any generated IP, you're also in for a rough ride.

Lastly, there is no support, currently, for timing constraints for IOs in nextpnr (the P&R tool), which, for designs such as a DDR3 controller, could be a real problem.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #65 on: September 17, 2021, 03:54:33 am »
LOL, the ~150 stock of Arrow Deca boards for the last 6 months have just all sold out in 2 days.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #66 on: September 17, 2021, 07:09:37 pm »
LOL, the ~150 stock of Arrow Deca boards for the last 6 months have just all sold out in 2 days.

You should get a commission. :-DD
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #67 on: September 18, 2021, 12:43:45 am »
Well, I guess we'll now get to see if they were just selling off old stock to get rid, hence the low price, or if more will become available...
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #68 on: September 20, 2021, 12:13:49 am »
Just a note regarding SV support in Yosys and Brian's DDR3 controller.

I tried analyzing the SV files with Yosys, and get a bunch of syntax errors. SV support looks pretty preliminary.
I'm not good enough with SV to be able to help here. But someone who is could have a go if they're ready to issue a number of tickets in the Yosys project, and be patient.
Meanwhile, I'm afraid it's absolutely not an option. Yet.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #69 on: September 28, 2021, 03:36:02 am »
Ok, I thought some may want to know about DDR pin planning in Quartus.

I've attached 2 screenshots of Quartus' pin planner, 1 for Cyclone_IV and 1 for Max_10.

You will see that I have chosen x8 devices even though we are suing an x16 DDR3 ram chip.  I have done this since the x16 DDR3 actually has 2 groups of DQS basically making it 2 x8 devices.

Here is the Max_10 device:


As for the Cyclone_IV (included Cyclone III), you will notice that there exists the DQS, but not a DQS_n.  My DDR3 ram controller will still work, however, it requires you connect the DDR3's DQS_n to the adjacent DQS IO within the same x8 bank.  Preferably a emulated differential pair as long as it is within the same IO bank, even if it isn't highlighted as being part of the same x8 group.  (Quartus' reported polarity of this differential pair doesn't matter.  So long as the DQS pin is connected to the DQS on the DDR3 and the differential pair gets connected to DQS# on the DDR3 even if Quartus' pin planner says that the x8 DQS pin is the negative part of the differential pair.)


The data mask pins also need to be placed inside the same associated x8 group.

Remember to check the data sheets as some older Cyclones have higher IO performance on the top and bottom rows compared to the left and right sides.  You want to use the higher speed performance IOs.

The CK and CK_n pins should be a differential pair close to the center of everything if you are using more than 1 DDR3 ram chip, otherwise, either end or center will do.

Note that the MAX_10 devices as well as Cyclone_V do have a dedicated CK and CK_n pin for DDR3.  You will need to use these for your DDR3 CK/CK_n if you want full compatibility with Altera's DDR3 controller.

Download Arrow DECA's schematics to get a complete example of the DDR3 wiring.
« Last Edit: September 28, 2021, 04:09:51 am by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #70 on: September 30, 2021, 02:15:48 pm »
Hi Brian

I got hold on 2 x TFP401 boards from https://learn.adafruit.com/adafruit-tfp401-hdmi-slash-dvi-decoder-to-40-pin-ttl-display
I got them to be able to measure on the V-sync on 2 x HDMI signals to see if the sync locked together (they were)

I then realizes I way back on  Cyclone III had a board from Bitec using the same chip TFP401 and I did look a bit into the datasheets and schematic.

So the TFP401 does decode a HDMI (up to 1920x1080@60) to 3 x 8bit R,G,B and sync and pixel clock and the do use 3v3 as output same as DECA GPIO pins.

So I have made some flat cable connection to the DECA board on GPIO J8

Code: [Select]
I have GND connection on a separate wire

GPIO0_D0 = B0
GPIO0_D1 = B1
GPIO0_D2 = B2
GPIO0_D3 = B3
GPIO0_D4 = B4
GPIO0_D5 = B5
GPIO0_D6 = B6
GPIO0_D7 = B7

GPIO0_D8 = G0
GPIO0_D9 = G1
GPIO0_D10 = G2
GPIO0_D11 = G3
GPIO0_D12 = G4
GPIO0_D13 = G5
GPIO0_D14 = G6
GPIO0_D15 = G7

GPIO0_D16 = R0
GPIO0_D17 = R1
GPIO0_D18 = R2
GPIO0_D19 = R3
GPIO0_D20 = R4
GPIO0_D21 = R5
GPIO0_D22 = R6
GPIO0_D23 = R7

GPIO0_D24 = GND (No Use)
GPIO0_D25 = PIXCLK
GPIO0_D26 = ACTIVE
GPIO0_D27 = HSYNC
GPIO0_D28 = VSYNC
GPIO0_D29 = DISPEN
GPIO0_D30 = NC
GPIO0_D31 = GND (No use)


So my question is if you could recommend how best to write those signal to the RAM ... I looking on the GFX_Demo in Q15

My idea was to also add the second TF401 on the the left over GPIO's and then show a part of the 2 HDMI inputs at the same time on the output.
Like a DVE (not need to scale)  just crop and move some area out of input 1 and 2 and combine them on the output.



Hope it makes sense
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3891
  • Country: 00
  • STM32, STM8, AVR, 8051
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #71 on: September 30, 2021, 02:59:26 pm »
Amazing work and thanks for sharing Brian!

I hope in the near future I will have the time and courage to finally start toying more seriously with the FPGAs and DDR3 will definitely come into play then.  Now back to my flat reconstruction and electronics lab rebuild.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #72 on: September 30, 2021, 11:52:56 pm »
@Wiljan, good luck.

  You would end up using the same strategy, but in reverse which my GFX demo 'BHG_vpg' uses.  Make a dual clock ram port buffer with 2 lines worth of buffer memory, 32bit aRGB in, 128 bit out.  Clock the input and make an active even/odd line & VS / HS / V_ENA all generated on the capture board's CLK input sampling active pixels into the dual port ram.  Transfer a copy of those 4 status flags to the CMD_CLK clock domain and make a reverse to my 'BrianHG_display_rmem' which will take the line buffer's 128bit side and store it into a selected ram address.

  Note that with work, you can optimize all the code to run on 24bit instead of 32bit graphics if you like, or use 16 RGB, or even use 4:2:2 16 bit YUV bit graphics if there are enough color for you giving you the ability to access and process mix 4x1080p screens/display buffers simultaneously in real time (16 bit color, ram at 400MHz) with still some free access time for a CPU core.

  The 128bit DDR3 access port is the only way to ensure you saturate your read and writes access to the DDR3 memory making a non-stop sequential burst.  You may also organize the available DDR3 ram so that you place a different graphics buffer or video frame in a different DDR3 bank location to optimize simultaneous frame access.

Yes, just re-hack up my 'BHG_vpg' and 'BrianHG_display_rmem' -> 'BHG_vpg_IN' and 'BrianHG_display_Wmem' as they already have 90% of what you need.

On the input side, you may 'crop' the DVE from the source with x-on and x-off start and stop coordinates.  You would need to adjust the DDR3 write width in your 'BrianHG_display_Wmem', or, just do everything in the  'BrianHG_display_Wmem' with the knowledge that you will start and stop every 4 pixels on the x axis for 32bit color.  Y axis will be adjustable line by line.

*Note that I did not include methods of figuring out what your source resolution is or interlace support.
« Last Edit: October 01, 2021, 03:17:52 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #73 on: October 01, 2021, 05:35:16 am »
@Wiljan, one mistake:

Quote
just do everything in the  'BrianHG_display_Wmem' with the knowledge that you will start and stop every 4 pixels on the x axis for 32bit color.

My mistake, I forgot that you have the 'write mask' capability which will allow pixel precision writes down to 8 bit pixels while still using my multiport in 128bit mode to retain full sequential burst speed.  It's only the question of 128bit/4 pixel horizontal alignment.  There are ways to solve this as well at the CMD_CLK stage with a 4 position 224 bit to 128 bit shift register.

It is better to try to do any cropping and computation in the CMD_CLK domain while the sampler input to the dual port line buffer memory should have absolute minimal logic.
« Last Edit: October 01, 2021, 05:54:32 am by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #74 on: October 01, 2021, 05:53:26 pm »
Thank you Brian, I will for sure reuse most of your great code.
I will let you know when I  have some progress.
Thx 👍
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #75 on: October 04, 2021, 08:59:11 am »
@BrianHG

I been trying over the weekend to make some simple tweaks just to see I actually get something into the FPGA over HDMI

I did remove the RS232 debug part

Changed the GPIO's to only input

Control signals from the HDMI board goes to LED

To be sure freq are not a problem I a 800x600@60 HDMI input (instead of 1920x1080@60)
I can messure the  PIXCLK, H,V, on the LED with a Scope so I for sure have some input the board

(no FIFO yet )

I tried to than just write to the DDR
  • based on the PIXCLK and expected to see some garbage on the output... none, just the internal test signals.


I tried to write fixed 128bit  FF's to the DDR  based on the CMD_Clk still no garbage on the output just test signals

I have removed the Scroll and have fixed 0x0000, 0x0000 to be sure I do see top let (0,0) on the output

I tried to reduce the DDR to only use 1 read port .... then the test signal goes Red so I must must miss to change some signals so I back on 2xR and 2xW

So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?

I will post some images how I have connected the hardware if other could be interested to have external HDMI input to the DECA board

Thank you
Wiljan


 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #76 on: October 04, 2021, 09:04:53 am »
If you do not have a scope or logic analyzer and you want to check your input signals, or any other in the system signals, just setup Quartus SignalTap.  It will give you a multichannel real-time logic analyzer right through the J-Tag connection right into Quartus.

You should be able to scope your source video clocked data, HS, DE and some of the data bus for testing as well as locking onto HS and VS if you like.  (This includes all DDR3 internal bus connections as well as a few other goodies...)

As for video output through HDMI, you will need to set it's PLL with valid settings plus I recommend keeping to 720p, or 480p standards unless you change the HDMI transmitter from HDMI mode to DVI mode.

Keep the DDR3 at 400MHz or 300MHz.  300MHz may take less time to compile.
« Last Edit: October 04, 2021, 09:10:37 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #77 on: October 04, 2021, 09:14:13 am »
So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?
Take a look at the 'BrianHG_DDR3_DECA_Show_1080p' project.  That project just displays ram at 1920x1080.
Like I said, I made this stuff for everyone to figure out.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #78 on: October 04, 2021, 12:49:59 pm »
I do have a 4ch Keysight Scope and I do have the HDMI input signals PIXCLK, HS,VS and R,G,B data in the FPGA so that part is fine,  the HDMI output are also fine 1920x1080@60 fine as well

Never did never play with the SignalTap, but sure something I will look into

I have just tried to hook up the 'BrianHG_DDR3_DECA_Show_1080p' and the rs232 debugger to the DECA board and that works as well I can change  pixels and save / load images from PC over the rs232

Have attached few images of the hardware

DigiKey part numbers
PART: 1528-1452-ND MFG : Adafruit Industries LLC / 2219 DESC: TFP401 HDMI/DVI DECODE 40PIN TTL
PART: 1528-2243-ND MFG : Adafruit Industries LLC / 2098 DESC: 40-PIN FPC EXTENSION BRD W/CABLE
PART: 1528-4905-ND MFG : Adafruit Industries LLC / 4905 DESC: 40-PIN FPC TO STRAIGHT 2X20 IDC

My goal is to combine 2x HDMI input (cutout / cropped) to 1 HDMI output signal  all 1920x1080

But I will be happy just to see 1 signal come through  :scared:

« Last Edit: October 04, 2021, 12:56:36 pm by Wiljan »
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #79 on: October 04, 2021, 01:08:42 pm »
It's not too difficult.  You will make it.  Begin with just getting 1 picture onscreen.  Just take a look at my module which writes draws graphics.  It's only if you have multiple 1080p signals simultaneously, writing 32bit pixels will be too slow at the 100MHz bus.  This is why I mentioned writing 128bits at a time which means writing 32bit pixels are at 4x speed.

Note that my ellipse drawing engine has an X/Y coordinate to address generator which is a little complex.  You do not need to go this far.  You only need a reset X&Y position, and add the Y coordinate by a fixed amount once every HS.  The address counter adds for every pixel written to create the X axis.

To begin, try a 720p or 480p source image to sample and copy my 32bit pixel write mode in the ellipse drawing engine.

It's fine to post results / examples and things you created with my DDR3 controller here.

If you are looking for in-depth help on coding techniques for sampling video, make a separate new thread as this thread should stick with my DDR3 controller issues or results/success stories.


BTW, with a Cyclone III of similar size to the DECA's MAX10 and DDR2, I did make a complete 2 video in, 1 video out scaler with PIP, each window crop-able and zoom in and out with test patterns, bi-linear filtering and picture enhancement and color processing, controlled through ethernet.  Though, the DDR2 bus width was a 128bit wide ram module, not a single 16bit wide chip.  I guess at 500MHz with 16bit color instead or 32bit color, you could achieve the same with the DECA board.
« Last Edit: October 04, 2021, 01:24:10 pm by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #80 on: October 09, 2021, 05:34:57 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #81 on: October 09, 2021, 11:48:41 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D

Wow, a 90 degree rotate.
The nasty non-sequential access preventing clean long bursts must be a killer unless you have worked around that.  I know a number of dedicated ways to work around this and get full performance, but they are advanced techniques.  For a first timer, even with small errors, that is still a great start.

Is that real-time?
Double buffered?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #82 on: October 10, 2021, 01:02:47 pm »
Feature update:

******************************************
******** Finally, V1.00 release here: ***********
******************************************
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/

Things to do:

a)  I will be contacting Intel's technical support about Cyclone V's poor 60% speed FMAX performance for my 1 multiport section in my design as seen in the above screenshots with the red arrow.  I'll see if something can be done.

b)  As described in my v0.95 notes, I will look into designing my simpler pyramid stack-able 2:1 multiport module aimed to achieve an FMAX of at least 200MHz allowing multiport running at Half rate interface controller speed, but with a loss of a few smart advanced features.

c)  I will download and install the latest Lattice Diamond and see if I can adapt and get my controller to compile and simulate there.  The LFE5U-45F/LFE5U-85F at 45kgate & 85kgate are just such a price bargain at 16$ and 36$ each respectively and if my DDR3 controller runs fast there, it is the next route to take.

I will be working on feature 'b)' this week.  I will be targeting 400MHz, not 200MHz.  This will make the module bare bones simple, but, for example, you should be able to run 32bit read and writes at full 400MHz speed saturating the DECA's 800MHz 16bit DDR3, or you should be able to run the the port at 200MHz, 64 bit and still saturate DECA's DDR3.  The new multiport head end should also get Altera's Cyclone V running to speed as my DDR3 phy is already fast enough, it was just the multiport module which was the bottleneck.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #83 on: October 11, 2021, 06:57:11 am »
Quote
Wow, a 90 degree rotate.

Is that real-time?
Double buffered?

No double buffer, I write directly to 2 x 32bit ports where I have swapped the x1, y1 on the rotate one.
Real time, maybe... had no time to test with moving video yet, will not be home until next week
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #84 on: October 13, 2021, 09:41:32 am »
DDR3 V1.5 engineering update:

     New high FMAX speed multiport front end 'MUX' called BrianHG_DDR3_COMMANDER_4x1.sv.  Unlike the earlier commander, each port input is a read/write channel combined.  Each channel input is identical to my core DDR3 controller's 'BrianHG_DDR3_PHY_SEQ.sv' SEQ_*** inputs.  This will allow you to use additional BrianHG_DDR3_COMMANDER_4x1.sv controllers to drive another one down in the chain making extremely huge port counts if needed.  2:1 mode will offer the greatest possible FMAX while 4:1 will still offer a good FMAX, but allow large ports counts with fewer modules.   My 'USE_TOGGLE_INPUT' and 'USE_TOGGLE_OUTPUT' parameters will allow you to clock each module in a different clock domain.  For example, connecting right to the 'BrianHG_DDR3_PHY_SEQ.sv', you may use a 2:1 module running at 400MHz.  On that first layer module, on port (A) you may use another 2:1 running at 400MHz while on port (B) you may run another MUX in 4:1 mode at 200MHz giving you a total of 2x400MHz read/write ports and 4x200MHz read/write ports.  *Note that crossing clock domain boundaries will only compile with good FMAX results when using PLL clocks frequencies in powers of 2.


Code: [Select]
// Features:
//
// - Input and output ports identical to the BrianHG_DDR3_PHY_SEQ's interface with the optional USE_TOGGLE_CONTROLS
//
// - 2 to 4 Read/Write ports in, 1 port out with user set burst length limiter with read req vector/destination pointer.
// - Designed for high FMAX speed.
// - Designed to be pyramid stacked offering maximum speed 2 R/W ports with 1 COMMANDER_4x1 module, 4 ports using 3 modules,
//   8 ports using 7 modules, 16 ports using 15 modules, or, medium speed 4:1 offering 4 ports with 1 module, or 16 ports
//   using 5 modules...  3:1 mode offers a middle ground of speed VS density VS chosen FPGA speed grade.
//
// - 2 command input FIFO on each port.
// - 16 or 32 stacked read commands for DDR3 read data delay.
// - Separate cached read and write BL8 block.
// - Adjustable write data cache dump timeout.

     Note that now when assessing/configuring a port priority and maximum sequential burst length, unlike the original 16 port commander, you will now need to asses each set of priorities going down through the chain when you stack multiple MUX commanders together.
« Last Edit: October 14, 2021, 03:08:14 pm by BrianHG »
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #85 on: October 23, 2021, 11:00:00 am »
Hi Brian I'm back and I have made some changes.

SignalTap is very useful, thank you for mentioned it  :)

Tested with mowing video and the Rotate was now where real time ... right now I'm not interested in Rotate but 2 straight inputs
But for sure I would like to scale and rotate later

I have removed the flat cable and added in single wires (same length) so fit GPIO's
I have 2 HDMI inputs running 1920x1080@60 non sync in parallel from 2 BrighSign players

I place the 2 inputs side by side on the 4K buffer and can scroll to see the Left / Right transition and it pretty OK
I had some PSU issues and have spitted to more PSU's to avoid interference

I do have some noise here and there in the picture ... you can see in the black hole on the video
Not sure why, but I suspect the "wires" and potential wrong terminations

I would like to write to the DDR as 128 bit instead of 32 bit to lower the traffic to the DDR

Attached is the Quartus project

Link for video
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #86 on: October 23, 2021, 02:40:36 pm »
     Years and years ago, I also transferred 1080p parallel through flex cables.  You are already at the limit of what can be transmitted perfectly clean not counting your hand wired jumpers.  I usually had to invert the incoming clock depending on source resolution to aid in corrupt pixel captures.

     I'm almost finished my new multiport.  It is virtually compatible to the old one except each port is a read and write port, the max is 4:1 per multiport unit, but, you may have a multiport 4:1's output new feed an input of another 4:1 down in the chain offering 16 ports with a 2 layer pyramid stack.  IE 4 units in 4:1 mode, whose 4 outputs drive another 4:1 inputs at the top of the chain while that one feeds the DDR3_PHY controller module.  The advantage here is you now can run the multiport's CMD_CLK in half speed mode up to 250MHz, all 16 ports, instead of the current limit of ~100MHz once you pass 4 IO ports.

   Running in half speed mode instead of quarter means that to completely fill the DDR3 bandwidth, you only need 64bit bus at 200MHz instead of 128bit bus at 100MHz.  With the multiport in 2:1 mode, IE: 400MHz CMD_CLK, you can achieve full DDR3 bandwidth with a 32bit bus, but, the on-FPGA M9K blockram's speed limit is 330MHz, so, no matter what you do, you are stuck with 200MHz mode, or 250MHz if you overclock the FPGA to 500MHz DDR3.


Because of your wiring, remember to at least single if not double D-Flipflop all your inputs from your HDMI receiver boards and for the inputs before you feed any logic and use the attribute (*useioff=1*), example:
Code: [Select]
(* useioff = 1 *) input  logic         Z80_CLK,           // Z80 clock signal (8 MHz)
(* useioff = 1 *) input  logic [21:0]  Z80_ADDR,          // Z80 22-bit address bus

Also, if your CLK inputs are not going the the FPGA's dedicated CLK input pin, try to keep all the data inputs in the same bank as the CLK signal which feeds them.  I know this can be a hassle with the DECA being pre-wired.  If you HDMI decoders have a DDR mode, this may help keeping 15 inputs all clocked inside 1 IO bank instead of 27 inputs with one clock.
« Last Edit: October 23, 2021, 02:55:44 pm by BrianHG »
 

Online mfro

  • Regular Contributor
  • *
  • Posts: 207
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #87 on: October 30, 2021, 05:46:41 pm »
Played half the day with your DDR3 controller on a DECA board and just wanted to say thank you for that absolutely brilliant piece of work! :-+
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #88 on: November 01, 2021, 06:51:36 pm »
Preview Demo .sof programming files of DECA BrianHG_DDR3_Controller V1.5 for Arrow DECA eval board overclocked to 500MHz in Half-rate mode.
(Actual full v1.5 project files coming in 2 days.)

  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D
  >:D  500MHz/1GTPS! with 250MHz multiport interface.  >:D
  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D

Just open your JTAG programmer and add one of the following 3 files:
1. 'BrianHG_DDR3_DECA_500MHz_DDR3_v1.0_QR_GFX_1080p_v3.sof'
        -> DDR3_V1.0, 500MHz DDR_CK, Quarter Rate 125MHz Multiport & Ellipse Generator.

2. 'BrianHG_DDR3_DECA_400MHz_DDR3_V1.5_HR_GFX_1080p_v3.sof'
        -> DDR3_V1.5, 400MHz DDR_CK, Half Rate 200MHz Multiport & Ellipse Generator.

3. 'BrianHG_DDR3_DECA_500MHz_DDR3_V1.5_HR_GFX_1080p_NOELLIPSE.sof'
        -> DDR3_V1.5, 500MHz DDR_CK, Half Rate 250MHz Multiport & Random noise/Binary counter.

Note that the Ellipse generator function has a <200MHz bottleneck, so with demo programming file 3, only pressing buttons 0 or 1 will illustrate the DDR3 32 bit color 250MHz fill speed with random noise or the binary counter pattern.

Check-on the 'Program/Configure' and click 'Start' to program.
The DECA's HDMI should output a 1080p image.

IMPORTANT NOTE:
If the picture is still or scrolling noise, just press buttons 0 or 1, or flip 'Switch 0' to enable drawing ellipses.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.

https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: November 02, 2021, 01:56:17 pm by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #89 on: November 10, 2021, 05:03:23 pm »
What does it mean by 500MHz DDR_CK, Half Rate 250MHz Multiport ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #90 on: November 10, 2021, 06:25:27 pm »
This means my controller runs at 500MHz, or basically the PHY driving the DRR3 command pins is running at 500MHz while the user interface which has 16 read/write ports is running at 250MHz.  This is actually overclocking the FPGA as some timings come out in the red, ie negative slack.  My controller can achieve a true 100% positive slack at 400MHz PHY with the user interface running at 200MHz.  The older V1.0 could only achieve a user interface of around 100MHz configured to ~3 read + 2 write user ports with the DDR3 PHY running at 400MHz.

My v1.5 constructs a tree / branch stacked join + fork command section allowing a user configured full 16 read/write ports running the full 200MHz with 400MHz DDR3 PHY controller with enough breathing room to compile an unofficial but functional 250MHz 16 port user interface with 500MHz PHY.

Half-rate means my controller will accept a new command once every second DDR_CK clock.  Quarter-rate means my controller will accept a new command once every 4 DDR_CK clocks.  It is the user interface clock frequency.

My DDR3 v1.5 multiport section now generates a smarter version of Xilins illustration shown here on page 18 figure 2.2:
https://www.xilinx.com/support/documentation/user_guides/ug388.pdf
The difference is you just set the total port parameter and my controller is programmed to render that 'branched' system, but all at 128 bit with smart caching of bursts allowing a superior FMAX to my DDR3 v1.0 which had all the ports at the first branch level where they show configuration 5.  You may also configure the width of each branch if you do not require a top FMAX, but want less clock join points between your RW port and the DDR3 phy.

EXAMPLE:
Code: [Select]
// ************************************************************************************************************************************
// ****************  BrianHG_DDR3_COMMANDER_2x1 configuration parameter settings.
parameter int        PORT_TOTAL              = 2,                // Set the total number of DDR3 controller write ports, 1 to 4 max.
parameter int        PORT_MLAYER_WIDTH [0:3] = '{2,2,2,2},       // Use 2 through 16.  This sets the width of each MUX join from the top PORT
                                                                 // inputs down to the final SEQ output.  2 offers the greatest possible FMAX while
                                                                 // making the first layer width = to PORT_TOTAL will minimize MUX layers to 1,
                                                                 // but with a large number of ports, FMAX may take a beating.
// ************************************************************************************************************************************
// PORT_MLAYER_WIDTH illustration
// ************************************************************************************************************************************
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {4,4,x,x}
//
// (PORT_MLAYER_WIDTH[0]=4)    (PORT_MLAYER_WIDTH[1]=4)     (PORT_MLAYER_WIDTH[2]=N/A) (not used)          (PORT_MLAYER_WIDTH[3]=N/A) (not used)
//                                                          These layers are not used since we already
//  PORT_xxxx[ 0] ----------\                               reached one single port to drive the DDR3 SEQ.
//  PORT_xxxx[ 1] -----------==== ML10_xxxx[0] --------\
//  PORT_xxxx[ 2] ----------/                           \
//  PORT_xxxx[ 3] ---------/                             \
//                                                        \
//  PORT_xxxx[ 4] ----------\                              \
//  PORT_xxxx[ 5] -----------==== ML10_xxxx[1] -------------==== SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[ 6] ----------/                              /
//  PORT_xxxx[ 7] ---------/                              /
//                                                       /
//  PORT_xxxx[ 8] ----------\                           /
//  PORT_xxxx[ 9] -----------==== ML10_xxxx[2] --------/
//  PORT_xxxx[10] ----------/                         /
//  PORT_xxxx[11] ---------/                         /
//                                                  /
//  PORT_xxxx[12] ----------\                      /
//  PORT_xxxx[13] -----------==== ML10_xxxx[3] ---/
//  PORT_xxxx[14] ----------/
//  PORT_xxxx[15] ---------/
//
//
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {3,3,3,x}
//  This will offer a better FMAX compared to {4,4,x,x}, but the final DDR3 SEQ command has 1 additional clock cycle pipe delay.
//
// (PORT_MLAYER_WIDTH[0]=3)    (PORT_MLAYER_WIDTH[1]=3)    (PORT_MLAYER_WIDTH[2]=3)                   (PORT_MLAYER_WIDTH[3]=N/A)
//                                                         It would make no difference if             (not used, we made it down to 1 port)
//                                                         this layer width was set to [2].
//  PORT_xxxx[ 0] ----------\
//  PORT_xxxx[ 1] -----------=== ML10_xxxx[0] -------\
//  PORT_xxxx[ 2] ----------/                         \
//                                                     \
//  PORT_xxxx[ 3] ----------\                           \
//  PORT_xxxx[ 4] -----------=== ML10_xxxx[1] -----------==== ML20_xxxx[0] ---\
//  PORT_xxxx[ 5] ----------/                           /                      \
//                                                     /                        \
//  PORT_xxxx[ 6] ----------\                         /                          \
//  PORT_xxxx[ 7] -----------=== ML10_xxxx[2] -------/                            \
//  PORT_xxxx[ 8] ----------/                                                      \
//                                                                                  \
//  PORT_xxxx[ 9] ----------\                                                        \
//  PORT_xxxx[10] -----------=== ML11_xxxx[0] -------\                                \
//  PORT_xxxx[11] ----------/                         \                                \
//                                                     \                                \
//  PORT_xxxx[12] ----------\                           \                                \
//  PORT_xxxx[13] -----------=== ML11_xxxx[1] -----------==== ML20_xxxx[1] ---------------====  SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[14] ----------/                           /                                /
//                                                     /                                /
//  PORT_xxxx[15] ----------\                         /                                /
//         0=[16] -----------=== ML11_xxxx[2] -------/                                /
//         0=[17] ----------/                                                        /
//                                                                                  /
//                                                                                 /
//                                                                                /
//                                                       0 = ML20_xxxx[2] -------/
//
// ************************************************************************************************************************************
« Last Edit: November 10, 2021, 06:30:03 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #91 on: November 10, 2021, 07:08:57 pm »
I wonder if I could achieve a 'Full-rate' controller at 300MHz.  Having a user 300MHz reading/writing 32bits data can generate perfect ~98% DDR3 data bus saturation consecutive bursts with a 16bit ram, good for 300MHz 32 bit cpus.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #92 on: November 11, 2021, 01:23:14 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
« Last Edit: November 11, 2021, 03:24:30 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #93 on: November 11, 2021, 01:40:09 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
Yes.

Half-rate means 2 things.  When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.  The controller provides a busy signal if there are mandatory command delays required by the DDR3 and it's input buffer memory has exceeded it's stack.  Only my user multiport interface has been running in quarter-rate mode due to it's multiplexer complexity which I am currently enhancing performance there.

I only have a tiny pin driving command timer running at the DDR3 400MHz which receives the stream of generated commands from the above 200MHz controller called 'BrianHG_DDR3_CMD_SEQUENCER.sv', simulated by the 'BrianHG_DDR3_CMD_SEQUENCER_tb.sv'.
« Last Edit: November 11, 2021, 01:58:23 am by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #94 on: November 11, 2021, 02:36:17 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
« Last Edit: November 11, 2021, 03:24:02 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #95 on: November 11, 2021, 02:50:04 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
No.  I have a command output section running at the full 400MHz.  That section has a 2 word fifo which takes in the stream of commands generated at 200MHz by the 'BrianHG_DDR3_CMD_SEQUENCER.sv' processor and outputs 1 DDR_CK wide commands at 400MHz.  Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #96 on: November 13, 2021, 04:15:47 am »
Quote
Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.

I have pondered a bit on your sentence quoted above,
However, when exactly should a new command be "enqueued" into the mentioned FPGA FIFO ?

I asked this question because from my understanding, whether it is half-rate or quarter-rate, the FSM timing event for the initialization sequence will still need to be triggered one at a time.
This means that there is no point of having a 2 words depth FIFO.  The new generated command only needs to be stored in a 1 word depth FIFO (which is basically a register), released to the DRAM once timing is up.
and the next generated command will be "enqueued" into the FIFO at the beginning of the next FSM event ?

Please correct me if wrong.

So, why do you need to have half-rate when quarter-rate already does the same job pretty well enough for power optimization (due to lesser clock transition for a given amount of time passed) ?
« Last Edit: November 13, 2021, 04:41:51 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #97 on: November 13, 2021, 05:18:30 am »
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.  Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.  For my design, the type of FIFO I need, FWFT type with acknowledge, has difficulty routing the acknowledge tied to 7 individual DDR3 command timers operating at 400MHz on Altera Cyclone devices.  My 400MHz side doesn't care about the commands it receives, only that each DDR3 command has a different set amount of time for each other possible new command coming in and it is not allowed to violate those minimum delay clock cycles depending on the next command to be sent.

You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.  One thing I cannot fix it the 'skew' or delay between receiving a user command and the length of pipe time it takes to get that command out to the DDR3 as my 'command sequencer' section is a 3-5 clock pipe running at 200 MHz.  (I have optimization parameters which can combine pipe stages at the cost of FMAX or FPGA size.)
« Last Edit: November 13, 2021, 05:35:53 am by BrianHG »
 
The following users thanked this post: promach

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #98 on: November 13, 2021, 05:23:47 am »
The stage piping and the rest of my coding of my controller is so efficient, that even overclocking the FPGA to 500MHz, even while drawing ellipses, the FPGA barely goes above room temperature even without a heatsink.  At 400Mhz, it barely consumes 200mw, never mind what 300 MHz must consume.  Remember, it is the rate of changes in logic state which consume power, not the static state of the command speed going through.

Take a really close look at when and how I even cycle my address and bank lines and control the OE timing and spacing of the data IO port and drive of the ODT line.  Everything is tuned for minimal transitions and proper central clearance and IO bus direction change with extra half cycle hold to achieve the cleanest, quietest, best possible communications with the DDR3.  Error free 500 MHz would have been otherwise impossible as Altera's max for a software DDRIO port is only supposed to be 300MHz.
« Last Edit: November 13, 2021, 05:30:34 am by BrianHG »
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #99 on: November 13, 2021, 06:36:46 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.

Could ACTIVATE command for a new bank be issued to DRAM when a write burst for other bank is still ongoing ?
A check on ACTIVATE timing does not suggest so though.

Besides, why odd DDR_CK clock cycles when 2 words depth FIFO is used ?


Quote
Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.

Why 4 words depth FIFO does not have the every 2nd DDR_CK concern ?


Quote
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.

I am confused with which other situations are not taken advantage of ?
« Last Edit: November 13, 2021, 07:12:52 am by promach »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #100 on: November 13, 2021, 07:56:17 am »


Please, just read the entire DDR3 datasheet and try things for yourself.
« Last Edit: November 13, 2021, 07:57:59 am by BrianHG »
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #101 on: November 23, 2021, 03:34:35 am »
Quote
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.

Which other situations are not taken advantage of ?


I understand that you are using synchronous FIFO in the above case of in-between commands. 
However, what about synchronizing asynchronous incoming multi-bits DQ signals from DRAM into FPGA ?
I suppose you would need an asynchronous FIFO ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #102 on: November 23, 2021, 03:49:42 am »
For example, looking at my above screenshot, if my activate of bank 1 was done just before the last write of bank 0, the continuing write would not need that little blue gap where I wrote in red 'ongoing burst' as the continuing write now switching into bank 1 will have been fully activated.  We can extend this throughout while writing in bank 7 and next switching to a new bank 0, where if needed, while write bursting in bank 7, we can send a 'precharge' command, still continuing to write to bank 7, then the new activate bank 0 while still writing into 7, then seamlessly transition you write into the new activated bank 0 without any pause.  You can literally continuously read and write to DDR3 with strategic plan ahead precharge and activates making the ram access as continuous as static ram with the 1 caveat that every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between.  The other advantage of in-between commands is that you can activate and precharge unused banks in an effort to further manual refresh the DDR3 allowing the development of a controller which may almost never need waste any bus cycle time if you application processor is designed to access the DDR3 in a sequential burst manner.

When properly done, a smart controller and an application which can take advantage of bursting and knowledge of DDR3 banks can create a system with access close the the performance of high speed static ram.

This is why both my ram controller and Altera's as well have a parameter to set the address location of the 'BANK'-'ROW'-'COLUMN' order in the controller's addressing scheme.
« Last Edit: November 23, 2021, 03:53:56 am by BrianHG »
 
The following users thanked this post: promach

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #103 on: November 23, 2021, 06:01:10 am »
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-I understand that you are using synchronous FIFO in the above case of in-between commands. 
However, what about synchronizing asynchronous incoming multi-bits DQ signals from DRAM into FPGA ?
I suppose you would need an asynchronous FIFO ?

Actually, for the DDR DQ, in my 400MHz command out section, I decode the instructions being sent out to detect when a read or write command is being sent.  That r/w decode will schedule my read data and write data FIFO serializers to send or capture / receive DDR data at the right time.  It is the job of my controller to make sure the write data is ready for the write by the time the write data needs to be sent.  For the read, well, if you miss it when the acknowledge / read data ready comes in, you missed it.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #105 on: December 04, 2021, 09:49:35 am »
Next I will make my own sync generator and replace the DECA example junk.
Fix a bug where only the current display mode of 1080p@32bit color functions properly.
And remove the 1-line dualport buffer for a minimal sized dual-port ram.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #106 on: December 12, 2021, 10:04:22 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command. 

Could the same bank interleave mechanism happen for write operations ?
And if yes, then I suppose there is no need for such pipe enqueue stuff ?




Quote
every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

Why few dead 0 access clock cycles ?

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #107 on: December 12, 2021, 10:24:49 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command. 

Could the same bank interleave mechanism happen for write operations ?
And if yes, then I suppose there is no need for such pipe enqueue stuff ?




Quote
every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

Why few dead 0 access clock cycles ?
#1, Yes.  Opening and closing banks are separate of read and write data into any bank's activated row.  You may mess around with all other banks while you still are reading / writing on a different bank, or, at least give enough time for an ACT to become ready.
#2, Read the god damn DDR3 data sheet!  They have example illustrations on switching between read and write called read-to-write and write-to-read operations.  There are mandatory empty cycles as the DQ buffers and DQS switch direction and the DDR3 ram chip row amplifiers change drive current into the memory cap arrays.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #108 on: December 14, 2021, 05:07:53 pm »
Quote
During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between. 

I think the number of in-between commands shall not be limited by whether it is full-rate or half-rate controller.
Since it would only be using simple if-else clocked logic (inside fast clock domain, maybe 500MHz in your case), your controller should be able to achieve such goal without suffering from STA setup timing violation.

Please correct me if wrong.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #109 on: December 15, 2021, 03:21:04 am »
Quote
During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between. 

I think the number of in-between commands shall not be limited by whether it is full-rate or half-rate controller.
Since it would only be using simple if-else clocked logic (inside fast clock domain, maybe 500MHz in your case), your controller should be able to achieve such goal without suffering from STA setup timing violation.

Please correct me if wrong.
:-//  Ok, I have given you plenty enough already, just read my previous posts as the answer lies within.
Please stop asking for guidelines for altering your DDR3 controller here on my thread with my finished DDR3 controller.

    This thread is for those who have issues or need help implementing 'MY' controller in their designs, and, for those who wish to share their success stories & examples implementations using my DDR3 controller system.
« Last Edit: December 15, 2021, 11:36:53 am by BrianHG »
 
The following users thanked this post: voltsandjolts

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #110 on: December 15, 2021, 05:15:31 am »
Hi Brian I'm back and I have made some changes.

SignalTap is very useful, thank you for mentioned it  :)

Tested with mowing video and the Rotate was now where real time ... right now I'm not interested in Rotate but 2 straight inputs
But for sure I would like to scale and rotate later

I have removed the flat cable and added in single wires (same length) so fit GPIO's
I have 2 HDMI inputs running 1920x1080@60 non sync in parallel from 2 BrighSign players

I place the 2 inputs side by side on the 4K buffer and can scroll to see the Left / Right transition and it pretty OK
I had some PSU issues and have spitted to more PSU's to avoid interference

I do have some noise here and there in the picture ... you can see in the black hole on the video
Not sure why, but I suspect the "wires" and potential wrong terminations


I would like to write to the DDR as 128 bit instead of 32 bit to lower the traffic to the DDR

Attached is the Quartus project

Link for video

Not sure why, but Nockieboy also had some occasional missing pixels during block fills in his 8-bit GPU thread using my DDR3 V1.00.  If it is actually the same problem, when we updated to V1.50 on the 8-bit GPU thread, all empty pixel fills disappeared.  (Improved multiport design...)  Note that with 1.50, if you need to be backwards compatible to the old separate read and write ports, just hard wire the write enable on 2 separate ports and you will achieve the same function.  And, don't forget to ASSIGN '0' to all the unused inputs as shown in my new simplified block diagram.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #111 on: December 15, 2021, 08:58:58 am »
Not sure why, but Nockieboy also had some occasional missing pixels during block fills in his 8-bit GPU thread using my DDR3 V1.00.  If it is actually the same problem, when we updated to V1.50 on the 8-bit GPU thread, all empty pixel fills disappeared.  (Improved multiport design...)  Note that with 1.50, if you need to be backwards compatible to the old separate read and write ports, just hard wire the write enable on 2 separate ports and you will achieve the same function.  And, don't forget to ASSIGN '0' to all the unused inputs as shown in my new simplified block diagram.

Thank you for letting me know, that similar issues has been observed.

The project I was working on was to determinate if a video clip played across 2 HDMI output on the same PC card actually was in sync.

The HDMI output it was processed through several video processors as 2 individual signal and ended up on a a huge LED wall (20m wide x 3 m height) working in 5 zones (processors).

So when I saw your DDR3 /  HDMI output and I already did have the 2 HDMI input board where I have used a scope to measure the 2 x V-Sync which was perfect in Sync, I got the idea to make the  HDMI splitter showing half of HDMI A and half of HDMI B on the same HDMI output.

Due to it was not perfect working in time, We ended up using 2 x Bacho multi-format converter HDMI to SDI (you can convert across all frame-rates / resolution) so it doe have a full frame of memory inside ,,, also here we saw the sync issue on fast horizontal moving content, we desired to record the 2x SDI on some broadcast recorders and try to analyze frame by frame, at the time we also added in external gen-lock for the 2 Barcho units... and the problem was gone.
This confirmed the video out of the PC was perfect in sync  :)

So right now the there are no need for the setup anymore. however I would like for my own exercise try the 1.5 when I get some time over Christmas.

If there still are noise I will also try to "loop" HDMI input to HDMI output via the FPGA to see if the noise goes away

But I'm aware that the wiring I  have are no good for a reliable solution, I did even think if it would be work to make a PCB shield fitting the DECA board with 2 HDMI input , on the other hand you can buy cheap 4x 2k input to 1x4k output splitt units and some of them have quite many features


 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #112 on: December 15, 2021, 09:37:02 am »
(you can convert across all frame-rates / resolution) so it doe have a full frame of memory inside ,,, also here we saw the sync issue on fast horizontal moving content, we desired to record the 2x SDI on some broadcast recorders and try to analyze frame by frame, at the time we also added in external gen-lock for the 2 Barcho units... and the problem was gone.

Ignoring the issue with your HDMI decoder wiring to the DECA, you could do the same on the DECA going from 2 in to 1 except you would have to most likely use lower quality bi-linear or small bi-cubic scaling if you are resizing the source images.  But I do know with code I've done in the past, with the know-how, you can easily do better picture enhancement/processing routines in the DECA than whatever may be available in the mixing consoles you are currently using.  (Except for true 1080i upsampled motion-adaptive de-interlacing unless you dedicate the entire DECA completely to that 1 task.)

Quote
This confirmed the video out of the PC was perfect in sync  :)

This depends on videocard type, drivers and settings/selected video modes, it is not guaranteed.
 

Online Wiljan

  • Regular Contributor
  • *
  • Posts: 225
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #113 on: December 15, 2021, 09:55:36 am »

Quote
This confirmed the video out of the PC was perfect in sync  :)

This depends on videocard type, drivers and settings/selected video modes, it is not guaranteed.

Absolutely .. it's a AMD dual head, and it was here all the discussion started if the problem was on the PC side or on the Wall side (I was in charge for the PC side) and a 3. party on the Wall side

I have a big interest in video processing, been working with broadcast for 30+ years  :o
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #114 on: December 31, 2021, 07:51:00 pm »
Ok, the my BrianHG_GFX_VGA_Window_System is 90% functional.  Only the final layer alpha channel mixer is missing as I am now just averaging windows for testing, but they are all there with all their features.

I paused here because the ravage DDR3 memory access done by the BrianHG_GFX_VGA_Window_System with multiple windows simultaneously open, each deliberately configured with an odd number of row pixels and to eat up over 90% of the available bandwidth causes that one DDR3 read port in use by the graphics system to randomly freeze.  It's time to look at debugging my multiport section of my controller before I finish my last alpha-blend window layer mixing module of my window system.

A version 1.6 will soon be coming where I fix this DDR3 frozen read bug.
(Narrowed it down to using the multiport in Quarter-rate, the higher speed Half-Rate mode works fine.  It seems to be a congestion issue.)
« Last Edit: December 31, 2021, 10:04:02 pm by BrianHG »
 

Online mfro

  • Regular Contributor
  • *
  • Posts: 207
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #115 on: January 09, 2022, 02:52:14 pm »
Wanted to get serious with the BrianHG_DDR3_CONTROLLER after just playing (impressed  :-+) with it. Tried to replace a UniPHY DDR3 controller in one of my existing designs with it today, but failed miserably.

I'm a VHDL guy and it seems interfacing SystemVerilog designs with VHDL isn't really fully supported in Quartus. I implemented a VHDL component representing the BrianHG_DDR3_CONTROLLER_top module on the VHDL side that has the SystemVerilog parameters as VHDL generics but wasn't successful. It appears that it is not possible to map SystemVerilog parameters of any other types than plain integers or bit vectors.
E.g I was expecting that it should be possible to map a SystemVerilog bit parameter (as BHG_OPTIMIZE_SPEED) into a VHDL BIT generic, but all I get is
Code: [Select]
Error (10258): Verilog HDL error at BrianHG_DDR3_CONTROLLER_top.sv(116): unsupported type for Verilog parameter BHG_OPTIMIZE_SPEED
Tried to use an integer generic in the VHDL component instead. This was accepted on the VHDL side, but then failed on the SystemVerilog side as an invalid type.

Apparently, it is not possible to pass the SystemVerilog parameters as VHDL generics, so one either needs to do the parametrization on the SystemVerilog side or modify the interface.

Anybody more successful than myself in interfacing to VHDL?
Beethoven wrote his first symphony in C.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #116 on: January 09, 2022, 04:12:01 pm »
In SystemVerilog, when I say'

parameter bit [x:y] OPTIMIZE_SPEED = z
Note that the 'bit [x:y]' usually can be omitted and changed into an 'int' and the code should still function.
However, I have seen such parameters passed through VHDL as some VHDL of verilog do pass parameters.  Note that the 'bit [x:y]' parameter is similar to a limited logic/register with a limited bit range.  It the [x:y] is missing, it just means a single wire.

This is not my area of expertise.  Note that the Intel FPGA forum may have engineers who may have an answer for you.

If you are out of luck, one workaround which means not modifying my code may be adding a dummy 'mfro' BrianHG_DDR3_CONTROLLER_top_pre-vhdl.sv dummy box module and stuff all your parameters there while your  BrianHG_DDR3_CONTROLLER_top.vhd calls that dummy box.

Also verify Altera's 'Compiler Settings' / 'VHDL Input' and try using the 'VHDL 2008' settings instead of the default 'VHDL 1993'.

IE: When I use that 'bit ***', Instead of declaring the default parameter as 'constant integers', I am declaring them as 'constant standard logic' with so many bits.  This way, I plug them directly into my code and System Verilog will understand that I am passing for example an 16/10/8 bit constants or a 1 bit constant logic wires limiting the user from inputting defaults outside my allotted scope/field/range.  There has got to be a way within VHDL to pass the same logic arguments.
« Last Edit: January 09, 2022, 05:12:50 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #117 on: January 09, 2022, 05:44:34 pm »
@mfro, one thing you can try is to Quartus to 'generate' a verilog instantiation template file for my project.  The new .v generated by quartus re-write the parameters in the older style calling my SystemVerilog module.  Maybe it will be easier to call that new .v from VHDL instead of my direct SystemVerilog.

Though, once you see how Quartus rewrote the parameters, maybe that will give you a clue on how to directly feed my module in VHDL.

Also, do not forget the reverse route.  Just make my wire my top design to the FPGA as I have in the examples, and within that, initiate the rest of your VHDL project inside the FPGA top.sv with all the CDM_xxx and other FPGA IO ports wired to your VHDL instantiation in the top.sv.
« Last Edit: January 09, 2022, 05:49:27 pm by BrianHG »
 

Online mfro

  • Regular Contributor
  • *
  • Posts: 207
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #118 on: January 09, 2022, 06:40:34 pm »
Thanks, Brian. Got it to work, eventually (I now have a compiler crash, but that's another story). At least it passes the stage where it maps the parameters.

It appears the SystemVerilog single bit parameters don't map to std_logic or bit in VHDL generics, but to a std_logic_vector(0 to 0) instead (studied the relevant chapter in the Quartus manual that doesn't really tell you much what maps to what, but the fact that it states the parameters are internally passed as strings inspired me to try a single bit vector instead.
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #119 on: January 10, 2022, 03:55:28 pm »
@mfro, one thing you can try is to Quartus to 'generate' a verilog instantiation template file for my project.  The new .v generated by quartus re-write the parameters in the older style calling my SystemVerilog module.  Maybe it will be easier to call that new .v from VHDL instead of my direct SystemVerilog.

Doing a generate VHDL instantiation creates this file on one of my new projects: GPU_DECA_DDR3_top.cmp

Code: [Select]
-- Copyright (C) 2020  Intel Corporation. All rights reserved.
-- Your use of Intel Corporation's design tools, logic functions
-- and other software and tools, and any partner logic
-- functions, and any output files from any of the foregoing
-- (including device programming or simulation files), and any
-- associated documentation or information are expressly subject
-- to the terms and conditions of the Intel Program License
-- Subscription Agreement, the Intel Quartus Prime License Agreement,
-- the Intel FPGA IP License Agreement, or other applicable license
-- agreement, including, without limitation, that your use is for
-- the sole purpose of programming logic devices manufactured by
-- Intel and sold by Intel or its authorized distributors.  Please
-- refer to the applicable agreement for further details, at
-- https://fpgasoftware.intel.com/eula.


-- Generated by Quartus Prime Version 20.1 (Build Build 720 11/11/2020)
-- Created on Sun Jan 09 12:32:21 2022

COMPONENT GPU_DECA_DDR3_top
GENERIC ( GPU_MEM : INTEGER := 524288; ENDIAN : STRING := "Little"; PDI_LAYERS : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"0001"; SDI_LAYERS : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"0100";
ENABLE_TILE_MODE : STRING := "A(1,0,0,0,0,0,0,0)"; SKIP_TILE_DELAY : std_logic := '0'; ENABLE_PALETTE : STRING := "A(1,1,1,1,1,1,1,1)"; SKIP_PALETTE_DELAY : std_logic := '0';
HWREG_BASE_ADDRESS : INTEGER := 256; HWREG_BASE_ADDR_LSWAP : INTEGER := 240; PAL_BASE_ADDR : INTEGER := 4096; TILE_BYTES : INTEGER := 65536;
TILE_BASE_ADDR : INTEGER := 16384; FPGA_VENDOR : STRING := "Altera"; FPGA_FAMILY : STRING := "MAX 10"; BHG_OPTIMIZE_SPEED : std_logic := '1';
BHG_EXTRA_SPEED : std_logic := '1'; CLK_KHZ_IN : INTEGER := 50000; CLK_IN_MULT : INTEGER := 24; CLK_IN_DIV : INTEGER := 4;
DDR_TRICK_MTPS_CAP : INTEGER := 600; INTERFACE_SPEED : STRING := "Half"; DDR3_CK_MHZ : INTEGER := 300; DDR3_SPEED_GRADE : STRING := "-15E";
DDR3_SIZE_GB : INTEGER := 4; DDR3_WIDTH_DQ : INTEGER := 16; DDR3_NUM_CHIPS : INTEGER := 1; DDR3_NUM_CK : INTEGER := 1;
DDR3_WIDTH_ADDR : INTEGER := 15; DDR3_WIDTH_BANK : INTEGER := 3; DDR3_WIDTH_CAS : INTEGER := 10; DDR3_WIDTH_DM : INTEGER := 2;
DDR3_WIDTH_DQS : INTEGER := 2; DDR3_RWDQ_BITS : INTEGER := 128; DDR3_ODT_RTT : INTEGER := 40; DDR3_RZQ : INTEGER := 40;
DDR3_TEMP : INTEGER := 85; DDR3_WDQ_PHASE : INTEGER := 270; DDR3_RDQ_PHASE : INTEGER := 0; DDR3_MAX_REF_QUEUE : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"1000";
IDLE_TIME_uSx10 : STD_LOGIC_VECTOR(6 DOWNTO 0) := b"0001010"; SKIP_PUP_TIMER : std_logic := '0'; BANK_ROW_ORDER : STRING := "ROW_BANK_COL"; PORT_ADDR_SIZE : INTEGER := 30;
PORT_TOTAL : INTEGER := 5; PORT_MLAYER_WIDTH : STRING := "A(2,2,2,2)"; PORT_VECTOR_SIZE : INTEGER := 16; READ_ID_SIZE : INTEGER := 4;
DDR3_VECTOR_SIZE : INTEGER := 5; PORT_CACHE_BITS : INTEGER := 128; CACHE_ADDR_WIDTH : INTEGER := 4; BYTE_INDEX_BITS : INTEGER := 11;
PORT_TOGGLE_INPUT : STRING := "A(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)"; PORT_R_DATA_WIDTH : STRING := "A(000001000,000001000,000010000,000010000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000)"; PORT_W_DATA_WIDTH : STRING := "A(000001000,000001000,000010000,000010000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000)"; PORT_PRIORITY : STRING := "A(11,10,00,00,10,00,00,00,00,00,00,00,00,00,00,00)";
PORT_READ_STACK : STRING := "A(16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16)"; PORT_W_CACHE_TOUT : STRING := "A(100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000)"; PORT_CACHE_SMART : STRING := "A(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)"; PORT_DREG_READ : STRING := "A(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)";
PORT_MAX_BURST : STRING := "A(100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000)"; SMART_BANK : std_logic := '0' );
PORT
(
ADC_CLK_10 : IN STD_LOGIC;
MAX10_CLK1_50 : IN STD_LOGIC;
MAX10_CLK2_50 : IN STD_LOGIC;
KEY : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
LED : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
CAP_SENSE_I2C_SCL : INOUT STD_LOGIC;
CAP_SENSE_I2C_SDA : INOUT STD_LOGIC;
AUDIO_BCLK : INOUT STD_LOGIC;
AUDIO_DIN_MFP1 : OUT STD_LOGIC;
AUDIO_DOUT_MFP2 : IN STD_LOGIC;
AUDIO_GPIO_MFP5 : INOUT STD_LOGIC;
AUDIO_MCLK : OUT STD_LOGIC;
AUDIO_MISO_MFP4 : IN STD_LOGIC;
AUDIO_RESET_n : INOUT STD_LOGIC;
AUDIO_SCL_SS_n : OUT STD_LOGIC;
AUDIO_SCLK_MFP3 : OUT STD_LOGIC;
AUDIO_SDA_MOSI : INOUT STD_LOGIC;
AUDIO_SPI_SELECT : OUT STD_LOGIC;
AUDIO_WCLK : INOUT STD_LOGIC;
FLASH_DATA : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
FLASH_DCLK : OUT STD_LOGIC;
FLASH_NCSO : OUT STD_LOGIC;
FLASH_RESET_n : OUT STD_LOGIC;
G_SENSOR_CS_n : OUT STD_LOGIC;
G_SENSOR_INT1 : IN STD_LOGIC;
G_SENSOR_INT2 : IN STD_LOGIC;
G_SENSOR_SCLK : INOUT STD_LOGIC;
G_SENSOR_SDI : INOUT STD_LOGIC;
G_SENSOR_SDO : INOUT STD_LOGIC;
HDMI_I2C_SCL : INOUT STD_LOGIC;
HDMI_I2C_SDA : INOUT STD_LOGIC;
HDMI_I2S : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
HDMI_LRCLK : INOUT STD_LOGIC;
HDMI_MCLK : INOUT STD_LOGIC;
HDMI_SCLK : INOUT STD_LOGIC;
HDMI_TX_CLK : OUT STD_LOGIC;
HDMI_TX_D : OUT STD_LOGIC_VECTOR(23 DOWNTO 0);
HDMI_TX_DE : OUT STD_LOGIC;
HDMI_TX_HS : OUT STD_LOGIC;
HDMI_TX_INT : IN STD_LOGIC;
HDMI_TX_VS : OUT STD_LOGIC;
LIGHT_I2C_SCL : OUT STD_LOGIC;
LIGHT_I2C_SDA : INOUT STD_LOGIC;
LIGHT_INT : INOUT STD_LOGIC;
MIPI_CORE_EN : OUT STD_LOGIC;
MIPI_I2C_SCL : OUT STD_LOGIC;
MIPI_I2C_SDA : INOUT STD_LOGIC;
MIPI_LP_MC_n : IN STD_LOGIC;
MIPI_LP_MC_p : IN STD_LOGIC;
MIPI_LP_MD_n : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_LP_MD_p : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_MC_p : IN STD_LOGIC;
MIPI_MCLK : OUT STD_LOGIC;
MIPI_MD_p : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_RESET_n : OUT STD_LOGIC;
MIPI_WP : OUT STD_LOGIC;
NET_COL : IN STD_LOGIC;
NET_CRS : IN STD_LOGIC;
NET_MDC : OUT STD_LOGIC;
NET_MDIO : INOUT STD_LOGIC;
NET_PCF_EN : OUT STD_LOGIC;
NET_RESET_n : OUT STD_LOGIC;
NET_RX_CLK : IN STD_LOGIC;
NET_RX_DV : IN STD_LOGIC;
NET_RX_ER : IN STD_LOGIC;
NET_RXD : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
NET_TX_CLK : IN STD_LOGIC;
NET_TX_EN : OUT STD_LOGIC;
NET_TXD : OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
PMONITOR_ALERT : IN STD_LOGIC;
PMONITOR_I2C_SCL : OUT STD_LOGIC;
PMONITOR_I2C_SDA : INOUT STD_LOGIC;
RH_TEMP_DRDY_n : IN STD_LOGIC;
RH_TEMP_I2C_SCL : OUT STD_LOGIC;
RH_TEMP_I2C_SDA : INOUT STD_LOGIC;
SD_CLK : OUT STD_LOGIC;
SD_CMD : INOUT STD_LOGIC;
SD_CMD_DIR : OUT STD_LOGIC;
SD_D0_DIR : OUT STD_LOGIC;
SD_D123_DIR : INOUT STD_LOGIC;
SD_DAT : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
SD_FB_CLK : IN STD_LOGIC;
SD_SEL : OUT STD_LOGIC;
SW : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
TEMP_CS_n : OUT STD_LOGIC;
TEMP_SC : OUT STD_LOGIC;
TEMP_SIO : INOUT STD_LOGIC;
USB_CLKIN : IN STD_LOGIC;
USB_CS : OUT STD_LOGIC;
USB_DATA : INOUT STD_LOGIC_VECTOR(7 DOWNTO 0);
USB_DIR : IN STD_LOGIC;
USB_FAULT_n : IN STD_LOGIC;
USB_NXT : IN STD_LOGIC;
USB_RESET_n : OUT STD_LOGIC;
USB_STP : OUT STD_LOGIC;
BBB_PWR_BUT : IN STD_LOGIC;
BBB_SYS_RESET_n : IN STD_LOGIC;
GPIO0_D : INOUT STD_LOGIC_VECTOR(43 DOWNTO 0);
GPIO1_D : INOUT STD_LOGIC_VECTOR(22 DOWNTO 0);
DDR3_RESET_n : OUT STD_LOGIC;
DDR3_CK_p : OUT STD_LOGIC_VECTOR(DDR3_NUM_CK-1 DOWNTO 0);
DDR3_CK_n : OUT STD_LOGIC_VECTOR(DDR3_NUM_CK-1 DOWNTO 0);
DDR3_CKE : OUT STD_LOGIC;
DDR3_CS_n : OUT STD_LOGIC;
DDR3_RAS_n : OUT STD_LOGIC;
DDR3_CAS_n : OUT STD_LOGIC;
DDR3_WE_n : OUT STD_LOGIC;
DDR3_ODT : OUT STD_LOGIC;
DDR3_A : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_ADDR-1 DOWNTO 0);
DDR3_BA : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_BANK-1 DOWNTO 0);
DDR3_DM : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_DM-1 DOWNTO 0);
DDR3_DQ : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQ-1 DOWNTO 0);
DDR3_DQS_p : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQS-1 DOWNTO 0);
DDR3_DQS_n : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQS-1 DOWNTO 0)
);
END COMPONENT;


It looks like Quartus knows about using the 'STD_LOGIC_VECTOR(0 DOWNTO 0) := ...
If fact, I'm assuming Quartus just made a VHDL compliant code to call my project...

In fact, they have it even simpler at:  BHG_EXTRA_SPEED : std_logic := '1';
« Last Edit: January 10, 2022, 04:47:37 pm by BrianHG »
 

Online mfro

  • Regular Contributor
  • *
  • Posts: 207
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #120 on: January 10, 2022, 07:00:28 pm »
doesn't work with BrianHG-DDR3-Controller_top.sv:

Code: [Select]
Error (283001): Can't create Component Declaration or Verilog Instantiation File for entity "BrianHG_DDR3_CONTROLLER_top" which has two or more dimensional ports

Thanks anyway.

The std_logic mapping also appears to be wrong (at least, it doesn't compile). The only mapping that works for me is indeed to std_logic_vector(0 to 0) as posted.

I'm on Quartus 20.1, btw.
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Offline vdp

  • Newbie
  • Posts: 1
  • Country: bg
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #121 on: January 17, 2022, 08:13:51 am »
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram.

Edit: nevermind
« Last Edit: January 29, 2022, 07:22:36 am by vdp »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #122 on: January 28, 2022, 01:00:33 pm »
 :phew: Ok, here is the new 2 page block diagram 'BrianHG_GFX_VGA_Window_System.pdf' block diagram and 'BrianHG_GFX_VGA_Window_System.txt' documentation for developers.

-Up to 64 window layers, with alpha blend transparency from layer-to-layer.
-In system real-time video mode switching support.
-Supports 32/16a/16b/8/4/2/1 bpp windows.
-Supports accelerated Fonts/Tiles stored in dedicated M9K blockram with resolutions of 4/8/16/32 X 4/8/16/32 pixels.
-Supports up to 1k addressable tiles/characters with 32/16a/16b/8/4/2/1 bpp, with mirror and flip.
-Each window has a base address, X&Y screen position & H&V sizes up to 65kx65k pixels.
-Independent bpp depth for each window.
-Optional independent or shared 256 color 32 bit RGBA palettes for each window.
-In tile mode, each tile/character's output with 8 bpp and below can be individually assigned to different portions of the palette.
-Multilayer 8 bit alpha stencil translucency between layers with programmable global override.
-Quick layer swap-able registers.
-Hardware individual integer X&Y scaling where each window output can be scaled 1x through 16x.

     My new BrianHG_DDR3_CONTROLLER_v16 which now has the multi-window VGA system demo should be uploaded in a few days.
« Last Edit: January 28, 2022, 01:42:22 pm by BrianHG »
 
The following users thanked this post: nockieboy

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #123 on: March 20, 2022, 04:35:50 pm »
Hi. I'm attempting to write an Avalon wrapper for your controller. During my memory test sim I encountered this error with a write/readback sequence:
(all addresses are byte-addressed)
  • Write 'data1' to 0x0000, readback and check against 'data1' (PASS)
  • Write 'data2' to 0x1000, readback and check against 'data2' (PASS)
  • Write 'data3' to 0x0000, readback and check against 'data3' (PASS)
  • Write 'data4' to 0x0000, readback and check against 'data4' (FAIL, data3 received)

The fourth read doesn't issue any command to the DDR3 model, it just returns dirty data from the cache. I would expect the fourth write with fresh data to signal to the cache it needs to perform another read.

Here's the failed write/read:


Is this expected behavior?
Thanks.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #124 on: March 20, 2022, 05:24:34 pm »
Hi. I'm attempting to write an Avalon wrapper for your controller. During my memory test sim I encountered this error with a write/readback sequence:
(all addresses are byte-addressed)
  • Write 'data1' to 0x0000, readback and check against 'data1' (PASS)
  • Write 'data2' to 0x1000, readback and check against 'data2' (PASS)
  • Write 'data3' to 0x0000, readback and check against 'data3' (PASS)
  • Write 'data4' to 0x0000, readback and check against 'data4' (FAIL, data3 received)

The fourth read doesn't issue any command to the DDR3 model, it just returns dirty data from the cache. I would expect the fourth write with fresh data to signal to the cache it needs to perform another read.

Here's the failed write/read:
If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2.  However, I will try to replicate the bug later tonight both with and without that 1 spare clock.  If you are using a separate CMD_xxx[ # ] as a write channel and another one for the read channel, then yes, it is possible to have stale data in the cache.

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. This allows the cache module to coalesce multiple writes within the same 16 bytes before sending a write command to the DDR3.  Otherwise, if you were to write, with a 8 bit data mode port, 16 consecutive bytes, every single byte write will send a DDR3 command wasting a huge setup and burst-8-cycle to the DDR3 which wouldn't be needed until the last of the 16 bytes has been received.  The smart cache feature means if you are reading from the same address as the current coalescing writes, you will read the new data even though it has yet to be send to the DDR3.

My bug may be because a write's smart caching takes 1 clock cycle to transfer it's new data to the read cache's buffer side, but you appear to have plenty of time in your sim.  Remember, the read cache is there to perform the same function as the write cache.

If you are using a single port directly controlling my 'PHY' module, then there is no caching and you should see the correct data in order or read and write.
« Last Edit: March 20, 2022, 05:35:11 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #125 on: March 20, 2022, 05:48:36 pm »

Is this expected behavior?
Thanks.
You should not see this behavior.  Pleave check that the setup time for the write command is placed ahead of the CMD_CLK.  It looks as if my module didn't see your write, or, it took the write at this address: (see attached photo)



To be sure when simulating and sending commands, try offsetting the commands you send by 1/2 CMD_CLK phase so that you can see clearly what is being accepted during the 'rise' of the source clock.

Also, the way you are accessing the ram with the set 'write mask', make sure you have the port width set to 128 bits, otherwise nothing will write.  You have only bits 96 through 127 write enabled.
« Last Edit: March 20, 2022, 06:28:27 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #126 on: March 20, 2022, 06:29:03 pm »

If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2. ...

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. ...

I'm only using one element in the CMD_* array. Relevant parameters are:
  • PORT_PRIORITY = '{default:0}
  • PORT_READ_STACK = '{default:4}
  • PORT_W_CACHE_TOUT = '{default:0}
  • PORT_CACHE_SMART = '{default:0}
  • PORT_MAX_BURST  = '{default:256}
  • SMART_BANK =  0
Everything else is the default for the DECA example at 400 MHz.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #127 on: March 20, 2022, 06:43:32 pm »

You should not see this behavior.  Pleave check that the setup time for the write command is placed ahead of the CMD_CLK.  It looks as if my module didn't see your write, or, it took the write at this address: (see attached photo)

To be sure when simulating and sending commands, try offsetting the commands you send by 1/2 CMD_CLK phase so that you can see clearly what is being accepted during the 'rise' of the source clock.


I'm pretty certain the address is getting sampled by your block correctly. I can see from the memory model prints that when I input address 0x0000 it corresponds to Row/Bank/Col = 0. Just to be sure I inverted the clock going to my logic and got the same result. 'clk' runs my logic and 'tmp' runs your logic in the screenshot.



Also, the way you are accessing the ram with the set 'write mask', make sure you have the port width set to 128 bits, otherwise nothing will write.  You have only bits 96 through 127 write enabled.

The port is set to 128-bits. I load the 128-bit words big-endian to match your controller, so CMD_wdata = 0x12345678 00000000 ... and CMD_wmask = 0xFFFF 0000 ....

Here you can see my writes/reads going into and out of the DDR3 successfully, note the final missing read operation due to the cache:
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #128 on: March 20, 2022, 06:51:04 pm »

If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2. ...

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. ...

I'm only using one element in the CMD_* array. Relevant parameters are:
  • PORT_PRIORITY = '{default:0}
  • PORT_READ_STACK = '{default:4}
  • PORT_W_CACHE_TOUT = '{default:0}
  • PORT_CACHE_SMART = '{default:0}
  • PORT_MAX_BURST  = '{default:256}
  • SMART_BANK =  0
Everything else is the default for the DECA example at 400 MHz.

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.
« Last Edit: March 20, 2022, 07:10:42 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #129 on: March 20, 2022, 07:04:31 pm »
If you are using my DDR3 V1.5, the:
 PORT_READ_STACK   [0:15]  should be  '{default:16} for maximum read speed when you stack a number of consecutive reads.  Though, with 128bit and if you do not require serious random read stacked events, 4 is perfectly fine.

« Last Edit: March 20, 2022, 07:15:42 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #130 on: March 20, 2022, 07:20:11 pm »

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.

In your comment for 'PORT_CACHE_SMART' you list disabling it for memory testing. I wanted to see each request go to the DDR3 without extra logic surrounding it. 'PORT_W_CACHE_TOUT' was disabled for a similar reason. Starting my development dumb & slow then improving it once the basics work.

I enabled the smart cache and it solved the particular test case, however the behavior was unexpected. I'll just leave it on for now.
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #131 on: March 20, 2022, 07:26:57 pm »

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.

In your comment for 'PORT_CACHE_SMART' you list disabling it for memory testing. I wanted to see each request go to the DDR3 without extra logic surrounding it. 'PORT_W_CACHE_TOUT' was disabled for a similar reason. Starting my development dumb & slow then improving it once the basics work.

I enabled the smart cache and it solved the particular test case, however the behavior was unexpected. I'll just leave it on for now.

Note that if you do not need or want any features of my multiport module with the CMD_xxx interface, it is a waste of space and you will get much better performance just using my PHY controller.  No cache, not smart, send a command and the DDR3 will do it ASAP, and around 1/2 the logic cells.

Example PHY only interface:  https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_DECA_only_PHY_SEQ

The only thing is that your 400MHz controller will have a 200MHz interface only, no option for 100MHz quarter rate unless you use the 'toggle' enable & data ready feature which allows for alternate clock domain command interface.

Each enabled command will always be sent to the DDR3 regardless of address or repeats.  But, you will no longer have the ability to add multiple read/write ports and you are stuck with 128bit.
« Last Edit: March 20, 2022, 07:53:23 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #132 on: April 02, 2022, 02:22:34 am »
Hi,

I'm trying to use the PHY_SEQ only connected to my custom code. I'm finding that sometimes the CMD_busy signals ends up sticking to 1 and locking all my upstream logic, but not the downstream logic (your block), which ends up performing the same write/read over and over again. This is all with TOGGLE_CONTROLs = 0.

Could the behavior I'm encountering be because of 'CMD_ena' and 'refresh_in_progress' assert at the same time? See the red highlight in pt1.png for that. In pt2.png you can see the busy signal get stuck with hopefully some extra surrounding info.

Also, I just need to make sure that when TOGGLE_CONTROLS=0 the CMD_ena and CMD_busy signals are analogous to something like AXI stream tvalid and tready. It seems like TOGGLE_CONTROLS=1 is your preferred style, would it be better to use that for driving the PHY?

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #133 on: April 02, 2022, 03:01:07 am »
I'm trying to use the PHY_SEQ only connected to my custom code. I'm finding that sometimes the CMD_busy signals ends up sticking to 1 and locking all my upstream logic, but not the downstream logic (your block), which ends up performing the same write/read over and over again. This is all with TOGGLE_CONTROLs = 0.
Note that with toggle controls at 0, the my 'CMD_BUSY' will go high is either the commands going in overflow the command stack, or, it will go high while an internal refresh request has been posted and it will stay high until the command has been added to the queue.  Whenever the 'CMD_BUSY' is high, all input activity on the CMD_ENA is ignored.

My Modelsim for the internal behavior of this DDR3 command stack processor belong to my 'BrianHG_DDR3_CMD_SEQUENCER_tb.sv' and the '.do' batch file 'setup_seq.do' and 'run_seq.do'.

Quote
Could the behavior I'm encountering be because of 'CMD_ena' and 'refresh_in_progress' assert at the same time? See the red highlight in pt1.png for that. In pt2.png you can see the busy signal get stuck with hopefully some extra surrounding info.

If they are asserted at the same time, the refresh in progress should take priority, yet the I do assert the CMD_BUSY ahead by 1 clock so you know you should not be sending a command at that time.

Q:  Did you wait long enough for the refresh to run through to see if your entered command came out the other end?  A refresh on a 4gb DDR3 is something like 350ns.  If you stacked a command or 2 in advance, the busy will stay high until those commands have finally been sent out in the neighborhood of 400ns later and don't forget there may be still a few commands in advance to pipe on through before the refresh begins.  (One advantage to using my multiport is if there are repetitive commands, it runs then in the cache first before bothering with accessing the DDR3.)

Quote
Also, I just need to make sure that when TOGGLE_CONTROLS=0 the CMD_ena and CMD_busy signals are analogous to something like AXI stream tvalid and tready. It seems like TOGGLE_CONTROLS=1 is your preferred style, would it be better to use that for driving the PHY?

Sorry, I am unfamiliar with the 'AXI stream tvalid and tready'.

My toggle mode treats the CMD_ENA_t input like a command address [ 0 ].  So, each command you send, that address should increment in parallel.

The CMD_BUSY_t operates like a return address [ 0 ] telling you which command address has finished processing.

The idea is if your control device driving my DDR3 is running, for example at 100MHz instead of 200MHz, incrementing/toggling that CMD_ENA_t input with every new command is seen by my controller as 1 new command.  Without toggle mode, pulsing the CMD_ENA at 100MHz will be seen as 2 consecutive commands by my 200MHz DDR3 core.  On your device host side, you know you are clear to continue sending commands so long as 'CMD_ENA_t  == CMD_busy_t'.  You can say within your module:

wire DDR3_is_busy = !(my_out_reg_CMD_ENA_t  == input_from_DDR3_phy_CMD_busy_t);
« Last Edit: April 02, 2022, 03:09:46 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #134 on: April 02, 2022, 03:28:07 am »
Ohh, 1 other thing about the refresh.  After a power-on reset, or reset pulse, the DDR3 will begin to run for around 15 milliseconds before the first initial refresh commands come in.  This is a one time thing after power-up and can be seen in some simulations.  This does not generate any lost or missing data as the CMD_BUSY flag will properly run if needed.  If no CMD_ENA commands are being sent, a small train of sequential refresh commands may run through, but, these additional ones may be interrupted by any CMD_ENA command you send as after the first one, the others are low priority.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #135 on: April 02, 2022, 03:43:43 am »
I've attached my decoding of your logic waveform:
Do not worry about the internals inside my source.  Wait until the actual commands are sent to the DDR3 and every command you CMD_ENA'ed while the CMD_BUSY was low will make it to the DDR3 when it is permitted due to DDR3 timing constraints and potential row and page selection as well as refresh.

Using the toggle mode =1, you may see how the CMD_ENA_t is toggled with each sent command while the CMD_BUSY_t return appears to you more like an ACKNOWLEDGE becoming equal to the CMD_ENA once a command is accepted.  I do not know the internal working of the AXI system, but an acknowledge style interface may be easier to work with if you generate the toggle out on your side.
« Last Edit: April 02, 2022, 03:48:17 am by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #136 on: April 02, 2022, 04:20:31 am »
Ohh, 1 other thing about the refresh.  After a power-on reset, or reset pulse, the DDR3 will begin to run for around 15 milliseconds before the first initial refresh commands come in.  This is a one time thing after power-up and can be seen in some simulations.  This does not generate any lost or missing data as the CMD_BUSY flag will properly run if needed.  If no CMD_ENA commands are being sent, a small train of sequential refresh commands may run through, but, these additional ones may be interrupted by any CMD_ENA command you send as after the first one, the others are low priority.
I see a refresh occur around 15 microseconds, if that's what you mean. I won't be getting close to 15 ms with the free version of Modelsim, lol. The first two refreshes shortly complete, but then I get locked up with one that doesn't end. See screenshot.

I tried delaying the CMD_ena signal by a single cycle, to avoid it being asserted on the same edge as 'refresh_in_progress'. The sim was able to get farther than it usually does, until the same issue happened again. Do I need to deassert CMD_ena during a refresh?

Tell me if this is wrong, quick pseudo-code for the CMD_* bus:

if(state)
  cmd_ena <= 1;
  if(cmd_ena & !cmd_busy)
     if(last_xfer)
         cmd_ena <= 0;
         state <= next_state

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #137 on: April 02, 2022, 04:52:26 am »
See attached image.  The ddr3 is working fine.

Note that even though you set the use-toggle =0, the refresh in progress is an internal signal and it is always a toggle style signal.  So, viewing it alone, you cannot see the true refresh request state.  If you want to know the truth about the refresh, you need to make a :

wire busy_doing_a_refresh = ( refresh_req != refresh_in_progress );

Quote
if(state)
  cmd_ena <= 1;
  if(cmd_ena & !cmd_busy)
     if(last_xfer)
         cmd_ena <= 0;
         state <= next_state

What are you trying to do?

it's more like:
if (!cmd_busy && I_need_to_access_ddr3) begin
     CMD_xxx <= what to do
     CMD_ENA <= 1;
     state       <= next_state;
else if (cmd_busy && I_need_to_access_ddr3) begin
      state <= wait;
else if (!cmd_busy)  begin
      CMD_ENA <= 0;
      state      <= next_state;
end


Note that the state can be done as combinational logic saving a clock cycle.

wire state = (cmd_busy && I_need_to_access_ddr3) ? wait_state : next_state;

« Last Edit: April 02, 2022, 06:59:21 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #138 on: April 04, 2022, 11:58:23 pm »
Ooopps, I just looked back at my code.  I made a mistake in my above post.

The logic 'refresh_in_progress' is actually true logic, not toggle logic.

From what I can see, forcing the CMD_ENA indefinitely high has tied up my sequencer preventing the refresh request from taking place.  The moment the CMD_ENA goes low, the next command entered into the command FIFO stack would be the refresh.  I need to double check that this does not accidentally constitute a potential refresh violation.  (Note that my coding counts the elapsed time of missed refreshes and will stream a continuous block or refreshes when it gets the chance to, maintaining the datasheet's recommended average refresh row count / maximum time period.)  When using my multiport & it's toggle-enable set to 1, there is always room for a refresh to enter the queue.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #139 on: April 09, 2022, 03:45:19 am »
Hey,
After developing my Avalon bridge to wrap your PHY+PLL, I decided some benchmarks were in order to compare against the Altera UniPHY IP.

All results below were obtained with my synthesizable Avalon memory tester that writes the entire DDR3 with random data then verifies it. It was configured with a 64-bit data bus and max burst of 256 to match what the UniPHY IP wanted. The memory tester is controlled via a separate JTAG-Avalon IP that measures how long the transaction takes with a TCL script. Both DDR3 instances were clocked at 300 MHz with a half-rate Avalon interface.

Altera UniPHY
Code: [Select]
*** Build Summary ***
Total logic elements : 7,290 / 49,760 ( 15 % )
    Total combinational functions : 6,391 / 49,760 ( 13 % )
    Dedicated logic registers : 3,779 / 49,760 ( 8 % )
Total memory bits : 14,304 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 65 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1011776
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000f700d

Dave's Bridge + BHG PHY/PLL
Code: [Select]
*** Build Summary ***
Total logic elements : 6,241 / 49,760 ( 13 % )
    Total combinational functions : 3,264 / 49,760 ( 7 % )
    Dedicated logic registers : 4,974 / 49,760 ( 10 % )
Total memory bits : 5,792 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 63 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1030280
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000fb950

Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+
 
The following users thanked this post: BrianHG

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #140 on: April 09, 2022, 04:17:39 am »
Hey,
After developing my Avalon bridge to wrap your PHY+PLL, I decided some benchmarks were in order to compare against the Altera UniPHY IP.

All results below were obtained with my synthesizable Avalon memory tester that writes the entire DDR3 with random data then verifies it. It was configured with a 64-bit data bus and max burst of 256 to match what the UniPHY IP wanted. The memory tester is controlled via a separate JTAG-Avalon IP that measures how long the transaction takes with a TCL script. Both DDR3 instances were clocked at 300 MHz with a half-rate Avalon interface.

Altera UniPHY
Code: [Select]
*** Build Summary ***
Total logic elements : 7,290 / 49,760 ( 15 % )
    Total combinational functions : 6,391 / 49,760 ( 13 % )
    Dedicated logic registers : 3,779 / 49,760 ( 8 % )
Total memory bits : 14,304 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 65 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1011776
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000f700d

Dave's Bridge + BHG PHY/PLL
Code: [Select]
*** Build Summary ***
Total logic elements : 6,241 / 49,760 ( 13 % )
    Total combinational functions : 3,264 / 49,760 ( 7 % )
    Dedicated logic registers : 4,974 / 49,760 ( 10 % )
Total memory bits : 5,792 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 63 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1030280
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000fb950

Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+

:-+ Thanks a million for the verification and comparison.

One even bigger plus of my core is it can run @300MHz on a -8.  Altera's Uniphy requires a -6 to run in software mode.  Not to mention I support Cyclone III/IV which are missing differential DQS ports necessary for DDR3.

I'm deciding whether my next move will be to bring my design to Lattice ECP5 fpgas, or clean up my core to version 2.0 to gain a few more percentage performance points as well as further improve the robustness of fitting a design with a timing report all in the black.  I know with Cyclone III & IV, you can achieve 450MHz with a timing report of 100% in the black, but some of the fitter options require a number of tweaks.
« Last Edit: April 09, 2022, 05:07:41 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #141 on: April 09, 2022, 04:26:34 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

Note that my core essentially have a 4 command input fifo and to get read performance, the read results do come out way delayed due to the nature of DDR3 read setup, so you need to stream those read commands to get that perfect continuous unbroken consecutive burst.

When using my Multiport, it handles a lot of this work for you behind the scene if you use my default CMD_XXX parameter features enabled.  If you got the my PHY only working, this shouldn't be a problem as the ports are compatible if you set the data bit width to the same number.

Even with the extra gates, it is still usefull to generate an Avalon interface running with my full controller as the extra ports allow sharing with my multi-window HDMI display engine which will receive commands through the Avalon port as it's display controls are addressable through all the available memory ports simultaneously.
« Last Edit: April 09, 2022, 04:30:59 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #142 on: April 09, 2022, 04:47:31 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core.

Set to 300MHz with my multiport, I'm getting a throughput of ~1100MB/s.  Note that this has been achieved running my video graphics adapter in 1080p mode with 2 translucent 32bit windows superimposed ontop of each other.  Note that my controller take full advantage of large sequential bursts where my VGA controller bursts 4kb at a time per window.  This performance should be matched if you were to generate ALU DSP modules, like FFT and convolution filters which may also burst in large linear chunks.  This is at the edge of my controllers efficiency.  Running the controller at 350MHz and above leaves enough room for other parallel tasks as well.

Note that with or without my Multiport, my PHY Only achieves the same performance.  It is just the consecutive and large throughput nature of video which allows these speeds.
« Last Edit: April 09, 2022, 06:38:50 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #143 on: April 09, 2022, 05:10:49 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

If you want, you can try the test again swapping my parameter 'BANK_ROW_ORDER'.  Depending on how you are accessing the DDR3, it may help improve throughput.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #144 on: April 09, 2022, 05:35:56 am »
It got a bit slower with BANK_ROW_ORDER = "BANK_ROW_COL", 1052606 microseconds for the whole RAM. My initial testing was with "ROW_BANK_COL". I'm not sure what should be the more appropriate setting for doing only large upward bursts.

My bridge is setup to stream commands as quickly as the Avalon port can give them and the PHY can take them. The slowest path would be when a read is requested and the FIFO for decoding the returning BL8 is near full. In the future I'll consider branching my memory tester to talk straight with your PHY and check for a speed difference, then naturally run it against the controller too. I'll have to think about this 1100 MB/s figure a bit more.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #145 on: April 09, 2022, 05:38:46 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)


In half-rate mode using my full controller with the Multiport at 64bit, you should be approximately doubling your throughput.  However, you need to use my default parameters.  This means having a read stack set to 16 and write cache timeout set to 255, ect...

In my HDL comments, when I said if you were making a 'memory testing algorythm', I meant if you were trying to test the ram chip's memory cells, not the integrity of my controller.

My multiport is designed to squeeze together 2 consecutive 64bit chunks into more efficient 128bit packets for my controller.  So long as Avalon can perform back-to-back reads or writes at 150MHz at 64bit, my multiport will do the lifting for you.
« Last Edit: April 09, 2022, 06:15:34 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #146 on: April 09, 2022, 05:57:50 am »
It got a bit slower with BANK_ROW_ORDER = "BANK_ROW_COL", 1052606 microseconds for the whole RAM. My initial testing was with "ROW_BANK_COL". I'm not sure what should be the more appropriate setting for doing only large upward bursts.

With BANK_ROW_COL mode, if you divide your ram into 2/4/8 chunks and with my multiport, you assign for example 1 cpu onto bank 0, video onto 1&2, sound onto bank 3, Having the bank at the top of the address space means as each peripheral accesses it's own region of memory, that bank is remembered and kept open and as other peripherals access their own memory regions, their banks are opened and closed only as necessary.  It almost makes it as if you have 8 separate ram controllers.

This also helps if you are copying or processing huge sequential chunks of ram from an upper bank to a lower one as my ran controller knows to keep the 2 different section's rows simultaneously open during the transfer eliminating all the precharge and activate commands which would normally happen after each BL8.  Now, the precharge and activate only happens when a new row is required in either or both sections of ram you may be copying to and from.


« Last Edit: April 09, 2022, 06:00:18 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #147 on: April 09, 2022, 06:30:32 pm »
I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+
You forgot the largest point.
IT'S FREE!!! and opensource.
« Last Edit: April 09, 2022, 06:41:09 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #148 on: April 11, 2022, 02:44:54 am »
Assuming that all the TOGGLE_* parameters are kept the same, can the controller be used as a drop-in replacement for the PHY+PLL?

I've created a wrapper that allows you to switch between the two, kept all other code constant, and my memory tester locks up on the controller version but not the PHY version. I made sure to use TOGGLE_OUTPUTS = 1, and TOGGLE_INPUTS = '{default:1} for the controller parameters to match my TOGGLE_CONTROLS = 1 for the PHY setup.
 
Since the test never completes, I assume I'm encountering the 'long refresh' that made me switch from the controller to the PHY in the first place. Can you confirm the timing diagram for toggle-mode below? That's what it looks like for the PHY setup, but for the controller setup CMD_busy toggles a cycle earlier, in a combinatorial way. This makes me think there are some differences with the front-end interface.


 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #149 on: April 11, 2022, 03:16:17 am »
Ok, one of the features of my Multiport module is that it was designed to use positive enable  logic and convert it's output to the toggle which my phy module prefers.

Looking at my basic example: BrianHG_DDR3_DECA_Show_1080p_v15_375Mhz_HR/BrianHG_DDR3_DECA_top.sv,
The instantiation of the: 'BrianHG_DDR3_CONTROLLER_v15_top'

(*** Careful, use the V15 versions here...)

The parameter array '.PORT_TOGGLE_INPUT  (PORT_TOGGLE_INPUT),' will allow you to set a selection of which CMD_xxx [ # ] ports into a toggle mode which should operate virtually identical to my core's 'BrianHG_DDR3_PHY_SEQ.sv' in it's toggle mode.  Note that my PHY module's 'USE_TOGGLE_CONTROLS' is no longer accessible.

When using the toggle mode, every toggle can happen every single clock and the command will be accepted every single clock the toggle has taken place.   It would be the same if you disabled the .PORT_TOGGLE_INPUT for that port # and left the CMD_ena high for every clock.  The difference is how the busy and return will work.  In toggle mode, you can keep sending a toggle command every clock as long as the (CMD_busy == CMD_ena).  Every time the CMD_read_ready toggles, you know a new read word and new read vector out is ready.  With toggle disabled, the CMD_read_ready will be high when new valid data is ready, otherwise it is low.

It is at this point where I say if you are using my full controller, you are better off disabling the toggle option and use the plain enable true/false logic.  My original toggle feature was to allow my core to run at for example 200MHz while running my multiport at 100MHz or 50MHz, or 400MHz.  The interface between the 2 with the toggle feature allow for any type of clock frequency crossing without added headaches.  I added the toggle feature to the multiport's CMD_xxx ports as an afterthought in case someone wanted to interface with slower or faster logic, but I have not extensively tested it.
« Last Edit: April 11, 2022, 03:19:48 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #150 on: April 11, 2022, 03:25:40 am »
Note that the purpose of my multiport was to make my DDR3 controller's interface to appear exactly like a 16 port altsyncram FPGA blockram function.  You just need to be attentive to the CMD_busy when read or writing and wait for the CMD_read_ready to see your read request ohhh so may clock cycles later.  So, to get the full read performance, you need to remember to post a bunch of reads ahead of time.  My CMD_read vector in/out ports does some lifting offering you a means of delineating a destination for each posted read command.
« Last Edit: April 11, 2022, 03:30:52 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #151 on: May 23, 2022, 07:36:50 pm »
Version 1.6 update...

 :palm: After 3 days of work and test compiles, I finally found my Quarter Rate setting bug.  And I flew by it a dozen times...

   I forgot to clock latch in the read_data_valid_toggle signal going from the half-rate clock domain to the quarter rate domain.  With heavy reads, that toggle signal may come in on the first or second half of the Quarter clock's period.  When operating in Half-rate, there is no Quarter rate and that latch always arrives by the next clock, so no problem.  At Quarter rate, usually, with slow memory single reads, or continuous evenly length bursts, that latch signal just happens to always be aligned at the beginning of the Quarter rate clock maintaining proper function.  But with bursts with just the right pacing and an odd length buried inside, like when my multi-window VGA generator has 2 super-imposed windows, one with an odd number of pixels just at the right position, then a read data valid may align itself to the second half of the Quarter-rate clock as the DDR3 controller may have an odd number of half-rate clock cycles until the read_data_valid_toggle.  This causes a loss of synchronization as that read is latched at the wrong time with the wrong vector data used by my multiport commander to know when a read arrives and which port the read data belongs to.

   After 3 days, I thought I had a fundamental architecture problem, but no...  :phew:  However, along the way, I added a few new features and further improved the .sdc constraints rendering a further improvement in achieving a high level FMAX.

Full proper release DDR3 v1.6 with VGA controller will come in a day after I clean up my mess.
« Last Edit: May 23, 2022, 07:41:54 pm by BrianHG »
 
The following users thanked this post: voltsandjolts

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #152 on: May 24, 2022, 12:15:12 am »
Arrrrgggg, ok, for my changes, once I removed my watch dog timer, it works fine for 30 seconds or so, but still seizes up.  Still superior to instantly seizing up, or garbage reads, but there must be another signal somewhere interrupting the system.  It also wont seize up if I use just the video channel, but start the RS232 debugger and it eventually craps out.  Yet in half-rate, everything runs AOK.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #153 on: June 11, 2022, 02:33:29 am »
Release V1.6 Demo .sof programming files of DECA BrianHG_DDR3_Controller v1.6 and multi-window BrianHG_GFX_VGA_Window_System v1.6 for Arrow DECA eval board.

  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D
  >:D  500MHz/1GTPS! with 2x32bit 1080p60 video layers.    >:D
  >:D  That's >1100 megabytes/sec just to show the image,  >:D
  >:D  never mind simultaneously drawing all those ellipses.  >:D
  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D

Just open your JTAG programmer and add one of the following 2 files:
1. 'BrianHG_DDR3_DECA_GFX_DEMO_v16_1_LAYER_500MHzQR.sof'
        -> Replaces and replicates the original Ellipse Generator now using the new BrianHG_GFX_VGA_Window_System.

2. 'BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS_500MHzQR.sof'
        -> Improved original Ellipse Generator demo where a second translucent superimposed video window scrolls at different coordinates and speeds generating an LSD trip visual effect.  (Note that the scroll switch needs to be turned on long enough at least bounce off 1 window edge to view effect.)

Check-on the 'Program/Configure' and click 'Start' to program.
The DECA's HDMI should output a 1080p image.


IMPORTANT NOTE:
If the picture is still or scrolling noise, just press buttons 0 or 1, or flip 'Switch 0' to enable drawing ellipses.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.

Full Github v1.6 source code.
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: June 13, 2022, 08:46:23 pm by BrianHG »
 
The following users thanked this post: ale500

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #154 on: June 12, 2022, 03:23:46 am »
New VGA video system demo configured for up to 16 window layers driven by my RS232_Debugger.  :box:

Code: [Select]
//****************************************************************************************************************
//
// Demo documentation.
//
// BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS which test runs the BrianHG_DDR3_CONTROLLER_top_v16
// DDR3 controller with the BrianHG_GFX_VGA_Window_System_DDR3_REGS.
//
// Version 1.60, June 9, 2022.
//
// Written by Brian Guralnick.
// For public use.
// Leave questions in the [url]https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/[/url]
//
//****************************************************************************************************************

A pre-built DECA compatible programming .sof file : BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS.sof should be used for this demo.

This demo requires a PC with a RS232 <-> 3.3v LVTTL converter and the use of my RS232 debugger to live edit window controls.
All necessary files are found in this project's sub-folder 'RS232_debugger'.

Wiring: On DECA PCB, connector P8.
    P8-Pin 2 - GND            <-> PC GND
    P8-Pin 4 - GPIO0_D[1] out --> PC LVTTL RXD
    P8-Pin 6 - GPIO0_D[3] in  <-- PC LVTTL TXD

See Readme.txt file in the .zip for full documentation.
See 'BrianHG_GFX_VGA_Window_System.txt' for address controls.
See 'BrianHG_GFX_VGA_Window_System.pdf' for system block diagram.
« Last Edit: June 12, 2022, 03:26:54 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #155 on: June 12, 2022, 07:32:56 pm »
As of today, full Github v1.6 source code has now been released:

https://github.com/BrianHGinc/BrianHG-DDR3-Controller

Expect 300Mhz-350Mhz builds to always meet timing requirements.  (Even with a slow fabric -8.)
Expect 400Mhz builds to usually meet timing requirements with the occasional need to massage some compiler/fitter settings to aid in meeting timing requirements.  (Still easier than making Altera's paid Uniphy DDR3 controller achieve only 300Mhz.)
Expect Cyclone III/IV -6 can meet timing requirements at 450MHz.  Even 500MHz is possible with heavy massaging of fitter setting.

Expect my BrianHG_GFX_VGA_Window_System to run up to 32 windows in 480p, 16 in 720p, 8 in 1080p.
Also expect my BrianHG_GFX_VGA_Window_System to automatically simplify down to minimal gates when lowering layers down to 1, disabling palette / font/tile modes and hard-wiring numerous video/window setting.
« Last Edit: June 13, 2022, 02:18:18 am by BrianHG »
 
The following users thanked this post: nockieboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #156 on: August 22, 2022, 06:36:18 pm »
I just ordered one of these (apparently new) Gowin 20k boards. I'm going to see if I can port your design across, which is going to be ... challenging ... because I know little about DDR3 and nothing at all about Gowin, but if I can get it working, having a pre-built DDR3/FPGA board with that many GPIO for that price in an easily-embeddable DIMM is going to be kinda useful :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #157 on: August 22, 2022, 07:11:26 pm »
I just ordered one of these (apparently new) Gowin 20k boards. I'm going to see if I can port your design across, which is going to be ... challenging ... because I know little about DDR3 and nothing at all about Gowin, but if I can get it working, having a pre-built DDR3/FPGA board with that many GPIO for that price in an easily-embeddable DIMM is going to be kinda useful :)

Begin with nothing more than implementing my simple 'BrianHG_DDR3_PHY_SEQ' controller.  It is half the size and once you got that working, implementing the multiport with everything else will be much easier as they have no special code and are only needed if you need 16 read/write ports.

So long as Gowin can deal with System Verilog, the only 2 HDL modules you will have to adapt will be:
BrianHG_DDR3_PLL.sv
BrianHG_DDR3_IO_PORT_ALTERA.sv

If Gowin uses or can use Modelsim, this will be a great help as I have already set all this up.  I still do recommend downloading Altera/Intel's free Quartus's v20.1 (not v21.x) and at least install that Modelsim as it has Altera's PLL and DDR_IO libraries so you can see what the original supposed to look like as I created setup_xxx.do script files which simulate everything individually.  Don't worry, you may have multiple versions of Modelsim in you system at the same time.

This link: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_DECA_PHY_SEQ_only_v16

Contains my simple stand alone 'BrianHG_DDR3_PHY_SEQ' controller wired to my RS232 debugger allowing you to view and edit the DDR3 memory contents from a PC with a LVTTL <-> RS232 com port, and it will report DDR3 tuning status and you can leave it running while updating you Gowin firmware.  Documentation is on my Github's read-me.

Step #1 would be to see if you can simulate a Gowin PLL with the phase step up and phase step down controls my DDR3 controller requires.  This means concentrating exclusively on replicating nothing more than my 'BrianHG_DDR3_PLL.sv' and it's stand alone testbench with it's 4 clock outputs.  If you are lucky, Gowin should provide their own PLL library functions in their own simulator.
« Last Edit: August 22, 2022, 07:39:35 pm by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #158 on: August 23, 2022, 01:44:36 am »
Thanks Brian,

Quote
Step #1 would be to see if you can simulate a Gowin PLL with the phase step up and phase step down controls my DDR3 controller requires.  This means concentrating exclusively on replicating nothing more than my 'BrianHG_DDR3_PLL.sv' and it's stand alone testbench with it's 4 clock outputs.  If you are lucky, Gowin should provide their own PLL library functions in their own simulator.

So this just got harder..

'Gowin' and 'Simulator' seem to be words that do not exist in the same sentence, other than in sentences where "do not have one" also appear... And this is in the "licensed" (ie: please send us your email) version, not the 'educational'
one.

There used to be an option in the IDE (I've seen screenshots!) where you could call into a 3rd party simulator, but that seems to have been removed (why ?!) There is an option to generate .vo "post-PnR simulation model files" in the options, but other than that there's sweet Fanny Adams to help out.

Clearly it'd be useful to have the same simulation environment as you're using, but I don't have Modelsim, and I can't find out how expensive it is... I sent off an email to the 'contact us' page at Siemens, but I have a bad feeling about software that doesn't advertise its price *anywhere*... Even my most recent eye-wateringly-expensive purchase (Altium) had advertised prices...

On the upside, it looks as though System Verilog (2017) is supported. And the boards I bought have shipped straight away, which is nice. It's been many (many!) moons since I could claim a student version of anything (and anyway it looks as though they've removed the free student edition for now), so on the down-side, I may be using icarus verilog or something similar...


 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #159 on: August 23, 2022, 02:21:30 am »
Clearly it'd be useful to have the same simulation environment as you're using, but I don't have Modelsim, and I can't find out how expensive it is... I sent off an email to the 'contact us' page at Siemens, but I have a bad feeling about software that doesn't advertise its price *anywhere*... Even my most recent eye-wateringly-expensive purchase (Altium) had advertised prices...

Modelsim is free.  It comes with Quartus 20.1 free web version and earlier.  (Includes Quartus Megafunction Libraries.)
It also comes with Lattice Diamond Design Software 3.12 and later.  (This version includes Lattice's library functions.)
I do not know about Gowin.

     If you do not include the appropriate -L xxxxx  on the compile line to include the vendor's library functions, then it's a basic Modelsim with over 90% functionality.  There are only 1 or 2 advanced post generation function views which arent available unless you buy the full Modelsim, but these are available in Quartus and Lattice itself.

     I personally begun to completely develop in Modelsim alone and then move my design to the FPGA tools as Modelsim's compile/build time is usually within a second.

Try googling:
HDL modelsim gowin fpga
and
HDL Active-HDL gowin fpga

Active-HDL is a somewhat close to but a cheaper experience than Modelsim.

Take a look at my https://github.com/BrianHGinc/SystemVerilog-TestBench-BPM-picture-generator as I made it work for both simulators.  Only difference is in the setup-xxx.do files.

You can look here:
https://www.intel.com/content/www/us/en/software-kit/661015/intel-quartus-prime-standard-edition-design-software-version-20-1-for-windows.html and click on 'Individual Files' where you will see Modelsim as a stand-alone download.

But the listed modelsim there only has the added Altera/Intel libraries, yet almost everything else works.
« Last Edit: August 23, 2022, 02:26:29 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #160 on: August 23, 2022, 02:28:06 am »
A-ha! Ok, thanks :)

I knew about the bundled versions, but I’d assumed they were vendor-locked in some way. Well, that makes things simpler, at least to start with :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #161 on: August 23, 2022, 02:36:18 am »
If you can get Gowin to generate simulation libraries for their functions, like PLL and DDR IO buffers, you may be able to include those with Altera's Modelsim as a work around hack.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #162 on: August 23, 2022, 02:47:31 am »
A-ha! Ok, thanks :)

I knew about the bundled versions, but I’d assumed they were vendor-locked in some way. Well, that makes things simpler, at least to start with :)
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #163 on: August 23, 2022, 03:51:36 am »
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.

That's no problem, trust me  :-DD To me, 'vi' is a luxury, it was 'ed' when I started :) command-line tools are A-OK with me 
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 3971
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #164 on: August 23, 2022, 04:14:19 am »
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.

That's no problem, trust me  :-DD To me, 'vi' is a luxury, it was 'ed' when I started :) command-line tools are A-OK with me

Ed? Pffft.  I used to work on a system where the best editor was an even more obscure and limited variant of TECO called SPEED.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #165 on: August 23, 2022, 05:02:36 am »
There was one time on a DECstation that had corrupted its usr partition, when I had to use only what was in /boot to get it to change how it booted. vi was in /usr/... :( head, cat and tail were in /bin... Took a while to get the boot config files how I wanted them. Worked in the end though :)

But I digress - sorry Brian, I'll keep it on-topic from now :)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #166 on: August 27, 2022, 06:46:27 pm »
Brian, am I reading this correctly ? In your BrianHG_DDR3_PLL.sv code, it looks like the delay-shift for the Altera PLL's can take any number between 0 and 4000ps. Is that correct ?

Because looking at the Gowin version of what you can do with the PLL, there are 16 possible phase-tuning parameters (step of 22.5°), and another 16 possible delay parameters (step of 0.125ns):

Code: [Select]
///////////////////////////////////////////////////////////////////////////////
// Phase control values
// --------------------
// 0000 0°          0001 22.5°          0010 45°            0011 67.5°
// 0100 90°         0101 112.5°         0110 135°           0111 157.5°
// 1000 180°        1001 202.5°         1010 225°           1011 247.5°
// 1100 270°        1101 292.5°         1110 315°           1111 337.5°
//
// Duty cycle values
// -----------------
// 0010 2/16        0011 3/16           0100 4/16           0101 5/16
// 0110 6/16        0111 7/16           1000 8/16           1001 9/16
// 1010 10/16       1011 11/16          1100 12/16          1101 13/16
// 1110 14/16
//
// Delay parameters (below are in manual, looks like others work)
// ----------------
// 0111 0.875ns     1011 1.375ns        1101 1.625ns        1110 1.75ns
// 1111 1.875ns
//
///////////////////////////////////////////////////////////////////////////////

From what I can see, the duty-cycle is there to indicate when the falling edge of the signal should be in the waveform, which is dependent on the phase, so for a 50/50 duty cycle, I ought to just add 8 to the phase and take the modulus in base 16

There are also 'fine-tuning' (±50ps, ±100ps, ±150ps) options for the phased-clock output, but those are parameter-based, not dynamically tunable.

But in any event, my number of phase-discriminating steps is going to be a *lot* less than the Altera one if it can do 4000 of them, hopefully it'll still be useful enough...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #167 on: August 27, 2022, 07:57:36 pm »
For the fixed integer '270' deg, I specify the integer parameter DDR3_WDQ_PHASE in 'degrees'.  To tell Altera how to adjust it's PLL, I needed to convert it into PS.  If your rPLL accepts the integer 270, use that one.  It is 50:50.  Ignore the Cyclone V PLL as it is a mess.  Better look at the CycloneIV/MAX10 PLL as it has fewer controls.

As for the read clock user phase tunable output, it is 50:50.  When the system begins or reset is sent, it defaults to '0' degree.  The Altera PLL will accept 16 tuning steps before a full 360deg round trip has been made.  It stays at 50:50.  (looks the same as Gowin)

The parameter DDR3_RDQ_PHASE is actually never used.  I always have it set to '0'.

The 'trick pll' should never be used for Gowin.  I use it to bypass a cap in one of Altera's DDR_IO buffers.

The localparam 'DDR3_WDQ_PHASE_ps' is a translation of the user set DDR3_WDQ_PHASE  into a picosecond delay.

Ignore the Altera dummy string as it circumvents a bug in the Altera alt_pll functions where they wanted 'string' inputs.
« Last Edit: August 27, 2022, 08:07:44 pm by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #168 on: August 27, 2022, 10:17:38 pm »
Ok, thanks Brian, I have both yours and my signals pretty much matching up initially. It looks as though the Gowin PLL takes longer to initialize. The internal 100MHz clock is slightly different - it's a cycle delayed (or ahead, I guess) of the Altera one but it's still in sync. Not sure if that's important because you've presumably got clock-crossing controls in place. Could alter some of the timing of those signals, though.


Presumably the phase_step signal is edge-triggered rather than level-triggered ? From the signaling, it certainly looks that way (see below). My WIP PLL module is going nuts at the moment, changing its phase on every (phase_sclk == 1'b1 and phase_step == 1'b1)...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #169 on: August 27, 2022, 10:42:59 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)
What is important is the direction and that there are 16 steps for a full rotation.

When my DDR3 controller applies a step, it waits for ~ 1us for the PLL output to adapt, hence I basically ignore the 'phase_done' signal by just waiting a crap load of time.

As for the clock out, the phases once set better stay where they belong, even in simulation.  Otherwise, the sim will fail over time.

I am not sure how your rPLL clock output cannot be exactly 400MHz if you set your reference clock divider and multiplier correctly.  For example, when I generate the requested:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

and synth the clock:
Code: [Select]
localparam       period  = 500000000/CLK_KHZ_IN ;

always #period                  CLK_IN = !CLK_IN; // create source clock oscillator

All the factors in the equations and delays hit dead on whole numbers.

In modelsim under menu 'wave/wave preferences / Grid & Timescale', if I set the grid preiod to a manual 2500ps and zoom in & scroll in the waveform output, you can see the 400MHz stays locked to the 50MHz source.


Do not worry about the initial PLL setup time.  I wait plenty of time for the PLL and other stuff to synchronize before running the system.  Also verify that Gowin provides a PLL locked signal out.  My DDR3 is held in reset during power-up until the locked signal is ready, then, there is a ton of other delays to accommodate the DDR3 startup sequence.
« Last Edit: August 27, 2022, 10:44:38 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #170 on: August 27, 2022, 11:08:41 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)

Yep, I'm going to have to put some logic in there to wait for it to go low again - there is no "step" functionality in the Gowin PLLs, you just set the value for the phase directly - so I'm wrapping some logic around the step/updn signals to mimic the same interface.

What is important is the direction and that there are 16 steps for a full rotation.

I think you're actually running at 8 steps per full period. After four calls to phase_step, the DDR3_CLK_RDQ signal is 180° out of phase with the DDR3_CLK signal. I'm adding 2 to my 'out-of-16' phase value to match the Altera original (currently simulating the Cyclone V variant).

When my DDR3 controller applies a step, it waits for ~ 1us for the PLL output to adapt, hence I basically ignore the 'phase_done' signal by just waiting a crap load of time.

That's good, because I don't have a 'phase done' :) I was thinking I'd synthesize an interface to one but if you don't use it, that's easier :)

As for the clock out, the phases once set better stay where they belong, even in simulation.  Otherwise, the sim will fail over time.

I am not sure how your rPLL clock output cannot be exactly 400MHz if you set your reference clock divider and multiplier correctly.  For example, when I generate the requested:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

and synth the clock:
Code: [Select]
localparam       period  = 500000000/CLK_KHZ_IN ;

always #period                  CLK_IN = !CLK_IN; // create source clock oscillator

All the factors in the equations and delays hit dead on whole numbers.

In modelsim under menu 'wave/wave preferences / Grid & Timescale', if I set the grid preiod to a manual 2500ps and zoom in & scroll in the waveform output, you can see the 400MHz stays locked to the 50MHz source.
Imprecise language here - the phases stay locked on, and the waves are exactly the correct periods/frequencies. What I meant was (if you look at 'initial-locked-state.png' two posts up), you can see

- the cursor is at 'gowin_clocks_locked' so we're just acheived stability.
- the red (DDR3_CLK and clk_ddrMain) signals are locked in sync
- same for the cyan DDR-write clocks (DDR3_CLK_WDQ and clk_ddrWrite)
- same for the blue DDR-read clocks (DDR3_CLK_RDQ and clk_ddrRead)
- same for the green client-interface clock (DDR3_CLK_50 and clk_ddrClient)

- the DDR3_CLK_25 and clk_ddrMgmt clocks both do have the same period/frequency, however the Gowin one (clk_ddrMgmt) is presenting a low->high (and high->low of course) transition one full DDR3_CLK period later than DDR3_CLK_25.

Do not worry about the initial PLL setup time.  I wait plenty of time for the PLL and other stuff to synchronize before running the system.  Also verify that Gowin provides a PLL locked signal out.  My DDR3 is held in reset during power-up until the locked signal is ready, then, there is a ton of other delays to accommodate the DDR3 startup sequence.

Sure, in the test bench the stop condition is now
Code: [Select]
always @(PLL_LOCKED & gowin_clocks_locked) #(endtime) $stop;            // Wait for both PLLs to start, then run the simulation until 1ms has been reached.

I figured there'd be a wait-for-the-signal to account any variation even in the Altera PLLs.

Cheers :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #171 on: August 27, 2022, 11:16:39 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)

Yep, I'm going to have to put some logic in there to wait for it to go low again - there is no "step" functionality in the Gowin PLLs, you just set the value for the phase directly - so I'm wrapping some logic around the step/updn signals to mimic the same interface.


It will then be a 4 bit up-down counter.
To decode the step:

always@(posedge clk_in) begin
step_dly<= step;
if (step && !step_dly) begin
     if (up_dn) phase_pos <= phase_pos + 4'd1;
     else phase_pos <= phase_pos - 4'd1;
end

I bet that will do what you want, though I may have the +/- backwards.

A cleaner method would be to have 2 step_dly's, and in the if()  check for a true step_dly1 and a false step_dly2.
This way, the unknown clock domain source 'step' is first transferred to the 'clk_in' domain for better metastability.
« Last Edit: August 27, 2022, 11:23:31 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #172 on: August 28, 2022, 12:14:59 am »
You should be able to insert a parameter value 'Gowin' to the 'FPGA_VENDOR' parameter and add your PLL to my 'BrianHG_DDR3_PLL.sv'.

Then temporarily edit my 'BrianHG_DDR3_PHY_SEQ_v16_tb.sv' so that just the PLL's FPGA_VENDOR is set to Gowin.  The execute my 'setup_phy_v16.do' script to see if the Gowin PLL will initialize the Mircon DDR3 model.
If you need to touch-up your code, to re-compile, just execute 'run_phy_v16.do'.  (You will need include Gowin's primitive lib to the setup_xxx.do)

If that works, then you can call that part a success.  Next, the DDR_IO buffers.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #173 on: August 28, 2022, 12:28:09 am »
Yep, my logic looked pretty similar to your counter-style, if a little less concise :) ...
Code: [Select]
    reg     [3:0]       duty;               // Duty cycle, 50/50 = 8 + phase
    reg     [3:0]       phase;              // Phase to present to PLL
    reg                 lastPhaseStep;      // Look for the level-transition

    always @ (posedge clk) 
        begin
            if (rst)
                begin
                    phase           <= 4'b0;
                    duty            <= 4'b0;
                    lastPhaseStep   <= 1'b0;
                end
            else   
                begin
                    if ((phase_step == 1'b1) && (lastPhaseStep == 1'b0))
                        if (phase_updn == 1'b0)
                            begin
                                phase   <= phase - 4'h2;
                                duty    <= phase + 4'h6;    // current phase + (8-2)
                            end
                        else
                            begin
                                phase   <= phase + 4'h2;
                                duty    <= phase + 4'hA;    // current phase + (8+2)

                            end

                    lastPhaseStep <= phase_step;
                end
        end

I did originally have 'phaseEdge' as a 3-bit detector, and I was checking for 'phaseEdge' being 3'b011 but the clk this is synced off is the slow clkIn (~27MHz in the eventual design) and three clocks took an eternity. Now that I know you wait for 1µs, I might just put that back :)

Looking at it, I can make the phase a 3-bit counter as well, since the last bit is always 0.

Anyway, it now matches up perfectly with the signals from your BrianHG_DDR3_PLL, apart from that phase offset between your DDR3_CLK_25 and my clk_ddrMgmt clock - and I don't *think* that'll be an issue. I'd have to burn another PLL to get the 180° phase-delay, so unless it turns out to be important, I'm going to leave it.

You should be able to insert a parameter value 'Gowin' to the 'FPGA_VENDOR' parameter and add your PLL to my 'BrianHG_DDR3_PLL.sv'.

Then temporarily edit my 'BrianHG_DDR3_PHY_SEQ_v16_tb.sv' so that just the PLL's FPGA_VENDOR is set to Gowin.  The execute my 'setup_phy_v16.do' script to see if the Gowin PLL will initialize the Mircon DDR3 model.
If you need to touch-up your code, to re-compile, just execute 'run_phy_v16.do'.  (You will need include Gowin's primitive lib to the setup_xxx.do)

If that works, then you can call that part a success.  Next, the DDR_IO buffers.

I might try and get around to that this evening, else tomorrow. Life, I'm reliably informed by 'er indoors doesn't *entirely* revolve around playing with electronics :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #174 on: August 28, 2022, 12:42:33 am »
My mistake, here is a snapshot of the tuning:



As you can see the time bars, I pulse the phase_step for 20ns.
Then I wait 100ns before analyzing the read data meaning the PLL's read clock needs to be ready in around 90ns.

These inherent delays can be adjusted in my DDR3 initialization sequence, though, 100ns is plenty.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #175 on: August 28, 2022, 12:57:52 am »
Here is the true operation tuning time when parameter 'SKIP_PUP_TIMER = 0'.  This is normal operation, however, to simulate this, it takes a few minutes as Modelsim needs to simulate 0.2seconds of real time since we now go through the entire required power-up delays listed in the DDR3 specifications.



As you can see here, the 'phase_step' is pulsed for 2000ns.  The time after that is 3000ns before the read check takes place.  So, if you step at the rise of 'phase_step, your PLL needs to have a new valid output phase within 5000ns of the step.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #176 on: August 28, 2022, 01:02:46 am »
As it stands it’s running off CLK_IN, and 90ns @ ~25MHz is approx 2 clocks. Since the PLLs are going to be well up and running before the phase change will occur, I could slave the logic off the 100MHz clock instead, then there’s plenty of time :)

And then I saw your 2nd post… 5000ns is indeed plenty :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #177 on: August 28, 2022, 01:47:04 am »
My tuning section clk is actually hard tied the DDR3_CK / 4 output.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #178 on: August 28, 2022, 05:01:41 am »
So I'm not 100% sure, but I think this looks good ?

I can see the WRITE commands, eg:

Code: [Select]
WRITE @ DQS= bank = 7 row = 0004 col = 00000000 data = 8888)
... corresponding to later READ data, eg:

Code: [Select]
READ @ DQS= bank = 7 row = 0004 col = 00000000 data = 8888
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #179 on: August 28, 2022, 05:13:02 am »
Looks ok.  If the tuning wasn't doing anything, the sim would get stuck in an infinite loop during initialization.  No reads or writes would take place.

IE: Break the phase step signal to the PLL, make the PLL's input set to '0' and see if the sim will run.

Also, all you had to do was compare the output when running with 'FPGA_VENDOR = Altera' mode.

To be absolutely sure, I need to inspect the waveform during tuning.

If this checks out, then all you have left is the DDR_IO primitive to be inserted into my IO port file.
Your best bet is to read my code's Cyclone DDR_IO primitive and replicate in Gowin.  So long as you can specify the data input read clock and data output write clock, you should be able to get this to work.
(Oh, and dont forget proper .SDC file restraints.  The values I have may be different for Gowin on the output side as the delays have been tuned to maximize Altera's FMAX.)
« Last Edit: August 28, 2022, 05:16:59 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #180 on: August 28, 2022, 05:41:13 am »
Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #181 on: August 28, 2022, 05:57:37 am »
Looks ok.  If the tuning wasn't doing anything, the sim would get stuck in an infinite loop during initialization.  No reads or writes would take place.

IE: Break the phase step signal to the PLL, make the PLL's input set to '0' and see if the sim will run.

Yep, if I set ".phase_step(1'b0)" in the BrianHG_DDR3_PLL (...) instantiation, I get an endless series of read data 0, read data 1 lines. Presumably the tuning.

Also, all you had to do was compare the output when running with 'FPGA_VENDOR = Altera' mode.

That's ... totally fair.


To be absolutely sure, I need to inspect the waveform during tuning.

So tomorrow, I'll fork your repo on GitHub into my 'Spaced-cowboy' GitHub account, and upload the code that I have here. Then it's a 'git clone' away.

If this checks out, then all you have left is the DDR_IO primitive to be inserted into my IO port file.
Your best bet is to read my code's Cyclone DDR_IO primitive and replicate in Gowin.  So long as you can specify the data input read clock and data output write clock, you should be able to get this to work.

This is at line 358 of BrianHG_DDR3_IO_PORT_ALTERA.sv, right ?

The DDR facilities of the Gowin parts seem a bit primitive compared to the Altera ones, and ODDR is a bit strange - there's a TX input that I haven't figured out yet - might be a clock-enable signal perhaps. The PDF documentation isn't what I'd call 'great' (see attached), but there's probably some use of DDR outputs in their example code that I can go look at and figure out what the parameters do.

(Oh, and dont forget proper .SDC file restraints.  The values I have may be different for Gowin on the output side as the delays have been tuned to maximize Altera's FMAX.)

And here do the dragons dwell....

Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.

I'll give that a go tomorrow. Off to bed :)

Thanks for all the help Brian, this is actually going a lot smoother than I thought it would...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #182 on: August 28, 2022, 06:03:22 am »
Actually maybe ODDR doubles as IODDR, and TX sets the direction, since Q1 can connect to an IOBUF (which is bidirectional).
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #183 on: August 28, 2022, 06:12:45 am »
Looks like the TX is the output enable.
Then you have the Q output and the Q for the OE.

something like :

assign pin_name = Q1 ? Q0 : 1'bz ;

Where as the DDR input primitive would also wire to 'pin_name'.

Anyways, look at my 'BrianHG_DDR3_IO_PORT_ALTERA.sv'.

Remove lines 228 through 358 while concentrating on lines 360 through 461.
(note that lines 228 through 358 use HW DIFFERENTIAL drivers for the DDR3_CK and DDR3 DQS pins.  360 through 461 uses a software emulated dumb differential driver.)

Altera documentation:
https://www.intel.com/programmable/technical-pdfs/683148.pdf
« Last Edit: August 28, 2022, 06:19:59 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #184 on: August 28, 2022, 05:27:35 pm »
To be absolutely sure, I need to inspect the waveform during tuning.

Ok, so the repo is forked and the code changes uploaded, the fork is here

Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.

Did that this morning, and all the phases seem to align between the Gowin and Altera PLL instances :)

This video has me selecting phase_done (rising edge) as a convenient click-to-move-to-point, and one clock subsequent to that signal going high is when the Altera PLL has shifted to the new phase.
  • The dark blue trace is the DDR3_CLK_RDQ clock, and it aligns with the dark blue trace from the Gowin PLLs below (clk_ddrRead).
  • The 'phase' counter above is actually from the Gowin module and represents what is sent to the Gowin PLL as the phase to use.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #185 on: August 28, 2022, 08:03:08 pm »
To be absolutely sure, I need to inspect the waveform during tuning.

Ok, so the repo is forked and the code changes uploaded, the fork is here


That's nice and all, but, why did you use Gowin's black box ' pll_ddr1 & 2 ' instead of directly instantiating 'rPLL' yourself directly?
Also, why didn't you feed into the rPLL my 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' ?
Without those 3, you cannot change the frequency of my DDR3 controller.

(Cant see your video, it is a privileged page.)

*** Additional: Did you try a power-up sequence with the parameter 'SKIP_PUP_TIMER = 0'?  In this configuration, I pulse the phase_step really slow, and, there should be no DDR3 RST_N & CKE warnings from Micron's DDR3 model.  Note that this will take a few minutes to run and it will appear to be cycling through the configuration non-stop at the end before the read/write tests begin.
« Last Edit: August 28, 2022, 09:04:31 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #186 on: August 28, 2022, 09:46:05 pm »
Thanks for all the help Brian, this is actually going a lot smoother than I thought it would...

Well, I designed my controller to work on bottom end 20 year old FPGAs, those without IO delay functions, relying 100% on PLL phase capabilities, plus, multiple configurable proper end-to-end simulations test-benchs.  It's as easy as it can get with the negative that you cannot stack too many DDR3s in parallel without dropping the clock rate, or taking care about your trace lengths.  My guess is that I would feel safe with maximum a single laptop DDR3 sodim module with 4 ram chips, 64 bit DDR3 bus running at 400MHz, 800mtps.  2 chips and I would not go beyond 600MHz, 1.2gtps.

Though, I do not know Gowin's IO port's capabilities.
« Last Edit: August 28, 2022, 09:48:13 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #187 on: August 28, 2022, 09:58:12 pm »
There's a couple of reasons why I was using the drop-in IP from Gowin

  • I didn't think it made much difference - I wasn't aware of the significance of 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' and that you were trying to make that a cross-architecture feature.

    You'd also said "Do not worry about how I managed/strong armed Altera's PLLs into doing what I like from provided parameters.  With Gowin, it is ok if you are left with no choice but to provide only a selection of source clocks and output clocks per-generated by Gowin's IP tool" and I thought those parameters where part of that setup process.

  • It was also kind of handy, when I was re-generating the PLLs that it would just overwrite the same files and I'm done - just hit up-arrow/return in modelsim to see any change.

That said, now that I'm happier with the end-result, it's a reasonable ask. I'll see if I can figure out how to do it.

The video not being visible is weird - it's fine here on the internal network, my ISP must be doing something to filter external access because there's no protection on the page at all.I've converted it into a GIF, so perhaps it'll work when uploaded instead.

[edit] No such luck. I've loaded it onto Imgur, and if this doesn't work, sod it.

I had tried a sequence with  'SKIP_PUP_TIMER = 0', I did it last night, but maybe because it was late, I looked in BrianHG_DDR3_PHY_SEQ_v16.sv not BrianHG_DDR3_PHY_SEQ_v16_tb.sv for the parameter  :palm: ... I thought it was running faster than you'd mentioned.

So, running it with 'SKIP_PUP_TIMER = 0' for real this time, it does show a couple of warnings regarding RST_N - which presumably it oughtn't. The log is attached (as a .zip) but the offending lines are:

Code: [Select]
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.reset at time 7697500.0 ps WARNING: 200 us is required before RST_N goes inactive.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task at time 8698750.0 ps WARNING: 500 us is required after RST_N goes inactive before CKE goes active.

So, something else to look at there.
« Last Edit: August 28, 2022, 10:13:04 pm by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #188 on: August 28, 2022, 10:27:57 pm »
There's a couple of reasons why I was using the drop-in IP from Gowin

  • I didn't think it made much difference - I wasn't aware of the significance of 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' and that you were trying to make that a cross-architecture feature.

    You'd also said "Do not worry about how I managed/strong armed Altera's PLLs into doing what I like from provided parameters.  With Gowin, it is ok if you are left with no choice but to provide only a selection of source clocks and output clocks per-generated by Gowin's IP tool" and I thought those parameters where part of that setup process.

  • It was also kind of handy, when I was re-generating the PLLs that it would just overwrite the same files and I'm done - just hit up-arrow/return in modelsim to see any change.

That said, now that I'm happier with the end-result, it's a reasonable ask. I'll see if I can figure out how to do it.
Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)

Yes, I did say begin with the simple fixed 400MHz pll, however, you now must tune Gowin's PLL to my clocks and multipliers.  The rest of my DDR3 controller uses these figures to tune all the DDR3 delays, like the number of clocks between RAC/CAS/ refresh clock cycle timing based on selected parameter 'DDR3_SPEED_GRADE'.  All my power-up timers also tune themselves based these PLL settings as well.

Quote
The video not being visible is weird - it's fine here on the internal network, my ISP must be doing something to filter external access because there's no protection on the page at all. I've converted it into a GIF, so perhaps it'll work when uploaded instead.

I had tried a sequence with  'SKIP_PUP_TIMER = 0', I did it last night, but maybe because it was late, I looked in BrianHG_DDR3_PHY_SEQ_v16.sv not BrianHG_DDR3_PHY_SEQ_v16_tb.sv for the parameter  :palm: ... I thought it was running faster than you'd mentioned.

So, running it with 'SKIP_PUP_TIMER = 0' for real this time, it does show a couple of warnings regarding RST_N - which presumably it oughtn't. The log is attached (as a .zip) but the offending lines are:

Code: [Select]
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.reset at time 7697500.0 ps WARNING: 200 us is required before RST_N goes inactive.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task at time 8698750.0 ps WARNING: 500 us is required after RST_N goes inactive before CKE goes active.


So, something else to look at there.

Looking at your attached .txt file, your sim begins at line 2548.  Reading from there:
Code: [Select]
# restart -force
# Loading sv_std.std
# Loading work.BrianHG_DDR3_PHY_SEQ_v16_tb
# Loading work.BrianHG_DDR3_PLL
# Loading work.BrianHG_DDR3_PHY_SEQ_v16
# Loading work.BrianHG_DDR3_GEN_tCK
# Loading work.BrianHG_DDR3_CMD_SEQUENCER_v16
# Loading work.gowin_ddr_clocking
# Loading work.pll_ddr1
# Loading work.rPLL
# Loading work.pll_ddr2
# Loading work.BrianHG_DDR3_IO_PORT_ALTERA
# ** Warning: (vsim-3017) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Too few port connections. Expected 11, found 10.
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_DDR3_PLL/genblk6/gowin_ddr_clocks
# ** Warning: (vsim-3722) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Missing connection for port 'fdly'.
#
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(450): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/CMD_TXB', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(148): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_VECT_OUT', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(147): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(146): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_RDY_t', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# run -all
# ** Warning: *****************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 616
# ** Warning: *** BrianHG_DDR3_PLL Info ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 617
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 618
# ** Warning: ***      CLK_IN           =    50 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 619
# ** Warning: ***      DDR3_RDQ/WDQ     =   800 MTPS.    ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 620
# ** Warning: ***      DDR3_CLK/RDQ/WDQ =   400 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 621
# ** Warning: ***      DDR3_WDQ_PHASE   =   270 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 622
# ** Warning: *** True DDR3_WDQ_PHASE   =  1875 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 623
# ** Warning: ***      DDR3_RDQ_PHASE   =     0 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 624
# ** Warning: *** True DDR3_RDQ_PHASE   =  0000 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 625
# ** Warning: ***      CMD_CLK          =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 626
# ** Warning: ***      DDR3_CLK_50      =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 627
# ** Warning: ***      DDR3_CLK_25      =   100 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 628
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 629
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.0.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.1.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.2.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.3.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.4.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.5.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.6.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.7.
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 CAS Write Latency =           5
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled


Ok, the first 6 warnings should not be there.
The first 2 seem to be associated with your Gowin pll.

The next 4 for some reason comes from my code, and it should not.
Did you modify my test bench HDL?

The next set of warnings are just a print-out from my PLL.  I guess I should have used $display instead of $warning, though the $warning show better in Quartus' compiler.

The divide by zero error is in Micron's DDR3 model, I tend to ignore.

Then we reach the load MRS2, without the RST_N or CKE required delay warning, and everything runs from there correctly.


Again, read your 'Gowin/pll_ddr#/pll_ddr#.v' to see what's inside those black boxes.
The only thing you might have trouble with is configuring the adjustable 270deg in PLL #2 based on my source parameter.

Also don't forget to pass my string  .FPGA_FAMILY    to     rpll_inst.DEVICE, and in your sims set my .FPGA_FAMILY to "GW2A-18".

(it looks like you will have to strong-arm Gowin's 'rpll_inst.PSDA_SEL = "1100";' as it looks to be a string, not a binary number, though it looks easy enough.)

From my system PLL module's parameter DDR3_WDQ_PHASE, multiply by 16, divide by 360, shrink to 4 bit logic, then make a string = 0 through 15 "0000", "0001", "0010",...  Remember, this must all be done as localparams.
« Last Edit: August 28, 2022, 10:48:28 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #189 on: August 28, 2022, 11:07:07 pm »

Looking at your attached .txt file, your sim begins at line 2548.  Reading from there:
Code: [Select]
# restart -force
# Loading sv_std.std
# Loading work.BrianHG_DDR3_PHY_SEQ_v16_tb
# Loading work.BrianHG_DDR3_PLL
# Loading work.BrianHG_DDR3_PHY_SEQ_v16
# Loading work.BrianHG_DDR3_GEN_tCK
# Loading work.BrianHG_DDR3_CMD_SEQUENCER_v16
# Loading work.gowin_ddr_clocking
# Loading work.pll_ddr1
# Loading work.rPLL
# Loading work.pll_ddr2
# Loading work.BrianHG_DDR3_IO_PORT_ALTERA
# ** Warning: (vsim-3017) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Too few port connections. Expected 11, found 10.
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_DDR3_PLL/genblk6/gowin_ddr_clocks
# ** Warning: (vsim-3722) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Missing connection for port 'fdly'.
#
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(450): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/CMD_TXB', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(148): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_VECT_OUT', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(147): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(146): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_RDY_t', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# run -all
# ** Warning: *****************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 616
# ** Warning: *** BrianHG_DDR3_PLL Info ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 617
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 618
# ** Warning: ***      CLK_IN           =    50 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 619
# ** Warning: ***      DDR3_RDQ/WDQ     =   800 MTPS.    ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 620
# ** Warning: ***      DDR3_CLK/RDQ/WDQ =   400 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 621
# ** Warning: ***      DDR3_WDQ_PHASE   =   270 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 622
# ** Warning: *** True DDR3_WDQ_PHASE   =  1875 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 623
# ** Warning: ***      DDR3_RDQ_PHASE   =     0 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 624
# ** Warning: *** True DDR3_RDQ_PHASE   =  0000 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 625
# ** Warning: ***      CMD_CLK          =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 626
# ** Warning: ***      DDR3_CLK_50      =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 627
# ** Warning: ***      DDR3_CLK_25      =   100 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 628
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 629
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.0.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.1.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.2.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.3.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.4.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.5.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.6.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.7.
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 CAS Write Latency =           5
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled


Ok, the first 6 warnings should not be there.
The first 2 seem to be associated with your Gowin pll.

Yep - it's the fdly [3:0] input to gowin_ddr_clocking. Apparently I forgot to take out the declaration of the input port at the top of the gowin_ddr_clocking module. We're never changing the delay on the output port, so I'm hardwiring those to 4'b0 and there was no need to pass it in. Fixed locally.

The next 4 for some reason comes from my code, and it should not.
Did you modify my test bench HDL?

Not that I'm aware of. The only changes that ought to have been made in the test bench are at the top of the file wrt the PLL instance. You can see the diffs here on GitHub

running 'git diff' on the local version of what's at GitHub is just showing me the line where I set SKIP_PUP_TIMER to 0, so I don't think there's any changes. Could be a Modelsim version ? I'm running
Code: [Select]

ModelSim ALTERA STARTER EDITION 10.3c

Revision: 2014.09
Date: Sep 20 2014



The next set of warnings are just a print-out from my PLL.  I guess I should have used $display instead of $warning, though the $warning show better in Quartus' compiler.

The divide by zero error is in Micron's DDR3 model, I tend to ignore.

Then we reach the load MRS2, without the RST_N or CKE error, and everything runs from there correctly.
:-+

Again, read your 'Gowin/pll_ddr#/pll_ddr#.v' to see what's inside those black boxes.
The only thing you might have trouble with is configuring the adjustable 270deg in PLL #2 based on my source parameter.

Also don't forget to pass my string  .FPGA_FAMILY    to     rpll_inst.DEVICE, and in your sims set my .FPGA_FAMILY to "GW2A-18".

Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

Speaking of which:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

The first and third are fairly obvious, the second seems to scale the wrong way. 50 x 32 = 1600. Isn't the DDR3 clock running at 400MHZ, so with DDR it is 800MT/s, so 1600 is x2 not /2 ? Maybe just perspective, and probably doesn't matter as long as it's consistent :)
 
So I think what you're saying is that from the above 3 figures, I ought to be deriving any of the numerical PLL parameters such as FBDIV_SEL in
Code: [Select]
defparam rpll_inst.FCLKIN = "50";
defparam rpll_inst.DYN_IDIV_SEL = "false";
defparam rpll_inst.IDIV_SEL = 0;
defparam rpll_inst.DYN_FBDIV_SEL = "false";
defparam rpll_inst.FBDIV_SEL = 7;
defparam rpll_inst.DYN_ODIV_SEL = "false";
defparam rpll_inst.ODIV_SEL = 2;
defparam rpll_inst.PSDA_SEL = "0000";
defparam rpll_inst.DYN_DA_EN = "true";
defparam rpll_inst.DUTYDA_SEL = "1000";
defparam rpll_inst.CLKOUT_FT_DIR = 1'b1;
defparam rpll_inst.CLKOUTP_FT_DIR = 1'b1;
defparam rpll_inst.CLKOUT_DLY_STEP = 0;
defparam rpll_inst.CLKOUTP_DLY_STEP = 0;
defparam rpll_inst.CLKFB_SEL = "internal";
defparam rpll_inst.CLKOUT_BYPASS = "false";
defparam rpll_inst.CLKOUTP_BYPASS = "false";
defparam rpll_inst.CLKOUTD_BYPASS = "false";
defparam rpll_inst.DYN_SDIV_SEL = 2;
defparam rpll_inst.CLKOUTD_SRC = "CLKOUT";
defparam rpll_inst.CLKOUTD3_SRC = "CLKOUT";
defparam rpll_inst.DEVICE = "GW2A-18";


which, as I said, doesn't look impossible. And certainly updating the device to be passed through is fine too.

Is there any difference between doing it as "type (parameters) instance (signals)" rather than as "type instance (signals) defparams" ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #190 on: August 28, 2022, 11:20:48 pm »
Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

What math?
I gave you the math, you just had to...

defparam rpll_inst.FCLKIN = (CLK_KHZ_IN /1000);
defparam rpll_inst.IDIV_SEL = (CLK_IN_DIV-1);
defparam rpll_inst.FBDIV_SEL = (CLK_IN_MULT-1);

How hard was that?

Also:

localparam string gowin_phase[0:15] ='{"0000","0001","0010",...,"1111"};
localparam  phase_set = (DDR3_WDQ_PHASE * 16 / 360);

and
defparam rpll_inst.PSDA_SEL = gowin_phase_string[phase_set[3:0]];

Only that you might need to remove the 'string' in the first localparam as this is the same hard-handed manipulation I has to use to make Altera's PLL work as they have some input parameters which arent true integers or strings.

You will need to test everything in the first simple PLL testbench.

Quote
Is there any difference between doing it as "type (parameters) instance (signals)" rather than as "type instance (signals) defparams" ?
No difference.

If you like, you may find the common remainder of the CLK_IN_DIV & CLK_IN_MULT to divide both by that number if you like.  I always prefer having the CLK_IN_DIV set to at least 2 as this makes a true 50:50 source reference as the rise and fall times of your source crystal oscillator are now ignored.  You are now relying on the rise-to-next-rise of the crystal feeding your PLL source.
« Last Edit: August 28, 2022, 11:47:02 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #191 on: August 28, 2022, 11:31:49 pm »
Clicking on 'Help / About Modelsim' gives me this:

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #192 on: August 28, 2022, 11:36:32 pm »
Looking at the:
defparam rpll_inst.FCLKIN = "50";

Notice that the '50' is in quotes.
It may be another one of those BS fake string things.
We may need to strong-arm a phony string for that one too.

(If this is it, it isn't too bad.  Not until you take a look at LATTICE, they actually rely on some figures written in their 'COMMENTS', yes, inside their /* xxx=xxxMHz */ to make their PLL primitive design compile properly.  WTF?  How.  What kind of backwater design is this?)
« Last Edit: August 28, 2022, 11:47:47 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #193 on: August 29, 2022, 12:00:07 am »
What math?
I gave you the math, you just had to...

defparam rpll_inst.FCLKIN = (CLK_KHZ_IN /1000);
defparam rpll_inst.IDIV_SEL = (CLK_IN_DIV-1);
defparam rpll_inst.FBDIV_SEL = (CLK_IN_MULT-1);

How hard was that?

Ok, I think the confusion is because those figures don't match what the tool produces for a 50MHz input clock (you can see the generated ones in my last post, it's 50/8/1 not 50/32/4) so it wasn't clear that you'd done it already, I thought you were just referring to some code of your own. Looking at the formula, and at the values (50,32,4) yours will of course produce a 400MHz clock, assuming there's no parameter-out-of-range in some in-the-middle calculation.


Also:

localparam string gowin_phase[0:15] ='{"0000","0001","0010",...,"1111"};
localparam  phase_set = (DDR3_WDQ_PHASE * 16 / 360);

and
defparam rpll_inst.PSDA_SEL = gowin_phase_string[phase_set[3:0]];

Only that you might need to remove the 'string' in the first localparam as this is the same hard-handed manipulation I has to use to make Altera's PLL work as they have some input parameters which arent true integers or strings.

You will need to test everything in the first simple PLL testbench.

Yep, the "joys" of stringification are heretofore unknown to me. I'm sure it'll be fun.

If you like, you may find the common remainder of the CLK_IN_DIV & CLK_IN_MULT to divide both by that number if you like.  I always prefer having the CLK_IN_DIV set to at least 2 as this makes a true 50:50 source reference as the rise and fall times of your source crystal oscillator are now ignored.  You are now relying on the rise-to-next-rise of the crystal feeding your PLL source.

That makes sense.

Looks like my modelsim is a little out of date as well - I can rectify that.

Looking at the:
defparam rpll_inst.FCLKIN = "50";

Notice that the '50' is in quotes.
It may be another one of those BS fake string things.
We may need to strong-arm a phony string for that one too.

(If this is it, it isn't too bad.  Not until you take a look at LATTICE, they actually rely on some figures written in their 'COMMENTS', yes, inside their /* xxx=xxxMHz */ to make their PLL primitive design compile properly.  WTF?  How.  What kind of backwater design is this?)

Is there really no function in verilog to accept an integer and convert it to a string format so it can be passed to a parameter ? Seems like a great potential extension to $sformat(), or some compile-time equivalent if it doesn't do it now.

And yes, that is nuts from Lattice. They also don't let you run Modelsim on a PC, serving the display to the nice big 3-monitor Mac using Microsoft Remote Desktop. They're just fine with Linux serving the display over X11 though...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #194 on: August 29, 2022, 12:00:14 am »

Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

Speaking of which:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.
The first and third are fairly obvious, the second seems to scale the wrong way. 50 x 32 = 1600. Isn't the DDR3 clock running at 400MHZ, so with DDR it is 800MT/s, so 1600 is x2 not /2 ? Maybe just perspective, and probably doesn't matter as long as it's consistent :)
 
:palm:  Is this what you were asking, FOUT = CLK_KHZ_IN  * CLK_IN_MULT / CLK_IN_DIV ?
50 * 32 / 4 = 400 MHz DDR3 CK.

So, if you want a 25Mhz source, just change the CLK_IN_DIV to 2 and you will get the same result.
« Last Edit: August 29, 2022, 12:05:21 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #195 on: August 29, 2022, 12:04:13 am »
Is there really no function in verilog to accept an integer and convert it to a string format so it can be passed to a parameter ? Seems like a great potential extension to $sformat(), or some compile-time equivalent if it doesn't do it now.

My quote:
Quote
Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #196 on: August 29, 2022, 12:05:11 am »
Only got as far as the first two lines. I'm going to stop talking now, it's just getting embarrassing.

Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)

 :-DD
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #197 on: August 30, 2022, 01:56:18 am »
This `ifdef shouldn't be necessary: Gowin/gowin_ddr_clocking.sv#L212

Did you try renaming the word 'string' to 'int' on this line: Gowin/gowin_ddr_clocking.sv#L65

This is what I had to do for Altera's PLL: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L193

And right after, I had to: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L278

Then I sent the 'DDR3_WDQ_PHASE_pss' into the PLL's parameters.
(Obviously, you would choose your own names._

The key factors in making both ModelSim and Quartus Prime happy when they compile, is that my dummy string array = '{"xxxx"} was a 'localparam int', and I made a new 'localparam param_to_be_sent' = dummy_string_array[sel#];.

If I did not go through this, even though ModelSim would accept the numbers for simulation, Quartus Prime would compile away, not saying a thing accepting a default "0" for that parameter.

------
Note, for generating your: Gowin/gowin_ddr_clocking.sv#L191 as a string, just copy & rename my int Altera_Dummy_String, except, trim it to 0-255 as this will probably be the frequency range input.

I have a feeling the "xxx' for the Gowin's frequency input may be because they might accept fractions, like "14.31818".  This might make things messy.  Not all FPGA compilers can accommodate 'real' floating point numbers.  Anyways, you will need to compile in Gowin a simple PLL clock to see if everything works out OK.
« Last Edit: August 30, 2022, 02:04:48 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #198 on: August 30, 2022, 05:34:21 am »
This `ifdef shouldn't be necessary: Gowin/gowin_ddr_clocking.sv#L212

Did you try renaming the word 'string' to 'int' on this line: Gowin/gowin_ddr_clocking.sv#L65

This is what I had to do for Altera's PLL: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L193

And right after, I had to: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L278

Then I sent the 'DDR3_WDQ_PHASE_pss' into the PLL's parameters.
(Obviously, you would choose your own names._

The key factors in making both ModelSim and Quartus Prime happy when they compile, is that my dummy string array = '{"xxxx"} was a 'localparam int', and I made a new 'localparam param_to_be_sent' = dummy_string_array[sel#];.

If I did not go through this, even though ModelSim would accept the numbers for simulation, Quartus Prime would compile away, not saying a thing accepting a default "0" for that parameter.

Worked like a charm  :-+ - no synthesis errors and modelsim is happy too.

[edit] And having said that, and posted it, I immediately noticed that the write-clock duty-cycle isn't working, it's always falling at 0 now, not at (phase + 8/16). I'll look at that...

------
Note, for generating your: Gowin/gowin_ddr_clocking.sv#L191 as a string, just copy & rename my int Altera_Dummy_String, except, trim it to 0-255 as this will probably be the frequency range input.

I have a feeling the "xxx' for the Gowin's frequency input may be because they might accept fractions, like "14.31818".  This might make things messy.  Not all FPGA compilers can accommodate 'real' floating point numbers.  Anyways, you will need to compile in Gowin a simple PLL clock to see if everything works out OK.

I'm not actually having any problems with the input frequency for the clock, both simulation and synthesis seem quite happy with the expression as-is. I did back-port the gowin_ddr_clockings changes to the PLL test project and compare the Modelsim output for the Altera PLL with the Gowin one, and they both match up (see attached example).

The timing report in the Gowin synthesis seems to indicate that it understood the ask for the PLL outputs - again, see attached. I think this is a good indicator it's doing "the right thing".
« Last Edit: August 30, 2022, 05:37:39 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #199 on: August 30, 2022, 05:54:41 am »
Umm, shouldn't the 133MHz be 100MHz?
Maybe a typo in the divider.
Or if it's another output, maybe set it to 100MHz and place it on the first PLL reserving the write clock to PLL2.
(This is a plus in the future as you may use the 'delay' output primitive for the write data removing the need for the second PLL all together.  Though, get my DDR3 working firstly as intended.)

Also for the duty cycle, why not just hard write it in the parameter line just as if it came from the rPLL GUI generator?

Next, take a look at IO buffer primitives, then, the DDR primitives to drive those buffers.

DDR3_CK needs a differential LVDS output.  (May also be 2 output pin buffers, but this is lower quality.)
DDR3_DQS needs a differential LVDS bidirectional.  (May also be 2 bidir pin buffers, but this is lower quality.)
« Last Edit: August 30, 2022, 05:58:35 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #200 on: August 30, 2022, 06:09:30 am »
That's better. Fixed the phase. I'll upload the changes in the morning (to the quick-test) and copy gowin_ddr_clocking.sv over to the full repo after I've tested it out there too. It ought to just drop in, but ... famous last words...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #201 on: August 30, 2022, 06:22:36 am »
Umm, shouldn't the 133MHz be 100MHz?
Maybe a typo in the divider.

Yeah, they worried me too at first, but it's just the CLKOUTD3 - fixed division of the CLKOUT by 3. Notice there's 8 clocks listed there, and I only want 5 of them - I burn a zero-phase-offset 400MHz output (because I get 2 of them) and I get two 133's "for free" as the /3 of the 400's.

Or if it's another output, maybe set it to 100MHz and place it on the first PLL reserving the write clock to PLL2.
(This is a plus in the future as you may use the 'delay' output primitive for the write data removing the need for the second PLL all together.  Though, get my DDR3 working firstly as intended.)

Sadly, there's no configuration for that output - it's /3 and that's your lot.


Also for the duty cycle, why not just hard write it in the parameter line just as if it came from the rPLL GUI generator?
There's a comment in the code about the write-clock being 90 or 270 degrees out of phase - maybe it's always 270 ?, anyway, it doesn't take much, the table is only 16 entries long, so I just created another 16-entries table and offset it by the phase.

Next, take a look at IO buffer primitives, then, the DDR primitives to drive those buffers.

DDR3_CK needs a differential LVDS output.  (May also be 2 output pin buffers, but this is lower quality.)
DDR3_DQS needs a differential LVDS bidirectional.  (May also be 2 bidir pin buffers, but this is lower quality.)

Yep, I've been looking at that a little. There's not too much to configure for DDR - I don't see any Differential DDR for example, the tool offers as below. I've done (this is all pre-testing) the command bus signals, which are relatively straightforward, and I implemented the n{CK} as 2 output buffers each with complementary drive. I'm not sure I have much in the way of options here...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #202 on: August 30, 2022, 06:28:20 am »
The DDR doesn't drive the PIN, it is the IO buffers which drive the pins.
The DDR <-> IO buffers <-> pin.
Read your Gowin documentation and you will understand the DDR primitive outputs, where they are tied to.
All the different type of IO buffers are right at the beginning of the primitives.

(I know Altera stuffed these 2 into a single primitive, Gowin does not...)
« Last Edit: August 30, 2022, 06:40:32 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #203 on: August 30, 2022, 06:39:51 am »
I see, so chain the DDR output through an TLVDS_OBUF (or ELVDS_OBUF) for outputs, and *_TBUF for bidirectional signals. Ok, that makes sense, I’ll try integrating that into what I have.

Anyway (yawn) I’m done for the day, I’m up in 6 hours…
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #204 on: August 31, 2022, 02:39:47 am »
So here's a question...

As given, the DDR input/output construct from Gowin doesn't have any obvious built in way to set a different clock on the input and output routes, like the Altera one seems to have.

Code: [Select]
`timescale 100 ps/100 ps
module Gowin_DDR (
  din,
  tx,
  clk,
  io,
  q
)
;
input [1:0] din;
input [0:0] tx;
input clk;
inout [0:0] io;
output [1:0] q;
wire [0:0] iobuf_o;
wire [0:0] ddr_inst_q0;
wire [0:0] ddr_inst_q1;
wire VCC;
wire GND;
  IOBUF \ddr_gen[0].iobuf_inst  (
    .O(iobuf_o[0]),
    .IO(io[0]),
    .I(ddr_inst_q0[0]),
    .OEN(ddr_inst_q1[0])
);
  ODDR \ddrx1_gen[0].oddr_inst  (
    .Q0(ddr_inst_q0[0]),
    .Q1(ddr_inst_q1[0]),
    .D0(din[0]),
    .D1(din[1]),
    .TX(tx[0]),
    .CLK(clk)
);
  IDDR \ddrx1_gen[0].iddr_inst  (
    .Q0(q[0]),
    .Q1(q[1]),
    .D(iobuf_o[0]),
    .CLK(clk)
);
  VCC VCC_cZ (
    .V(VCC)
);
  GND GND_cZ (
    .G(GND)
);
  GSR GSR (
    .GSRI(VCC)
);
endmodule /* Gowin_DDR */


I could, of course, pass in the 2 clocks and just wire up the IDDR to the read-clock and the ODDR to the write-clock. The obvious problem is that the direction control OEN on the IOBUF, which is linked to Q1 on the ODDR, in turn linked through to TX on the input is going to always be in the phase of the write-clock

The question is: does this matter ?

Looking at the signalling for DDR3, it seems there's a whole bunch of cycles before DQ is read/written or DQS* are asserted, and successive READ and WRITE ops will presumably need that preamble, so if the TX is set during that preamble, and held for the duration of the operation, I'll be fine just wiring up the data to the correct clocks, I think.

Of course, I could be wrong about that, and it's also possible that TX is controlled per-cycle, hence the question...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #205 on: August 31, 2022, 02:58:21 am »
Don't worry about the details yet.

Make a bit width sizable ddr bidir with 2 clocks, 1 in, 1 out.
The DDR output should take in the OE in parallel with it's data and posedge clock out.
The read side does nothing but DDR read the t pin buffer.

We will worry about the 'half-clock' delayed OE feature I use with Altera later.
(I use it to half clock early turn on the OE and half clock late turn it off, expanding the OE's timing for severe overclocking performance.  I'm sure we can adapt Gowin, or just use the parallel OE without my expansion feature.)

All DDR buffers run exclusively on positive edge clocked logic.  Otherwise, what it the point of the DDR primitive?  Otherwise I could have done half my buffers in posedge clk and the other half in negedge clock and never use any DDR primitives on any FPGA and the world would be perfect.


If you are not sure how the Altera ddr buffer works, you can add the port IO traces to the sim and watch, or go to Altera's data sheet I posted a few threads ago.
« Last Edit: August 31, 2022, 03:01:33 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #206 on: August 31, 2022, 03:03:11 am »
Remember, Gowin's DDR out has an input for TX enable and passes that signal to the OEn for the tristate buffer.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #207 on: September 01, 2022, 12:04:22 am »
So I figured out what I thought I was going to do (it's easier to draw for 2-bits rather than N*2-bits)...

Code: [Select]
     ┌─────────────┐      ┌───────────┐      ┌─────────────┐         
     │    Module   │      │   IDDR    │      │ TLVDS_IOBUF │         
     │             │      │           │      │             │         
     │             │      │           │      │             │         
────▶│    rdClk    │─────▶│ CLK       │      │             │         
     │             │      │           │      │             │         
     │             │      │    D [DDR]│◀─────┤ O [DDR]     │         
     │             │      │           │      │             │         
◀────┤rdData[1:0]  │ ◀────┤Q[1:0]     │      │             │         
     │             │      │(SDR)      │      │             │         
     │             │      └───────────┘      │             │  IO+   
     │             │                         │             ├────▶   
     │             │      ┌───────────┐      │             │         
     │             │      │   ODDR    │      │             │         
     │             │      │           │      │             ├────▶   
────▶│    wrClk    │─────▶│ CLK       │      │             │  IO-   
     │             │      │           │      │             │         
────▶│   txEnable  │─────▶│ TX        │      │             │         
     │             │      │           │      │             │         
     │             │      │   Q1 [DDR]├─────▶│OEN (DDR)    │         
     │             │      │           │      │             │         
     │             │      │   Q0 [DDR]├─────▶│I [DDR]      │         
     │             │      │           │      │             │         
────▶│wrData[1:0]  │─────▶│Q[1:0]     │      │             │         
     │             │      │(SDR)      │      │             │         
     └─────────────┘      └───────────┘      └─────────────┘         

... and then for N*2 bits, the last 2 columns are replicated by N with the data routed appropriately.

And then I looked closer at the DQS signaling inside the generate(..) and it looks as though you never actually read the signal from the DDR3 when it's in read-mode anyway :) 

Code: [Select]
         ...
        .inclock         (DDR_CLK_RDQ),            .outclock        (DDR_CLK),
        .dout            ({RDQS_ph[x],RDQS_pl[x]}),.din             ( 2'b10 ),   
        .pad_io         (DDR3_DQS_p[x]),          .pad_io_b        (DDR3_DQS_n[x]),              .oe         (OE_DQS[x]),
         ...

So, clock pins {_p,_n} are write-only, DQS pins {_p,_n} are really write-only (the read port outputs 2'b10). The other pins on the chip can be DDR, but they're not differential.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #208 on: September 01, 2022, 01:01:10 am »

And then I looked closer at the DQS signaling inside the generate(..) and it looks as though you never actually read the signal from the DDR3 when it's in read-mode anyway :) 


Ohhh, yes I do...

#1) Differential LVDS IO pin version, MAX IO : BrianHG_DDR3_IO_PORT_ALTERA.sv#L278

#2) Emulated LVDS IO pin version, old Cyclone IO : BrianHG_DDR3_IO_PORT_ALTERA.sv#L390

#3) Reading in the DQ data to my primary shift register : BrianHG_DDR3_IO_PORT_ALTERA.sv#L507

#4) Reading in the DQS status to my primary shift register : BrianHG_DDR3_IO_PORT_ALTERA.sv#L511

#5) AND HERE IS THE KICKER -> Qualifying the beginning of the read location by reading the DQS contents : BrianHG_DDR3_IO_PORT_ALTERA.sv#L524

If you are using the differential pad, then you need to use the MAX buffer #1), if you are using emulated differential pad (more compatible), then use the Cyclone buffer #2).

(Also, my 2 bit array 'RDQ_POS' is the only 2 bit counter in this 400MHz section.  There is no other logic or tests of bits which exceed 2 bits.  Everything else is 1bit along a fixed parameter position of shift registers.)
« Last Edit: September 01, 2022, 01:11:39 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #209 on: September 01, 2022, 03:20:17 am »
Ah, ok. I had the sense of the signals the wrong way around (equating .din with read). It makes sense that to-DDR3 should be a fixed clock and from-DDR3 is read in.

Ok, well the diagram it is, then :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #210 on: September 01, 2022, 03:25:26 am »
Outclock samples the outdata and oe to transmit the outdata.
Inclock clocks the data coming in from the IO pin to the indata.

I'm assuming gowin's DDR in[0] is the low data and the in[1] if the high data.  I think in[1] is presented first, or is delayed after outputing the first in[0].  I guess we need to see a timing diagram example.
« Last Edit: September 01, 2022, 03:28:59 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #211 on: September 01, 2022, 03:29:29 am »
Yep, that's what I'm trying to convey above. Given the Gowin setup with TX,Q on ODDR being sampled using the same clock, it's harder *not* to do that.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #212 on: September 01, 2022, 03:35:07 am »
Outclock samples the outdata and oe to transmit the outdata.  (called din on the altera primitive)
Inclock clocks the data coming in from the IO pin to the indata. (called dout on the altera primitive)

I'm assuming gowin's DDR in[0] is the low data and the in[1] if the high data.  I think in[1] is presented first, or is delayed after outputing the first in[0].  I guess we need to see a timing diagram example.

I hate these confusing port naming BS!
They should have called it TX data and RX data if they really wanted to not confuse things.
« Last Edit: September 01, 2022, 03:51:25 am by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #213 on: September 01, 2022, 11:20:49 pm »
Well?  You did the PLL fast and the DDR is a fraction the complexity...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #214 on: September 02, 2022, 12:28:23 am »
It's funny. I found the PLL much easier to grok than the DDIO/LVDS stuff - I guess I've instantiated PLLs dozens of times (if not with Gowin) but never actually used DDIO before ...

I'll get back to it tomorrow evening, but tomorrow @ 9AM I have to (at least partly) present the 2023 feature-roadmap to a bunch of VPs/SVPs (most of whom want to go in different directions from each other, which will make the meeting ... interesting) It's been a busy week of finalizing cross-functional support/dependencies and horse-trading features, loaning out engineers if necessary, etc. etc.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #215 on: September 04, 2022, 02:47:33 am »
So, some progress - see "offset-signals" below. Altera signals in green, gowin equivalents in cyan directly underneath.

It's certainly not working yet. The Gowin primitives seem to add a couple of clocks of latency between input and output, which is expected behavior from the GPIO user guide - extract below as gowin-oddr-timing.

There's also issues with polarity, and I'm not completely sure I've tracked all those down yet. The gowin OE signal (they call it TX) is output-active on tx=1'b0, not 1'b1. The LVDS channels seems to be inverted as well, but that might be a corollary of the timing being off.

In any event, the simulation isn't returning data yet, but it's enough for today. Code is checked in.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #216 on: September 04, 2022, 03:01:45 am »
Phase and delay clock offsets don't matter.
This is why I pass absolutely everything through the DDR when feeding the DDR3 ram chips.  If the command is delayed with the clocks, then the data going out will match the delay as well.  Your biggest issue should only be getting the [0:1]  input orientation correct so that the commands out look correct.  (The orientation for every DDR out should match.  The read input may have this orientation backwards.)

There is a calibration clock cycle offset for reading in the data, but let's worry about that one second.
Your first goal is to feed the first MRS commands at startup.

Then we see if the read will need a + or - 1 cycle tuning.  My code has room for - 2 clocks, + >4 clocks levels of adjustment.
What we will do here is add a localparam Gowen_read_correction = x  and add it to my set offset when the right Gowin FPGA is set for the vendor parameter.

There is one other adjustment regarding the OE cycle delay.  I have once again tuning controls for this, but, I would like to try to replicate my 'half' clock cycle I'm using with Altera, but is is not mandatory,  You will need to check with Gowin's TX command and see if it's output can be shifted to activate on the rise or fall of reference CLK in's output time slot.  Altera has a parameter for their DDR to choose this optional delay.  Note that all commands going into the DDR primitive should always be active on the rising clock.
« Last Edit: September 04, 2022, 03:39:20 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #217 on: September 04, 2022, 03:23:10 am »
Note that because of the read clock calibration, my code can get stuck in the tuning power-up sequence stage, but first let's make the commands output look correct.

Second, Gowin's txclk_pol parameter appears to perform the 'half-shift' operation for the OE which I was describing above.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #218 on: September 04, 2022, 07:35:48 pm »
Okay, so everything seems to start off ok, we get the following command sequence from startup, see 'MRS2 signals' for an example...

Code: [Select]
MRS, BA=2: Write MR2 to 16’h0000
- A[10:9] = 2’b0 Dynamic ODT off
- A[7] = 1’b0 Normal operating temperature range
- A[6] = 1’b0 Manual Self-refresh reference
- A[5:3] = 3’b0 CAS write latency of 5 (>= 2.5ns)
- A[2:0] = 3’b0 Full array self-refresh

MRS, BA=3: Write MR3 to 16’h0000
- A[2] =1’b0 MPR = normal operation
- A[1:0] = 2’b0 (ignored anyway)

MRS, BA=1: Write MR1 to 16’h0044
- A[12] = 1’b0 Output buffer enables
- A[11] = 1’b0 TDQS disabled
- A[9,6,2] = 3’b011 Rtt_Nom = RZQ/6
- A[7] = 1’b0 Write leveling disabled
- A[5,1] = 2’b0 (reserved for RZQ/6)
- A[0] = 1’b0 DLL enabled

MRS,  BA=0: Write MR0 to 16’h1720
- A[12] = 1’b1 DLL control for Precharge, fast-exit (DLL on)
- A[11:9] = 3’b011 Write recovery for autoprecharge = 7 cycles
- A[8] = 1’b1 DLL Reset = Yes
- A[7] = 1’b0 Test mode disabled
- A[6:4,2] = 4’b0100 CAS latency = 6
- A[3] = 1’b0 Nibble sequential read burst type
- A[1:0] = 2’b00 Burst length 8 (fixed)

ZQC
-A[10] = 1’b1 Do a long ZQC calibration

MRS, BA=3: Write MR3 to 16’h0004
- A[2] = 1’b1 Dataflow from MPR
- A[1:0] = 2’b00 Predefined system timing calibration pattern

 REA, BA=0, A=16’h1000
- A[12] = 1’b1 user regular burst length of 8
But the last command above (read) hits a problem. Signals attached ('REA command signaling'). Looks like DQS/DQ just aren't doing anything in the Gowin world. Could be I've just wired it wrong, but I'll have a look later-on today / this evening.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #219 on: September 04, 2022, 08:15:50 pm »
@SpacedCowboy, it is no longer of any use to have side-by-side Gowin vs Altera comparisons.
Make the system Gowin only so that it matches what my Altera only sims looks like.
(Other than the 2 additional pipeline delay of Gowin's DDR buffer.  :( 2 clocks... I though maybe 1 or 1/2, but a full 2 clock delay.)
The default regs I have on the waveform display should tell you 99% of what you need to know.

Once this is organized, we need to see why Micron's DDR3 data output during the read-cal test pattern [FFFF,0000,FFFF,0000,FFFF,0000] isn't getting through Gowin's IO buffer read input.

The best way is to run 2x Modelsim, 1 with my original Altera, the other with your build.
Align the display and zoom on both and inspect the read buffer results which are included in my default display.

Remember, 2 clock means the entire read pattern is completely missed.  And, if Gowin's DDR input primitive also adds another 2 clock delays, then we need to add 4 clocks to my DDR3 controllers read reference window.  We can compare this with the 2 different snapshots once you cleaned up your current setup.
« Last Edit: September 04, 2022, 08:33:05 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #220 on: September 05, 2022, 12:08:29 am »
@SpacedCowboy, it is no longer of any use to have side-by-side Gowin vs Altera comparisons.
Make the system Gowin only so that it matches what my Altera only sims looks like.

(Other than the 2 additional pipeline delay of Gowin's DDR buffer.  :( 2 clocks... I though maybe 1 or 1/2, but a full 2 clock delay.)
The default regs I have on the waveform display should tell you 99% of what you need to know.

Once this is organized, we need to see why Micron's DDR3 data output during the read-cal test pattern [FFFF,0000,FFFF,0000,FFFF,0000] isn't getting through Gowin's IO buffer read input.

Yes, on further examination, at least some (maybe most) of the problem is that I'm trying to mimic the Altera signals for comparison, but those mimicked signals don't route into subsequent logic. I'm separating stuff out as you suggest. For the time being I'll make a similar module to BrianHG_DDR3_IO_PORT_ALTERA.sv (probably called DDR3_IO_PORT_GOWIN.sv) and put the Gowin code in there. There is a little bit of probably-duplicated code at the end, but maybe that's extractable as a module in its own right.

The best way is to run 2x Modelsim, 1 with my original Altera, the other with your build.
Align the display and zoom on both and inspect the read buffer results which are included in my default display.

Yep, I have that setup now. Large monitors are great :)

Remember, 2 clock means the entire read pattern is completely missed.  And, if Gowin's DDR input primitive also adds another 2 clock delays, then we need to add 4 clocks to my DDR3 controllers read reference window.  We can compare this with the 2 different snapshots once you cleaned up your current setup.

Noted.

I think I hadn't quite appreciated the logic for DQS read, though - my original idea was that it looked like this

Code: [Select]

 rDQS[1:0]▲        rClk│                              2'b10 │    TX│  wClk│     
          │            ▼                                    ▼      ▼      ▼     
       ┌───────────────────┐                             ┌───────────────────┐   
       │       IDDR        │                             │       ODDR        │   
       └───────────────────┘                             └───────────────────┘   
                ▲                                    OEN(DDR)│          │I(DDR) 
                │O(DDR)                                      ▼          ▼       
       ┌────────┴────────────────────────────────────────────────────────────┐   
       │                             TLVDS_IOBUF                             │   
       └─────────────────────────────────────────────────────────────────────┘   
                                      IO+│   │IO-                               
                                         │   │                                   
                                         ▼   ▼                                   

But in fact, from reading the verilog, you want to preserve 4 signals for the captured (rDQS above) DQS from the DDR3 - it wants _p and _n for both bits of the DDR signal (l and h). The code uses the _p and _n signals to determine a zero-state, so the layout really wants to look like...

Code: [Select]
                                                                                 
rDQS_n[1:0]▲        rClk│    rDQS_p[1:0]▲    rClk│     2'b10 │    TX│  wClk│     
           │            ▼               │        ▼           ▼      ▼      ▼     
        ┌───────────────────┐    ┌───────────────────┐    ┌───────────────────┐   
        │       IDDR        │    │       IDDR        │    │       ODDR        │   
        └───────────────────┘    └───────────────────┘    └───────────────────┘   
                 ▲                        ▲           OEN(DDR)│          │I(DDR) 
                 │IO-                     │IO+                ▼          ▼       
                 │                        │               ┌───────────────────┐   
                 │                        │               │    TLVDS_OBUF     │   
                 │                        │               └───────────────────┘   
                 │                        │                   IO+│   │IO-         
                 │                        └──────────────────────┤   │           
                 │                                               ▼   │           
                 └───────────────────────────────────────────────────┤           
                                                                     ▼           

I'm hoping there won't be any problems connecting an IDDR to the same pad as an LVDS_OBUF - there's only one driver of the signals, but the IDDR primitive says it needs to connect to an IBUF 'I' (not 'IO/IOB') port. I'm not sure if I'll have "used up" my IBUF for that pad by instantiating the IOBUF, next up is to see if that's the case...
 
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #221 on: September 05, 2022, 12:26:28 am »

Code: [Select]

 rDQS[1:0]▲        rClk│                              2'b10 │    TX│  wClk│     
          │            ▼                                    ▼      ▼      ▼     
       ┌───────────────────┐                             ┌───────────────────┐   
       │       IDDR        │                             │       ODDR        │   
       └───────────────────┘                             └───────────────────┘   
                ▲                                    OEN(DDR)│          │I(DDR) 
                │O(DDR)                                      ▼          ▼       
       ┌────────┴────────────────────────────────────────────────────────────┐   
       │                             TLVDS_IOBUF                             │   
       └─────────────────────────────────────────────────────────────────────┘   
                                      IO+│   │IO-                               
                                         │   │                                   
                                         ▼   ▼                                   

But in fact, from reading the verilog, you want to preserve 4 signals for the captured (rDQS above) DQS from the DDR3 - it wants _p and _n for both bits of the DDR signal (l and h). The code uses the _p and _n signals to determine a zero-state, so the layout really wants to look like...
Your above illustration here is the proper differential one.  My code actually ignores the _n in the read of the DQS.  However, you should 'assign _n = !_p' variant of the read signal so if I did, that logic will be properly pruned out of my design yet still function.

Quote
Code: [Select]
                                                                                 
rDQS_n[1:0]▲        rClk│    rDQS_p[1:0]▲    rClk│     2'b10 │    TX│  wClk│     
           │            ▼               │        ▼           ▼      ▼      ▼     
        ┌───────────────────┐    ┌───────────────────┐    ┌───────────────────┐   
        │       IDDR        │    │       IDDR        │    │       ODDR        │   
        └───────────────────┘    └───────────────────┘    └───────────────────┘   
                 ▲                        ▲           OEN(DDR)│          │I(DDR) 
                 │IO-                     │IO+                ▼          ▼       
                 │                        │               ┌───────────────────┐   
                 │                        │               │    TLVDS_OBUF     │   
                 │                        │               └───────────────────┘   
                 │                        │                   IO+│   │IO-         
                 │                        └──────────────────────┤   │           
                 │                                               ▼   │           
                 └───────────────────────────────────────────────────┤           
                                                                     ▼           

I'm hoping there won't be any problems connecting an IDDR to the same pad as an LVDS_OBUF - there's only one driver of the signals, but the IDDR primitive says it needs to connect to an IBUF 'I' (not 'IO/IOB') port. I'm not sure if I'll have "used up" my IBUF for that pad by instantiating the IOBUF, next up is to see if that's the case...

For the DQS, no, do not use this illustration.  This use of the IO buffer will bypass the differential input receiver which has improved noise and level immunity.

Also, for the DQ, please try to use the IO buffer's proper read back port.
« Last Edit: September 05, 2022, 12:30:15 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #222 on: September 05, 2022, 01:01:03 am »
I still think it would have been easier for you to make your:

My_Gowin_IO_DDR_buffer
&
My_Gowin_IO_DDR_buffer_differential

With a parameter of bits_width and shove it into my IO port code under another selection of FPGA type besides the MAX_10 fpga and cyclone type fpga.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #223 on: September 05, 2022, 01:17:30 am »
Take a look at my DQS DDR IO primitive here:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/c901baa0c41ae46389940ae729cc772c8d40a8f1/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L282

See, I do not use the _n which I have here:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/c901baa0c41ae46389940ae729cc772c8d40a8f1/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L393

The RDQS_nh[] and RDQS_nl[] are dummy placeholders for the old Cyclone 'software differential' style buffer since it cannot use true differential like the first MAX10 DDR primitive in this bidir case.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #224 on: September 05, 2022, 01:45:49 am »
Good. The original sketch seemed "neater" to me anyway.

You're probably right about the approach - but Gowin IDDR/ODDR don't take vector arguments as the Altera ones seem to do. I could probably have bunched them up and used generate though, and kept it a little more generic. I'm generally of the school of thought "get it working" before "get it working well" and since there's a lot here that's new, keeping it simple where I can is "a good idea", at least to start off with. I think your abilities in the FPGA realm are orders of magnitude better than my own, so it's not surprising that we might tackle things differently :)

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #225 on: September 05, 2022, 01:51:45 am »
Good. The original sketch seemed "neater" to me anyway.

You're probably right about the approach - but Gowin IDDR/ODDR don't take vector arguments as the Altera ones seem to do. I could probably have bunched them up and used generate though, and kept it a little more generic. I'm generally of the school of thought "get it working" before "get it working well" and since there's a lot here that's new, keeping it simple where I can is "a good idea", at least to start off with. I think your abilities in the FPGA realm are orders of magnitude better than my own, so it's not surprising that we might tackle things differently :)
That's the way.
Even I use the generate for the DQ to gain access to separate OEs per bit on the Cyclone DDR primitive.  The MAX10 primitives already use 1 OE per data bit.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #226 on: September 05, 2022, 07:12:09 pm »
Well, I'll go back and clean it up later - for the time being it actually fits reasonably well into the FPGA_VENDOR generate in BrianHG_DDR3_PHY_SEQ_v16.sv with a 'Gowin' section.

Anyways the code is in, and up if not actually running yet - we don't get past calibration (the simulator goes into an infinite loop - the first time I let it go for a while, thinking it was that macro in the testbench, before realization dawned...)

So the compared traces are below for the REA command after MSRs and ZQC. Warning: Image is a bit big... Gowin on top, Altera below, though the red traces ought to make that fairly clear :)

The DDR3_DQ offset seems pretty clear. FWIW, I tried adding the "shift-output-by-half-a-clock" defparam to both DQ and DQS...

Code: [Select]
            ...
            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // ODDR -> IOBUF
                .Q1(gowin_dq_tx_out),               // OE   -> IOBUF, 1'b0 => output
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(PIN_OE_WDQ_wide[x]),            // Input 'output enable' 1=out
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            // sync to negedge instead of posedge, 1/2 clock earlier output
            defparam gowin_dq_oddr_inst.TXCLK_POL   = 1'b1;
            ...
... but I didn't see any actual different traces in the sim, the outputs didn't shift across that I could see...

Then we see if the read will need a + or - 1 cycle tuning.  My code has room for - 2 clocks, + >4 clocks levels of adjustment.
What we will do here is add a localparam Gowen_read_correction = x  and add it to my set offset when the right Gowin FPGA is set for the vendor parameter.

Is this the CMD_ADD_DLY and/or RDQ_SYNC_CHAIN parameters in BrianHG_DDR3_IO_PORT_ALTERA.sv ? Or is it somewhere else ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #227 on: September 05, 2022, 07:18:54 pm »
It appear you have a bug with you DQ output enable.
It is stuck with the output 'ON'.

This is why you see the DQ lines always green when = 16'h0000, but when the DDR3 model tries to output a 16'FFFF, you see a red 16'hXXXX as there is a bus conflict.

This is why my Altera show the blue 16'hZZZZ on the DQ before and after the DDR3's model outputs the read data pattern.

Check for the output enable logic first.
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #228 on: September 05, 2022, 08:03:17 pm »
Yep, good call - it was a polarity on the gowin TX (1'b0 == enable-output for gowin)

Ok, so DDR3_DQ are now being driven pretty much the same for both Gowin and Altera :) RDQ_h seems to be 2 clocks delayed, RDQ_l and RDQS_p{h,l} are a clock delayed. I'm not yet sure why RDQ_h ought to be different to RDQ_l - it's linked to the IDDR instantiation just the same. Perhaps this is the TXCLK_POL=1'b1 effect...

Code: [Select]
    for (x=0; x<DQ_WIDTH; x = x + 1)
        begin : gowin_DQ_bus

            wire gowin_dq_out;
            wire gowin_dq_in;
            wire gowin_dq_tx_out;

            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // ODDR -> IOBUF
                .Q1(gowin_dq_tx_out),               // OE   -> IOBUF, 1'b0 => output
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(~PIN_OE_WDQ_wide[x]),           // Input 'output enable' 1'b0=out
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            // sync to negedge instead of posedge, 1/2 clock earlier output
            defparam gowin_dq_oddr_inst.TXCLK_POL   = 1'b1;

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_h[x]),                      // SDR to app #0
                .Q1(RDQ_l[x]),     // SDR to app #1
                .D(gowin_dq_in),                    // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );

            IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(gowin_dq_tx_out)               // input when 1'b1
                );

        end

Got to run to make lunch for the kids, so I'll get back to it later.

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #229 on: September 05, 2022, 08:25:19 pm »
Ok, so DDR3_DQ are now being driven pretty much the same for both Gowin and Altera :) RDQ_h seems to be 2 clocks delayed, RDQ_l and RDQS_p{h,l} are a clock delayed. I'm not yet sure why RDQ_h ought to be different to RDQ_l - it's linked to the IDDR instantiation just the same. Perhaps this is the TXCLK_POL=1'b1 effect...

 TXCLK_POL=1'b1 appears to only be an adjustable delay for the output enable.  It has no effect until we write data to the DDR3.

What you have is a DDR_IN primitive configuration problem.
Take a look at the signals under the 'DDR3_PHY Data Path' divider.
These are the signals coming in directly from your DDR IN/OUT primitive, unprocessed.

Ok, for some reason, the blue b'zzzz isn't being fed through, this may be a natural consequence of Gowin's primitives, or, something wrong with your wiring.

Next, carefully analyze the RDQS_ph & RDQS_pl.  Except for the extended '0's, those seemed to have correctly read the data being generated by Mircon's DDR3 Model and actually match the output results of Altera's DDR in primitive.

Now, for the RDQ_h and RDQ_l output from the Gowin.  We can see that '_l' matches Altera, but, the '_h' is delayed 1 clock cycle.  Are there a set of parameters associated with Gowin's DDR_input primitive?  Maybe one to shift the read register by 180 degrees?  Doing so plus swapping the '_h' and '_l' read output should fix this problem.  If not, we will need to add a clocked DFF delay to all the '_l' reads in all your primitives, then instead of using a '2' read clock delay for my DDR3 routines, we will need to increase this read offset to a '3'.

(I'm assuming the h'xxxx errors in Gowin's primitive is their default response when reading a TBUFFER when in tristate when receiving a h'zzzz in simulation.  Or, you used a 'reg' instead of 'logic' or 'wire' in your home made DDR IO module.  In SystemVerilog, using the old style REG does not see and pass through the 'zzzz' sometimes.  The other choice is to report a bug or feature to Gowin regarding their primitive library not representing the IO pin's status.  Also, that the TLVDS IO reports '0' when the IO port is driven with b'z.  You may further inspect these signals in you module's wiring and right at Gowin's primitives to verify that this the case, or if it was your HDL code which has just fuzzed up the results.)
« Last Edit: September 05, 2022, 08:44:01 pm by BrianHG »
 
The following users thanked this post: SpacedCowboy

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #230 on: September 05, 2022, 08:36:47 pm »
After you corrected the above issues, then go to this line:
BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L495

Change to this:
Code: [Select]
always_comb  RD_WINDOW = RDATA_window[RD_POS+RDQ_SYNC_CHAIN+3+CMD_ADD_DLY+(GOWIN_ENABLE*GOWIN_READ_OFFSET)]; // The extra '+3' counts for the extra latching of input to the 'RDQ_CACHE_x[0]'
The 2 new localparams, 'GOWIN_ENABLE' and 'GOWIN_READ_OFFSET' you need to place at the top after the port definitions.

GOWIN_ENABLE should = 0 unless the FGPA_VENDOR's first letter is a "G" or "g", then it should be = 1.

For now, the 'GOWIN_READ_OFFSET' should equal 2 or 3 depending on further tests.  However, make future allowance for recognizing different 'FPGA_FAMILY' options as Gowin's different FPGA silicon may have different response times for their DDR primitives.
« Last Edit: September 05, 2022, 08:45:42 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #231 on: September 05, 2022, 11:51:53 pm »
What you have is a DDR_IN primitive configuration problem.
Take a look at the signals under the 'DDR3_PHY Data Path' divider.
These are the signals coming in directly from your DDR IN/OUT primitive, unprocessed.

Ok, for some reason, the blue b'zzzz isn't being fed through, this may be a natural consequence of Gowin's primitives, or, something wrong with your wiring.

Next, carefully analyze the RDQS_ph & RDQS_pl.  Except for the extended '0's, those seemed to have correctly read the data being generated by Mircon's DDR3 Model and actually match the output results of Altera's DDR in primitive.

Now, for the RDQ_h and RDQ_l output from the Gowin.  We can see that '_l' matches Altera, but, the '_h' is delayed 1 clock cycle.  Are there a set of parameters associated with Gowin's DDR_input primitive?  Maybe one to shift the read register by 180 degrees?  Doing so plus swapping the '_h' and '_l' read output should fix this problem.  If not, we will need to add a clocked DFF delay to all the '_l' reads in all your primitives, then instead of using a '2' read clock delay for my DDR3 routines, we will need to increase this read offset to a '3'.

(I'm assuming the h'xxxx errors in Gowin's primitive is their default response when reading a TBUFFER when in tristate when receiving a h'zzzz in simulation.  Or, you used a 'reg' instead of 'logic' or 'wire' in your home made DDR IO module.  In SystemVerilog, using the old style REG does not see and pass through the 'zzzz' sometimes.  The other choice is to report a bug or feature to Gowin regarding their primitive library not representing the IO pin's status.  Also, that the TLVDS IO reports '0' when the IO port is driven with b'z.  You may further inspect these signals in you module's wiring and right at Gowin's primitives to verify that this the case, or if it was your HDL code which has just fuzzed up the results.)

The entirety of the DQ wiring is in the 'code' section a post or 3 up, there aren't any regs, just wires, and it's a straight map from pad to port (well, via the IOBUF/IDDR). Looking at the IDDR primitive, there are no parameters other than 'init=1'b0/1'b1' for Q0,Q1 (both defaulting to 0).

What is shown though is the timing diagram (attached), and it certainly looks as though Q0 and Q1 are supposed to be in-sync, not one delayed from the other. I checked the DDR3_RD_CLK clock to see if it had somehow gone awry and we were missing a 'beat', but it's lock-synced to DDR3_CLK in both Altera and Gowin sims.

I pulled out the internal wire 'gowin_dq_in' that transfers the data from the IOBUF:O output to the IDDR:I input, and that looks just fine - it's mirroring the DDR3_DQ signal in the simulation, so I think it's definitely something in the IDDR rather than some io-buffer delay.

Delving a bit deeper, stripping out the reset-control and initialization, the verilog library primitive for an IDDR looks like:
Code: [Select]
module IDDR(output Q0, output Q1, input D, input CLK);
  reg Q0_oreg, Q1_oreg,Q0_reg, Q1_reg;

  assign Q0 = Q0_reg;
  assign Q1 = Q1_reg;

  always @(posedge CLK) begin
      Q0_oreg <= D;
      Q0_reg <= Q0_oreg;
      Q1_reg <= Q1_oreg;
  end

  always @(negedge CLK) begin
      Q1_oreg <= D;
  end
endmodule

I pulled out one of the 16 IDDR bits into the simulation waveform (they're all either '1' or '0' at the same time, so one bit is representative). The four signals above are in the image 'gowin-iddr-sim' just under the cyan 'gowin_dq_in' trace.  If we start looking (@neg-edge on DDR_CLK_RDQ) at the yellow cursor, then Q1_oreg is set to 0. One half clock later (@posedge) ...
  • Q0_oreg is set to D at the end of the clock-cycle
  • Q0_reg is set to Q0_oreg at the end of the clock-cycle
  • Q1_reg is set to Q1_oreg at the end of the clock-cycle
It looks as though that logic can only work if it gets a positive clock-edge as the first clock-edge into the IDDR. If it gets a negedge first (as we're seeing), Q0_reg will necessarily be delayed by a clock. Not sure if this is a bug in the simulation library, or an actual hardware limitation. I guess I can ask Gowin...

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #232 on: September 06, 2022, 12:02:32 am »
Ok, you will need to add your own DFF delay reg for all the '_l' to align the data output.

Remember, this is for all the reads, the DQ and DQS.

Also, please, please, please fix your waveform grid period.  Counting clocks from in to out has messed up and I do not know how to calculate the proper read window offset without clear visuals.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #233 on: September 06, 2022, 12:10:26 am »
You can also try reset / powerup the read clock to 180deg and swap the high and low to fix the problem.
This would save the extra delay reg.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #234 on: September 06, 2022, 05:02:59 pm »
Okay, a bit stumped at the moment.

I've fixed up RDQ_{h,l} so they're being changed at the same time. I just put the DFF in place for the time being, I'll maybe play with clock-retiming later. So the trace below shows them changing correctly at the cursor - as before the Gowin one is on top, Altera below.

I experimented with the GOWIN_ENABLE/GOWIN_READ_OFFSET parameter, and matching the RD_WINDOW signal to be one clock after RDQ_{l,h} are available seems to require a GOWIN_READ_OFFSET of 3, so that's what's being shown in the screenshot, with the GOWIN_READ_OFFSET parameter being logged as the 2nd-to-top yellow trace.

It still wasn't reading correctly, so I logged RDQ_POS, and that seemed to be starting its incrementing one clock earlier, after RD_WINDOW goes high, than for the Altera design.

So I logged the RDQ_CACHE_{l,h} and you can see them below. RDQ_CACHE_l is being populated one clock before RDQ_CACHE_h, so I assumed I'd missed something with the DFF I'd just inserted. I still think it must be related to that, but I don't understand how... As far as I can tell, RDQ_CACHE_{l,h} are only updated in module BrianHG_DDR3_IO_PORT_ALTERA in the following code:

Code: [Select]
always_ff @(posedge DDR_CLK_RDQ) begin
                                            RDQ_CACHE_h[0]    <= RDQ_h;                       // Shift in the input read data
                                            RDQ_CACHE_l[0]    <= RDQ_l;                       // Shift in the input read data
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQ_CACHE_h[i+1]  <= RDQ_CACHE_h[i];              // Shift the input across the cache
    for (int i=0; i<(3+RDQ_SYNC_CHAIN);i++) RDQ_CACHE_l[i+1]  <= RDQ_CACHE_l[i];              // Shift the input across the cache
    ...

So their only dependency ought to be RDQ_{l,h} and I can see both of those changing in sync with each other. So how can RDQ_CACHE_l be being changed one clock earlier than RDQ_CACHE_h ? Or vice-versa, RDQ_CACHE_h losing a clock over RDQ_CACHE_l, I guess ...

[This is all actually in the DDR3_IO_PORT_GOWIN module of course, but this logic is identical to the DDR3_IO_PORT_ALTERA module, the only changes I've made are to the DDR/LVDS instantiations (and the RD_WINDOW modification for GOWIN_ENABLE/GOWIN_READ_OFFSET).]

I thought it might be stale data from a previous run, but AFAICT the setup_phy_v16.do (which I run every time and let it call run_phy_v16.do) deletes everything in work/ as a precursor to doing anything at all... I'm sure it's something stupid I'm doing, but at least right now, I'm just not seeing it.

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #235 on: September 06, 2022, 06:41:17 pm »
You did not need to bring up the RDQ_caches.
Now I cant follow what's going on with that mess.

It's the RDATA_store which is taking the buffer 1 clock early, so it has the wrong data in the buffer chain.

Try a 4 or 5 for the Gowin delay.

Just swap the read _h and _l at the instantiation of you DDR buffer and let my auto PLL tuning auto discover the 180 degree error as it will tune the Gowin PLL through the 180deg mode anyways.  Don't forget to do the same for the read DQS.

Having your read PLL reset set to 180 just means a few less cycles during simulation startup.

 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #236 on: September 06, 2022, 07:46:41 pm »
Ok, you want to see something awesome ? Look at the attached :)

Took out the DFF's, inverted the returned RDQ_{_h,_l} and RDQS_{pl,ph}, reset GOWIN_READ_OFFSET to 3, and ran setup_phy_v16.do, which .... ran to completion.

The first few times through the training, RDQ_{h,l} are offset, with RDQ_l trailing by a clock. Then they match up as you can see in the image, and we get the training pattern back. Then the rest of the test bench runs - log attached :)

It's not entirely sweetness and light - there's a lot of tDSS violations on DQS and DQS_n bits on WRITE ops, and for some reason the test bench logs for Altera and Gowin start with different values, which might be indicative of a missed write-cycle or could be to do with the different phase of the two clocks, I haven't checked yet. This is the case right at the start all the way through to the end:
Code: [Select]
Gowin:
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 20603750.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 20603750.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 20793750.0 ps INFO: Activate  bank 0 row 0040
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 20811250.0 ps INFO: Write     bank 0 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.main: at time 20818750.0 ps INFO: Sync On Die Termination Rtt_NOM =         40 Ohm
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 20821250.0 ps INFO: Write     bank 0 col 080, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20825000.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000000 data = 4567
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps ERROR: tDSS violation on DQS   bit           0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps ERROR: tDSS violation on DQS   bit           1
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps ERROR: tDSS violation on DQS_N bit           0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps ERROR: tDSS violation on DQS_N bit           1
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000001 data = 89ab
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20827500.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000002 data = cdef

vs Altera:
Code: [Select]
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 14978750.0 ps INFO: Load Mode 3 MultiPurpose Register Select = Pre-defined pattern
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 14978750.0 ps INFO: Load Mode 3 MultiPurpose Register Enable = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 15168750.0 ps INFO: Activate  bank 0 row 0040
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 15186250.0 ps INFO: Write     bank 0 col 000, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.main: at time 15193750.0 ps INFO: Sync On Die Termination Rtt_NOM =         40 Ohm
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 15196250.0 ps INFO: Write     bank 0 col 080, auto precharge 0
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 15200000.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000000 data = 0123
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 15201250.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000001 data = 4567
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 15202500.0 ps INFO: WRITE @ DQS= bank = 0 row = 0040 col = 00000002 data = 89ab

Most of the writes work, but some don't, and return xxxx where they return data for the Altera log. There *are* some entries in the Altera log where xxxx is returned, but given the high 'row=0284' value, I'm guessing those are supposed to fail.
Code: [Select]
Gowin:
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 21083750.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 0000001f data = xxxx
...
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 21483750.0 ps INFO: READ @ DQS= bank = 7 row = 0004 col = 0000001f data = xxxx

vs Altera:
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 15458750.0 ps INFO: WRITE @ DQS= bank = 7 row = 0004 col = 0000001f data = 8888
...
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 15663750.0 ps INFO: READ @ DQS= bank = 7 row = 0004 col = 0000001f data = 8888

Anyway, this is fantastic :) Thanks for all the help, inspiration, and guidance. Hopefully clearing up the remaining things above isn't too arduous :)
« Last Edit: September 06, 2022, 07:50:45 pm by SpacedCowboy »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #237 on: September 06, 2022, 08:11:17 pm »
And just to keep it real, here's the traces of where the write-op is failing.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #238 on: September 06, 2022, 08:45:08 pm »
Note that if we read a memory address which has never been written to, Micron's DDR3 model does return to us an h'xxxx.  I have 1 such deliberate read in my testbench.

If any writes fail (including partial bits), then there will probably be multiple reads with h'xxxx in them.

It looks like we need to inspect the 'OE' for the DQS and the OE for the write data and compare it to the matched Altera reference version.  Once again, I have controls in my code to make +/-1 adjustments here, individually for the DQS and DQ oe's.

The DQS write is on a different phase than the write DQ data.
We need to compare this to the Altera's output of the same written byte.  You may just need to adjust Gowin's TX enable phase shift parameter so that the DQS gets asserted at the right time.

Please zoom the Altera and Gowin's matching write position bug with only my master recommended outputs shown in the waveform as this is all the needed debug information.
« Last Edit: September 06, 2022, 08:47:54 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #239 on: September 06, 2022, 09:11:44 pm »
*** PRO-TIP - Double click on the:

# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.data_task: at time 20826250.0 ps ERROR: tDSS violation on DQS   bit           0

and ModelSim's waveform view will center the waveform display at that point.

I've attached what I need to see from you.
You also may need to Zoom into the error if it appears to match my Altera example.

(Note that I already see your simple mistake, but let's see if you can figure it out...)
« Last Edit: September 06, 2022, 09:15:04 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #240 on: September 06, 2022, 11:26:13 pm »
So I can see a few differences (see 'compare-gowin-altera) ...

  • The Gowin DDR3_DQS_p is out of phase with the Altera one (high-low-high-... rather than low-high-low...
  • RDQS_ph (and presumably _l but it's hard to tell) is a half-clock later in the Gowin simulation
  • RDQ_{l,h} are a half-clock later too

The last two are on the read-path, and since we're seeing problems with write-ops, I doubt it's that. The test-bench started running to completion when I inverted the phase of the signals for DQS and DQ, so I'd really like to keep that unless we *must* revert it...

Ah-ha. I forgot to change around the DQS clock-values (ie: setting D0/D1) when I inverted the polarity of the input-path DQS. See DQS-generation.

(runs simulation)

Yep. That passes with flying colours!
  • The only xxxx's are the expected ones
  • No more tDSS violations
  • We start storing data with 0123, just like the Altera simulation

Log attached, but this looks pretty much perfect to me :)

Thanks again for all the help Brian :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #241 on: September 06, 2022, 11:51:39 pm »
The DDR3_DQS_p output during a write must match the DDR3_CK_p phase...
See attachment 1.

In fact, the hard wired D0/D1 inputs to the DDR_out primitive should match clock DDR3_CK's inputs.
(Your above snapshot still has it backwards, though your swap in the code should have fixed it, so I assume it's an old screenshot.)

I hope you kept everything as a positive edge clock...

Now you need to inspect the read data coming out in my second snapshot to verify a match between Altera and Gowin.  Also, inspect the 2 different 'Transcript logs' and verify that all the read and write match perfectly.  There might only be a difference in the time index location as Gowin's power-up tuning time may be different.

Continue scrolling to the right to verify a match between Gowin and Altera.

If everything is ok, clean up you code and add your necessary comments.
Remember that we may still have a bad value for the read offset as the simulator allows for large discrepancies.  I bet the true figure will be 2, not 3 once in hardware.

Next, place into your hardware and let's see what needs to be done with the .sdc file.

« Last Edit: September 07, 2022, 12:00:40 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #242 on: September 07, 2022, 01:05:39 am »
So the clock generators do match up - DDR3 clock is:
Code: [Select]
            ODDR gowin_ck_ddr_inst
                (
                .Q0(gowin_clock),       // Send clock via LVDS buffer
                .Q1(gowin_ck_OE),       // Not used but save a warning
                .D0(1'b0),              // clock goes low to start
                .D1(1'b1),              // clock goes high in 2nd phase
                .TX(1'b0),              // TX=0 -> Output pin
                .CLK(DDR_CLK)           // DDR input clock
                );
           
            TLVDS_OBUF gowin_ck_lvds_inst
                (
                .I(gowin_clock),
                .O(DDR3_CK_p[x]),
                .OB(DDR3_CK_n[x])   
                );

and DQS clock is
Code: [Select]
            ODDR gowin_dqs_oddr_inst 
                (
                .Q0(gowin_dqs_out),             // ODDR -> IVDS
                .Q1(gowin_dqs_tx),              // 1'b0 => output
                .D0(1'b0),                      // Input data [SDR]
                .D1(1'b1),                      // Input data [SDR]
                .TX(~OE_DQS[x]),                // Input 'output enable' 0=output
                .CLK(DDR_CLK)                   // DDR clock
                );

When you say "The DDR3_DQS_p output during a write must match the DDR3_CK_p phase..." - both signals are synced so they change hi->lo->hi at the same time, is that what you mean here by phase, or are you talking about timings of sequences relative to each other ?

As for reads, I think I'm matching your results here, see 'tb-reads'. It looks the same to me, and the read-values from the log-files seem to match up as far as I can tell. I probably ought to write a script to compare them, to account for human error...

I just ran the Gowin simulation without any extra delays(see 'gowin-without-delays') and it worked fine - the transcript logs are the same (modulo timestamps) and all the reads & writes produce the expected values. So maybe making other fixes has abrogated any need for that delay, and the signal timings actually look more like that of the Altera snapshot you made (regarding the 'phase' of DDR3_DQS_p and DDR3_CK_p for example).



 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #243 on: September 07, 2022, 02:26:42 am »
When you say "The DDR3_DQS_p output during a write must match the DDR3_CK_p phase..." - both signals are synced so they change hi->lo->hi at the same time, is that what you mean here by phase, or are you talking about timings of sequences relative to each other ?
Just look at my above screenshot ' hint1.png '.  The 2 clock output phases I highlighted are a perfect match.
They both go high and low at the same time.

Next, see if you can get your hardware working with my RS232 debugger reference design.
IE: you will need a PC with a LVTTL <-> RS232(USB) adapter and my RS232_debugger.exe program.

Quote
I just ran the Gowin simulation without any extra delays(see 'gowin-without-delays') and it worked fine - the transcript logs are the same (modulo timestamps) and all the reads & writes produce the expected values. So maybe making other fixes has abrogated any need for that delay, and the signal timings actually look more like that of the Altera snapshot you made (regarding the 'phase' of DDR3_DQS_p and DDR3_CK_p for example).
This is the nature of using Modelsim, my code resets the read reference in the middle until the true DQS coming from the DDR3 model is received.  So, a too early/small a value will work.  When real hardware and testing the DDR3 running at 300MHz, 400MHz, 500MHz and 600MHz, you may need that 1 or 2 extra delays, otherwise powerup initialization wont always work properly.  On the other hand, if you added 2 to the Altera simulation, it probably will never power-up as the read will begin too late.
« Last Edit: September 07, 2022, 02:35:05 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #244 on: September 07, 2022, 02:37:04 am »
Ok, so the signaling in ‘gowin-without-delays.png’ is a match for ‘hint1.png’ as far as I can see. Since the no-extra-delay code seems to work, I’m going to keep that for now.

[edit] crossed over in time there. I see, well, I’ll keep the code in, but define the delay to be zero, and when/if it needs to be changed, we can take a view then.
« Last Edit: September 07, 2022, 02:39:39 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #245 on: September 08, 2022, 07:20:55 pm »
Did Gowin compile the design?
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #246 on: September 08, 2022, 08:54:13 pm »
Haven't had a chance to do anything much more. Life getting in the way - had an MRI scan yesterday... Work is busy too, so it'll be this weekend probably before I start trying to get the DDR3 to compile on the chip.

The hardware turned up, and I can see it in the dev environment - needed to download a separate a chip-specific update for the programmer, but it appears to work now.  I got the "dev-board" to go with the SO-DIMM so I have access to easy pins to make an RS232 interface with.

The board itself is basically just the FPGA, some DDR3, and the power circuitry - which is perfect for me - no long list of buttons and switches with a paltry 10 GPIO!
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #247 on: September 09, 2022, 02:43:58 am »
You will need to verify all the DDR3 specification parameters first.

These from this source file:
PHY only <-> RS232-Debugger
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #248 on: September 09, 2022, 11:20:17 pm »
Slowly starting to make a start on writing something the Gowin IDE will like.

You will need to verify all the DDR3 specification parameters first.

These from this source file:
PHY only <-> RS232-Debugger

Yep, it's a 1Gbit, 64Mbx16 SK-hynix HSTQ1G63EFR-PBC which isn't exactly mainstream, but as long as it works...

The Gowin example code has 14 address pins mapped out, but the datasheet for the chip says it uses A0->A12. I'm assuming they mapped out the pins on the board so they could replace with a x8 or a 2Gbit chip easier. The -PBC indicates its CL6 at 800MHz and CL7 at 1066MHz. It claims to be DDR3-1600 11-11-11.

Anyway, in 'top.sv' I have
Code: [Select]
// Use 1066/187E, 1333/-15E, 1600/-125, 1866/-107, or 2133/093.
parameter string     DDR3_SPEED_GRADE        = "-125",

// Use 0,1,2,4 or 8.  (0=512mb) Caution: Must be correct as ram chip size
// affects the tRFC REFRESH period.
parameter int        DDR3_SIZE_GB            = 1,

// Use 8 or 16.  The width of each DDR3 ram chip.
parameter int        DDR3_WIDTH_DQ           = 16,

// 1, 2, or 4 for the number of DDR3 RAM chips.
parameter int        DDR3_NUM_CHIPS          = 1,

// Select the number of DDR3_CLK & DDR3_CLK# output pairs. 
// Add 1 for every DDR3 Ram chip.
// These are placed on a DDR DQ or DDR CK# IO output pins.
parameter int        DDR3_NUM_CK             = (DDR3_NUM_CHIPS),

// Use for the number of bits to address each row.
parameter int        DDR3_WIDTH_ADDR         = 13,  // A0-12

// Use for the number of bits to address each bank.
parameter int        DDR3_WIDTH_BANK         = 3,

// Use for the number of bits to address each column.
parameter int        DDR3_WIDTH_CAS          = 10,   // A0-9


The Gowin verilog compiler is complaining (see attached, just warnings) about these lines because of the initialization. Is that initialization just for simulation purposes, or do I have to worry about these ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #249 on: September 09, 2022, 11:32:55 pm »
Ahhh, Gowin doesn't allow default power-up values specified in the IO port list.
You need to do it the old fashioned way, define the ports, then externally declare that those nets are logic & their initial power-up value.
(Maybe Gowin's compiler may have a 'SystemVerilog default version 20xx' flag which may allow such declarations.)

Anyways, try re-compiling with the ' =0, ' removed.  It should still work.
Even simulating should work with the annoying red #'hxx just after power-up until the ports get their first assignment.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #250 on: September 10, 2022, 04:29:27 pm »
I'll comment them out for now  :)

Run into something else in the FIFO code, which actually looks like it might be a bug in the compiler (see attached).

Looks as though you guard against assigning si[xx] if (xx!=1) in the generate loop but the compiler is still flagging it as being written to by both continuous and procedural assigns. (Had to double the x to xx because apparently a single x in brackets is a bullet point...)

Maybe an explicit x=1 stage and a generate for x>1 ? Or if the detection is only being triggered by the variable-name, and not the variable+index, then perhaps a rename where the si[1] is 'si1', same for so...

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #251 on: September 10, 2022, 06:16:21 pm »
Funny, I don't remember my minimal PHY controller calling any of my FIFOs.

Anyways, that FIFO code is never used.  My complete 16 port 'BrianHG_DDR3_CONTROLLER_v16_top.sv' which uses the 'BrianHG_DDR3_COMMANDER_v16.sv' only calls the first 'BHG_FIFO_shifter_FWFT' fifo in that source HDL.  The others arent used at all as they were extras to test FMAX optimizations.  You should not yet be touching this stuff.

These are the only source files you need for the PHY_SEQ_only controller:
BrianHG_DDR3_PLL.sv
BrianHG_DDR3_PHY_SEQ_v16.sv
BrianHG_DDR3_CMD_SEQUENCER_v16.sv
BrianHG_DDR3_IO_PORT_ALTERA.sv
BrianHG_DDR3_GEN_tCK.sv
sync_rs232_uart.v
rs232_DEBUGGER.v
« Last Edit: September 10, 2022, 06:19:50 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #252 on: September 10, 2022, 06:20:11 pm »
Huh, ok. I'm using the base code in the RS232 debug directory - which I guess instantiates a BrianHG_DDR3_CONTROLLER_v16_top module, which in turn wants the FIFO.

Am I jumping ahead too far, here ?

[ah, ok, saw the edit. I can change things around a bit]
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #253 on: September 10, 2022, 06:30:15 pm »
Huh, ok. I'm using the base code in the RS232 debug directory - which I guess instantiates a BrianHG_DDR3_CONTROLLER_v16_top module, which in turn wants the FIFO.

Am I jumping ahead too far, here ?

[ah, ok, saw the edit. I can change things around a bit]
Wrong folder, go to the one I pointed out here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4406407/#msg4406407

The one you are pointing to uses around 5-6k logic elements and has up to 16 user read/write ports.
The '_PHY_SEQ_only_v16' one I pointed to uses only around 3k logic elements and has 1 user read/write port.
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #254 on: September 11, 2022, 05:10:08 am »
So the Gowin compiler doesn't like the code here when DDR_WIDTH_ROW = 13

The last line has the assumption that DDR_WIDTH_ROW > 13 I think. I've changed it to:

Code: [Select]
// *************************************************************************
// Output the CAS address on the DDR3 A bus.
// *************************************************************************
task SET_cas();
    begin
    OUT_A[9:0]                        <= S3_CAS[9:0] ; // Column address at the beginning of a sequential burst
    if (DDR3_WIDTH_CAS==10) OUT_A[11] <= 1'b0        ; // Default 0 for additional column address.
    else                    OUT_A[11] <= S3_CAS[10]  ; // Assign the additional MSB Column address used in 4 bit DDR3 devices.
    OUT_A[10]                         <= 1'b0        ; // Disable AUTO-PRECHARGE.  We keep the banks open and precharge manually only when needed.
    OUT_A[12]                         <= 1'b1        ; // Set burst length to BL8.
    if (DDR3_WIDTH_ROW > 13)
OUT_A[DDR3_WIDTH_ROW-1:13]    <= 0           ;
    end
endtask
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #255 on: September 11, 2022, 05:14:00 am »
So the Gowin compiler doesn't like the code here when DDR_WIDTH_ROW = 13

The last line has the assumption that DDR_WIDTH_ROW > 13 I think. I've changed it to:

Code: [Select]
// *************************************************************************
// Output the CAS address on the DDR3 A bus.
// *************************************************************************
task SET_cas();
    begin
    OUT_A[9:0]                        <= S3_CAS[9:0] ; // Column address at the beginning of a sequential burst
    if (DDR3_WIDTH_CAS==10) OUT_A[11] <= 1'b0        ; // Default 0 for additional column address.
    else                    OUT_A[11] <= S3_CAS[10]  ; // Assign the additional MSB Column address used in 4 bit DDR3 devices.
    OUT_A[10]                         <= 1'b0        ; // Disable AUTO-PRECHARGE.  We keep the banks open and precharge manually only when needed.
    OUT_A[12]                         <= 1'b1        ; // Set burst length to BL8.
    if (DDR3_WIDTH_ROW > 13)
OUT_A[DDR3_WIDTH_ROW-1:13]    <= 0           ;
    end
endtask
Try:

if (DDR3_WIDTH_ROW>13)  OUT_A[DDR3_WIDTH_ROW-1:13]    <= 0           ;

Never considered testing the smallest DDR3 ram chips.
« Last Edit: September 11, 2022, 05:18:17 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #256 on: September 11, 2022, 05:27:38 am »
Ok, is >=14 better than >13 ?

It works (well, it compiles, still in the process of incrementally adding stuff so it doesn't work yet) with either of those two. I have bigger fish to fry right now though, because it doesn't like my DQ strobes code :(

ERROR (CK0012) : Instance 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/Gowin_DQ_Strobes[0].gowin_dqs_iddr_inst' has different control net from other iologic connected to the same buffer

which is referencing:
Code: [Select]
        begin : Gowin_DQ_Strobes

            wire gowin_dqs_out;                 // Internal: ODDR->IOBUF
            wire gowin_dqs_in;                  // Internal: IOBUF->IDDR
            wire gowin_dqs_tx;                  // Internal: OE on input to ODDR

            ODDR gowin_dqs_oddr_inst 
                (
                .Q0(gowin_dqs_out),             // ODDR -> LVDS
                .Q1(gowin_dqs_tx),              // 1'b0 => output
                .D0(1'b0),                      // Input data [SDR]
                .D1(1'b1),                      // Input data [SDR]
                .TX(~OE_DQS[x]),                // Input 'output enable' 0=output
                .CLK(DDR_CLK)                   // DDR clock
                );
 
            IDDR gowin_dqs_iddr_inst 
                (
                .Q0(RDQS_pl[x]),                // SDR to app #0
                .Q1(RDQS_ph[x]),                // SDR to app #1
                .D(gowin_dqs_in),               // DDR input signal
                .CLK(DDR_CLK_RDQ) // read clock
                );

            TLVDS_IOBUF gowin_dqs_lvds_iobuf_inst
                (
                .O(gowin_dqs_in),               // LVDS -> IDDR
                .IO(DDR3_DQS_p[x]),             // +ve LVDS pad
                .IOB(DDR3_DQS_n[x]),            // -ve LVDS pad
                .I(gowin_dqs_out),              // ODDR -> LVDS
                .OEN(gowin_dqs_tx)              // input when 1'b1
                );
           
            assign RDQS_nl[x] = ~RDQS_pl[x];
            assign RDQS_nh[x] = ~RDQS_ph[x];

        end

... and I'm assuming that means it doesn't like the DDR_CLK_RDQ on the IDDR when it has DDR_CLK on the ODDR, and when both of these are connected to the TLVDS_IOBUF :(

[edit: yep, if I change the clocks to be the same, I don't get that error any more (I do get the same error for the DQ iobufs). Looks like mixing and matching clocks isn't on]

« Last Edit: September 11, 2022, 05:37:37 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #257 on: September 11, 2022, 05:42:14 am »
'different control net'?  They do not specify which or what control net they are talking about?
Before we go haywire, let's see if there are different bidir IO buffers available.
Also, what options exist.

Also, if the DQ IO passes which does use 2 different clocks, then for the DQS, you can try the manual, or usually called emulated  differential approach which I had to use for the older Cyclone devices.

Basically 2 single ended buffers in parallel, where the second IO output has the high and low 2'b01 inverted to 2'b10.



Try removing the IO buffers, just wiring the DDRs to the IO pin.
Maybe requesting the IO buffer primitive prevents the use of adjacent logic cells which may be clocked on a different net.
« Last Edit: September 11, 2022, 05:45:34 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #258 on: September 11, 2022, 06:18:39 am »
'different control net'?  They do not specify which or what control net they are talking about?

Nope - that's the full extent of the error. There's a whole bunch of warnings in there because not everything is linked up yet (no-driver, port undriven, etc.) and a few warnings about truncation when there's calculations in a parameter (they seem to be auto-expanded to 64-bit, and then it complains when shortening them down to the size of the receiver). But nothing until now that made me think "this is serious".

Before we go haywire, let's see if there are different bidir IO buffers available.
Also, what options exist.

Without wishing to sound too gloomy, as far as I can see, the primitives aren't really that flexible - there's an IOBUF or an LVDS_IOBUF (which is further split into 'true' and 'emulated' LVDS_IOBUFs). There are no parameters to set on either of the IOBUF types, and nothing really useful on the IDDR/ODDR (init on both and clock-polarity on the ODDR).

There is an IDDR_MEM, but that's designed to work closely with the DQS (DDR3 clocking) module, which doesn't have much more documentation than the image below, as far as I can tell.

Also, if the DQ IO passes which does use 2 different clocks, then for the DQS, you can try the manual, or usually called emulated  differential approach which I had to use for the older Cyclone devices.

Basically 2 single ended buffers in parallel, where the second IO output has the high and low 2'b01 inverted to 2'b10.

DQ doesn't actually pass, it's just shielded by DQS failing first - the netlist generation bails as soon as it can't do something, and DQS happened to be in front of DQ in the .sv file. If I swap them around, I get:

ERROR (CK0012) : Instance 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[0].gowin_dq_iddr_inst' has different control net from other iologic connected to the same buffer

I mean, this can clearly work, Gowin provide a DDR3 IP core, but presumably they are using the primitives for memory-access, and have the docs of how that works. Whether it can be done in a platform-neutral way without taking advantage of those special primitives is a different question ...

« Last Edit: September 11, 2022, 06:27:01 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #259 on: September 11, 2022, 06:30:18 am »
Try removing the IO buffers, just wiring the I&O DDRs directly to the IO pin.
Maybe requesting the IO buffer primitive prevents the use of adjacent logic cells which may be clocked on a different net.

Gowin's DDR3 controller is probably using delay for the write data and using the DQS as an input to clock the DQ.  However, doing this does mean there is some way to separately clock in the data, but this may be a hard wired on-die feature.  Such a setup does simplify the PLL, but locks you into a specific range of clock frequencies.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #260 on: September 11, 2022, 06:35:53 am »
Try removing the IO buffers, just wiring the DDRs to the IO pin.
Maybe requesting the IO buffer primitive prevents the use of adjacent logic cells which may be clocked on a different net.

Ok, so for DQ, removing the IOBUF does in fact make it pass compilation. The code now looks like:

Code: [Select]
    for (x=0; x<DQ_WIDTH; x = x + 1)
        begin : gowin_DQ_bus

            wire gowin_dq_tx_out;

            ODDR gowin_dq_oddr_inst 
                (
                .Q0(DDR3_DQ[x]),                  // ODDR -> IOBUF
                .Q1(gowin_dq_tx_out),               // OE   -> IOBUF, 1'b0 => output
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(~PIN_OE_WDQ_wide[x]),           // Input 'output enable' 1'b0=out
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(DDR3_DQ[x]),                     // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );
        end


I'm not sure if this will work in reality, or whether it'll change the timing significantly (though I think the timing delays were in the DDR stages rather than the IOBUF stage).

Maybe I'm being dense, but I can't off-hand see how the LVDS can be implemented for the DQS signals without using an LVDS_IOBUF, I ought to go read that other code you mentioned to see how you did it ... Perhaps you're "faking" the LVDS part ? I guess I'll see.

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #261 on: September 11, 2022, 06:49:56 am »
Try removing the IO buffers, just wiring the DDRs to the IO pin.
Maybe requesting the IO buffer primitive prevents the use of adjacent logic cells which may be clocked on a different net.

Ok, so for DQ, removing the IOBUF does in fact make it pass compilation. The code now looks like:

Code: [Select]
    for (x=0; x<DQ_WIDTH; x = x + 1)
        begin : gowin_DQ_bus

            wire gowin_dq_tx_out;

            ODDR gowin_dq_oddr_inst 
                (
                .Q0(DDR3_DQ[x]),                  // ODDR -> IOBUF
                .Q1(gowin_dq_tx_out),               // OE   -> IOBUF, 1'b0 => output
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(~PIN_OE_WDQ_wide[x]),           // Input 'output enable' 1'b0=out
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(DDR3_DQ[x]),                     // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );
        end


I'm not sure if this will work in reality, or whether it'll change the timing significantly (though I think the timing delays were in the DDR stages rather than the IOBUF stage).

Maybe I'm being dense, but I can't off-hand see how the LVDS can be implemented for the DQS signals without using an LVDS_IOBUF, I ought to go read that other code you mentioned to see how you did it ... Perhaps you're "faking" the LVDS part ? I guess I'll see.

The 'LVDS' is actually still a 1.8v LVTTL signal, just differential.
What we call 'PSEUDO DIFFERENTIAL' is just when making 1 output high, the other goes low.
For the read, I only analyze the positive pin of the DQS even though I do wire them both up just for the balance.

Also remember, that whatever we choose for the DQ & DQS, we need to match the output buffer type / wiring for the DDR3_CK and the DDR3 command controls / address lines.

This is how I wired the DDR3_CK pins using standard DDR LVTTL1.8v:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/c901baa0c41ae46389940ae729cc772c8d40a8f1/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L360-L374

And this is how I wired the DQS:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/blob/c901baa0c41ae46389940ae729cc772c8d40a8f1/BrianHG_DDR3/BrianHG_DDR3_IO_PORT_ALTERA.sv#L390-L412

As you can see, it is just 2 output pins for the CK and 2 IO pins for the DQS.
Try wiring this up and then go back to modelsim to verify functionality.
Maybe Gowin's red bus error 16'hxxxx will now be replaced with the blue tristate 16'hzzzz...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #262 on: September 11, 2022, 06:58:43 am »
Note that the there may be a different buffer used to get 2 clocks.
Maybe 1 buffer with tristate for the output.
Then instead wire the pin to the IDDR, or another dedicated input buffer to the same pin\.
With this, it may be possible to get the differential bidirectional functionality.

(Note that altera uses their IO buffer for the output and uses a logic cell deeper in the fabric for the read data.)
« Last Edit: September 11, 2022, 04:30:38 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #263 on: September 11, 2022, 07:13:19 am »
Ta. I’ll think about this tomorrow - off to bed right now, it’s late :)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #264 on: September 11, 2022, 06:45:27 pm »
So I have this simulating again, as far as I can tell... (see 'gowin-v-altera').

Before I start checking it back in synthesis, quick question on coding style. I could generate (for example) the clock using:

Code: [Select]
    for (x=0; x<DDR3_NUM_CK; x = x + 1)
        begin : DDR_Clocks
            wire gowin_clock;
            wire gowin_ck_oe;

            ODDR gowin_ck_ddr_inst_p
                (
                .Q0(DDR3_CK_p[x]),      // Send +ve clock to this pin
                .Q1(gowin_ck_oe),       // Not used but save a warning
                .D0(1'b0),              // clock goes low to start
                .D1(1'b1),              // clock goes high in 2nd phase
                .TX(1'b0),              // TX=0 -> Output pin
                .CLK(DDR_CLK)           // DDR clock
                );

            assign DDR3_CK_n[x] = ~DDR3_CK_p[x];
        end

but you've been stressing how important it is to have the outputs perfectly aligned through the same path-types. Is the above ok, even though there might be an extra logic-delay for the negation in the DDR3_CK_n[] path ? Or would it be better to duplicate the ODDRs, something like:

Code: [Select]
    for (x=0; x<DDR3_NUM_CK; x = x + 1)
        begin : DDR_Clocks
            wire gowin_clock;
            wire gowin_ck_oe_p;
            wire gowin_ck_oe_n;

            ODDR gowin_ck_ddr_inst_p
                (
                .Q0(DDR3_CK_p[x]),       // Send +ve clock to this pin
                .Q1(gowin_ck_oe_p),      // Not used but save a warning
                .D0(1'b0),               // clock goes low to start
                .D1(1'b1),               // clock goes high in 2nd phase
                .TX(1'b0),               // TX=0 -> Output pin
                .CLK(DDR_CLK)            // DDR clock
                );

             ODDR gowin_ck_ddr_inst_n
                (
                .Q0(DDR3_CK_n[x]),        // Send -ve clock to this pin
                .Q1(gowin_ck_oe_n),       // Not used but save a warning
                .D0(1'b1),                // clock goes high to start
                .D1(1'b0),                // clock goes low in 2nd phase
                .TX(1'b0),                // TX=0 -> Output pin
                .CLK(DDR_CLK)             // DDR clock
                );
        end

Same thing applies for DQS and DQ of course.
« Last Edit: September 11, 2022, 07:35:06 pm by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #265 on: September 11, 2022, 09:28:59 pm »

Code: [Select]
            assign DDR3_CK_n[x] = ~DDR3_CK_p[x];

You are going to have 1 logic cell making the CK_P and driving that pin.  Then, an un-clocked inverter gate making the CK_N.  This will not do.  There will be a delay from the P output to the N output.

You literally need to duplicate all the logic and driver for the '_N' output so that the logic feeding the pin appears to have the same delay from the source clock.  Remember, your CK_P is the direct clocked output of a logic cell, but your CK_N isn't.

Your second solution is the better one.

Also, have you tried using the original TBUFFER, but, tying the IDDR to the pin instead of tying it to the TBUFFER's return port?  This should be easy enough for your original DQ code, just move the IDDR source pin's name.  If this compiles, it is the best solution to work with for the entire DDR design.


(I guess with the original Gowin VS Altera, the 'red' traces were trying to tell us something even though it simulated...)
« Last Edit: September 11, 2022, 09:33:57 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #266 on: September 11, 2022, 09:33:55 pm »
You are going to have 1 logic cell making the CK_P and driving that pin.  Then, an un-clocked inverter gate making the CK_N.  This will not do.  There will be a delay from the P output to the N output.

Yep, that's what I was afraid of. It wouldn't be a big deal to duplicate it all, except that I am getting an error now when I try to get DQ to synthesize, where I thought it was working last night. I'm seeing:

ERROR (CK0011) : Instance 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[0].gowin_dq_oddr_inst'(ODDR) of module 'top' cannot drive instance 'ddr_dq_0_iobuf'(IOBUF)

That must be an inferred IOBUF because I'm not instantiating one. I will try the ODDR->IOBUF->pads and IDDR<-pads idea...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #267 on: September 11, 2022, 09:48:11 pm »
[sigh]. No, wiring up the IDDR to the pads is a no-go as well.

Code: [Select]
             wire gowin_dq_in;
             wire gowin_dq_out;
             wire gowin_dq_oe;

            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // 2x SDR -> DDR
                .Q1(gowin_dq_oe),                   // in-phase output enable
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(PIN_OE_WDQ_wide[x]),            // Input 'output enable' 
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR (unused)
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(~gowin_dq_oe)                  // input when 1'b1
                );


            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(DDR3_DQ[x]),                     // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );

Gives me: ERROR (EX0339) : Port 'ddr_dq[15]' drives 1 pad loads(gowin_DQ_bus[0].gowin_dq_iobuf_inst) and 1 non-pad loads(pin:D inst:gowin_DQ_bus[0].gowin_dq_iddr_inst)

I'm beginning to think Gowin really don't want to support this sort of access...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #268 on: September 11, 2022, 09:51:10 pm »
There is also asking Gowin.

I do not know about Gowin, but Altera does offer us visual diagrams in the data sheet showing us what type of wiring is available to each IO pin on the FPGA fabric.  It shows us the allowable connections.  Do you have such an illustration for Gowin.  Maybe it can offer a hint...

It would be really frustrating if they only allow if from 1 PLL as we have a separate PLL for the write data.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #269 on: September 11, 2022, 09:54:48 pm »
Try adding a second IBUF from pin net, then tie that to the IDDR.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #270 on: September 11, 2022, 09:59:51 pm »
There is also asking Gowin.

This is looking like the best option atm, IMHO. The last query I sent off to the FAE went unanswered, though - perhaps they realized just how 'small potatoes' I am, or are busy elsewhere with something, but I can certainly try again. A query of 'do you have an example of IO with different clocks for read and write' is something an FAE might have a stock example for...

I do not know about Gowin, but Altera does offer us visual diagrams in the data sheet showing us what type of wiring is available to each IO pin on the FPGA fabric.  It shows us the allowable connections.  Do you have such an illustration for Gowin.  Maybe it can offer a hint...

It would be really frustrating if they only allow if from 1 PLL as we have a separate PLL for the write data.

I don't know much about the Altera side of things but I've seen similar diagrams representing the internals of an IOBUF in Xilinx docs showing the various inputs and outputs that can be attached - I can't find anything with that level of detail (or at least I haven't, yet) for Gowin - their diagrams are really just pictures of the verilog/VHDL definition of the port, example attached...

Try adding a second IBUF from pin net, then tie that to the IDDR.

Worth a go... :)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #271 on: September 11, 2022, 10:06:59 pm »
Same as before: ERROR (EX0339) : Port 'ddr_dq[15]' drives 1 pad loads(gowin_DQ_bus[0].gowin_dq_iobuf_inst) and 1 non-pad loads(pin:I inst:gowin_DQ_bus[0].gowin_dq_extra_ibuf)

Code: [Select]
             gowin_dq_in;
             wire gowin_dq_out;
             wire gowin_dq_oe;
             wire gowin_ibuf_in;


            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // 2x SDR -> DDR
                .Q1(gowin_dq_oe),                   // in-phase output enable
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(PIN_OE_WDQ_wide[x]),            // Input 'output enable' 
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(~gowin_dq_oe)                  // input when 1'b1
                );


             IBUF gowin_dq_extra_ibuf
                (
                .I(DDR3_DQ[x]),
                .O(gowin_ibuf_in)
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(gowin_ibuf_in),                  // DDR input signal from extra inserted IBUF
                .CLK(DDR_CLK_RDQ)                   // read clock
                );
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #272 on: September 11, 2022, 10:07:50 pm »
Example Altera datasheet  IO illustration:
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #273 on: September 11, 2022, 10:08:36 pm »
Where's the 'jealous' emoji ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #274 on: September 11, 2022, 10:19:02 pm »
Not an IO buffer, use a tristate buffer.

Then an input buffer.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #275 on: September 11, 2022, 10:32:00 pm »
Not really understanding this error...

ERROR (CV0013) : Pin(ddr_dq[0]) of 'gowin_DQ_bus[0].gowin_dq_tbuf_inst'(TBUF) does not connect to port

Given the code:
Code: [Select]
             wire gowin_dq_in;
             wire gowin_dq_out;
             wire gowin_dq_oe;
             wire gowin_ibuf_in;


            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // 2x SDR -> DDR
                .Q1(gowin_dq_oe),                   // in-phase output enable
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(PIN_OE_WDQ_wide[x]),            // Input 'output enable' 
                .CLK(DDR_CLK_WDQ)                   // write clock
                );

            TBUF gowin_dq_tbuf_inst
                (
                .O(DDR3_DQ[x]),                      // TBUF -> pad
                .I(gowin_dq_out),                    // ODDR -> TBUF
                .OEN(~gowin_dq_oe)                   // input when 1'b1
                );

             IBUF gowin_dq_extra_ibuf
                (
                .I(DDR3_DQ[x]),
                .O(gowin_ibuf_in)
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(gowin_ibuf_in),                  // DDR input signal
                .CLK(DDR_CLK_RDQ)                   // read clock
                );

- I have stared at this for the last 5 minutes, and it certainly seems to be linked up correctly.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #276 on: September 11, 2022, 10:48:47 pm »
That would make me scratch my head too.
What happens if you bypass the IBUF.

I looked at the Gowin data sheet you linked to last week.
Is this all you get?
I even read the DDR memory interface modules.  With such lack of description, never mind figuring out how they work, but, how are you supposed to work out the wiring between them?

They have the natural clock and a DQS clock.  Where is this supposed to be wired from/to?  What kind of buffer?  How do you generate the DQS output clock, my way or some other way?  What does the addressing do and why do you have or use them?  At least, Altera's 'mem_io_phy' explains this in a 20 page document going over every feature and how & where you can wire the IO and what the waveforms do...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #277 on: September 11, 2022, 10:56:38 pm »
That would make me scratch my head too.
What happens if you bypass the IBUF.

I tried that :) You get the same error, in fact commenting out the input path altogether still gives the same error. Something is wrong with the output connectivity, but I'm not sure I can see *how* let alone where.

I got briefly excited when I commented out the TBUF and made it do:

Code: [Select]
   assign DDR3_DQ[x] = (gowin_dq_oe) ? gowin_dq_out : 1'bz

... which seemed to work, in as much as I got errors about the next section (DQS) instead of this one, but a few re-runs later it was back to complaining about the DQ path. Somehow the seed-value made it evaluate DQS before DQ for a few goes, I think. Yes, I know that messes with the timing as well, but we're at the point of "kitchen sink" time here...

I looked at the Gowin data sheet you linked to last week.
Is this all you get?
I even read the DDR memory interface modules.  With such lack of description, never mind figuring out how they work, but, how are you supposed to work out the wiring between them?

They have the natural clock and a DQS clock.  Where is this supposed to be wired from/to?  What kind of buffer?  How do you generate the DQS output clock, my way or some other way?  What does the addressing do and why do you have or use them?  At least, Altera's 'mem_io_phy' explains this in a 20 page document going over every feature and how & where you can wire the IO and what the waveforms do...

Yeah. It's not ... exhaustively ... documented.

At this point I think I'll ping the FAE and see if there's any answers forthcoming. I have a feeling they're going to day "use our provided IP if you want to talk to a DDR3 chip" but we'll see.
« Last Edit: September 11, 2022, 10:58:32 pm by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #278 on: September 12, 2022, 01:34:11 am »
Here is an absolutely stupid idea:

Code: [Select]
assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ ;
Now, use ' dqs_clk[ x ] ' as a single clock for both the IDDR and ODDR together.

I have an extra cycle clearance in the OE as the DDR3 has a large minimum bus-turn-around cycle time and we have room for 1/2 additional clock turn-on cycle.

So long as the IO buffer's source clock selection hardware is tied to logic whether you manually hard selected it or not, this might have 0 impact on design performance.  If Gowin manually manipulates the PLL to use specific on-chip routing for the DDR, then this will not work.
« Last Edit: September 12, 2022, 01:38:38 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #279 on: September 12, 2022, 02:54:08 am »
Here is an absolutely stupid idea:

Code: [Select]
assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ ;
Now, use ' dqs_clk[ x ] ' as a single clock for both the IDDR and ODDR together.

I have an extra cycle clearance in the OE as the DDR3 has a large minimum bus-turn-around cycle time and we have room for 1/2 additional clock turn-on cycle.

So long as the IO buffer's source clock selection hardware is tied to logic whether you manually hard selected it or not, this might have 0 impact on design performance.  If Gowin manually manipulates the PLL to use specific on-chip routing for the DDR, then this will not work.

So that worked for synthesis (!) I haven't checked simulation yet, but using ...

Code: [Select]
    for (x=0; x<DDR3_WIDTH_DQS; x = x + 1)
        begin : Gowin_DQ_Strobes

             wire gowin_dqs_out;                 // Internal: ODDR->IOBUF
             wire gowin_dqs_in;                  // Internal: IOBUF->IDDR
             wire gowin_dqs_tx;                  // Internal: OE on input to ODDR

             assign dqs_clk[x] = (OE_DQS[x]) ? DDR_CLK : DDR_CLK_RDQ;

             ODDR gowin_dqs_oddr_inst 
                (
                .Q0(gowin_dqs_out),             // ODDR -> LVDS
                .Q1(gowin_dqs_tx),              // 1'b0 => output
                .D0(1'b0),                      // Input data [SDR]
                .D1(1'b1),                      // Input data [SDR]
                .TX(~OE_DQS[x]),                // Input 'output enable' 0=output
                .CLK(dqs_clk[x])                 // DDR clock
                );
 
             IDDR gowin_dqs_iddr_inst 
                (
                .Q0(RDQS_pl[x]),                // SDR to app #0
                .Q1(RDQS_ph[x]),                // SDR to app #1
                .D(gowin_dqs_in),               // DDR input signal
                .CLK(dqs_clk[x]) // read clock
                );

            TLVDS_IOBUF gowin_dqs_lvds_iobuf_inst
                (
                .O(gowin_dqs_in),               // LVDS -> IDDR
                .IO(DDR3_DQS_p[x]),             // +ve LVDS pad
                .IOB(DDR3_DQS_n[x]),            // -ve LVDS pad
                .I(gowin_dqs_out),              // ODDR -> LVDS
                .OEN(gowin_dqs_tx)              // input when 1'b1
                );
           
             assign RDQS_nl[x] = ~RDQS_pl[x];
             assign RDQS_nh[x] = ~RDQS_ph[x];
        end

I get 1 odd warning I hadn't expected:

Code: [Select]
WARN  (NL0001) : Sweep user defined dangling instance "DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/Gowin_DQ_Strobes[0].gowin_dqs_iddr_inst"
but it might be an artifact of me temporarily commenting out the DQ section. Synthesis completes without errors though.

Can we do the same for DQ, in terms of the slack available in the system ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #280 on: September 12, 2022, 03:06:24 am »
Can we do the same for DQ, in terms of the slack available in the system ?

Yes, go right ahead....
DQ and DQS slack is identical.  (Note that I already turn on DQS an additional half-clock early, we may just need to do so for the DQ.)
We just might need to add 1 or 2 settings to my code, but for now, test away.
« Last Edit: September 12, 2022, 03:15:19 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #281 on: September 12, 2022, 04:40:04 pm »
Ok, so doing the 'dynamically switch clock' for both DQ and DQS compiles fine - somehow I'm still getting

Code: [Select]
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing
.. but I haven't yet managed to place the RS232 stuff at the top-level, so there's probably at least some reset weirdness going on. With just the BrianHG_DDR3_PLL and BrianHG_DDR3_PHY_SEQ_v16 instantiated, it all compiles for synthesis, and the simulation passes. Today is a busy meetings-day (Mondays, ugh!) so this is as far as it's likely to get until this evening...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #282 on: September 12, 2022, 06:57:27 pm »
Ok, so doing the 'dynamically switch clock' for both DQ and DQS compiles fine - somehow I'm still getting

Code: [Select]
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing

Hmmm, 'BrianHG_DDR3_GEN_tCK' takes in the DDR3 parameters and clock rate parameters.
It spits out a piles of constants which define how many every type of required delay time should be, in the number of required clock cycles and how to set the MRS control registers.

You can say the DDR3 cannot function without it.

I know that a good 50% of it I never needed to use, but, the rest is absolutely important.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #283 on: September 13, 2022, 02:56:40 pm »
Well, I got "everything" into the project last night, and I'm still seeing a couple of confusing 'sweep' warnings where modules are being removed...

Code: [Select]
GowinSynthesis start
Running parser ...
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_GEN_tCK.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_IO_PORT_ALTERA.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\gowin_ddr_clocking.sv'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\sync_rs232_uart.v'
Analyzing Verilog file 'C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv'
Compiling module 'top'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":3)
Compiling module 'BrianHG_DDR3_PLL(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter")'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv":24)
Compiling module 'gowin_ddr_clocking(CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\gowin_ddr_clocking.sv":25)
Compiling module 'BrianHG_DDR3_PHY_SEQ_v16(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",BHG_OPTIMIZE_SPEED=1'b1,BHG_EXTRA_SPEED=1'b1,CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter",DDR3_CK_MHZ=405,DDR3_SPEED_GRADE="-125",DDR3_SIZE_GB=1,DDR3_NUM_CK=1,DDR3_WIDTH_ADDR=13,DDR3_WIDTH_DM=2,DDR3_WIDTH_DQS=2,DDR3_MAX_REF_QUEUE=5'b01000,IDLE_TIME_uSx10=8'b00000010,SKIP_PUP_TIMER=1'b0,PORT_VECTOR_SIZE=5,PORT_ADDR_SIZE=27,USE_TOGGLE_CONTROLS=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv":53)
Compiling module 'BrianHG_DDR3_GEN_tCK(DDR3_CK_MHZ=405,DDR3_SPEED_GRADE="-125")'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_GEN_tCK.sv":54)
Compiling module 'DDR3_IO_PORT_GOWIN(FPGA_VENDOR="Gowin",BHG_EXTRA_SPEED=1'b1,CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,DDR3_NUM_CK=1,DDR3_WIDTH_ADDR=13,DDR3_WIDTH_DM=2,DDR3_WIDTH_DQS=2,DDR3_RWDQ_BITS=128,CMD_ADD_DLY=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":24)
Extracting RAM for identifier 'PIN_OE_WDQ'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":184)
Compiling module 'BrianHG_DDR3_CMD_SEQUENCER_v16(USE_TOGGLE_ENA=1'b1,USE_TOGGLE_OUT=1'b0,DDR3_WIDTH_ROW=13,DDR3_RWDQ_BITS=128,PORT_VECTOR_SIZE=5,BHG_EXTRA_SPEED=1'b1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":39)
Extracting RAM for identifier 'bank_row_mem'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)
Extracting RAM for identifier 'vector_pipe_mem'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_CMD_SEQUENCER_v16.sv":180)
Compiling module 'DDR3_CMD_ENCODE_BYTE(addr_size=5)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":571)
WARN  (EX3670) : Actual bit length 8 differs from formal bit length 128 for port 'data_in'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":393)
WARN  (EX3670) : Actual bit length 1 differs from formal bit length 16 for port 'mask_in'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":394)
Compiling module 'DDR3_CMD_DECODE_BYTE(addr_size=5)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":625)
WARN  (EX3670) : Actual bit length 8 differs from formal bit length 128 for port 'data_out'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\top.sv":413)
WARN  (EX3073) : Port 'rx_sample_pulse' remains unconnected for this instance("C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v":221)
Compiling module 'rs232_debugger(CLK_IN_HZ=101250000,ADDR_SIZE=24,READ_REQ_1CLK=1)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\rs232_DEBUGGER.v":30)
Compiling module 'sync_rs232_uart(CLK_IN_HZ=101250000)'("C:\Users\simon\Documents\verilog\ddr3-gowin\src\sync_rs232_uart.v":15)
NOTE  (EX0101) : Current top module is "top"
WARN  (EX0211) : The output port "phase_done" of module "BrianHG_DDR3_PLL(FPGA_VENDOR="Gowin",FPGA_FAMILY="GW2A-18",CLK_KHZ_IN=27000,CLK_IN_MULT=15,CLK_IN_DIV=1,INTERFACE_SPEED="Quarter")" has no driver("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PLL.sv":69)
WARN  (NL0001) : Sweep user defined dangling instance "DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/Gowin_DQ_Strobes[1].gowin_dqs_iddr_inst"("C:\Users\simon\Documents\verilog\ddr3-gowin\src\ddr3_io_port_gowin.sv":452)
[5%] Running netlist conversion ...
Running device independent optimization ...
[10%] Optimizing Phase 0 completed
[15%] Optimizing Phase 1 completed
[25%] Optimizing Phase 2 completed
Running inference ...
[30%] Inferring Phase 0 completed
[40%] Inferring Phase 1 completed
[50%] Inferring Phase 2 completed
[55%] Inferring Phase 3 completed
Running technical mapping ...
[60%] Tech-Mapping Phase 0 completed
[65%] Tech-Mapping Phase 1 completed
[75%] Tech-Mapping Phase 2 completed
[80%] Tech-Mapping Phase 3 completed
[90%] Tech-Mapping Phase 4 completed
WARN  (NL0002) : The module "BrianHG_DDR3_GEN_tCK" instantiated to "BHG_DDR3_GEN_tCK" is swept in optimizing("C:\Users\simon\Documents\verilog\ddr3-gowin\src\BrianHG_DDR3_PHY_SEQ_v16.sv":336)
[95%] Generate netlist file "C:\Users\simon\Documents\verilog\ddr3-gowin\impl\gwsynthesis\ddr3-gowin.vg" completed
[100%] Generate report file "C:\Users\simon\Documents\verilog\ddr3-gowin\impl\gwsynthesis\ddr3-gowin_syn.rpt.html" completed
GowinSynthesis finish


Given that it simulates correctly, I expect there's a signal I missed somewhere that's the snowball that starts the avalanche of removal. Now begins the process of going through the changes and trying to figure out where I went wrong...

There's a *small* voice in the back of my mind wondering if the NL002-type warning (after tech-mapping) is because it realized it could reduce the module down to constants at compile-time, but that's probably wishful thinking. I have no idea yet why the IDDR of only DQS[1] ought to be removed...
 
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #284 on: September 13, 2022, 06:19:00 pm »
There's a *small* voice in the back of my mind wondering if the NL002-type warning (after tech-mapping) is because it realized it could reduce the module down to constants at compile-time, but that's probably wishful thinking. I have no idea yet why the IDDR of only DQS[1] ought to be removed...
Do not worry about the DQS[1] as I only use one of the source DQSs to verify read phase.  Everything else is tuned off of the DQ port.

Yes, the module "BrianHG_DDR3_GEN_tCK" is the only real surprise for me.


Did you get an FMAX reading?
How about a logic cell usage count?
« Last Edit: September 13, 2022, 06:21:40 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #285 on: September 13, 2022, 07:21:44 pm »
I'll spend some more time this evening trying to track down what I've screwed up - it wasn't anything too obvious (at least to my eyes) because I spent a fair amount of time last night looking as well without finding it. What's being passed into BrianHG_DDR3_PHY_SEQ_v16 seems ... reasonable.

I'm cautious about stats until I know it's all being synthesized, but the current resource usage is below. The fMax is a bit disappointing, I think - also below, but bear in mind this is just push-button synthesis.

It's complaining about a lack of timing paths for some things, and it certainly isn't properly constrained yet - of all the black magic within the realm of FPGAs, clock constraints are the most opaque to me, in fact the only timing constraints being applied are those inferred by the clock-rates in the PLL instantiations.

I stuck with a 'multiple of the base clock' as you recommended, and the nearest to 400MHz was 405 from the 27MHz base clock, but it didn't get even close to that as you can see.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #286 on: September 13, 2022, 08:09:25 pm »
Try these PLL settings:

Code: [Select]
// ****************  System clock generation and operation.
parameter int        CLK_KHZ_IN              = 27000,            // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 24,               // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 2,                // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.
parameter int        DDR_TRICK_MTPS_CAP      = 0,              // 0=off, Set a false PLL DDR data rate for the compiler to allow FPGA overclocking.  ***DO NOT USE.

parameter string     INTERFACE_SPEED         = "Quarter",        // Either "Full", "Half", or "Quarter" speed for the user interface clock.
                                                                 // This will effect the controller's interface CMD_CLK output port frequency.

And then try 'CLK_IN_MULT  = 22'.

Your register count seems a little low, did you include the 'RS232 debugger'.   However, this may just be since you are using a smaller DDR3 ram chip.

Exactly which chip are you using?
For example, an Altera Cyclone/Max with a -8 suffix can only do ~300MHz.  An Altera Cyclone/Max with a -6 suffix can do ~400MHz.  This doesn't mean I cant cheat and overclock or use a -8 as a -6 with a good heatsink, but these 300/400MHz are the official plain vanilla setup.

Yes, compiler effort to achieve a good FMAX comes with the generation of a proper .sdc file.
If Gowin supports the Synopsis Design Constraints file syntax, then is should match mine with the exception of the primary PLL clock names and the optimum output delay timing values I have chosen.

(Note that your designed has inferred some logic registers as SSRAM16, if this has been done in the IO port section, forcing Gowing to use logic via appropriate attribute may remove these speed bottlenecks as logic should be much faster than memory blocks.)
« Last Edit: September 13, 2022, 08:17:12 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #287 on: September 13, 2022, 09:54:40 pm »
So, attached are the mult=24 and mult=22 results for fMax. It did better, but still didn't manage to get the 50% clock, in both cases.

I did include the rs232 debugger - my current code is basically a "gowin-ified clone" of your BrianHG_DDR3_DECA_PHY_SEQ_only_v16 folder.

The chip is a GW2A-LV18PG256C8/I7 so not quite their fastest (6=slowest, 9=fastest). I'm really not sure how to interpret the figures for the internal measurements in 'speed-grades' below (the column is entitled 'speed grade' but has no distinction between grade!), but there do seem to be some distinctions for external switching...

The SDC file format is "supported" but virtually none of your directives are actually legal syntax. The only legal syntax in the .sdc file is:
  • create_clock
  • create_generated_clock
  • set_clock_latency
  • set_clock_uncertainty
  • set_clock_groups
  • set_input_delay
  • set_output_delay
  • set_max_delay/ set_min_delay
  • set_false_path
  • set_multicycle_path
  • report_timing
  • report_high_fanout_nets
  • report_route_congestion
  • report_min_pulse_width
  • report_max_frequency
  • report_exceptions

There's no variables, no derive_*... I have a clock named 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]' so I tried converting

Code: [Select]
set_input_delay  -clock [get_clocks {*DDR3_PLL5*clk[2]}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

... to ...
Code: [Select]
set_input_delay  -clock [get_clocks {*dq_clk[0]}] -max -add_delay 0.5 [get_ports {DDR3_DQ*[*]}]

but it rejects it with:
ERROR  (TA2003) : "ddr3-gowin.sdc":13 | Can't set timing constraint to object

As for SSRAM16's, according to the synthesis log, there are 3 places where it happens:
  • Extracting RAM for identifier 'PIN_OE_WDQ'("ddr3_io_port_gowin.sv":184)
  • Extracting RAM for identifier 'bank_row_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)
  • Extracting RAM for identifier 'vector_pipe_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":180)

I don't see any obvious way in the GowinSynthesis User Guide to tell it *not* to infer a RAM - just ways to help it infer one. I tried marking the PIN_OE_WDQ logic as /*synthesis syn_keep=1 */, hoping that might be a hint that I don't want it changed, but it made no difference.

Got a meeting to attend, so can't take any more time right now, I'll have a look this evening...
« Last Edit: September 13, 2022, 09:58:16 pm by SpacedCowboy »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #288 on: September 13, 2022, 09:55:45 pm »
Gah. Forgot the attachments, and you can't 'edit' them in :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #289 on: September 13, 2022, 10:16:33 pm »
Code: [Select]
set_input_delay  -clock [get_clocks {*DDR3_PLL5*clk[2]}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

... to ...
Code: [Select]
set_input_delay  -clock [get_clocks {*dq_clk[0]}] -max -add_delay 0.5 [get_ports {DDR3_DQ*[*]}]

but it rejects it with:
ERROR  (TA2003) : "ddr3-gowin.sdc":13 | Can't set timing constraint to object


The '-clock [get_clocks {*dq_clk[0]}]' needs to be a source clock name, not a net name.

What you want is something along the line:
Code: [Select]
set_input_delay  -clock [get_clocks {*ddr3_pll1/CLKOUTP*}] -max -add_delay $tSU [get_ports {DDR3_DQ*[*]}]

If Gowin wont allow the wild card for the source clock, then you will need to spell it all out.  Even in Quartus, this is a hassle as well.  Derive clocks is an Altera thing, you may need to see how Gowin labels their PLL clocks.  However, variables should be supported.  Otherwise, defining a group of IO to the same figure would be an absolute hassle when you need to change a specification globally.  Doesn't Gowin provide example DDR .sdc files to look at?

Can I see the slack report for the failed 1/2 freq ddr3_pll1/CLKOUTD clock as well.  We are close and if you are using a -7 Gowin, 324Mhz may be close to the max, maybe 351MHz may be achievable.  You can try compiling for a -9 to see what happens.

Another help may be to set a bidirectional 'falsepath' between '*ddr3_pll1/CLKOUTP*' and '*ddr3_pll2/CLKOUTP*'.  Remember we have a clock switch between these 2 on the DQ DDRIO, however, they actually never talk to each other.  The falsepath should help timing as the compiler will no longer connect the 2 clocks when optimizing the design, but for now, the '*ddr3_pll1/CLKOUTD*' is your bottleneck.
« Last Edit: September 13, 2022, 10:21:22 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #290 on: September 14, 2022, 01:00:26 am »
Ok, so I have some progress here. I sort of muddled my way through converting over your constraints file...

Code: [Select]
#**************************************************************
# Input clock from the board
#**************************************************************
create_clock -name clk -period 37.037 -waveform {0 18.518} [get_ports {clk}]

#**************************************************************
# Create Generated Clocks
#**************************************************************
create_generated_clock -name clk_ddr3 -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUT}]
create_generated_clock -name clk_ddr3_rd -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 -phase 90 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUTP}]
create_generated_clock -name clk_ddr3_50 -source [get_ports {clk}] -master_clock clk -divide_by 4 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll1/CLKOUTD}]

create_generated_clock -name clk_ddr3_wr -source [get_ports {clk}] -master_clock clk -divide_by 2 -multiply_by 22 -duty_cycle 50 -phase 270 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTP}]
#create_generated_clock -name clk_ddr3_25 -source [get_ports {clk}] -master_clock clk -divide_by 8 -multiply_by 22 -duty_cycle 50 [get_pins {BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTD}]

#**************************************************************
# Set Input Delay
#**************************************************************

# tSU = 0.5
set_input_delay -clock clk_ddr3_rd -max -add_delay             0.500 [get_ports {ddr_dq*[*]}]
set_input_delay -clock clk_ddr3_rd -max -add_delay -clock_fall 0.500 [get_ports {ddr_dq*[*]}]

set_input_delay -clock clk_ddr3_50 -max 0.5  [get_ports {uart_rxd}]

# tH  = 2.0
set_input_delay -clock clk_ddr3_rd -min -add_delay             2.000 [get_ports {ddr_dq*[*]}]
set_input_delay -clock clk_ddr3_rd -min -add_delay -clock_fall 2.000 [get_ports {ddr_dq*[*]}]

set_input_delay -clock clk_ddr3_50 -min 2.000  [get_ports {uart_rxd}]


#**************************************************************
# Set Output Delay
#**************************************************************

# tCO = -7.5 (?)
set_output_delay -clock clk_ddr3 -max -add_delay             -7.5 [get_ports {ddr*}]
set_output_delay -clock clk_ddr3 -max -add_delay -clock_fall -7.5 [get_ports {ddr*}]

set_output_delay -clock clk_ddr3_50 -max -7.5 [get_ports {led[*]}]
set_output_delay -clock clk_ddr3_50 -max -7.5 [get_ports {uart_txd}]

# tCOm = -3.8 (?)
set_output_delay -clock clk_ddr3 -min -add_delay             -3.8 [get_ports {ddr*}]
set_output_delay -clock clk_ddr3 -min -add_delay -clock_fall -3.8 [get_ports {ddr*}]

set_output_delay -clock clk_ddr3_50 -min -3.8 [get_ports {led[*]}]
set_output_delay -clock clk_ddr3_50 -min -3.8 [get_ports {uart_txd}]


#**************************************************************
# Set False Path
#**************************************************************
set_false_path -from [get_clocks {clk_ddr3_rd}] -to  [get_clocks {clk_ddr3_wr}]
set_false_path -from [get_clocks {clk_ddr3_wr}] -to  [get_clocks {clk_ddr3_rd}]

#**************************************************************
# Report more timing errors (default is 25)
#**************************************************************
report_timing -setup -max_paths 100 -max_common_paths 1

Most of these numbers are simply transcribed from your .sdc, I took a guess at 90 degrees for the read-clock phase, the test bench seems to take 3 or 4 passes in the calibration phase before it locks.  The clock-namess seem to have to be those defined by 'create_clock' or 'create_generated_clock', just putting in the expression you use (abbreviated or not) to create the clock doesn't work, it seems to want an actual clock name.

With this, and after setting the "try a bit harder" option in the project configuration, I'm reliably seeing fMax as below, so it's hitting the target (just :)) and it looks like I could possibly squeeze 300 MHz out of it, but I am getting setup and hold violations still, as below. Not sure if those are because my numbers are bogus or what...

FWIW, if I claim to have a C9/I8 part, then it still won't push much higher - I tried with a clock multiplier of 26 in both top.sv and the .sdc (for a 350/175/87.5 clock setup) and the 50% clock only managed 157 MHz (not 175). That's not much over the C8/I7 part's 150-and-change.

I just figured out how to get the timing report based on the clock, so I've attached that as well :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #291 on: September 14, 2022, 01:16:37 am »
Read clock must be 0 degrees, otherwise massive optimization can never be done properly.  You will be stuck with many hold or setup violations.

My code had a 4 clock window between the RD clock and the main DDR clock.  It had the cycle timing designed to center the read latch in the middle +/-1 clock bypassing any metastability issues.

However, you must specify the correct write clock phase.

Also, don't forget setting the false path between the read and write clock domains.
Also setting false paths between you 27MHz clock in and the clk_25 and clk_50 domains will also help improve FMAX.

« Last Edit: September 14, 2022, 01:19:16 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #292 on: September 14, 2022, 03:09:09 am »
Read clock must be 0 degrees, otherwise massive optimization can never be done properly.  You will be stuck with many hold or setup violations.

My code had a 4 clock window between the RD clock and the main DDR clock.  It had the cycle timing designed to center the read latch in the middle +/-1 clock bypassing any metastability issues.

However, you must specify the correct write clock phase.

Gotcha. Changed.

Also, don't forget setting the false path between the read and write clock domains.
Also setting false paths between you 27MHz clock in and the clk_25 and clk_50 domains will also help improve FMAX.

Yep, I'd done the first of those - it was towards the bottom of the settings above though, so not easily visible :) Added both directions for clk<->ddr3_clk_25 and clk<->ddr3_clk_50 as well. Together with the phase=0 on the read clock, that was enough to boost the timings to the attached.

The vast majority of the setup violations are on the read-clock, and I think what you're saying is that the clock-centering will mean that those violations aren't really going to matter. What about the write ones ?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #293 on: September 14, 2022, 03:20:59 am »
The report you are showing me are violations between the write clock and the IO buffer.
This is not like due to the 270deg phase, but all the set_output_delays in the .sdc file.

I have a specific tweak in those values which allows Altera's Quartus the best grace when fitting the design based on the IO cells of their FPGA's.

Test step 1, comment out all the set_output_delays and see what happens.

Step 2, determine the best tsu and hold values for Gowin's DDR outputs.  Since in my design, everything about the DDR3 is source generated from the FPGA, our goal is the most comfortable tsu and hold which will lie on the buffer's natural timing.  Since I have a multi-level pipe to each DDR buffer, we should be able to set a nice tight figure allowing 0 slack violations, which also allows for a better FMAX in the report as well.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #294 on: September 14, 2022, 03:42:35 am »
So what happens after commenting out all the set_output_delay lines appears to be "absolutely nothing". The timings table for the clk_ddr3_wr clock is identical from top to bottom with or without the set_output_delay. I cleared out the previous-run data first, just to make sure it wasn't stale.

Changing the phase of the clock (just to test it) did make some small difference - the worst path slack went from -2.496 to -2.6...

[edit: however, all the hold violations disappeared when I took out the set_output_delay statements]
« Last Edit: September 14, 2022, 03:44:42 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #295 on: September 14, 2022, 03:49:27 am »

[edit: however, all the hold violations disappeared when I took out the set_output_delay statements]

Ok, are you saying the design compiled with 0 violations?

If so, then we just need to tune the tsu settings.
If so, then try lowering every tsu on every set_output_delay by 2ns and recompile and check the violations slack.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #296 on: September 14, 2022, 03:55:52 am »
Not quite [grin]

I still have all the setup violations - and they're the same figures as above in the image posted. I just have no hold violations any more, now there's no "set_output_delay" in the .sdc file.

Here's the 1st-path info:

Code: [Select]
Report Command:report_timing -setup -from_clock [get_clocks {clk_ddr3_wr}]

Path1

Path Summary:

Slack -2.496
Data Arrival Time 83.077
Data Required Time 80.582
From DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0
To DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
Launch Clk clk_ddr3_wr:[R]
Latch Clk DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]:[R]
Data Arrival Path:

AT DELAY TYPE RF FANOUT LOC NODE
79.966 79.966 active clock edge time
79.966 0.000 clk_ddr3_wr
79.966 0.000 tCL RR 86 PLL_R[0] BHG_DDR3_PLL/gowin_ddr_clocks/ddr3_pll2/CLKOUTP
81.240 1.274 tNET RR 1 R36C30[0][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0/CLK
81.472 0.232 tC2Q RF 2 R36C30[0][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/PIN_WDATA_PIPE_l[0]_6_s0/Q
83.077 1.605 tNET FF 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst/D1
Data Required Path:

AT DELAY TYPE RF FANOUT LOC NODE
80.000 80.000 active clock edge time
80.000 0.000 DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk[0]
80.000 0.000 tCL RR 32 R40C27[2][A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/dq_clk_0_s0/F
80.782 0.782 tNET RR 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst/CLK
80.747 -0.035 tUnc DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
80.582 -0.165 tSu 1 IOB44[A] DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[14].gowin_dq_oddr_inst
Path Statistics:

Clock Skew -0.492
Setup Relationship 0.034
Logic Level 1
Arrival Clock Path Delay cell: 0.000, 0.000%; route: 1.274, 100.000%
Arrival Data Path Delay cell: 0.000, 0.000%; route: 1.605, 87.373%; tC2Q: 0.232, 12.627%
Required Clock Path Delay cell: 0.000, 0.000%; route: 0.782, 100.000%
« Last Edit: September 14, 2022, 03:58:48 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #297 on: September 14, 2022, 04:04:01 am »
Can you test a 90degree WDQ clock?
You need to change your PLL (i forgot if it is auto...), the parameter 'DDR3_WDQ_PHASE' at the top of your top hierarchy, and your .sdc file.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #298 on: September 14, 2022, 04:17:30 am »
Also find out the output buffer's delay element range and test a write data clock of 0 deg.  If 0 deg really helps and the programmable delay element can be made large enough to shift the write data by 90 deg, then we might have to go that route.
« Last Edit: September 14, 2022, 04:19:59 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #299 on: September 14, 2022, 04:33:41 am »
Yup, will do.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #300 on: September 14, 2022, 05:04:18 am »
Can you test a 90degree WDQ clock?
You need to change your PLL (i forgot if it is auto...), the parameter 'DDR3_WDQ_PHASE' at the top of your top hierarchy, and your .sdc file.

Ok, so the 90-degree phase is actually worse - see attached. I'm seeing -3.366 as worst-case slack rather than -2.496. This is with changing the SDC file clock definition and also "DDR3_WDQ_PHASE" in top.sv, which passes it through to BrianHG_DDR3_PLL, which passes it through to gowin_ddr_clocking, which uses it to index the phase table.

Running with 0-degree phase (as above, in the SDC and top.sv), the results are identical to the 90-degree results. Both of these are runs using the "clean and re-run all" option. Just to make sure there's no stale data.

I'm getting the feeling that I'm missing something fundamental here - I would have expected more change...


Quote from: BrianHG
Also find out the output buffer's delay element range and test a write data clock of 0 deg.  If 0 deg really helps and the programmable delay element can be made large enough to shift the write data by 90 deg, then we might have to go that route

The IODELAY module is 18ps per step, with a max step-count of 127, ==> 2.286ns. a 400MHz signal is 2.5ns, so we can do a delay of ~90% of a period. But I guess that doesn't help much if the 0-degree phase still has all the setup violations...
 
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #301 on: September 14, 2022, 05:10:19 am »
Ok, try this:

At the PLL, the write clock output, assign it to the DDR3 clock.
This will force every associated ounce of logic in the write section to be tied to the first PLL.

You might need to remove the set_output_delay in the .sdc.

If this doesn't work, I don't know what will.  We put all of the write onto the same system clock as all the other outputs.  It may be boiling down to something with the clock switch we are using to swap from read to write.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #302 on: September 14, 2022, 05:24:18 am »
Ok, so with that I have no clk_ddr3_wr setup violations. There are some clk_ddr setup violations though (attached)

I also have 4 hold violations (attached)

If we can remove/ignore these, perhaps then use the DELAY to get the write-phase working...

[edit: meant to say, this is with the SDC containing no set_output_delay statements, but keeping the set_input_delay statements]
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #303 on: September 14, 2022, 05:45:33 am »
Though we did cut down from ~2.5ns to 1.5ns slack, it is still not good enough.
Though, the funny thing, this path should not be a problem.
The write DQ path is nothing more than a DFF -> DFF, a number of fold.

Ok, let's try changing this parameter in the IO port source HDL:

line 72:
Code: [Select]
parameter int        WDQ_SYNC_CHAIN          = 3 + CMD_ADD_DLY - WDQ_CLK_270, // + ((CLK_KHZ_IN*CLK_IN_MULT/CLK_IN_DIV)>=450000), 
Try changing the '3' to a '2' and a '4'.

This shifts the clock domain transition position in the middle of the write pipe FIFO from the DDR_CLK to the DDR_CLK_WDQ.

You may want to try changing everything back to 270deg and also just try 0deg.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #304 on: September 14, 2022, 05:47:15 am »
Ok, so I tried pushing it as far as it needed to go, in order to pass muster. Setting the PLLs to x19/2, and changing the (default) calculation of 'slow, hot' to 'fast' in the analysis, I can get everything to pass - This is still with DDR3_CLK_WR being assigned to DDR3_CLK.

No setup violations, no hold violations. What I don't really understand is how it can claim a max-frequency so high in the summary view, when trying to actually push it that far will give all sorts of violations...

Something else cropped up when I launched the constraint editor to play with the environment constraints - it popped up a dialogue saying it couldn't find the DQ/DQS ports - that's the first time I've seen that warning, there's nothing in the log. So it may not be as constrained as I thought it was...

[edit: looks like
Code: [Select]
set_input_delay -clock clk_ddr3_rd -max -add_delay             0.500 [get_ports {ddr_dq[0]}] is acceptable, but
Code: [Select]
set_input_delay -clock clk_ddr3_rd -max -add_delay             0.500 [get_ports {ddr_dq*[*]}] is not. Even the single asterisk inside the [] isn't acceptable. Looks like I have a lot of copy/paste to do...

« Last Edit: September 14, 2022, 05:57:42 am by SpacedCowboy »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #305 on: September 14, 2022, 06:57:00 am »
Ok, I've tried a variety of options here now, and with the clock above ~256MHz, I'm always getting setup and/or hold violations.

The best seems to be with:
  - DDR_CLK_WR assigned to DDR_CLK
  - No set_output_delay parameters in the SDC file
  - syntax-acceptable port definitions in the SDC file for all the ddr_dq and ddr_dqs lines in the set_input_delay parameters
  - parameter int        WDQ_SYNC_CHAIN          = 2 + CMD_ADD_DLY - WDQ_CLK_270

With that, I'm seeing:
  - worst slack violation of -1.000ns
  - worst hold violation of -0.35ns

It's almost midnight, so it's checked in and I'm off to bed ...
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #306 on: September 14, 2022, 04:54:08 pm »
I'm curious, what happens if you disable the set_input_delay ?  IE: we made that spec sooooo tight that it is preventing any proper routing of the output paths.

I'm wondering if our problem stems from our clock 'switch' technique.

Also, we are currently feeding the OE through the ODDR, try feeding the OE directly to the bidir buffer.

Another trick would be to use an output enable buffer, then, tie the IDDR directly to the pin.
May using a bidir buffer forces the input to be placed directly at the pin while the output logic cell is buried or visa-versa.

One thing I do know, it's got to be possible to shift data to the ODDR at a good rate unless you can find a spec somewhere that the ODDR cannot achieve 300MHz, ie 600mtps.  It appears the IDDR works unless you need to go down the list of timing violations and it s actually there at the bottom.


Believe when I say, my write data module is nothing more than:

command_in -> df1 -> df2 -> df3 -> df4 -> df5 -> ODDR...

And your compiler is complaining about the df5 -> ODDR point, after it has had enough DFF to strategically place them right at the IO pin's DFF.


Does Gowin have any data on the maximum throughput of the ODDR primitive?
Maybe they need us using the 4:1 serializer to achieve 600mtps.
« Last Edit: September 14, 2022, 10:06:25 pm by BrianHG »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #307 on: September 14, 2022, 11:51:05 pm »
@SpacedCowboy, I think I know what's going on.
Perform my above tests.  The hadr-wire the DQ clock to the DDR_CLK_WDQ only, then hard-wire it only to the DDR_CLK only.

If a hard-wire to exclusive DDR_CLK cleans up the timing violations, then we will need to move the tunable DDR_CLK_RD to PLL#2 and wire the DDR_CLK_WDQ to PLL#1 setting everything back to 270 deg phase.  This might fix the timing violations.

Also, DDR3 needs to run at a minimum of 300MHz though I tested proper functionality at 250 in the past.  I once had a limiter in my code set to 300, but you are lucky I lowered it to 250 for these tests.
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #308 on: September 15, 2022, 12:26:23 am »
Quote
@SpacedCowboy, I think I know what's going on.
Well, I'm glad *someone* does :)

"The tests" being to do both of:

- disable all the set_input_delays as well as set_output_delays
- also feed OE directly to the bi-dir (though this will be 3 clocks out of phase with the data because the ODDR has 3FFs in a chain, see attached)

... then (inside the DQ path) instead of

Code: [Select]
assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;
we try:
Code: [Select]
assign dq_clk[x] = DDR_CLK_WDQ;
and:
Code: [Select]
assign dq_clk[x] = DDR_CLK_RDQ;
separately ? (I'll probably just wire them directly, but the assign shows the change better)

This is for CLK_WDQ being 270 degrees out of phase with CLK_DDR3, and not hardwired and with WDQ_SYNC_CHAIN=3 as originally set, right ?


Just checking :)
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #309 on: September 15, 2022, 12:33:23 am »
Quote
@SpacedCowboy, I think I know what's going on.
Well, I'm glad *someone* does :)

"The tests" being to do both of:

- disable all the set_input_delays as well as set_output_delays
- also feed OE directly to the bi-dir (though this will be 3 clocks out of phase with the data because the ODDR has 3FFs in a chain, see attached)

... then (inside the DQ path) instead of

Code: [Select]
assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;
we try:
Code: [Select]
assign dq_clk[x] = DDR_CLK_WDQ;
and:
Code: [Select]
assign dq_clk[x] = DDR_CLK_RDQ;
separately ? (I'll probably just wire them directly, but the assign shows the change better)

This is for CLK_WDQ being 270 degrees out of phase with CLK_DDR3, and not hardwired and with WDQ_SYNC_CHAIN=3 as originally set, right ?


Just checking :)
Yes, 1 clock alone, just the write clock.
Test compile.
Then in the PLL module, hard-wire the write clock to the DDR_CLK out.
Test compile.
Then in the IODDR, re-assign the combo read/write clock switch.
Test compile.


We are snooping around where that slack error is coming from.
Even if you set the write clock to 0deg, because it is one on a different PLL, it may not be able to meet the timing we want due to FPGA fabric routing.
For all we know, the error may be because the dumb ODDR clock is actually a negative clock input, but I doubt that one as the simulations should have been messed up.

Note that if the in and out IOBUFFER 'delays' can be real-time software controlled, then it might be possible to get everything down to 1 PLL and tie my generated tuning steps into a module to set the read delay and make my DDR3 run all off of 1 PLL with just a DDR_CLK, and the .50 & .25 outputs.

Note that the IObuffer delay also needs to be visible in modelsim to prove test the code changes.
« Last Edit: September 15, 2022, 12:47:38 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #310 on: September 15, 2022, 12:57:46 am »
Ok, so test #1 conditions:

Code: [Select]
In BrianHG_DDR3_PLL:
===================
assign   DDR3_CLK      = PLL1_clk_out[0];      // DDR3 CK clock running at 1/2 the DQ rate.
assign   DDR3_CLK_WDQ  = PLL1_clk_out[1];      // DDR3 write data clock 90 degree out of phase running at 1/2 the DQ rate.
assign   DDR3_CLK_RDQ  = PLL1_clk_out[2];      // DDR3 phase adjustable read data input clock running at 1/2 the DQ rate.

In top.sv:
=========
DDR3_WDQ_PHASE          = 270,

In ddr_io_port_gowin:
====================
WDQ_SYNC_CHAIN          = 3  + CMD_ADD_DLY - WDQ_CLK_270

route DQ OE directly to IOBUF:
           IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(~PIN_OE_WDQ_wide[x])           // input when 1'b1

Set DQ clock to toggle:
            assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;


I get an error that the top module can't directly drive the IOBUF ...

ERROR (CK0011) : Instance 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[0].gowin_dq_oddr_inst'(ODDR) of module 'top' cannot drive instance 'DDR3_PHY/BHG_DDR3_IO_PORT_GOWIN/gowin_DQ_bus[0].gowin_dq_iobuf_inst'(IOBUF)

The code is below: I replaced the commented out part of the DQ IOBUF with the version that does not send OE through the ODDR.
Code: [Select]
    for (x=0; x<DQ_WIDTH; x = x + 1)
        begin : gowin_DQ_bus

            wire gowin_dq_out;
            wire gowin_dq_in;
            wire gowin_dq_tx_out;

            // See above comment in DQS code for this...
             assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;

            ODDR gowin_dq_oddr_inst 
                (
                .Q0(gowin_dq_out),                  // ODDR -> IOBUF
                .Q1(gowin_dq_tx_out),               // OE   -> IOBUF, 1'b0 => output
                .D0(PIN_WDATA_PIPE_h[0][x]),        // Input data [SDR]
                .D1(PIN_WDATA_PIPE_l[0][x]),        // Input data [SDR]
                .TX(~PIN_OE_WDQ_wide[x]),           // Input 'output enable' 1'b0=out
                .CLK(dq_clk[x])                     // write clock
                );

            IDDR gowin_dq_iddr_inst 
                (
                .Q0(RDQ_l[x]),                      // SDR to app #0
                .Q1(RDQ_h[x]),                      // SDR to app #1
                .D(gowin_dq_in),                    // DDR input signal
                .CLK(dq_clk[x])                     // read clock
                );

//           IOBUF gowin_dq_iobuf_inst
//               (
//               .O(gowin_dq_in),                    // IOBUF -> IDDR
//               .IO(DDR3_DQ[x]),                    // DQ pad
//               .I(gowin_dq_out),                   // ODDR -> IOBUF
//               .OEN(gowin_dq_tx_out)               // input when 1'b1
//               );
           IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(~PIN_OE_WDQ_wide[x])           // input when 1'b1
                );
        end

Swapping in the commented section will make it compile again, so it's definitely that single line in the IOBUF OE. I can do the tests without the direct wiring, if that will help ?
« Last Edit: September 15, 2022, 01:00:43 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #311 on: September 15, 2022, 01:11:28 am »
What happened, this is not what I asked.  Or did I miss read what you` are saying.

Change:
Code: [Select]
assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;to:
Code: [Select]
assign dq_clk[x] = DDR_CLK_WDQ ;
This should work as it used to work.
The only problem should be a potential timing issue with the DDR_CLK_RDQ.

The second test is within the PLL module to assign the output DDR_CLK_WDQ to DDR_CLK.
Compile again.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #312 on: September 15, 2022, 01:18:16 am »
Assuming the test is still useful,

Test #1 conditions:
Code: [Select]
In BrianHG_DDR3_PLL:
===================
assign   DDR3_CLK      = PLL1_clk_out[0];      // DDR3 CK clock running at 1/2 the DQ rate.
assign   DDR3_CLK_WDQ  = PLL1_clk_out[1];      // DDR3 write data clock 90 degree out of phase running at 1/2 the DQ rate.
assign   DDR3_CLK_RDQ  = PLL1_clk_out[2];      // DDR3 phase adjustable read data input clock running at 1/2 the DQ rate.

In top.sv:
=========
DDR3_WDQ_PHASE          = 270,

In ddr_io_port_gowin:
====================
WDQ_SYNC_CHAIN          = 3  + CMD_ADD_DLY - WDQ_CLK_270

route DQ OE directly to IOBUF:
           IOBUF gowin_dq_iobuf_inst
                (
                .O(gowin_dq_in),                    // IOBUF -> IDDR
                .IO(DDR3_DQ[x]),                    // DQ pad
                .I(gowin_dq_out),                   // ODDR -> IOBUF
                .OEN(~PIN_OE_WDQ_wide[x])           // input when 1'b1

Set DQ clock to toggle:
            assign dq_clk[x] = (PIN_OE_WDQ_wide[x]) ? DDR_CLK_WDQ : DDR_CLK_RDQ;

Timing analysis results:
  - all clk_ddr3 setup & hold results were fine, minimum setup slack is 1.413ns
  - clk_ddr3_wr setup results miss by worst slack of -1.658ns


Test #2 conditions: as #1, except
Code: [Select]
In BrianHG_DDR3_PLL:
===================
assign   DDR3_CLK_WDQ  = DDR3_CLK;      // DDR3 write data clock 90 degree out of phase running at 1/2 the DQ rate.


Timing analysis results:
  - clk_ddr3 setup results miss by worst slack of -0.813ns



Test #3 conditions: as #2, except
Code: [Select]
In ddr_io_port_gowin:
====================
assign dq_clk[x] = DDR_CLK_RDQ;

Timing analysis results:

Ok, wow.

  - all clk_ddr3 setup & hold results were fine, minimum setup slack is 1.413ns
  - all clk_ddr3_wr setup & hold results were fine, minimum setup slack is 1.413ns
  - rather than > 100 'read' clock issues, I see 4, as attached:


Re:your last question, I think we crossed over in time there. The above ought to be what you were asking for, if I'm understanding you correctly.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #313 on: September 15, 2022, 01:42:38 am »
Your test condition 3 was not to be done.  It will always read wrong because of the 'false_path' and it is not what I asked.


Go back to test condition #2 and show me the negative slack report.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #314 on: September 15, 2022, 01:53:04 am »
Ah, mistook DDR_CLK for DDR_CLK_RD...

Ok, here's the results from test #2 (ignoring the clk_rd setup violations), attached
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #315 on: September 15, 2022, 01:58:02 am »
Ok.
1) Are the set_output_delay constraints active?

2) Are the et_input_delay constraints active?

3) What happens when in the PLL, you now also assign the DDR_CLK_RDQ to the DDR_CLK?
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #316 on: September 15, 2022, 02:01:12 am »
I was just about to say mea culpa

I just double-checked the SDC file, and the inputs delays are still active. I mustn't have saved the file :(

I have to run - the kid has soccer, but I'll come back to it ASAP and re-run without the input delays, then assign DDR_CLK_RDQ to DDR_CLK and re-run again.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #317 on: September 15, 2022, 02:01:23 am »
Also, are your DQ pins assigned to the correct FPGA's dedicated IO DQ groups?  If not, this can be a problem.
« Last Edit: September 15, 2022, 02:03:52 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #318 on: September 15, 2022, 02:02:21 am »
I took the DQ pin locations from their example DDR code, so that ought to be ok.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #319 on: September 15, 2022, 02:04:30 am »
I ask because looking at the setup report, you see 6 paths at >=0.2ns, then there is a gigantic drop to 0.08ns.

Remember, different FPGA sizes may have different dedicated or optimized locations.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #320 on: September 15, 2022, 02:25:31 am »
On my phone atm, it’s the example for this particular board, but I’ll double-check when I get back
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #321 on: September 15, 2022, 03:16:16 am »
I hope Gowin isn't a copy of Lattice:

Read this .pdf:
https://www.latticesemi.com/view_document?document_id=50461
Page 67, you will see 2 different types of DDR.

GDDRX1 with it's data rate of 250/500, and GDDRX2 with it's data rate of 400/800.

The difference is an annoying crap fest.
Does Gowin have a similar IDDR/ODDR function set?
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #322 on: September 15, 2022, 03:37:59 am »
First, I verified the location constraints against the schematic, and they all match.

Ok, test #1 as above

  - all clk_ddr3 setup & hold results were fine, minimum setup slack is 1.413ns
  - attached test1-setup-clk_ddr3_wr violations
  - no hold violations

test #2, as above: DDR3_CLK_WDQ  = DDR3_CLK
  - attached test2-setup-clk_ddr3 violations
  - nothing to report for clk_ddr3_wr (obviously enough)
  - attached test2-hold violations

test #3, also have DDR3_CLK_RDQ  = DDR3_CLK
  - no setup or hold violations at all
  - attached test3-path-slacks table


I'm not aware of any difference in the DDR's - everything seems to be uniform on a pad-by-pad basis, and there's only one "ODDR" primitive that I can see in their list.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #323 on: September 15, 2022, 03:56:17 am »
Is you clock 300MHz or 400MHz?

If everything checks, then we need to investigate the IO delay options as path #3 seems to give us the best results.
You will need to set the output delay based on a parameter and adjust the input delay in real-time based on our 16 possible stepped clock.

If we can make this work, then at lease there is a chance of getting the design down to 1 PLL is you can make the .50 and .25 outputs, all with 0deg phase.
« Last Edit: September 15, 2022, 03:59:29 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #324 on: September 15, 2022, 04:03:06 am »
So the clock above was at mult=22, div=2  --> Freq =297. I bumped it up a few times, and got as far as mult=30, div=2 ---> 405 MHz.
 
It very nearly managed mult=32, there are only 2 failing paths.

This is all with the 'fast, cold' model, not the 'slow, hot' model, though. I doubt it'd make it that far under the 'slow, hot' model.
« Last Edit: September 15, 2022, 04:05:23 am by SpacedCowboy »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #325 on: September 15, 2022, 04:15:33 am »
That CE path is cycled once during system reset.

Ok,, this looks like the way to go, so long as we can make the 'delay' primitive work for us.

Right now, the DDR3 controller will function, but, the write data phase will be too soon, and the read data phase will be un-tunable.  You need to find out how the delay element works and what is its range to see if we can make it fit our needs.  If good, and you want to make a tiny module to software tune it controlled by my phase_step and phase_updn, then this will be the next step just before you check to see if you can shrink your PLL down to one unit.  Doing so may further improve your FMAX range eliminating those 2 minute timing errors at 432MHz.

And finally, other than the IOs and PLL tuning limitations, we can now say that my DDR3 code can achieve the necessary 400MHz rates.


If tuning the delays cannot be done, then there is the last option to decode Gowin's DQS mem-phy clock system and use it to auto generate the read clock phase.

(Wait, did you say fast cold model?  Actually, we only are interested in the Slow Hot 85degC model.)
« Last Edit: September 15, 2022, 04:21:38 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #326 on: September 15, 2022, 04:47:34 am »
I think I added the 'fast cold' model when I was trying everything to try and get something to work. I can take it out and see where we land, but it might not be pretty :)

Also remember there's no input/output delays in the SDC right now.

Ouch. Yeah...

 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #327 on: September 15, 2022, 05:01:22 am »
Set the PLL for ~300MHz and see what happens.
Show me the setup slack log.
Also, try the -8 or -9 chip.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #328 on: September 15, 2022, 05:11:23 am »
Ok, yes, that is a lot closer.. On 'slow' I'm getting the attached. I get identical results for grades 8 and 9, oddly enough. And I re-ran it a couple of times, clearing the data in-between runs, just to make sure that was the case.

 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #329 on: September 15, 2022, 05:17:33 am »
FWIW, It's still inferring 3 RAMs, and I haven't found any way to stop it from doing so as yet.

Extracting RAM for identifier 'PIN_OE_WDQ'("ddr3_io_port_gowin.sv":184)
Extracting RAM for identifier 'bank_row_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)
Extracting RAM for identifier 'vector_pipe_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":180)

Funnily enough, I seem to recall reading that the fMax of the RAM was 320MHz. Ah, no not quite right, see attached.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #330 on: September 15, 2022, 05:27:14 am »
The RAMS are being infered at the 0.50clk freq, 162Mhz, so, you are safe.  Except for the OE path which is running at the 324MHz.  However, none of your failed paths are in the 324MHz domain to the DQ DDR port like before.

This is something we can work with as the worst 6 paths at ~ -0.2ns is part of a combinational logic block to decide when to output the next DDR3 command.

Do you have a compiler option to adjust the effort for placement and routing.
Can you enable or disable any smart state machine or mux options.
Auto infer options for infering clock enable or other smart timing analysis during compile to improve FMAX.
Like auto duplicate or re-time registers to improve FMAX and slack?

Try going down to 1 pll if it can be made to just generate the DDR_CLK, _50 & _25.  This should help improve routing between the _25 and _50 domain maybe giving you a little more breathing room.

Also, remove the _RDY <-> _WDQ false_path in your .sdc.  Maybe that is why we cleared the timing way too easily.
« Last Edit: September 15, 2022, 05:30:19 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #331 on: September 15, 2022, 05:44:02 am »
Compared to other synthesis/P&R options, the Gowin one is very limited. Attached is every option available in settings...

  • We're already running with Timing-driven synthesis, I can't believe it gets better without.
  • The 'Place' option of '1' is basically 'Place option 0, except try a bit harder'. The only options are 0 and 1
  • The 'Route' option of '1' is 'Route according to timing', 0 is according to congestion, and 2 is 'make it take as little time to route as possible'. Those are the only options

In general, the '1' options seem to produce the best results so far. None of them are what you might call intensive, From-scratch synthesis, P&R, and bitstream generation takes ~13 seconds in all

The only thing that approaches smart state-machine/mux options is "gray" | "onehot" in

Code: [Select]
verilog object /* synthesis syn_encoding = "setting_value" */;

In the same docs, there's mention of a

Code: [Select]
Verilog object /* synthesis syn_tlvds_io = setting_value */;
.. but I'm instantiating using primitives, so that's not really relevant. There's also dspstyle, but I doubt you're multiplying much [grin].

Overall, it seems to be very much 'click-and-go synthesis' as opposed to the fine-tuning you can do with the major players.

There's a control for fan-out, and hopefully it'd do duplication if the fan-out was exceeded, but I tried that earlier (reducing the '23' default to '10') and it actually got worse timing....
 
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #332 on: September 15, 2022, 05:46:54 am »
FWIW, It's still inferring 3 RAMs, and I haven't found any way to stop it from doing so as yet.


Extracting RAM for identifier 'bank_row_mem'("BrianHG_DDR3_CMD_SEQUENCER_v16.sv":117)


Un-infering this reg from ram will probably make the difference.
The other 2 you do not need to worry about as they have double write / read latching.
Bank row mem is being used directly with logic without such a read stage latch.  This is why I have the attribute (*preserve*) which in Quartus means to preserve the logic register and not infer anything at all like a ram block.

You will need to find Gowin's equivalent to prevent the ram infer of the logic bank_row_mem array..
« Last Edit: September 15, 2022, 05:50:09 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #333 on: September 15, 2022, 06:13:37 am »
Ok, so it's now definitely not using block-rams. I can't be absolutely sure that it's not using what it calls 'shadow ram' (using the LUT as an SRAM) but it ought to be using registers now - the directive is /* synthesis syn_ramstyle = "registers" */

Looking at the resource usage I can see there's 10 of those SSRAMs in use, and no BSRAMs (the block-rams). Adding that directive took the register usage from 1832 to 1904, so I think it must be having the right effect.

We still don't quite make 324MHz (but we get 317) and we don't quite get 162MHz (but we get 156), so perhaps we can claim 300MHz...

There are the obligatory setup path violations because we're not making the "required" clock, attached.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #334 on: September 15, 2022, 06:25:07 am »
You cleared out the timing for the S3_BANK_0_s0, but now, the slow paths are in the S3_BANK_1_s0.

Take a look at which logic I used the (*preserve*) in the 'BrianHG_DDR3_CMD_SEQUENCER_v16.sv' and attach your same attribute to all of the same logic regs.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #335 on: September 15, 2022, 06:43:57 am »
Doing that made things a lot worse (>100 setup paths failing, and lower fMax). Seems to be linked to 'preserve'ing BANK_ACT_ANY. Preserving the others didn't make much difference  -still got the same 23 violations and the same slack values.

In the interests of full disclosure, I was using "/* synthesis syn_preserve = 1 */" rather than the syn_ramstyle directive because there's no RAM being inferred on any other line apart from line 180.

Anyway, another midnight, and I'm up again at 5:30 so I'm off to bed...
 

Offline e2020k

  • Newbie
  • Posts: 1
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #336 on: October 24, 2022, 02:33:51 pm »
Do you still have plans to look into Lattice & Xilinx chips?
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #337 on: October 24, 2022, 08:47:05 pm »
Xilinx shouldn't be a problem, but I will need to learn their IDE.  At least there are many dev boards with DDR3 ram on them.

I started looking into Lattice.  Their PLL is as limited as Gowin and I need to clock transition the data for their 4:1,1:4 DDR IOs to achieve their max 800mtps.  (Lattice's 2:1,1:2 mode is capped at ~500mtps) All I'm looking for is a cheap dev board with a wired DDR3 chip on it.  Right now, the cheapest is around 350 usd.

Lattice has the dirt cheap ECP5 series with 45kle and 85kle, so it is better to focus my attention there.

As for Xilinx, it does have a free DDR3 controller already.  What I may do is just tie my 'multiport' module of my DDR3 to their controller replacing my 'PHY' stage.  My multi-window VGA display processor just needs the 'dual-clock dual-port' ram assigned to Xilinx's and Lattice's counterparts to make my video system function.
« Last Edit: October 25, 2022, 03:36:41 am by BrianHG »
 

Offline iMo

  • Super Contributor
  • ***
  • Posts: 4627
  • Country: nr
  • It's important to try new things..
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #338 on: October 28, 2022, 04:40:19 pm »
I may try in turn with the ML605 board (w/ 512MB DDR3 and Virtex-6 there), provided I get something which builds in ISE 14.7 straight, without too much hassle (too complex stuff here in this thread for such an eternal rookie like me)..  :D
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14230
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #339 on: October 28, 2022, 07:20:25 pm »
For Lattice, yeah, I remember we talked about designing a dev. board with DDR3, but ECP5s are currently as unobtainium as the rest and the very little stock there is here and there is with outrageous prices, we're talking like $100 for a ECP5-85.
 

Offline mopplayer

  • Newbie
  • Posts: 5
  • Country: tw
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #340 on: December 06, 2022, 03:06:17 am »
Quote
During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between. 

I think the number of in-between commands shall not be limited by whether it is full-rate or half-rate controller.
Since it would only be using simple if-else clocked logic (inside fast clock domain, maybe 500MHz in your case), your controller should be able to achieve such goal without suffering from STA setup timing violation.

Please correct me if wrong.

Well, nothing bad. This is just an open source project.

And in fact, the bandwidth depends on the user interface issuing a new command. We do not need to concern what exactly is the half-rate/full-rate or a hybrid. The random access will depend on that bandwidth.

For example, if a controller runs at 533MHZ, the user interface is 533/2= 266, so bandwidth is  266x2Byte = 533MB/s, therefore 500(250) is 250MB/s. Please remember that is not actually the same as PC data rate.

I recall I implemented that Scrypt algorithm on DDR3 bandwidth bottleneck issue, and more cost on HBM.

I also have a Anlogic EG4S20 FPGA (with embedded SDRAM) to do some project, that user interface transfers can achieved at 200MHz, but that bus is 32 bit, so we get the 800MB/s bandwidth, which match the official datasheet described.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #341 on: December 06, 2022, 03:25:45 am »
Well, nothing bad. This is just an open source project.

And in fact, the bandwidth depends on the user interface issuing a new command. We do not need to concern what exactly is the half-rate/full-rate or a hybrid. The random access will depend on that bandwidth.

For example, if a controller runs at 533MHZ, the user interface is 533/2= 266, so bandwidth is  266x2Byte = 533MB/s, therefore 500(250) is 250MB/s. Please remember that is not actually the same as PC data rate.

I recall I implemented that Scrypt algorithm on DDR3 bandwidth bottleneck issue, and more cost on HBM.

I also have a Anlogic EG4S20 FPGA (with embedded SDRAM) to do some project, that user interface transfers can achieved at 200MHz, but that bus is 32 bit, so we get the 800MB/s bandwidth, which match the official datasheet described.
The controller rate is the DDR_CK command port clock rate.
IE: If run at 400MHz, the DDR rate is 800MTPS.

If my interface runs at 200Mhz, I still need to command my DDR command port at the 400MHz clock and the data is still coming in and going out at 800MTPS.

This should be called a quarter rate controller.

Though, the full bandwidth of the memory is still available.  Example, if my ram is 16bit at 800MTPS, my 200MHz controller interface needs to run at least at 64bits to maintain unbroken full 800MTPS bursts.  I went 2x further, 128bit to allow unbroken full speed bursts even running my controller interface at 1/4 mode, 100MHz.

With this setup, the core of my controller, even when the user interface is set to 100MHz, will still stuff commands inbetween, IE Command timer part of it still runs at the 400MHz command port rate and at the end or beginning of a burst cycle, to open/close/precharge rows in an attempt to cut down on as much wasted dead time as possible when a new row needs to be opened/closed.  Once said rows are open or closed, you can read/write back and forth between them with no wasted clock cycles except for the mandatory buss-turnaround cycles in the changeover between a read to write to read in the DDR3 spec.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #342 on: December 20, 2022, 10:10:00 pm »
Hi Brian,

Since Lattice ECP5U parts are now available, would you be willing to port your controller onto that device? If you will provide me a pinout and commit to making a port, I will design and assemble the board for HW validation as well as provide an assembled board for you (and others if they will contribute to development and/or will be willing to buy them). We can discuss & debate specifics of what else will be on that board in a dedicated topic, as I'm open for suggestions in that regard. But we need to decide on a part fast, so that I would be able to buy them while they are still available. Right now I can see 85F device in 381, 554 and 756 balls packages are available (285 are available too, but they have 0.5 mm pitch which is too fine for PCB manufacturers I used in the past). I'm inclined to go with the largest 756 balls part because "MOAR is better" :) , but I'm open to suggestions.

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #343 on: December 20, 2022, 11:27:48 pm »
Give me a day to get back to you.  I've already did some research a few months back.  It seems it is best for me to use Lattice's free DDR# PHY in place of my 'Altera DDR IO Port' .sv module, so, copying an example schematic from one of their dev boards for wiring the DDR3 itself is a good starting place for research to maintain hardware backwards compatibility with Lattice's full paid version of their DDR3 controller.

Unless you are going for a compact 1 chip DDR3 solution, I also recommend going to 2 if not 4 DDR3s, or a laptop sodimm module.  4 DDR3s/Lapto SODIMM 64 bit module give us enough speed for basic real-time 3D texturing graphics engine, or, ~ 10x 1080p 60 real-time bandwidth.

Also, remember, for the high speed GBPS transceivers, you need to paid version of Lattice Diamond.  This is useful for true 1080p60/4k30p HDMI out on 8 IOs.  Not everyone has this version of the Diamond Studio.  Unless Lattice changed this policy, the best you can officially get is 800 MBPS on a DQ buss and the cheaper FPGAs.  (I think in the 480 & above pin count, you may have a cross compatible with and without GTPS transceivers.)

Also, be careful, there are different peak maximum MBPS on the horizontal rows VS the vertical columns IO banks.
« Last Edit: December 20, 2022, 11:54:02 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #344 on: December 21, 2022, 01:58:06 am »
Give me a day to get back to you.  I've already did some research a few months back.  It seems it is best for me to use Lattice's free DDR# PHY in place of my 'Altera DDR IO Port' .sv module, so, copying an example schematic from one of their dev boards for wiring the DDR3 itself is a good starting place for research to maintain hardware backwards compatibility with Lattice's full paid version of their DDR3 controller.
Are you sure they provide a free PHY for DDR3? I can see it in their IPs list, but so is the "full" DDR3 controller, and description doesn't say anything about licensing.

Unless you are going for a compact 1 chip DDR3 solution, I also recommend going to 2 if not 4 DDR3s, or a laptop sodimm module.  4 DDR3s/Lapto SODIMM 64 bit module give us enough speed for basic real-time 3D texturing graphics engine, or, ~ 10x 1080p 60 real-time bandwidth.
I think 64 bit is going to be too much for that FPGA, it will consume almost half of all available IO pins even in the largest package. So I'm thinking two 16bit devices for 32 bit wide data bus would be optimal.

Also, remember, for the high speed GBPS transceivers, you need to paid version of Lattice Diamond.  This is useful for true 1080p60/4k30p HDMI out on 8 IOs.  Not everyone has this version of the Diamond Studio.  Unless Lattice changed this policy, the best you can officially get is 800 MBPS on a DQ buss and the cheaper FPGAs.  (I think in the 480 & above pin count, you may have a cross compatible with and without GTPS transceivers.)
Forget about transceivers, the version I'm looking at is ECP5U which doesn't have any, and it's the only version of ECP which can be used with free license.

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #345 on: December 21, 2022, 02:37:23 am »
Give me a day to get back to you.  I've already did some research a few months back.  It seems it is best for me to use Lattice's free DDR# PHY in place of my 'Altera DDR IO Port' .sv module, so, copying an example schematic from one of their dev boards for wiring the DDR3 itself is a good starting place for research to maintain hardware backwards compatibility with Lattice's full paid version of their DDR3 controller.
Are you sure they provide a free PHY for DDR3? I can see it in their IPs list, but so is the "full" DDR3 controller, and description doesn't say anything about licensing.

Ok, their PHY is nothing more than a pre-configed IO buffer DDR port for the command and DQ lines.

It has no intelligence and just calls and configures the physical buffers, see here:
https://www.latticesemi.com/products/designsoftwareandip/intellectualproperty/ipcore/ipcores02/ddr3phy

This is Lattice's full DDR3 controller: https://www.latticesemi.com/products/designsoftwareandip/intellectualproperty/ipcore/ipcores01/ddr3sdramcontroller

As you can see, the full controller also uses the top 'PHY' to drive the IO buffer pins.

I know the full controller is paid, but what about that PHY.

If not, I basically need to manually instantiate the IO pin's DDR primitives like what I have done in my 'BrianHG_DDR3_IO_PORT_ALTERA.sv' source file.  Lattice seems to have similar PLL capabilities to what 'SpacedCowboy' was doing with Gowin, except, I am forced to use a 4:1 and 1:4 serializer to achieve the the full 800mbps instead of Altera's 2:1 and 1:2.  This isn't horrible as I have already a second layer 1:2 / 2:1 after my Altera IO port's 2:1 / 1:2, but, I will need to remove some code in my main PHY section.

Properly instantiate the DQ bus to the DQS lines instead of using a tuned PLL and using the IO buffer's delay lines for the write 90deg phase will also save a second PLL from being used since I must use the PLL's main clock output on any IO buffer to achieve the 800mbps unlike with Altera where any PLLs clock can run the IO buffer at full speed and there I also get 6 tunable outputs instead of 1 tunable output, plus 3 additional integer divided outputs per Lattice PLL.

2x 16bit DDR3 will still be plenty speed for a 720p 3D accelerator, or 1080p @ 16bit color.
1x 16bit DDR3 will give us the same as the Arrow DECA board, double the necessary bandwidth for 1080p60 at 32bit.

The 'LFE5U-45' and 'LFE5U-85' should both have a variant with pin-pin compatibility.
There should be enough in the 85k chip to include a 1080p MJPEG2000 player with 2x 16bit DDR3 chips, just barely enough with the 45k chip.

I will look tonight at the documentation about the Lattice's QDR IO buffer (4:1 / 1:4) and see how they handle the write delay and DQS read latching.  I'll also read up on their PLL capabilities.  Limiting myself to 1 PLL just means my controller will also be meaningful for Lattice's 25k / 12k fpga.  (only 2 PLLs) (Note that their 12k is actually the same die as their 25k.)
« Last Edit: December 21, 2022, 02:42:00 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #346 on: December 21, 2022, 03:39:58 am »
I haven't completely given up on the Gowin implementation, btw. I just had something come along that's taking all my time. I *really* need to retire soon...
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #347 on: December 21, 2022, 04:04:57 am »
Ok, their PHY is nothing more than a pre-configed IO buffer DDR port for the command and DQ lines.
It looks like they do provide a block called "ddr_mem" in Clarity, which is free and can be used to create a PHY layer for the memory controller. It's kind of described in chapter 7 of "ECP5 and ECP5-5G High-Speed I/O Interface" document, but only in very general terms. So I suspect one would require quite a bit of simulations to figure out how it works and how to interface with it.

2x 16bit DDR3 will still be plenty speed for a 720p 3D accelerator, or 1080p @ 16bit color.
1x 16bit DDR3 will give us the same as the Arrow DECA board, double the necessary bandwidth for 1080p60 at 32bit.
16bit DDR3 running at 400MHz should be enough for double-buffering of 1080p@60 (meaning reading one frame and writing another one in parallel) with some margin to spare. So 32 bit interface would have a huge margin. Or we can instead go for 4 x 8 bit interface, this will allow to have higher memory capacity (4 x 4Gb = 2 GB vs 2 x 4Gb = 1 GB, or 4 GB vs 2 GB for largest 8Gb devices). Since DDR3 uses flyby routing, it shouldn't be much harder to route four 8 bit memory devices as opposed to two 16 bit ones.

As for HDMI out, it would make sense to add something like TFP410 to implement a 1080p@60 output without having to mess around with LVDS, or, since these FPGA seem to natively support MIPI, we can simply implement a DSI out port.

The 'LFE5U-45' and 'LFE5U-85' should both have a variant with pin-pin compatibility.
I'm not so sure about that one, as both 381 and 554 packages have different amount of IO pins for 45F and 85F devices as per datasheet, and 756 package only exists for the largest device (85F).
« Last Edit: December 21, 2022, 04:08:17 am by asmi »
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #348 on: December 21, 2022, 04:45:05 am »
The 'LFE5U-45' and 'LFE5U-85' should both have a variant with pin-pin compatibility.
I'm not so sure about that one, as both 381 and 554 packages have different amount of IO pins for 45F and 85F devices as per datasheet, and 756 package only exists for the largest device (85F).
In the 381, 2 IOs are missing on the 45F version which I believe are VCC / GND or NC.  Just double check.  With Altera, I would usually just wire the appropriate power pins to the power and tristate on the hardwired power on the larger FPGA.

DDR rates on their data sheet V3.0 on page 66 & 67:
https://www.latticesemi.com/-/media/LatticeSemi/Documents/DataSheets/ECP5/FPGA-DS-02012-3-0-ECP5-ECP5G-Family-Data-Sheet.ashx?document_id=50461

Only the Left and Right banks support the full 800mb/s.

Isn't the TFP410 only DVI, IE, no HDMI sound.  You might as well add a HDMI/DVI cable driver NXP's PTN3366BSMP wired to a 800mb/s DQ8 group as well is not fitted under the TFP410 as an option.  Though 720p, 1280x960, or 1080p30 will be the max, you have the option for HDMI audio.
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #349 on: December 22, 2022, 03:41:54 am »
@asmi, I'm sorry, but I cannot guarantee I will have enough free time over the next 3 months to do work on adapting my DDR3 controller to Lattice.  It's a shame that we didn't do this back in September.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2728
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #350 on: December 22, 2022, 05:29:08 pm »
@asmi, I'm sorry, but I cannot guarantee I will have enough free time over the next 3 months to do work on adapting my DDR3 controller to Lattice.  It's a shame that we didn't do this back in September.
No worries, that's OK, I understand, we all have a life to live.

Offline gvedaraj

  • Newbie
  • Posts: 2
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #351 on: January 21, 2023, 11:29:37 pm »
Hi Brian,

Just a quick question regarding the DDR3 controller. The README page says that it is designed for Cyclone 3/4/5 boards. But I see that Cyclone 3/4 supports only till DDR2.
Does that mean that the controller works for both DDR2/DDR and DDR3 memories?


Thanks & Regards
Ganesha
 

Online BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7638
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #352 on: January 22, 2023, 09:28:56 am »
Hi Brian,

Just a quick question regarding the DDR3 controller. The README page says that it is designed for Cyclone 3/4/5 boards. But I see that Cyclone 3/4 supports only till DDR2.
Does that mean that the controller works for both DDR2/DDR and DDR3 memories?


Thanks & Regards
Ganesha

     No, it means wire DDR3s to a Cyclone III or Cyclone IV and it will work.

     I use a trick with Altera's PLL to generate alternate read phase clock and write phase clock which are self tuned during power-up to operate the ram.  I measure and use the DQS as a clock enable instead of as a clocking port for the DQ read data.  This permits me to control 1 or 2 16bit DDR3s on the older cyclones.  Or 4x 8bit ones.

     Use the 1.5v  HSTL CLASS I IOs and DDR3 IO voltage.  See photos and instructions of my pinouts pin-planner on my Github and review my hypothetical Cyclone III/IV examples which compile and meet timing requirements to see how to setup.

Example Cyclone III setup using 1x 16bit DDR3: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_CIII_GFX_TEST_v16_1_LAYER_Q13.0sp1

Example Cyclone IV setup using 1x 16bit DDR3: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_CIV_GFX_TEST_v16_1_LAYER

Pin planner read-me:  https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/Screenshots_Pin_Planner
« Last Edit: January 22, 2023, 09:37:58 am by BrianHG »
 

Offline gvedaraj

  • Newbie
  • Posts: 2
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #353 on: January 27, 2023, 03:07:56 pm »
Hi Brian,

Just a quick question regarding the DDR3 controller. The README page says that it is designed for Cyclone 3/4/5 boards. But I see that Cyclone 3/4 supports only till DDR2.
Does that mean that the controller works for both DDR2/DDR and DDR3 memories?


Thanks & Regards
Ganesha

     No, it means wire DDR3s to a Cyclone III or Cyclone IV and it will work.

     I use a trick with Altera's PLL to generate alternate read phase clock and write phase clock which are self tuned during power-up to operate the ram.  I measure and use the DQS as a clock enable instead of as a clocking port for the DQ read data.  This permits me to control 1 or 2 16bit DDR3s on the older cyclones.  Or 4x 8bit ones.

     Use the 1.5v  HSTL CLASS I IOs and DDR3 IO voltage.  See photos and instructions of my pinouts pin-planner on my Github and review my hypothetical Cyclone III/IV examples which compile and meet timing requirements to see how to setup.

Example Cyclone III setup using 1x 16bit DDR3: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_CIII_GFX_TEST_v16_1_LAYER_Q13.0sp1

Example Cyclone IV setup using 1x 16bit DDR3: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_CIV_GFX_TEST_v16_1_LAYER

Pin planner read-me:  https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/Screenshots_Pin_Planner

Thanks a lot for the sharing these resources!!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf