Author Topic: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.  (Read 85814 times)

0 Members and 1 Guest are viewing this topic.

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
*****************************************************
*** NEW June 11, 2021.  BrianHG_DDR3_Controller V1.6 ***
*****************************************************
----------------------------------------------------------------
 :scared: A maddening over fnK lines of code....  :scared:
----------------------------------------------------------------
My Github has the full latest v1.6 release package:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
https://github.com/BrianHGinc/BrianHG-DDR3-Controller/archive/refs/tags/v1.60.zip
-----------------------------------------------------
BrianHG_DDR3_Controller V1.6 Release, June 11, 2022.
Includes new BrianHG_GFX_VGA_Window_System.
-----------------------------------------------------
(Control ports are the same as v1.5)


Folder BrianHG_DDR3 now contains the new v1.6 controller.
Main source files:

Code: [Select]
- BrianHG_DDR3_v15_and_v16_Block_Diagram.png -> Illustration of module connections.

 - Includes these following sub-modules :
   - BrianHG_DDR3_CONTROLLER_v16_top.sv     -> v1.6 TOP entry to the complete project which wires the DDR3_COMMANDER_v16 to the DDR3_PHY_SEQ giving you access to all the read/write ports + access to the DDR3 IO pins.
   - BrianHG_DDR3_COMMANDER_v16.sv          -> v1.6 High FMAX speed multi-port read and write requests and cache, commands the BrianHG_DDR3_PHY_SEQ.sv sequencer.
   - BrianHG_DDR3_CMD_SEQUENCER_v16.sv      -> v1.6 Takes in the read and write requests, generates a stream of DDR3 commands to execute the read and writes.
   - BrianHG_DDR3_PHY_SEQ_v16.sv            -> v1.6 DDR3 PHY sequencer.          (If you want just a compact DDR3 controller, skip the DDR3_CONTROLLER_top & DDR3_COMMANDER and just use this module alone.)
   - BrianHG_DDR3_PLL.sv                    -> Generates the system clocks. (*** Currently Altera/Intel only ***)
   - BrianHG_DDR3_GEN_tCK.sv                -> Generates all the tCK count clock cycles for the DDR3_PHY_SEQ so that the DDR3 clock cycle requirements are met.
   - BrianHG_DDR3_FIFOs.sv                  -> Serial shifting logic FIFOs.

 - Includes the following test-benches :
   - BrianHG_DDR3_CONTROLLER_v16_top_tb.sv  -> Test the entire 'BrianHG_DDR3_CONTROLLER_v16_top.sv' system with Mircon's DDR3 Verilog model.
   - BrianHG_DDR3_COMMANDER_v16_tb.sv       -> Test just the commander_v16.  The 'DDR3_PHY_SEQ' is dummy simulated.  (*** This one will simulate on any vendor's ModelSim ***)
   - BrianHG_DDR3_CMD_SEQUENCER_v16_tb.sv   -> Test just the DDR3 command sequencer.                                 (*** This one will simulate on any vendor's ModelSim ***)
   - BrianHG_DDR3_PHY_SEQ_v16_tb.sv         -> Test just the DDR3 PHY sequencer with Mircon's DDR3 Verilog model providing logged DDR3 command results with any access violations listed.
   - BrianHG_DDR3_PLL_tb.sv                 -> Test just the PLL module.

 - IO port vendor specific modules :
   - BrianHG_DDR3_IO_PORT_ALTERA.sv         -> Physical DDR IO pin driver specifically for Altera/Intel Cyclone III/IV/V and MAX10.

 - Modelsim 'do' script files.
   - All setup_xxx.do files setup their associated Modelsim simulation.
   - All run_xxx.do   files quick re-compile and run their associated Modelsim simulation.


Folder 'BrianHG_DDR3_GFX_source_v16' contains my new BrianHG_GFX_VGA_Window_System multi-window system.
Main source files:

 - BrianHG_GFX_VGA_Window_System.pdf          -> Visual block diagram for the graphics system and layer-swapping illustration.
 - BrianHG_GFX_VGA_Window_System.txt          -> Full documentation for the VGA window system.

 - Includes these top hierarchy files:
   - BrianHG_GFX_VGA_Window_System.sv           -> Full window system where you drive the CMD_win_xxx controls via input ports.
   - BrianHG_GFX_VGA_Window_System_DDR3_REGS.sv -> Full window system where you drive the CMD_win_xxx controls via writing to DDR3 memory addresses through any multiport.

 - Modelsim 'do' script files.
   - All setup_xxx.do files setup their associated Modelsim simulation.
   - All run_xxx.do   files quick re-compile and run their associated Modelsim simulation.


New Arrow DECA board demo complete projects running the v1.6 BrianHG_DDR3_Controller conected to
the BrianHG_GFX_VGA_Window_System, all at 400MHz, all 100% timing requirements met.
Source folders:

 - BrianHG_DDR3_DECA_GFX_DEMO_v16_1_LAYER     -> Replaces the original ellipse demo, but now uses my new BrianHG_GFX_VGA_Window_System.
 - BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS    -> Improved ellipse demo using 2 translucent windows scrolling at different speeds.
 - BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS -> Example 16 window layer system where writes to the DDR3 controls the window's regs.
 - BrianHG_DDR3_DECA_RS232_DEBUG_TEST_v16     -> Single port DDR3 controller example connected to my RS232 debugger.
 - BrianHG_DDR3_DECA_PHY_SEQ_only_v16         -> (No multiport controller.) Bare minimum DDR3 PHY_SEQ controller connected to my RS232 debugger.

Test hypothetical builds for Cyclone III,IV,V to see if we can meet FMAX.
 - BrianHG_DDR3_CIII_GFX_TEST_v16_1_LAYER_Q13.0sp1  -> Cyclone III example using Quartus 13.0 sp1.
 - BrianHG_DDR3_CIV_GFX_TEST_v16_1_LAYER            -> Cyclone IV example using Quartus 20.1.
 - BrianHG_DDR3_CV_GFX_TEST_v16_1_LAYER_350MHz      -> Cyclone V example running only at 350MHz using Quartus 20.1.


- Get new 2 window layer ellipse demo here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4230856/#msg4230856

- Get new VGA video system demo configured for up to 16 window layers driven by my RS232_Debugger here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4233016/#msg4233016


- User 'davemuscle' created an Avalon interface wrapper comparing my DDR3 PHY_SEQ_only free controller to Altera's expensive UniPHY IP, see here:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg4108963/#msg4108963
(Note that both controllers should have achieve double the bandwidth and my 'BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS' demo already runs at double the efficiency.)

- User 'Nockieboy' has been working on integrating my controller with his Z80 8-bit GPU project, see his first multi-layer-window test here:
https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg3980567/#msg3980567
and tile mode test:
https://www.eevblog.com/forum/fpga/fpga-vga-controller-for-8-bit-computer/msg4029019/#msg4029019



Check here for older compiled FMAX & LC/LUT usage stats:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg3649318/#msg3649318
« Last Edit: June 12, 2022, 08:29:43 pm by BrianHG »
 
The following users thanked this post: Ed.Kloonk, tom66, Jope, Berni, agehall, Omega Glory, Emo, nockieboy, asmi, dmendesf, bgm370, Ted/KC9LKE

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #1 on: July 14, 2021, 02:17:12 am »
Nice work. Is there a way to use it with VHDL under Quartus?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #2 on: July 14, 2021, 02:29:50 am »
Though I am not familiar with VHDL, I do know that in either Verilog or VHDL, you can initiate either code type module.

Just google: How do I instantiate a SystemVerilog module inside a VHDL design?

There has got to be good examples.  You just need to find out how to pass 2 dimentional arrays if you will be using my multi-port module unless you make a simple smaller verilog module with only the ports and settings you want shrinking what you call in your VHDL code.

If crossing code is vendor specific, maybe the best place to ask would be on Intel's forum.
I know Intel has instructions on how to insert VHDL code/modules into verilog source code.
« Last Edit: July 14, 2021, 02:38:19 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1910
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #3 on: July 14, 2021, 06:46:41 am »
Thanks for sharing, Please make a reop on github too :-+ :-+ :-+
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #4 on: July 14, 2021, 10:40:06 am »
Thanks for sharing, Please make a reop on github too :-+ :-+ :-+
Coming in a few days once I fix the FMAX bottleneck.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #5 on: July 14, 2021, 04:58:12 pm »
Nice work. Is there a way to use it with VHDL under Quartus?

I don't use Quartus, but otherwise, the answer is usually, yes. Brian's controller is written in SystemVerilog, so first you need to check whether this is properly supported by the Quartus version you're using. If it's recent enough, I guess it is.

Mixing HDLs is usually no problem. You'll just need to define a component interface for the controller in VHDL following the SV interface.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #6 on: July 14, 2021, 05:09:55 pm »
SystemVerilog, so first you need to check whether this is properly supported by the Quartus version you're using. If it's recent enough, I guess it is.
My code should work all the way back to QuartusII V9.0 from 2005...
I did design it to run on Cyclone II & III which requires QII V13.x or earlier.
 
The following users thanked this post: SiliconWizard

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #7 on: July 14, 2021, 07:25:50 pm »
I just bought a Deca Max 10 (arrived today, my birthday... Perfect timing :) and plan to use this code with it, but with VHDL. I'll probably use the latest Quartus unless something requires an older version.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #8 on: July 14, 2021, 08:45:23 pm »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #9 on: July 14, 2021, 09:38:58 pm »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
I just need to find a ECP5 board with at least 1 DDR3 ram chip.

The code was designed to be ported to old and new FPGAs alike, however, the capture read data method I used to be compatible across all basic FPGA limits the DDR3 controller to around 500MHz / 1gtps.  Higher than that and it would be recommended to change the read data sampling to using the DQS strobe input as a clock instead of as a latch-enable.

The DDR3 controller alone, 1 read, 1 write port, running a 16bit DDR3 512mb ram chip in Quartus uses:
3480 logic cells in the HDMI out ellipse demo.
512 LUT-Only LCs,
1806 Registers-Only LCs
1166 LUT/Registers LCs

The 3480 number may be inflated since it is connected to the Multiport module, and that one eats a crap-load of registers as it it has independent caches on each port and it's a huge cross-bar matrix.
In the Ellipse demo, it eats another ~3k logic cells.
« Last Edit: July 14, 2021, 10:46:11 pm by BrianHG »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #10 on: July 15, 2021, 01:08:49 am »
Question to Brian: you were talking about testing this controller on a Lattice ECP5 IIRC. Did you get to do this? If so, how did that turn out, and how many LUTs does it take?
I just need to find a ECP5 board with at least 1 DDR3 ram chip.

Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...

The DDR3 controller alone, 1 read, 1 write port, running a 16bit DDR3 512mb ram chip in Quartus uses:
3480 logic cells in the HDMI out ellipse demo.
512 LUT-Only LCs,
1806 Registers-Only LCs
1166 LUT/Registers LCs

The 3480 number may be inflated since it is connected to the Multiport module, and that one eats a crap-load of registers as it it has independent caches on each port and it's a huge cross-bar matrix.
In the Ellipse demo, it eats another ~3k logic cells.

Ok, should give me a rough idea of what to expect. Doesn't look too bad.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #11 on: July 15, 2021, 03:56:16 am »
Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...
Design you own? ;D It won't be cheap either - at least initially - but it will surely be a lot of fun :-+ And if you team up with others, it will help to spread the NRE around, as well as speed up the process.

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #12 on: July 15, 2021, 09:42:04 pm »
Ah yes... I don't have one either. The one I have only has SDRAM. And right now, prices have inflated quite a bit, so ECP5 boards with DDR3 are not quite cheap...
Design you own? ;D It won't be cheap either - at least initially - but it will surely be a lot of fun :-+ And if you team up with others, it will help to spread the NRE around, as well as speed up the process.
I know how this may sound, but from my personal point of view, I would like to first get my controller working on Lattice, then worry about a custom PCB.  For me, having a PCB with a proven functional wired DDR3 setup allowing me to plug and play is a preferred first step.  With the DECA board, this was one thing I did not have to second guess any buggy behavior in my code right at the beginning just powering up the DDR3.  The initial failure of function was that I was using the DDR_IO primitive for Cyclone II/III/IV/V, not the newer primitive used by the MAX10.  Quartus did not complain.  It compiled and even simulated properly both at the logic level and gate level.  Yet, the DDR3 was doing nothing.  Using the Cyclone's DDR_IO primitive meant that nothing was outputting on the data lines in the real MAX10 FPGA, but, the input was still working.  This wasted over a week and if I encountered this problem on my home-made PCB, it might have taken me an extra month to figure out that the DDR_IO primitive which compiled and simulated fine was the culprit.

Lattice tools and FPGAs are new to me, so I do not know what would go wrong where.  An existing eval PCB is a preferred first step removing a piece in the debug equation.
« Last Edit: July 15, 2021, 09:45:39 pm by BrianHG »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #13 on: July 15, 2021, 11:24:01 pm »
You can always use a trial version of their DDR3 controller to verify the hardware. I would also use it to confirm that the pinout will work.
« Last Edit: July 15, 2021, 11:27:14 pm by asmi »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #14 on: July 16, 2021, 12:38:51 am »
I tend to agree with BrianHG here. One thing to debug at a time...
Now I suppose if you're familiar with routing DDR3 stuff, going directly for a custom board should not be a problem. But I'm not. (Now I guess possibly the design could be shared, and someone else could do the routing...)
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #15 on: July 16, 2021, 02:46:06 am »
I tend to agree with BrianHG here. One thing to debug at a time...
Like I said, using a trial version of controller solves the hardware checkout problem. I use this method all the time, albeit with Xilinx devices.

Now I suppose if you're familiar with routing DDR3 stuff, going directly for a custom board should not be a problem. But I'm not. (Now I guess possibly the design could be shared, and someone else could do the routing...)
Or you can do the routing yourself and ask someone to check it once completed, and/or perhaps ask some questions in case DDR3 layout rules are not sufficiently clear. Otherwise it's going to be a classic chicken-and-egg problem when you won't attempt a DDR3 design because you have no experience, but you can't gain experience without actually doing it.

There is nothing particularly difficult about it, especially if you go for a relatively simple design - like a single DDR3 memory device, no ADDR/CTRL termination, and low'ish (as far as DDR3 standard goes) frequency. It's just a handful of rules you've got to follow, and that's pretty much it.

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #16 on: July 17, 2021, 03:31:01 am »
Asmi, I feel exactly that about using DDR 3. Do you have a list of layout rules and a description about how DDR3 ? I surely understand SRAM memories and more or less get dynamic memories, but I have no idea what other complications exists for synchronous memories.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #17 on: July 17, 2021, 05:19:23 am »
Asmi, I feel exactly that about using DDR 3. Do you have a list of layout rules and a description about how DDR3 ? I surely understand SRAM memories and more or less get dynamic memories, but I have no idea what other complications exists for synchronous memories.
DDR3 memory interface consists of a bunch of traces, which are divided into several groups: a single group typically called "Address/control" (or "Command/address/control") which, as the name suggests, consists of address lines (A0-A15, depending on module's capacity), bank address lines (BA0-BA2), and command lines (CKE, RAS, CAS, WE, CS, ODT); and then one or more of "byte lanes" (also called "DQ group"), each one consisting of 8 DQ (data) lines, associated DQS and DQS# lines, and a data mask DM. Attached is a good image from iMX6 datasheet showing example rules for length matching in case of a single x16 DDR3 memory device.

0. Remember that the length matching is about signal length (sometimes also called electrical length) a.k.a. propagation delay, not necessarily physical length! This is important to keep in mind in case you route traces within a group on different layers - signals on outer layers travel faster than on internal ones. Also remember that a portion of via height that the signal is going along also needs to be included - it's typically called via z-length. Not all eCAD tools take this into account, so you have to be on a lookout for these things.
1. The clock line needs to be at least as long as address/control lines are. I typically match it as part of the group, but it can be a bit longer.
2. Traces within address/control lines need to be matched to ±10 ps.
3. All byte lanes has to be no longer than address/command traces.
4. All signals within a single byte lane need to be matched to ±10ps.
5. There is no requirement to match traces of different byte lanes.
6. Traces within differential pair (CK/CK#, DQSn/DQSn#) needs to be matched to ±2ps.

As far as impedance goes, depending on a frequency and a specific controller it can be 50 Ohm or as low as 40 Ohm (the latter is a Xilinx requirement for 7 series FPGAs for DDR3 frequencies above 666 MHz, 50 Ohm is good enough for speeds below that).

If you intend to use several memory devices in your interface, things get more complicated because with DDR3 there are two possible topologies for address/control lines - a balanced tree (like DDR2 and below), and a fly-by (new for DDR3). Technically all DDR3 controllers are supposed to support fly-by topology (which is easier to route), but in reality there are some which don't support it.

Finally, it might be required to implement a termination for address/control lines (DQ, DQS and DM lines don't need one because memory devices have dynamic on-die termination controlled by ODT input). Whether it's required or not can be determined by SI simulations, but typically you can get away without it for a single component which is close to the controller. For multi-chip interfaces you will more likely than not need to implement it.
« Last Edit: September 13, 2021, 09:01:03 am by asmi »
 
The following users thanked this post: dmendesf

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #18 on: July 17, 2021, 02:48:29 pm »
FMAX bottleneck update in the DDR3_PHY_SEQUENCER:  Currently, I'm running individual serial shift timers which are tested before allowing a command request from the 'BrianHG_DDR3_CMD_SEQUENCER', which runs at half DDR_CK clock frequency, to go out.  There are 6 timers for each of the individual 6 main DDR3 commands which are selective reset to the new appropriate values when each new command is sent.  This design has allowed me to request any command at any time and the DDR3 would be controlled as quick as possible.  However, these timers have created a 'nexus' lump where an acknowledgement flag dependent on which command went out needs to get back to the BrianHG_DDR3_CMD_SEQUENCER's cmd out fifo as fast as possible which hits the FMAX as the compiler tries to work out the routing across 2 clock domains.

Currently, 1 test I am doing is to run the timers at half clock frequency on the same clock as the BrianHG_DDR3_CMD_SEQUENCER and pass a 'half-time' delay bit to the DDR3_CK clock command output stage.  This has erased the FMAX problem, and even allowed compiles where I get a legit 400MHz FMAX.  However, a good number of times, where we have delays which only require an ODD number of DDR3_CK clocks, say 5, have averaged up to 6.  This is not good as when using a 300MHz controller, every clock is precious.  I'm currently trying to debug and eliminate this 1 odd clock cycle penalty.  Once done, a beta version 0.95 will be uploaded where 350MHz should be easily achievable & 400MHz with compiler effort turned up to max and careful interfacing with my controller.

It is too bad the original code had this 1 problem as the solution was sweet without any coding hacks, though it was problematic to get Quartus to properly hit a >300MHz core with tc of 85 degrees.  Unless an idea spark comes in on how to solve the 'cmd_ack' bottleneck with the current design, I will once again have to muddy clean code for work-around solutions.
« Last Edit: July 17, 2021, 02:54:14 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #19 on: August 05, 2021, 02:05:06 am »
New release:
**********************************
Beta Release V0.95, August 4, 2021.
**********************************

Has now been uploaded at the top of this thread:
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/
Changes are written there and the history is in the README file.
Please make sure you are downloading the _V0.95 as I kept the earlier revisions available for download as well.


Utilization report DDR3 controller inside the HDMI out ellipse demo:

Old V0.9.
3480  logic cells in the HDMI out ellipse demo,
512    LUT-Only LCs,
1806  Registers-Only LCs,
1166  LUT/Registers LCs.

New V0.95.
3100  logic cells in the HDMI out ellipse demo,
478    LUT-Only LCs,
1826  Registers-Only LCs,
796    LUT/Registers LCs.

New V0.95 True Stand-alone DDR3_PHY_SEQ.sv DDR3 controller with 128bit read and write port.
2082  logic cells,
515    LUT-Only LCs,
984    Registers-Only LCs,
583    LUT/Registers LCs.
« Last Edit: August 05, 2021, 02:32:30 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #20 on: August 06, 2021, 11:03:19 pm »
A minor issue has bee found with 'DDR3_PHY_SEQ.sv' V0.95 when it swaps ram banks.  It just occasionally delays the next read or write command by 2 DDR_CK clocks.  This is minor as there is no data corruption and you may revert to the V0.90 if you truly need to.

I'm working on it now with a major improvement in FMAX and again a shrinkage of used logic cells.  It should be out in a day or 2, so, I wont release an intermediate patch.
 
The following users thanked this post: nockieboy

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #21 on: August 14, 2021, 07:40:35 am »
Update:  Just cleaned up the main BrianHG_DDR3_PHY_SEQ.sv controller and sequencer.

All FMAX limitations and cross clock domain problems have been cleaned out.  Easily achieved proper 350MHz controller and 400MHz with only a few cross-clock domain signals not making the cut @ TC85 degrees by less the 0.070ns, yet it runs clean.  (The signals being some of the OEs for the write data, however, my codes is programmed to turn on the OE 1 clock cycle early, and turn it off 1 clock cycle late meaning an error here will not occur.  The core and all data + IO paths clear the timing analysis fine above 400MHz.)  The logic cell and LUT count has also shrunk.  The change in code has generated what appears as a minor occasional 1 DDR_CK clock delay occasionally when bursting after the first BL8, however, what is going on is if a read or write burst begins on an odd DDR_CK phase compared to it's half-clock interface clock, or the tRCD, tCAS happens to be an odd number, an alignment to the phase of the burst size of 8 may realign to the even matched phase after the first BL8 burst.  The old full speed V0.9 controller stacked a few additional commands in advance and would retain the 'ODD' alignments generating an unbroken consecutive burst making the appearance of a tighter ram controller by packing everything back-to-back.  After the release of version 1.0, I will see if there is a cheap method of packing the commands in this way once again without having to move the timers back to the full speed DDR_CK clock rate.

The one issue remaining is the multi-port handler 'BrianHG_DDR3_COMMANDER.sv'.  It's current FMAX has trouble passing 130MHz if it is configured with 2x128 bit read, 2x128bit write ports, all smart options enabled @ TC85 degrees.  I'm thinking of a way to get this one to compile robustly above 150MHz without any big compromise so you may at least use it with ram running at 300MHz and half rate.  Right now, it can work fine at 75MHz, or even at 100MHz with the ram at 400MHz.  (This means no timing violations at any temperature.  The 130MHz builds always work error free at 150MHz, but this is not what we want.)  If I cannot come up with a solution, I will leave it as it is and create a secondary FAST multi-port handler 'BrianHG_DDR3_COMMANDER_fast.sv' which will be a strict 2 read, 2 write port device targeting a 200MHz FMAX allowing half-rate support with 400MHz ram.  This multiport will be designed to be chain-able where you can use 3 of them to give you 4 read, 4 write ports, or 7 of them for 8 read, 8 write ports.

Note that the Altera's Cyclone & MAX FPGA fabrics are really slow and low power, my current code would probably be at least 50% faster on other vendor's FPGAs, or even Altera's Arria/Stratix FPGAs.  Achieving a full consistent 400MHz core builds without the paid version of Quartus for manually placing cells, even though that portion is truly only software serializers & IO port controller is still difficult as it still has to be controlled by a 200MHz section and bridge the 2 clock domains in both directions.
« Last Edit: August 14, 2021, 10:12:26 pm by BrianHG »
 

Offline Omega Glory

  • Regular Contributor
  • *
  • Posts: 91
  • Country: us
    • Ezra's Robots
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #22 on: August 15, 2021, 04:27:39 am »
Have you thought about hosting the code on GitHub? I bet a lot of people would find your work very helpful, and that would be an easy and free way to distribute it more widely than on this forum. In any case, amazing work!

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #23 on: August 15, 2021, 06:23:47 am »
Have you thought about hosting the code on GitHub? I bet a lot of people would find your work very helpful, and that would be an easy and free way to distribute it more widely than on this forum. In any case, amazing work!
We'll see in a few days after I release v1.00.
I never used GitHub before and would probably need to venture around first.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #24 on: August 16, 2021, 08:17:48 am »
Get the new BrianHG_DDR3_Controller_V1.5 demo over here: https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/msg3785711/#msg3785711

 >:D  >:D  >:D  >:D  >:D  >:D
 >:D 500MHz/1GTPS >:D
 >:D  >:D  >:D  >:D  >:D  >:D

Error free, well, on my DECA board anyways...
So much for Altera's software DDR3 300MHz limit.
Though, the reported FMAX at 0C reads only 461MHz.

If you download & program the attached ellipse-demo, scoping the SMD termination resistor tied to the DDR_CK line should show 500MHz.

Arrow DECA DEMO .sof programming file instructions:
(If the picture is still or scrolling noise, just flip 'Switch 0'.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.)


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.


Note: V3 just fixes a reset bug when using the RS232-Debugger.  Nothing else has changed.
« Last Edit: November 01, 2021, 10:08:38 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #25 on: August 16, 2021, 06:49:02 pm »
 >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D
 >:D   400MHz/800MTPS, Zero timing violations with 85C model   >:D
 >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D  >:D



V1.00 coming...
Just cleaning up and increasing the clearance by a bit more...
« Last Edit: August 16, 2021, 09:30:45 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #26 on: August 16, 2021, 10:09:38 pm »
Huge clearance boost, latest V1.00 build @ 400MHz...



Just compile testing a few different configuration, then some documenting the changes and the full V1.00 should be released tonight.

If it weren't for CLK[2] having that clock restriction to 405.19MHz (read data in clock) instead of the 450.05MHz restrictions on CLK[0] and CLK[1] (data out clocks),  I could have made a 450MHz controller with no timing violations at 85C.  However, make no mistake that this controller does run at 450MHz error free and can even be overclocked to 500MHz  (PLL Maxes out at 475MHz according to data sheet).  The minimum period restrictions are only the limitations of the DDR IO PIN buffer itself and it's required data hold time.

At 450MHz & 500MHz, my tuneable read data clock's has 5 out of 8 error free tuning positions.  At 400MHz, it is 6 out of 8 while at 300MHz, it is 7 out of 8.  Note that 8=theoretical perfect all 180 degree error free positions, nearly impossible unless I begin to use individual DQ PIN deskew-tuning calibration with picosecond alignment.  (Each tuning step is 22.5 degrees.)
« Last Edit: August 16, 2021, 10:53:53 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #27 on: August 17, 2021, 12:41:07 am »
Slowest fabric -8 build using the device 10M50DAF484C8GES set to 300MHz.



Note that the DECA eval board uses the fastest -6 Max10 fabric.
Note that Altera doesn't support DDR3 on -8 Max10/Cyclone V FPGA as the DDR buffer transceivers max out at 550MTPS instead of the required 600MTPS.  Though, my source code has the 'DDR_TRICK_MTPS_CAP' parameter function to allow you to get around this (used to break Quartus' 600MTPS limiter on my 800MTPS builds), otherwise, you could not do a full compile.  (The fitter and timing analyzer still performs the rest of their function properly recognizing the requested true 300MHz core frequency.)

Yes, you can now have a DDR3 controller on any slowest -8 Cyclone III / IV / V / Max10.
« Last Edit: August 17, 2021, 12:46:31 am by BrianHG »
 

Offline Daixiwen

  • Frequent Contributor
  • **
  • Posts: 357
  • Country: no
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #28 on: August 17, 2021, 06:22:07 am »
That's really impressive for Cyclone devices.  :-+
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #29 on: August 17, 2021, 07:08:55 am »
That's really impressive for Cyclone devices.  :-+
LOL, Cyclone III/IV has the fastest fabric in the series and can outperform Max10 & Cyclone V.  IE, it should compile with better FMAX than what I have accomplished here, though it's only about 10% faster.

It's been hell.
I've started our around 2 months ago with something which only occasionally worked after each build, even when under clocking the DDR3 at 250MHz.

Then, the 300MHz got a little more stable, but 350MHz was a fluke and I got to see 400MHz initialize once as well as a 450MHz fluke with horrible FMAX report and no other code in the FPGA like the graphics geometry engine and video output.  Once I begun with the 1080p output, it took a few weeks to stabilize the 300MHz, but adding the ellipse geometry unit killed that.  500Mhz was a fable dream.

Finally, 300MHz was stable and 350 not too reliable while 400 was dead until last week.  Some concentration and fine tuning, now 400MHz is a breeze and 300MHz is a given.  Even 500MHz surprisingly worked first shot, though, right now, 450MHz has a weird addressing bug, but it's not with the DDR3 controller, but in the extended 16 read/write channel multiport front end.

I'm almost done with some final tests and tweaks, I should have a rev 1.00 out in a day where it's only limiting factor is the top FMAX speed of the 16 channel multi-port module.  It looks like the best solution here will be to make an alternate fast strip-down version with 2 read, 2 write port designed for speed and pyramid style stacking support so you can have as many ports as you like, though ports deep in the pyramid will have an extended sequential pipe delay.  Stack it the way you like and you can have at least 1 read & write port right next to the DDR ram controller with a single pipe stage.
« Last Edit: August 17, 2021, 11:08:20 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1910
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #30 on: August 17, 2021, 11:28:33 am »
Thumbs up BrianHG :-+ Github is waiting for your rev 1.0 >:D
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 
The following users thanked this post: BrianHG

Offline nockieboy

  • Super Contributor
  • ***
  • Posts: 1812
  • Country: england
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #31 on: August 17, 2021, 11:55:08 am »
Latest version of the project tested and working fine on the DECA at 500MHz!!!!!  :wtf:

https://youtu.be/a1k106CNylI
« Last Edit: August 17, 2021, 04:58:21 pm by nockieboy »
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #32 on: August 19, 2021, 09:10:51 am »
V1.00 update.

Just cleaned up a bunch of stuff and also got rid of all multicycle paths in the .sdc as they are no longer needed.

400 MHz with 100% timing in the black is easy to achieve, though sometimes you may need to change compile settings or change the compiler beginning 'SEED' number as it is still a stretch for Cyclone/MAX10 devices.  300MHz is a given...  Though overclocking to 450/500Mhz functions with a -6 MAX10, it is not something I'm officially supporting.

I have some documenting to do and a compile test for a Cyclone IV to verify that we reach the same FMAX range.  Then I will upload V1.00.
« Last Edit: August 19, 2021, 09:15:51 am by BrianHG »
 

Offline dmendesf

  • Frequent Contributor
  • **
  • Posts: 324
  • Country: br
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #33 on: August 22, 2021, 01:05:10 am »
Tried the 500MHz version in my DECA10 and it worked flawlessy. Looking formward for the 1.0 release. I plan to make a VHDL wrapper for it. Congratulations!
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #34 on: August 22, 2021, 10:54:38 am »
V1.00 update:

One last thing to do.  I just need to do a Cyclone IV E build test tonight to make absolutely sure we reach FMAX with that series as well, then .zip everything and I'll be ready to upload everything.

IE: I need to re-assign a bunch of IOs from my Ellipse DECA demo switched to a Cyclone IV FPGA following Altera's recommended connections for external DDR memory, then compile...

I will also be doing the same for Cyclone V as that FPGA seems to be slower/lower power (yet higher density and designed for DDR3) than the Cyclone IV.
« Last Edit: August 22, 2021, 11:02:18 am by BrianHG »
 
The following users thanked this post: dmendesf

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #35 on: August 23, 2021, 03:42:38 am »
Ok, Cyclone III and Cyclone IV compiles and reached a true 400MHz FMAX.

But for some FN reason, Cyclone V's PLL isn't compatible with Cyclone III/IV/10 LP/Max 10/Arria II.

Yes, there is a different PLL megafunction dedicated just for the Cyclone V.  If I use the normal 'altpll', it wont support phase stepping on a Cyclone V.  So, I need to use Cyclone V's 'altera_pll' function which uses a shit load of strings to define it's settings and yet still has the same identical phase step tuning function as the 'altpll' other than it has many more clock outputs.  Something which I might not be able to handle with my current parameters auto-pll generator system.

Altera drives me nuts.  Max 10 will simulate and compile correctly using the old 'altddio_bidir', but will not output anything unless you use the new 'altera_gpio_lite'.  Cyclone V needs the new PLL, and at least wont compile, but uses the old 'altddio_bidir', but the Max 10 uses the old PLL but new IO scheme...

Once I get the Cyclone V pll working and test the FMAX, I'll upload V1.00.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #36 on: August 24, 2021, 12:50:18 am »
V1.00 update:
Arrrrgggg, this Cyclone V -6 is a piece of crap.

Ok, I can reach 400MHz & 200MHz for the controller clocks at 0C, after an 11 minute compile, and after I falsely set the read clock phase offset to 1ps so that it would not remove that tune-able clock and merge it with my DDR_CK clock 0. (Yes, that BS is a thing...) But somehow, my multiport interface module's FMAX is 41MHz?

WTF? 41MHz?

Even the Cyclone III achieved 115MHz for this clock while CIV got 118MHz and Max10 got 114MHz.
How can the next generation of Cyclone devices drop so terribly in performance, yet they improved the IO speed compared to the Max10's 450MHz and CIII/CIV's 500MHz limit.  And yet, these guys compile in 4-5 minutes.  And only 3.5 minutes for the CycloneIII only using 1 CPU core with the old Quartus 13.0sp1.

It's 11 minutes a shot for the Cyclone V compiles.  I though I would just have to add and adapt it's PLL function.  Not investigate how the hell Quartus's fitter decides to optimize my core into a snail of a design.
« Last Edit: August 24, 2021, 04:29:17 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #37 on: August 25, 2021, 06:38:12 am »
V1.00 update.

Ok, I got a Cyclone-V-6 to run at 300MHz, just barely with advanced smart banking feature in the multiport module disabled.  (The stand alone DDR3_PHY ram controller can still run at 400/200MHz and it's smart bank management is always enabled.  It's been designed to run at speed on the most pathetically slow FPGA s ever...)

I'll stick V1.00 here, document the updates and upload 1.00.

My DDR3 PHY stand-alone controller can do over 300MHz easy on the C-V -6, it's just the multiport hub which has a devastating FMAX of 50% speed.  I'm also going to upload it to Intel's support with the C-IV-6 versions and ask why there is such a huge speed difference.

Setting up it's PLL was a nightmare with built in system bugs, parameters as fancy non-standard string types & the compiler simplifying out some crucial clocks which I needed to invent multiple work arounds.

V1.00 coming tonight...
« Last Edit: August 25, 2021, 07:29:42 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #38 on: August 26, 2021, 02:28:54 am »
I don't know WTF is going on with this Cyclone V build, but, I have this once build where the 85C model has an FMAX of 204.33MHz and the 0C model has an FMAX of 203.29 MHz.   Yes.  You read right.  The colder model is 'SLOWER' than the hot model.  WTF?

Ok, need to do a bunch of builds...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #39 on: August 27, 2021, 03:56:57 am »
So, reporting back somewhat later than I expected, I have the bouncing ellipses running on the screen at 1080p, and it's FREAKING AWESOME  :clap:

Regarding the blue eye-destroyers, LEDs 3 through 7 are lit :)
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #40 on: August 27, 2021, 07:44:10 am »
V1.00 FMAX results:

Files,  Description:

300MHz, Hypothetical Cyclone III-8 DDR3 System scrolling ellipse build to verify FMAX.
(Uses Quartus 13.0sp1)


400MHz, Hypothetical Cyclone III-6 DDR3 System scrolling ellipse build to verify FMAX.
(Uses Quartus 13.0sp1)



300MHz, Hypothetical Cyclone IV-8 DDR3 System scrolling ellipse build to verify FMAX.


400MHz, Hypothetical Cyclone IV-6 DDR3 System scrolling ellipse build to verify FMAX.



300MHz, functional DDR3 System scrolling ellipse with optional RS232 debug port demo for Arrow DECA eval board, but compiled for a -8.


400MHz, functional DDR3 System scrolling ellipse with optional RS232 debug port demo for Arrow DECA eval board.




400MHz, Hypothetical Cyclone V-6 DDR3 System scrolling ellipse build to verify FMAX.
( :-- FMAX FAILED  :-- )  Take a look at the multiport clock.


300MHz, Hypothetical Cyclone V-6 DDR3 System scrolling ellipse build to verify FMAX.
(PASSED, but with I had to disable some smart multiport features and this is a CV-6 :--)


300MHz, Hypothetical Cyclone V-7 DDR3 PHY Only controller with RS232 debug port build to verify FMAX.
(300MHz only, no multiport )  A CV-7  :--, not even a -8.  Compiling for a -8 leaves 4 clock domain crossing nets in the red even though the rest of the design including IO ports easily pass.


375MHz, Hypothetical Cyclone V-6 DDR3 PHY Only controller with RS232 debug port build to verify FMAX.
(375MHz only, no multiport  :-- ) Compiling for 400MHz reveals ~8 clock domain crossing nets in the red even though the rest of the design including IO ports easily pass.  In fact, this FPGA should have reached 500MHz.



I will be sending my code to Intel to see why their Cyclone V only gets 60% speed on my multiport commander module.  Maybe there is something in the compiler setting to help as the FPGA fabric of Cyclone V is radically different compared to all other Cyclone & MAX FPGAs.


Clocks [ 0 ],[ 1 ],[ 2 ] are the 400MHz DDR_CK, Write clock, read clock.
Clock  [ 3 ] is the DDR_CLK_50 200 MHz half speed clock, the interface speed of the Brian_DDR3_PHY_SEQ.
Clock  [ 4 ] is the DDR_CLK_25 100MHz quarter speed clock, currently set for the multiport COMMANDER module.
« Last Edit: September 03, 2021, 08:10:03 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #41 on: August 27, 2021, 07:55:07 am »
BrianHD_DDR3 V1.00 system FPGA utilization reports:

300MHz_PHY_only.png - DDR3 controller with 1 read & write port to an 8 bit device build.


300MHz-8_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 300MHz Max10-8.


300MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 300MHz Max10-6.


400MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 400MHz Max10-6.


450MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 450MHz Max10-6.


500MHz_ellipse.png - DDR3 controller random ellipse project with 4 ports, 128 bit access, 500MHz Max10-6.



I've included a few builds.  You will notice that the LC/LUT increases with frequency.  This is most likely the compiler adding duplicate parallel logic cells to improve FMAX timing.


I highlighted the 'BrianHG_PHY_SEQ' module which tells you the full LC/LUT count is you were to build a stand-alone 1 read/write port DDR3 controller.

The COMMANDER module is the multiport handler configured with 2 read and 2 write ports in the ellipse demo.
« Last Edit: August 27, 2021, 07:53:50 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #42 on: August 27, 2021, 08:35:35 am »
******************************************
******** Finally, V1.00 release here: ***********
******************************************
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/

Things to do:

a)  I will be contacting Intel's technical support about Cyclone V's poor 60% speed FMAX performance for my 1 multiport section in my design as seen in the above screenshots with the red arrow.  I'll see if something can be done.

b)  As described in my v0.95 notes, I will look into designing my simpler pyramid stack-able 2:1 multiport module aimed to achieve an FMAX of at least 200MHz allowing multiport running at Half rate interface controller speed, but with a loss of a few smart advanced features.

c)  I will download and install the latest Lattice Diamond and see if I can adapt and get my controller to compile and simulate there.  The LFE5U-45F/LFE5U-85F at 45kgate & 85kgate are just such a price bargain at 16$ and 36$ each respectively and if my DDR3 controller runs fast there, it is the next route to take.
« Last Edit: August 28, 2021, 11:29:15 pm by BrianHG »
 
The following users thanked this post: nockieboy, SpacedCowboy

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #43 on: August 28, 2021, 09:21:17 pm »
OMG, learning how to use Github with it's esoteric project generation and file entry is not treating me well.  I'm wondering if it is worth the hassle.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #44 on: August 29, 2021, 01:46:00 am »
Arrrg, is Github a place to share some projects / source code, or, is it a place where you have to learn a bunch of their own esoteric terms and learn an entire new mix of text and web-page click OS just to post some .zop or source code.  And, I don't see any official support for HDL firmware languages, though I do know people do post such code there.

Ok, can I assume I am not allowed to upload a .zip file to GitHub?
How do I get Quartus binaries uploaded to my 'Repository'?
Or, do I somehow place these files within my listed 'Projects' which appear to have no consequences or arent even searchable?
« Last Edit: August 29, 2021, 01:58:30 am by BrianHG »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4112
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #45 on: August 29, 2021, 02:48:14 am »
On github just create a new empty repository (in the menu on the top right of the web page). Hit the green CODE button and copy the ssh URL you find there there.

In your local git repo type "git remote add github <url>" then "git push github"

If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
« Last Edit: August 29, 2021, 02:50:17 am by brucehoult »
 
The following users thanked this post: BrianHG

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1910
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #46 on: August 29, 2021, 07:22:00 am »
As A side note on how to add every thing in the git repo you can do as flow
if you need to not include some parts from github, like the project outputs you can use a .gitignore file with the directories and files that you don't want to include.


git init
git add .
git commit -m "Ininit repo"
git remote add github <url>
git push github
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #47 on: August 29, 2021, 09:22:50 pm »
If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
What do you mean by ', not output files.'?

For example, my Quartus projects do have some binary files and may include a hex file.

Also, when I copy and paste ASCII files/show readme files, why does all my carriage returns disappear?
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #48 on: August 30, 2021, 12:06:14 am »
You can use TortoiseGit utility if you don't feel like messing with command line. That's what I use all the time - I have a Synology NAS which has a private Git server installed. I use Git even for projects that I don't intend to ever publish, as it makes development much easier.
 
The following users thanked this post: BrianHG

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4112
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #49 on: August 30, 2021, 02:19:13 am »
If you don't already have a local git repo then that's the first step. cd into your project directory and type "git init" and then "git add FOO" where FOO is a file or directory or whitespace separated list of files or directories. Repeat for everything you want sent to github i.e. source code and config files, not output files. Then type "git commit -m 'Initial commit'". And then follow the instructions above to deal with github.
What do you mean by ', not output files.'?

You don't include output files in a SOURCE CODE control system because they are not source code. The outputs of compilation or synthesis and routing, or whatever it is your project does (I'm talking in general terms here) change every time you run the build process. AND anyone who checks out the project will generate them themselves, from the source files.

If you want to put bitstreams or something somewhere so that people don't have to run synthesis themselves, that's a binary release, which is a different thing. Github has a different place to put releases (or you can put them on any web or ft server etc).

Quote
For example, my Quartus projects do have some binary files and may include a hex file.

If those are inputs to the process then that is fine.

Quote
Also, when I copy and paste ASCII files/show readme files, why does all my carriage returns disappear?

What do you mean "copy and paste"?

Whatever you put into git comes back out absolutely byte identical to what you put in. There is no line ending translation. Git handles binary files just fine, and in fact treats text as binary e.g. diffs are by bytes not lines.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #50 on: August 31, 2021, 01:57:51 am »
You can use TortoiseGit utility if you don't feel like messing with command line. That's what I use all the time - I have a Synology NAS which has a private Git server installed. I use Git even for projects that I don't intend to ever publish, as it makes development much easier.
Ok, nice.  I went to Google tutorials on the tool.  Fine.  But still, just even the setup for TortoiseGit is super esoteric.  It's like I'm going back to the 90's with systems setups designed for insiders only.  A putty setup enviroment and public/private key generation + paste and copy URL address from my web browser display status of my generated repository into TortoiseGit.  Seriously, it's 2021.  Could you imagine having to go through this trouble to securely purchase anything on Amazon?

Still, it looks like I'm going to end up using TortoiseGit.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2738
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #51 on: September 01, 2021, 03:54:38 am »
Ok, nice.  I went to Google tutorials on the tool.  Fine.  But still, just even the setup for TortoiseGit is super esoteric.  It's like I'm going back to the 90's with systems setups designed for insiders only.  A putty setup enviroment and public/private key generation + paste and copy URL address from my web browser display status of my generated repository into TortoiseGit.  Seriously, it's 2021.  Could you imagine having to go through this trouble to securely purchase anything on Amazon?

Still, it looks like I'm going to end up using TortoiseGit.
This is a typical example of what happens when you let developers design a user interface. Even after using it for many years professionally, I still stumble upon it every once in a while.

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #52 on: September 01, 2021, 05:44:46 am »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #53 on: September 05, 2021, 04:19:36 am »
Finally!!!!  (Damn javascript bug finally bypassed...)

My GitHub repository release:
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: September 05, 2021, 08:06:57 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #54 on: September 09, 2021, 08:06:24 am »
If anyone has seen some cheap or free Lattice ECP5 LFE5U-25/45/85F boards with at least 1 16 bit DDR3 ram chip and preferably and HDMI output, please link them here...

I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.
« Last Edit: September 09, 2021, 08:11:50 am by BrianHG »
 

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1910
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #55 on: September 09, 2021, 11:04:48 am »
Brian play with Gowin too, they are the cheapest,  also they promised to spin out a new chip this year with 12Gb SERDES and internal DDR4 Memory!
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #56 on: September 09, 2021, 05:26:01 pm »
Brian play with Gowin too, they are the cheapest,  also they promised to spin out a new chip this year with 12Gb SERDES and internal DDR4 Memory!
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram.
 
The following users thanked this post: ali_asadzadeh

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #57 on: September 09, 2021, 05:28:55 pm »
If anyone has seen some cheap or free Lattice ECP5 LFE5U-25/45/85F boards with at least 1 16 bit DDR3 ram chip and preferably and HDMI output, please link them here...

Free? :-DD

There are very few boards with an ECP5. You have the Lattice dev boards, but they all come with an LFE5UM AFAIR.
Then there is the ULX3S, but no DDR3, just SDRAM. Also boards I mentioned earlier (which are repurposed boards from Colorlight) that you can find on Aliexpress. I have a couple. They are fine. HDMI connector, but no DDR3, only SDRAM...

One such board is the OrangeCrab, ECP5-25F, 1Gbit DDR3, no HDMI connector though, and limited IOs: https://1bitsquared.com/products/orangecrab

I don't know of anything else on the market at the moment.


 
The following users thanked this post: BrianHG

Offline ali_asadzadeh

  • Super Contributor
  • ***
  • Posts: 1910
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #58 on: September 10, 2021, 08:19:23 am »
Quote
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram
It's free but not open source. ;)
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline dolbeau

  • Regular Contributor
  • *
  • Posts: 89
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #59 on: September 11, 2021, 06:15:23 pm »
I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.

Zero idea about Lattice licensing, but the TrellisBoard has a LFE5UM5G-85F and is usable with the Yosys opensource toolchain, including the DDR3 SDRAM. So is the ECPIX5, which (unlike the TrellisBoard) is commercially manufactured.

The FOSS toolchain might not be able to reach the performance of the Lattice toolchain, but might be an interesting option to try nonetheless.
 
The following users thanked this post: BrianHG

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #60 on: September 11, 2021, 06:47:30 pm »
I am not interested in the LFE5UM as those require a Lattice Diamond License and the goal of my DDR3 controller is to offer a free solution for the most affordable Lattice components.

Zero idea about Lattice licensing, but the TrellisBoard has a LFE5UM5G-85F and is usable with the Yosys opensource toolchain, including the DDR3 SDRAM. So is the ECPIX5, which (unlike the TrellisBoard) is commercially manufactured.

As we mentioned, Lattice Diamond requires a subscription license for the LFE5UM. You can see this here: https://www.latticesemi.com/en/Products/DesignSoftwareAndIP/FPGAandLDS/LatticeDiamond

Those are nice boards otherwise, but the above point is certainly a problem...

The FOSS toolchain might not be able to reach the performance of the Lattice toolchain, but might be an interesting option to try nonetheless.

It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys

For those interested, there's a GHDL plugin for Yosys, potentially allowing to use VHDL with the Yosys-based toolchain. I admit I have never gotten around to trying it yet, but I'd be curious.
 
The following users thanked this post: BrianHG

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #61 on: September 12, 2021, 01:19:54 am »
Just adding a quick note about Yosys here, as this post finally decided me to take the plunge.

So, using the latest GHDL, Yosys, Prjtrellis and Nextpnr from git (warning: Yosys/Prjtrellis/Nextpnr take a fair amount of time to build from source), I was able to generate a .bit file from VHDL source files (+.lpf constraint file). The only modification I had to do was on the .lpf file, as nextpnr doesn't support port groups yet (which kind of bites, but that's not a dealbreaker). But all in all, it was smoother than I feared.

What I can say at this point is that Fmax is higher than what I get with Lattice Diamond for the design I tried, but I didn't dig enough to figure out if nextpnr is just being more optimistic, or if it indeed yields faster logic. This was a relatively simple design too, so that may not be as good for larger designs. Another point is that it all runs much faster than the commercial tools (but again, this is probably due, at least partly, to the fact that optimization is less aggressive.)

As I said, I don't know if SV support in Yosys is enough for Brian's work, but it could be worth a shot. With recent versions of GHDL and the combo with Yosys, VHDL support has become pretty good. So now, I'm curious to try this with larger designs.
« Last Edit: September 12, 2021, 01:21:43 am by SiliconWizard »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4112
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #62 on: September 12, 2021, 03:27:58 am »
I don't know for sure, but it would not surprise me if yosys was using sufficiently better algorithms than FPGA vendor's tools to get better results in a shorter time. It's had a significant and growing amount of work put into it and like most hardware manufacturers FPGA vendor probably don't want to "waste" any more money on software tools than absolutely necessary.

The same as eventually happened with gcc and llvm and the Linux kernel.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #63 on: September 12, 2021, 04:35:24 am »
It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys


My SystemVerilog coding relies on 2 dimensional arrays for IO ports and 'genvar/generate/if' to render repetitious instances of a single module, each one pointing to incremented dimension in one of my 2D arrays.

Also, I use the attribute ' (*preserve*) logic [ x:x ] var_name [ 0:x ] ; ' to force the compiler to not simplify out that particular register bundle, force it to use logic cells to aid in speed optimization.

I also use parameters and localparams, with string support and have a number of 'task' in a few places.

If it can handle these plus 'if (x=y) $stop', 'if (x=y) $error' and '$display ("msg")' or '$warning ("msg")' during compile time, my code should work.

If 'yosys' can handle this much, then the rest of my code operates down at a simplistic regular 'Verilog' level and it should work.  (IE: I went to SystemVerilog exclusively for the 2D interconnected IO ports capability.)  The dumber the compiler, the better FMAX you can expect from my coding style.  (IE: Cranking up Altera's Quartus 'speed' optimizations to the max actually slows down my designs, optimize for Area and my FMAX usually goes through the roof...)

The problem is how many Lattice users use 'yosys' and with a fresh Win7 install and nothing else, can I get it running, or, do I require other utils and need to build it on my system?
« Last Edit: September 12, 2021, 04:42:02 am by BrianHG »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #64 on: September 12, 2021, 09:46:04 pm »
It's interesting. But beyond performance, there are other issues to consider. AFAIK, BrianHG uses SystemVerilog for his developments, and currently, Yosys only supports a subset of SV. I would expect a number of problems trying to compile his code with Yosys. Nonetheless, you can have a look there to check what could possibly be a problem, although they mention supported features - not the ones that aren't, and there are probably many - so I guess it'll be hard to tell before actually trying: https://github.com/YosysHQ/yosys


My SystemVerilog coding relies on 2 dimensional arrays for IO ports and 'genvar/generate/if' to render repetitious instances of a single module, each one pointing to incremented dimension in one of my 2D arrays.

Also, I use the attribute ' (*preserve*) logic [ x:x ] var_name [ 0:x ] ; ' to force the compiler to not simplify out that particular register bundle, force it to use logic cells to aid in speed optimization.

I also use parameters and localparams, with string support and have a number of 'task' in a few places.

If it can handle these plus 'if (x=y) $stop', 'if (x=y) $error' and '$display ("msg")' or '$warning ("msg")' during compile time, my code should work.

If 'yosys' can handle this much, then the rest of my code operates down at a simplistic regular 'Verilog' level and it should work.  (IE: I went to SystemVerilog exclusively for the 2D interconnected IO ports capability.)  The dumber the compiler, the better FMAX you can expect from my coding style.  (IE: Cranking up Altera's Quartus 'speed' optimizations to the max actually slows down my designs, optimize for Area and my FMAX usually goes through the roof...)

The problem is how many Lattice users use 'yosys' and with a fresh Win7 install and nothing else, can I get it running, or, do I require other utils and need to build it on my system?

The easiest way of using Yosys on Windows is to use MSYS2, which has pre-built packages for Yosys.
https://www.msys2.org/
https://packages.msys2.org/group/mingw-w64-x86_64-eda

For a usable Yosys toochain, you'll need to install from MSYS2:
mingw-w64-x86_64-nextpnr
mingw-w64-x86_64-prjtrellis
mingw-w64-x86_64-yosys
(additionally, mingw-w64-x86_64-ghdl-llvm for those willing to use VHDL.)

This is done via the MSYS2 console with:
Code: [Select]
pacman -S mingw-w64-x86_64-nextpnr mingw-w64-x86_64-prjtrellis mingw-w64-x86_64-yosys
I have tried Yosys with more complex projects, and this wasn't as pain-free as with the first, simple one. Expect any Lattice generated IP, if you want to reuse this, to require some manual modifications. Also, some Lattice primitives may not be supported or not completely well.

I've found some oddities regarding VHDL support too (through the yosys ghdl plugin), while the same code using GHDL for simulation is fine...

So, not tried SV yet, but I would expect a number of unsupported features that may drive you nuts. Would be interesting to get feedback on this.

And if you need any generated IP, you're also in for a rough ride.

Lastly, there is no support, currently, for timing constraints for IOs in nextpnr (the P&R tool), which, for designs such as a DDR3 controller, could be a real problem.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #65 on: September 17, 2021, 03:54:33 am »
LOL, the ~150 stock of Arrow Deca boards for the last 6 months have just all sold out in 2 days.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #66 on: September 17, 2021, 07:09:37 pm »
LOL, the ~150 stock of Arrow Deca boards for the last 6 months have just all sold out in 2 days.

You should get a commission. :-DD
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #67 on: September 18, 2021, 12:43:45 am »
Well, I guess we'll now get to see if they were just selling off old stock to get rid, hence the low price, or if more will become available...
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14698
  • Country: fr
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #68 on: September 20, 2021, 12:13:49 am »
Just a note regarding SV support in Yosys and Brian's DDR3 controller.

I tried analyzing the SV files with Yosys, and get a bunch of syntax errors. SV support looks pretty preliminary.
I'm not good enough with SV to be able to help here. But someone who is could have a go if they're ready to issue a number of tickets in the Yosys project, and be patient.
Meanwhile, I'm afraid it's absolutely not an option. Yet.
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #69 on: September 28, 2021, 03:36:02 am »
Ok, I thought some may want to know about DDR pin planning in Quartus.

I've attached 2 screenshots of Quartus' pin planner, 1 for Cyclone_IV and 1 for Max_10.

You will see that I have chosen x8 devices even though we are suing an x16 DDR3 ram chip.  I have done this since the x16 DDR3 actually has 2 groups of DQS basically making it 2 x8 devices.

Here is the Max_10 device:


As for the Cyclone_IV (included Cyclone III), you will notice that there exists the DQS, but not a DQS_n.  My DDR3 ram controller will still work, however, it requires you connect the DDR3's DQS_n to the adjacent DQS IO within the same x8 bank.  Preferably a emulated differential pair as long as it is within the same IO bank, even if it isn't highlighted as being part of the same x8 group.  (Quartus' reported polarity of this differential pair doesn't matter.  So long as the DQS pin is connected to the DQS on the DDR3 and the differential pair gets connected to DQS# on the DDR3 even if Quartus' pin planner says that the x8 DQS pin is the negative part of the differential pair.)


The data mask pins also need to be placed inside the same associated x8 group.

Remember to check the data sheets as some older Cyclones have higher IO performance on the top and bottom rows compared to the left and right sides.  You want to use the higher speed performance IOs.

The CK and CK_n pins should be a differential pair close to the center of everything if you are using more than 1 DDR3 ram chip, otherwise, either end or center will do.

Note that the MAX_10 devices as well as Cyclone_V do have a dedicated CK and CK_n pin for DDR3.  You will need to use these for your DDR3 CK/CK_n if you want full compatibility with Altera's DDR3 controller.

Download Arrow DECA's schematics to get a complete example of the DDR3 wiring.
« Last Edit: September 28, 2021, 04:09:51 am by BrianHG »
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #70 on: September 30, 2021, 02:15:48 pm »
Hi Brian

I got hold on 2 x TFP401 boards from https://learn.adafruit.com/adafruit-tfp401-hdmi-slash-dvi-decoder-to-40-pin-ttl-display
I got them to be able to measure on the V-sync on 2 x HDMI signals to see if the sync locked together (they were)

I then realizes I way back on  Cyclone III had a board from Bitec using the same chip TFP401 and I did look a bit into the datasheets and schematic.

So the TFP401 does decode a HDMI (up to 1920x1080@60) to 3 x 8bit R,G,B and sync and pixel clock and the do use 3v3 as output same as DECA GPIO pins.

So I have made some flat cable connection to the DECA board on GPIO J8

Code: [Select]
I have GND connection on a separate wire

GPIO0_D0 = B0
GPIO0_D1 = B1
GPIO0_D2 = B2
GPIO0_D3 = B3
GPIO0_D4 = B4
GPIO0_D5 = B5
GPIO0_D6 = B6
GPIO0_D7 = B7

GPIO0_D8 = G0
GPIO0_D9 = G1
GPIO0_D10 = G2
GPIO0_D11 = G3
GPIO0_D12 = G4
GPIO0_D13 = G5
GPIO0_D14 = G6
GPIO0_D15 = G7

GPIO0_D16 = R0
GPIO0_D17 = R1
GPIO0_D18 = R2
GPIO0_D19 = R3
GPIO0_D20 = R4
GPIO0_D21 = R5
GPIO0_D22 = R6
GPIO0_D23 = R7

GPIO0_D24 = GND (No Use)
GPIO0_D25 = PIXCLK
GPIO0_D26 = ACTIVE
GPIO0_D27 = HSYNC
GPIO0_D28 = VSYNC
GPIO0_D29 = DISPEN
GPIO0_D30 = NC
GPIO0_D31 = GND (No use)


So my question is if you could recommend how best to write those signal to the RAM ... I looking on the GFX_Demo in Q15

My idea was to also add the second TF401 on the the left over GPIO's and then show a part of the 2 HDMI inputs at the same time on the output.
Like a DVE (not need to scale)  just crop and move some area out of input 1 and 2 and combine them on the output.



Hope it makes sense
 

Offline Yansi

  • Super Contributor
  • ***
  • Posts: 3893
  • Country: 00
  • STM32, STM8, AVR, 8051
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #71 on: September 30, 2021, 02:59:26 pm »
Amazing work and thanks for sharing Brian!

I hope in the near future I will have the time and courage to finally start toying more seriously with the FPGAs and DDR3 will definitely come into play then.  Now back to my flat reconstruction and electronics lab rebuild.
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #72 on: September 30, 2021, 11:52:56 pm »
@Wiljan, good luck.

  You would end up using the same strategy, but in reverse which my GFX demo 'BHG_vpg' uses.  Make a dual clock ram port buffer with 2 lines worth of buffer memory, 32bit aRGB in, 128 bit out.  Clock the input and make an active even/odd line & VS / HS / V_ENA all generated on the capture board's CLK input sampling active pixels into the dual port ram.  Transfer a copy of those 4 status flags to the CMD_CLK clock domain and make a reverse to my 'BrianHG_display_rmem' which will take the line buffer's 128bit side and store it into a selected ram address.

  Note that with work, you can optimize all the code to run on 24bit instead of 32bit graphics if you like, or use 16 RGB, or even use 4:2:2 16 bit YUV bit graphics if there are enough color for you giving you the ability to access and process mix 4x1080p screens/display buffers simultaneously in real time (16 bit color, ram at 400MHz) with still some free access time for a CPU core.

  The 128bit DDR3 access port is the only way to ensure you saturate your read and writes access to the DDR3 memory making a non-stop sequential burst.  You may also organize the available DDR3 ram so that you place a different graphics buffer or video frame in a different DDR3 bank location to optimize simultaneous frame access.

Yes, just re-hack up my 'BHG_vpg' and 'BrianHG_display_rmem' -> 'BHG_vpg_IN' and 'BrianHG_display_Wmem' as they already have 90% of what you need.

On the input side, you may 'crop' the DVE from the source with x-on and x-off start and stop coordinates.  You would need to adjust the DDR3 write width in your 'BrianHG_display_Wmem', or, just do everything in the  'BrianHG_display_Wmem' with the knowledge that you will start and stop every 4 pixels on the x axis for 32bit color.  Y axis will be adjustable line by line.

*Note that I did not include methods of figuring out what your source resolution is or interlace support.
« Last Edit: October 01, 2021, 03:17:52 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #73 on: October 01, 2021, 05:35:16 am »
@Wiljan, one mistake:

Quote
just do everything in the  'BrianHG_display_Wmem' with the knowledge that you will start and stop every 4 pixels on the x axis for 32bit color.

My mistake, I forgot that you have the 'write mask' capability which will allow pixel precision writes down to 8 bit pixels while still using my multiport in 128bit mode to retain full sequential burst speed.  It's only the question of 128bit/4 pixel horizontal alignment.  There are ways to solve this as well at the CMD_CLK stage with a 4 position 224 bit to 128 bit shift register.

It is better to try to do any cropping and computation in the CMD_CLK domain while the sampler input to the dual port line buffer memory should have absolute minimal logic.
« Last Edit: October 01, 2021, 05:54:32 am by BrianHG »
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #74 on: October 01, 2021, 05:53:26 pm »
Thank you Brian, I will for sure reuse most of your great code.
I will let you know when I  have some progress.
Thx 👍
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #75 on: October 04, 2021, 08:59:11 am »
@BrianHG

I been trying over the weekend to make some simple tweaks just to see I actually get something into the FPGA over HDMI

I did remove the RS232 debug part

Changed the GPIO's to only input

Control signals from the HDMI board goes to LED

To be sure freq are not a problem I a 800x600@60 HDMI input (instead of 1920x1080@60)
I can messure the  PIXCLK, H,V, on the LED with a Scope so I for sure have some input the board

(no FIFO yet )

I tried to than just write to the DDR
  • based on the PIXCLK and expected to see some garbage on the output... none, just the internal test signals.


I tried to write fixed 128bit  FF's to the DDR  based on the CMD_Clk still no garbage on the output just test signals

I have removed the Scroll and have fixed 0x0000, 0x0000 to be sure I do see top let (0,0) on the output

I tried to reduce the DDR to only use 1 read port .... then the test signal goes Red so I must must miss to change some signals so I back on 2xR and 2xW

So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?

I will post some images how I have connected the hardware if other could be interested to have external HDMI input to the DECA board

Thank you
Wiljan


 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #76 on: October 04, 2021, 09:04:53 am »
If you do not have a scope or logic analyzer and you want to check your input signals, or any other in the system signals, just setup Quartus SignalTap.  It will give you a multichannel real-time logic analyzer right through the J-Tag connection right into Quartus.

You should be able to scope your source video clocked data, HS, DE and some of the data bus for testing as well as locking onto HS and VS if you like.  (This includes all DDR3 internal bus connections as well as a few other goodies...)

As for video output through HDMI, you will need to set it's PLL with valid settings plus I recommend keeping to 720p, or 480p standards unless you change the HDMI transmitter from HDMI mode to DVI mode.

Keep the DDR3 at 400MHz or 300MHz.  300MHz may take less time to compile.
« Last Edit: October 04, 2021, 09:10:37 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #77 on: October 04, 2021, 09:14:13 am »
So to be honest I have to say I'm bit stocked right now and could use some inspiration, are there any chance you could make the skeleton to feed in the external HDMI?
Take a look at the 'BrianHG_DDR3_DECA_Show_1080p' project.  That project just displays ram at 1920x1080.
Like I said, I made this stuff for everyone to figure out.
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #78 on: October 04, 2021, 12:49:59 pm »
I do have a 4ch Keysight Scope and I do have the HDMI input signals PIXCLK, HS,VS and R,G,B data in the FPGA so that part is fine,  the HDMI output are also fine 1920x1080@60 fine as well

Never did never play with the SignalTap, but sure something I will look into

I have just tried to hook up the 'BrianHG_DDR3_DECA_Show_1080p' and the rs232 debugger to the DECA board and that works as well I can change  pixels and save / load images from PC over the rs232

Have attached few images of the hardware

DigiKey part numbers
PART: 1528-1452-ND MFG : Adafruit Industries LLC / 2219 DESC: TFP401 HDMI/DVI DECODE 40PIN TTL
PART: 1528-2243-ND MFG : Adafruit Industries LLC / 2098 DESC: 40-PIN FPC EXTENSION BRD W/CABLE
PART: 1528-4905-ND MFG : Adafruit Industries LLC / 4905 DESC: 40-PIN FPC TO STRAIGHT 2X20 IDC

My goal is to combine 2x HDMI input (cutout / cropped) to 1 HDMI output signal  all 1920x1080

But I will be happy just to see 1 signal come through  :scared:

« Last Edit: October 04, 2021, 12:56:36 pm by Wiljan »
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #79 on: October 04, 2021, 01:08:42 pm »
It's not too difficult.  You will make it.  Begin with just getting 1 picture onscreen.  Just take a look at my module which writes draws graphics.  It's only if you have multiple 1080p signals simultaneously, writing 32bit pixels will be too slow at the 100MHz bus.  This is why I mentioned writing 128bits at a time which means writing 32bit pixels are at 4x speed.

Note that my ellipse drawing engine has an X/Y coordinate to address generator which is a little complex.  You do not need to go this far.  You only need a reset X&Y position, and add the Y coordinate by a fixed amount once every HS.  The address counter adds for every pixel written to create the X axis.

To begin, try a 720p or 480p source image to sample and copy my 32bit pixel write mode in the ellipse drawing engine.

It's fine to post results / examples and things you created with my DDR3 controller here.

If you are looking for in-depth help on coding techniques for sampling video, make a separate new thread as this thread should stick with my DDR3 controller issues or results/success stories.


BTW, with a Cyclone III of similar size to the DECA's MAX10 and DDR2, I did make a complete 2 video in, 1 video out scaler with PIP, each window crop-able and zoom in and out with test patterns, bi-linear filtering and picture enhancement and color processing, controlled through ethernet.  Though, the DDR2 bus width was a 128bit wide ram module, not a single 16bit wide chip.  I guess at 500MHz with 16bit color instead or 32bit color, you could achieve the same with the DECA board.
« Last Edit: October 04, 2021, 01:24:10 pm by BrianHG »
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #80 on: October 09, 2021, 05:34:57 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #81 on: October 09, 2021, 11:48:41 pm »
Some progress, even I'm out of time at the moment
1 x 800x600@60 is feed in

 There are some small error here and there but at least image on the output  :D

Wow, a 90 degree rotate.
The nasty non-sequential access preventing clean long bursts must be a killer unless you have worked around that.  I know a number of dedicated ways to work around this and get full performance, but they are advanced techniques.  For a first timer, even with small errors, that is still a great start.

Is that real-time?
Double buffered?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #82 on: October 10, 2021, 01:02:47 pm »
Feature update:

******************************************
******** Finally, V1.00 release here: ***********
******************************************
https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/

Things to do:

a)  I will be contacting Intel's technical support about Cyclone V's poor 60% speed FMAX performance for my 1 multiport section in my design as seen in the above screenshots with the red arrow.  I'll see if something can be done.

b)  As described in my v0.95 notes, I will look into designing my simpler pyramid stack-able 2:1 multiport module aimed to achieve an FMAX of at least 200MHz allowing multiport running at Half rate interface controller speed, but with a loss of a few smart advanced features.

c)  I will download and install the latest Lattice Diamond and see if I can adapt and get my controller to compile and simulate there.  The LFE5U-45F/LFE5U-85F at 45kgate & 85kgate are just such a price bargain at 16$ and 36$ each respectively and if my DDR3 controller runs fast there, it is the next route to take.

I will be working on feature 'b)' this week.  I will be targeting 400MHz, not 200MHz.  This will make the module bare bones simple, but, for example, you should be able to run 32bit read and writes at full 400MHz speed saturating the DECA's 800MHz 16bit DDR3, or you should be able to run the the port at 200MHz, 64 bit and still saturate DECA's DDR3.  The new multiport head end should also get Altera's Cyclone V running to speed as my DDR3 phy is already fast enough, it was just the multiport module which was the bottleneck.
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #83 on: October 11, 2021, 06:57:11 am »
Quote
Wow, a 90 degree rotate.

Is that real-time?
Double buffered?

No double buffer, I write directly to 2 x 32bit ports where I have swapped the x1, y1 on the rotate one.
Real time, maybe... had no time to test with moving video yet, will not be home until next week
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #84 on: October 13, 2021, 09:41:32 am »
DDR3 V1.5 engineering update:

     New high FMAX speed multiport front end 'MUX' called BrianHG_DDR3_COMMANDER_4x1.sv.  Unlike the earlier commander, each port input is a read/write channel combined.  Each channel input is identical to my core DDR3 controller's 'BrianHG_DDR3_PHY_SEQ.sv' SEQ_*** inputs.  This will allow you to use additional BrianHG_DDR3_COMMANDER_4x1.sv controllers to drive another one down in the chain making extremely huge port counts if needed.  2:1 mode will offer the greatest possible FMAX while 4:1 will still offer a good FMAX, but allow large ports counts with fewer modules.   My 'USE_TOGGLE_INPUT' and 'USE_TOGGLE_OUTPUT' parameters will allow you to clock each module in a different clock domain.  For example, connecting right to the 'BrianHG_DDR3_PHY_SEQ.sv', you may use a 2:1 module running at 400MHz.  On that first layer module, on port (A) you may use another 2:1 running at 400MHz while on port (B) you may run another MUX in 4:1 mode at 200MHz giving you a total of 2x400MHz read/write ports and 4x200MHz read/write ports.  *Note that crossing clock domain boundaries will only compile with good FMAX results when using PLL clocks frequencies in powers of 2.


Code: [Select]
// Features:
//
// - Input and output ports identical to the BrianHG_DDR3_PHY_SEQ's interface with the optional USE_TOGGLE_CONTROLS
//
// - 2 to 4 Read/Write ports in, 1 port out with user set burst length limiter with read req vector/destination pointer.
// - Designed for high FMAX speed.
// - Designed to be pyramid stacked offering maximum speed 2 R/W ports with 1 COMMANDER_4x1 module, 4 ports using 3 modules,
//   8 ports using 7 modules, 16 ports using 15 modules, or, medium speed 4:1 offering 4 ports with 1 module, or 16 ports
//   using 5 modules...  3:1 mode offers a middle ground of speed VS density VS chosen FPGA speed grade.
//
// - 2 command input FIFO on each port.
// - 16 or 32 stacked read commands for DDR3 read data delay.
// - Separate cached read and write BL8 block.
// - Adjustable write data cache dump timeout.

     Note that now when assessing/configuring a port priority and maximum sequential burst length, unlike the original 16 port commander, you will now need to asses each set of priorities going down through the chain when you stack multiple MUX commanders together.
« Last Edit: October 14, 2021, 03:08:14 pm by BrianHG »
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #85 on: October 23, 2021, 11:00:00 am »
Hi Brian I'm back and I have made some changes.

SignalTap is very useful, thank you for mentioned it  :)

Tested with mowing video and the Rotate was now where real time ... right now I'm not interested in Rotate but 2 straight inputs
But for sure I would like to scale and rotate later

I have removed the flat cable and added in single wires (same length) so fit GPIO's
I have 2 HDMI inputs running 1920x1080@60 non sync in parallel from 2 BrighSign players

I place the 2 inputs side by side on the 4K buffer and can scroll to see the Left / Right transition and it pretty OK
I had some PSU issues and have spitted to more PSU's to avoid interference

I do have some noise here and there in the picture ... you can see in the black hole on the video
Not sure why, but I suspect the "wires" and potential wrong terminations

I would like to write to the DDR as 128 bit instead of 32 bit to lower the traffic to the DDR

Attached is the Quartus project

Link for video
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #86 on: October 23, 2021, 02:40:36 pm »
     Years and years ago, I also transferred 1080p parallel through flex cables.  You are already at the limit of what can be transmitted perfectly clean not counting your hand wired jumpers.  I usually had to invert the incoming clock depending on source resolution to aid in corrupt pixel captures.

     I'm almost finished my new multiport.  It is virtually compatible to the old one except each port is a read and write port, the max is 4:1 per multiport unit, but, you may have a multiport 4:1's output new feed an input of another 4:1 down in the chain offering 16 ports with a 2 layer pyramid stack.  IE 4 units in 4:1 mode, whose 4 outputs drive another 4:1 inputs at the top of the chain while that one feeds the DDR3_PHY controller module.  The advantage here is you now can run the multiport's CMD_CLK in half speed mode up to 250MHz, all 16 ports, instead of the current limit of ~100MHz once you pass 4 IO ports.

   Running in half speed mode instead of quarter means that to completely fill the DDR3 bandwidth, you only need 64bit bus at 200MHz instead of 128bit bus at 100MHz.  With the multiport in 2:1 mode, IE: 400MHz CMD_CLK, you can achieve full DDR3 bandwidth with a 32bit bus, but, the on-FPGA M9K blockram's speed limit is 330MHz, so, no matter what you do, you are stuck with 200MHz mode, or 250MHz if you overclock the FPGA to 500MHz DDR3.


Because of your wiring, remember to at least single if not double D-Flipflop all your inputs from your HDMI receiver boards and for the inputs before you feed any logic and use the attribute (*useioff=1*), example:
Code: [Select]
(* useioff = 1 *) input  logic         Z80_CLK,           // Z80 clock signal (8 MHz)
(* useioff = 1 *) input  logic [21:0]  Z80_ADDR,          // Z80 22-bit address bus

Also, if your CLK inputs are not going the the FPGA's dedicated CLK input pin, try to keep all the data inputs in the same bank as the CLK signal which feeds them.  I know this can be a hassle with the DECA being pre-wired.  If you HDMI decoders have a DDR mode, this may help keeping 15 inputs all clocked inside 1 IO bank instead of 27 inputs with one clock.
« Last Edit: October 23, 2021, 02:55:44 pm by BrianHG »
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 212
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #87 on: October 30, 2021, 05:46:41 pm »
Played half the day with your DDR3 controller on a DECA board and just wanted to say thank you for that absolutely brilliant piece of work! :-+
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #88 on: November 01, 2021, 06:51:36 pm »
Preview Demo .sof programming files of DECA BrianHG_DDR3_Controller V1.5 for Arrow DECA eval board overclocked to 500MHz in Half-rate mode.
(Actual full v1.5 project files coming in 2 days.)

  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D
  >:D  500MHz/1GTPS! with 250MHz multiport interface.  >:D
  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D

Just open your JTAG programmer and add one of the following 3 files:
1. 'BrianHG_DDR3_DECA_500MHz_DDR3_v1.0_QR_GFX_1080p_v3.sof'
        -> DDR3_V1.0, 500MHz DDR_CK, Quarter Rate 125MHz Multiport & Ellipse Generator.

2. 'BrianHG_DDR3_DECA_400MHz_DDR3_V1.5_HR_GFX_1080p_v3.sof'
        -> DDR3_V1.5, 400MHz DDR_CK, Half Rate 200MHz Multiport & Ellipse Generator.

3. 'BrianHG_DDR3_DECA_500MHz_DDR3_V1.5_HR_GFX_1080p_NOELLIPSE.sof'
        -> DDR3_V1.5, 500MHz DDR_CK, Half Rate 250MHz Multiport & Random noise/Binary counter.

Note that the Ellipse generator function has a <200MHz bottleneck, so with demo programming file 3, only pressing buttons 0 or 1 will illustrate the DDR3 32 bit color 250MHz fill speed with random noise or the binary counter pattern.

Check-on the 'Program/Configure' and click 'Start' to program.
The DECA's HDMI should output a 1080p image.

IMPORTANT NOTE:
If the picture is still or scrolling noise, just press buttons 0 or 1, or flip 'Switch 0' to enable drawing ellipses.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.

https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: November 02, 2021, 01:56:17 pm by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #89 on: November 10, 2021, 05:03:23 pm »
What does it mean by 500MHz DDR_CK, Half Rate 250MHz Multiport ?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #90 on: November 10, 2021, 06:25:27 pm »
This means my controller runs at 500MHz, or basically the PHY driving the DRR3 command pins is running at 500MHz while the user interface which has 16 read/write ports is running at 250MHz.  This is actually overclocking the FPGA as some timings come out in the red, ie negative slack.  My controller can achieve a true 100% positive slack at 400MHz PHY with the user interface running at 200MHz.  The older V1.0 could only achieve a user interface of around 100MHz configured to ~3 read + 2 write user ports with the DDR3 PHY running at 400MHz.

My v1.5 constructs a tree / branch stacked join + fork command section allowing a user configured full 16 read/write ports running the full 200MHz with 400MHz DDR3 PHY controller with enough breathing room to compile an unofficial but functional 250MHz 16 port user interface with 500MHz PHY.

Half-rate means my controller will accept a new command once every second DDR_CK clock.  Quarter-rate means my controller will accept a new command once every 4 DDR_CK clocks.  It is the user interface clock frequency.

My DDR3 v1.5 multiport section now generates a smarter version of Xilins illustration shown here on page 18 figure 2.2:
https://www.xilinx.com/support/documentation/user_guides/ug388.pdf
The difference is you just set the total port parameter and my controller is programmed to render that 'branched' system, but all at 128 bit with smart caching of bursts allowing a superior FMAX to my DDR3 v1.0 which had all the ports at the first branch level where they show configuration 5.  You may also configure the width of each branch if you do not require a top FMAX, but want less clock join points between your RW port and the DDR3 phy.

EXAMPLE:
Code: [Select]
// ************************************************************************************************************************************
// ****************  BrianHG_DDR3_COMMANDER_2x1 configuration parameter settings.
parameter int        PORT_TOTAL              = 2,                // Set the total number of DDR3 controller write ports, 1 to 4 max.
parameter int        PORT_MLAYER_WIDTH [0:3] = '{2,2,2,2},       // Use 2 through 16.  This sets the width of each MUX join from the top PORT
                                                                 // inputs down to the final SEQ output.  2 offers the greatest possible FMAX while
                                                                 // making the first layer width = to PORT_TOTAL will minimize MUX layers to 1,
                                                                 // but with a large number of ports, FMAX may take a beating.
// ************************************************************************************************************************************
// PORT_MLAYER_WIDTH illustration
// ************************************************************************************************************************************
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {4,4,x,x}
//
// (PORT_MLAYER_WIDTH[0]=4)    (PORT_MLAYER_WIDTH[1]=4)     (PORT_MLAYER_WIDTH[2]=N/A) (not used)          (PORT_MLAYER_WIDTH[3]=N/A) (not used)
//                                                          These layers are not used since we already
//  PORT_xxxx[ 0] ----------\                               reached one single port to drive the DDR3 SEQ.
//  PORT_xxxx[ 1] -----------==== ML10_xxxx[0] --------\
//  PORT_xxxx[ 2] ----------/                           \
//  PORT_xxxx[ 3] ---------/                             \
//                                                        \
//  PORT_xxxx[ 4] ----------\                              \
//  PORT_xxxx[ 5] -----------==== ML10_xxxx[1] -------------==== SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[ 6] ----------/                              /
//  PORT_xxxx[ 7] ---------/                              /
//                                                       /
//  PORT_xxxx[ 8] ----------\                           /
//  PORT_xxxx[ 9] -----------==== ML10_xxxx[2] --------/
//  PORT_xxxx[10] ----------/                         /
//  PORT_xxxx[11] ---------/                         /
//                                                  /
//  PORT_xxxx[12] ----------\                      /
//  PORT_xxxx[13] -----------==== ML10_xxxx[3] ---/
//  PORT_xxxx[14] ----------/
//  PORT_xxxx[15] ---------/
//
//
//  PORT_TOTAL = 16
//  PORT_MLAYER_WIDTH [0:3]  = {3,3,3,x}
//  This will offer a better FMAX compared to {4,4,x,x}, but the final DDR3 SEQ command has 1 additional clock cycle pipe delay.
//
// (PORT_MLAYER_WIDTH[0]=3)    (PORT_MLAYER_WIDTH[1]=3)    (PORT_MLAYER_WIDTH[2]=3)                   (PORT_MLAYER_WIDTH[3]=N/A)
//                                                         It would make no difference if             (not used, we made it down to 1 port)
//                                                         this layer width was set to [2].
//  PORT_xxxx[ 0] ----------\
//  PORT_xxxx[ 1] -----------=== ML10_xxxx[0] -------\
//  PORT_xxxx[ 2] ----------/                         \
//                                                     \
//  PORT_xxxx[ 3] ----------\                           \
//  PORT_xxxx[ 4] -----------=== ML10_xxxx[1] -----------==== ML20_xxxx[0] ---\
//  PORT_xxxx[ 5] ----------/                           /                      \
//                                                     /                        \
//  PORT_xxxx[ 6] ----------\                         /                          \
//  PORT_xxxx[ 7] -----------=== ML10_xxxx[2] -------/                            \
//  PORT_xxxx[ 8] ----------/                                                      \
//                                                                                  \
//  PORT_xxxx[ 9] ----------\                                                        \
//  PORT_xxxx[10] -----------=== ML11_xxxx[0] -------\                                \
//  PORT_xxxx[11] ----------/                         \                                \
//                                                     \                                \
//  PORT_xxxx[12] ----------\                           \                                \
//  PORT_xxxx[13] -----------=== ML11_xxxx[1] -----------==== ML20_xxxx[1] ---------------====  SEQ_xxxx wires to DDR3_PHY controller.
//  PORT_xxxx[14] ----------/                           /                                /
//                                                     /                                /
//  PORT_xxxx[15] ----------\                         /                                /
//         0=[16] -----------=== ML11_xxxx[2] -------/                                /
//         0=[17] ----------/                                                        /
//                                                                                  /
//                                                                                 /
//                                                                                /
//                                                       0 = ML20_xxxx[2] -------/
//
// ************************************************************************************************************************************
« Last Edit: November 10, 2021, 06:30:03 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #91 on: November 10, 2021, 07:08:57 pm »
I wonder if I could achieve a 'Full-rate' controller at 300MHz.  Having a user 300MHz reading/writing 32bits data can generate perfect ~98% DDR3 data bus saturation consecutive bursts with a 16bit ram, good for 300MHz 32 bit cpus.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #92 on: November 11, 2021, 01:23:14 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
« Last Edit: November 11, 2021, 03:24:30 am by promach »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #93 on: November 11, 2021, 01:40:09 am »
What do you exactly mean by Half-rate means my controller will accept a new command once every second DDR_CK clock. ?
Yes.

Half-rate means 2 things.  When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.  The controller provides a busy signal if there are mandatory command delays required by the DDR3 and it's input buffer memory has exceeded it's stack.  Only my user multiport interface has been running in quarter-rate mode due to it's multiplexer complexity which I am currently enhancing performance there.

I only have a tiny pin driving command timer running at the DDR3 400MHz which receives the stream of generated commands from the above 200MHz controller called 'BrianHG_DDR3_CMD_SEQUENCER.sv', simulated by the 'BrianHG_DDR3_CMD_SEQUENCER_tb.sv'.
« Last Edit: November 11, 2021, 01:58:23 am by BrianHG »
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #94 on: November 11, 2021, 02:36:17 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
« Last Edit: November 11, 2021, 03:24:02 am by promach »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #95 on: November 11, 2021, 02:50:04 am »
Quote
When the DDR3 is being run at 400MHz, (a) the processor which accepts user commands and (b) spits out DDR3 commands is running at 200MHz.  This part of my controller has always operated at half-rate.

However the problem with using half-rate on the commands will result in DDR3 manufacturer timing violations.  For example, given that your DRAM is accepting an incoming 400MHz ck signal, but the DDR3 commands is arriving to the DRAM at a rate of only 200MHz.  This will cause issue such as tMRD violation where the DRAM is getting 2 consecutive MRS commands.

Please correct me if wrong.
No.  I have a command output section running at the full 400MHz.  That section has a 2 word fifo which takes in the stream of commands generated at 200MHz by the 'BrianHG_DDR3_CMD_SEQUENCER.sv' processor and outputs 1 DDR_CK wide commands at 400MHz.  Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #96 on: November 13, 2021, 04:15:47 am »
Quote
Before sending out each received command in that 2 word 200MHz in, 400MHz out FIFO, it uses a look-up table to see how many clock cycles since any previously sent commands to know when it may be permitted the insert the next new command.

I have pondered a bit on your sentence quoted above,
However, when exactly should a new command be "enqueued" into the mentioned FPGA FIFO ?

I asked this question because from my understanding, whether it is half-rate or quarter-rate, the FSM timing event for the initialization sequence will still need to be triggered one at a time.
This means that there is no point of having a 2 words depth FIFO.  The new generated command only needs to be stored in a 1 word depth FIFO (which is basically a register), released to the DRAM once timing is up.
and the next generated command will be "enqueued" into the FIFO at the beginning of the next FSM event ?

Please correct me if wrong.

So, why do you need to have half-rate when quarter-rate already does the same job pretty well enough for power optimization (due to lesser clock transition for a given amount of time passed) ?
« Last Edit: November 13, 2021, 04:41:51 am by promach »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #97 on: November 13, 2021, 05:18:30 am »
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.  Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.  For my design, the type of FIFO I need, FWFT type with acknowledge, has difficulty routing the acknowledge tied to 7 individual DDR3 command timers operating at 400MHz on Altera Cyclone devices.  My 400MHz side doesn't care about the commands it receives, only that each DDR3 command has a different set amount of time for each other possible new command coming in and it is not allowed to violate those minimum delay clock cycles depending on the next command to be sent.

You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.  One thing I cannot fix it the 'skew' or delay between receiving a user command and the length of pipe time it takes to get that command out to the DDR3 as my 'command sequencer' section is a 3-5 clock pipe running at 200 MHz.  (I have optimization parameters which can combine pipe stages at the cost of FMAX or FPGA size.)
« Last Edit: November 13, 2021, 05:35:53 am by BrianHG »
 
The following users thanked this post: promach

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #98 on: November 13, 2021, 05:23:47 am »
The stage piping and the rest of my coding of my controller is so efficient, that even overclocking the FPGA to 500MHz, even while drawing ellipses, the FPGA barely goes above room temperature even without a heatsink.  At 400Mhz, it barely consumes 200mw, never mind what 300 MHz must consume.  Remember, it is the rate of changes in logic state which consume power, not the static state of the command speed going through.

Take a really close look at when and how I even cycle my address and bank lines and control the OE timing and spacing of the data IO port and drive of the ODT line.  Everything is tuned for minimal transitions and proper central clearance and IO bus direction change with extra half cycle hold to achieve the cleanest, quietest, best possible communications with the DDR3.  Error free 500 MHz would have been otherwise impossible as Altera's max for a software DDRIO port is only supposed to be 300MHz.
« Last Edit: November 13, 2021, 05:30:34 am by BrianHG »
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #99 on: November 13, 2021, 06:36:46 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command.  Without the 2 word FIFO, I will always have a 'NOP' between that write and activate since I can only generate 200 million commands a second.  This allows stuffing commands where permitted on either immediate or odd DDR_CK clock cycles.

Could ACTIVATE command for a new bank be issued to DRAM when a write burst for other bank is still ongoing ?
A check on ACTIVATE timing does not suggest so though.

Besides, why odd DDR_CK clock cycles when 2 words depth FIFO is used ?


Quote
Enlarging that fifo to say 4 words would allow for typically the most compact command sequences possible being sent to the DDR3.  With a simple 1 word latch, commands will typically be spaced out on at least every 2nd DDR_CK.

Why 4 words depth FIFO does not have the every 2nd DDR_CK concern ?


Quote
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.

I am confused with which other situations are not taken advantage of ?
« Last Edit: November 13, 2021, 07:12:52 am by promach »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #100 on: November 13, 2021, 07:56:17 am »


Please, just read the entire DDR3 datasheet and try things for yourself.
« Last Edit: November 13, 2021, 07:57:59 am by BrianHG »
 
The following users thanked this post: promach

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #101 on: November 23, 2021, 03:34:35 am »
Quote
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-rate controller with a full-rate controller's performance.  But with 2 words, I'm sort of stuck half way in-between where some situations are taken advantage of while others arent.

Which other situations are not taken advantage of ?


I understand that you are using synchronous FIFO in the above case of in-between commands. 
However, what about synchronizing asynchronous incoming multi-bits DQ signals from DRAM into FPGA ?
I suppose you would need an asynchronous FIFO ?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #102 on: November 23, 2021, 03:49:42 am »
For example, looking at my above screenshot, if my activate of bank 1 was done just before the last write of bank 0, the continuing write would not need that little blue gap where I wrote in red 'ongoing burst' as the continuing write now switching into bank 1 will have been fully activated.  We can extend this throughout while writing in bank 7 and next switching to a new bank 0, where if needed, while write bursting in bank 7, we can send a 'precharge' command, still continuing to write to bank 7, then the new activate bank 0 while still writing into 7, then seamlessly transition you write into the new activated bank 0 without any pause.  You can literally continuously read and write to DDR3 with strategic plan ahead precharge and activates making the ram access as continuous as static ram with the 1 caveat that every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between.  The other advantage of in-between commands is that you can activate and precharge unused banks in an effort to further manual refresh the DDR3 allowing the development of a controller which may almost never need waste any bus cycle time if you application processor is designed to access the DDR3 in a sequential burst manner.

When properly done, a smart controller and an application which can take advantage of bursting and knowledge of DDR3 banks can create a system with access close the the performance of high speed static ram.

This is why both my ram controller and Altera's as well have a parameter to set the address location of the 'BANK'-'ROW'-'COLUMN' order in the controller's addressing scheme.
« Last Edit: November 23, 2021, 03:53:56 am by BrianHG »
 
The following users thanked this post: promach

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #103 on: November 23, 2021, 06:01:10 am »
You could say because of my mid FIFO, if it were a bit larger like 4 words enqueue, I have designed a hybrid half-I understand that you are using synchronous FIFO in the above case of in-between commands. 
However, what about synchronizing asynchronous incoming multi-bits DQ signals from DRAM into FPGA ?
I suppose you would need an asynchronous FIFO ?

Actually, for the DDR DQ, in my 400MHz command out section, I decode the instructions being sent out to detect when a read or write command is being sent.  That r/w decode will schedule my read data and write data FIFO serializers to send or capture / receive DDR data at the right time.  It is the job of my controller to make sure the write data is ready for the write by the time the write data needs to be sent.  For the read, well, if you miss it when the acknowledge / read data ready comes in, you missed it.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #105 on: December 04, 2021, 09:49:35 am »
Next I will make my own sync generator and replace the DECA example junk.
Fix a bug where only the current display mode of 1080p@32bit color functions properly.
And remove the 1-line dualport buffer for a minimal sized dual-port ram.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #106 on: December 12, 2021, 10:04:22 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command. 

Could the same bank interleave mechanism happen for write operations ?
And if yes, then I suppose there is no need for such pipe enqueue stuff ?




Quote
every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

Why few dead 0 access clock cycles ?

 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #107 on: December 12, 2021, 10:24:49 am »
Quote
I pipe enqueue multiple user request commands.  There are situations where a new bank may be activated while a previous write was just sent and a current burst is taking place.  This activate command is allowed immediately after the previous write command. 

Could the same bank interleave mechanism happen for write operations ?
And if yes, then I suppose there is no need for such pipe enqueue stuff ?




Quote
every-time you switch between a read burst and write burst, there are a few dead 0 access clock cycles as the DDR3 needs time to transition from input to output.

Why few dead 0 access clock cycles ?
#1, Yes.  Opening and closing banks are separate of read and write data into any bank's activated row.  You may mess around with all other banks while you still are reading / writing on a different bank, or, at least give enough time for an ACT to become ready.
#2, Read the god damn DDR3 data sheet!  They have example illustrations on switching between read and write called read-to-write and write-to-read operations.  There are mandatory empty cycles as the DQ buffers and DQS switch direction and the DDR3 ram chip row amplifiers change drive current into the memory cap arrays.
 

Offline promach

  • Frequent Contributor
  • **
  • Posts: 875
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #108 on: December 14, 2021, 05:07:53 pm »
Quote
During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between. 

I think the number of in-between commands shall not be limited by whether it is full-rate or half-rate controller.
Since it would only be using simple if-else clocked logic (inside fast clock domain, maybe 500MHz in your case), your controller should be able to achieve such goal without suffering from STA setup timing violation.

Please correct me if wrong.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #109 on: December 15, 2021, 03:21:04 am »
Quote
During an unbroken burst, you send a command every 4 clocks to maintain optimum efficiency.  This means with a full rate controller, you can stuff 3 new commands in-between.  With a half-rate controller, your controller can only be fast enough to add 1 command in-between. 

I think the number of in-between commands shall not be limited by whether it is full-rate or half-rate controller.
Since it would only be using simple if-else clocked logic (inside fast clock domain, maybe 500MHz in your case), your controller should be able to achieve such goal without suffering from STA setup timing violation.

Please correct me if wrong.
:-//  Ok, I have given you plenty enough already, just read my previous posts as the answer lies within.
Please stop asking for guidelines for altering your DDR3 controller here on my thread with my finished DDR3 controller.

    This thread is for those who have issues or need help implementing 'MY' controller in their designs, and, for those who wish to share their success stories & examples implementations using my DDR3 controller system.
« Last Edit: December 15, 2021, 11:36:53 am by BrianHG »
 
The following users thanked this post: voltsandjolts

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #110 on: December 15, 2021, 05:15:31 am »
Hi Brian I'm back and I have made some changes.

SignalTap is very useful, thank you for mentioned it  :)

Tested with mowing video and the Rotate was now where real time ... right now I'm not interested in Rotate but 2 straight inputs
But for sure I would like to scale and rotate later

I have removed the flat cable and added in single wires (same length) so fit GPIO's
I have 2 HDMI inputs running 1920x1080@60 non sync in parallel from 2 BrighSign players

I place the 2 inputs side by side on the 4K buffer and can scroll to see the Left / Right transition and it pretty OK
I had some PSU issues and have spitted to more PSU's to avoid interference

I do have some noise here and there in the picture ... you can see in the black hole on the video
Not sure why, but I suspect the "wires" and potential wrong terminations


I would like to write to the DDR as 128 bit instead of 32 bit to lower the traffic to the DDR

Attached is the Quartus project

Link for video

Not sure why, but Nockieboy also had some occasional missing pixels during block fills in his 8-bit GPU thread using my DDR3 V1.00.  If it is actually the same problem, when we updated to V1.50 on the 8-bit GPU thread, all empty pixel fills disappeared.  (Improved multiport design...)  Note that with 1.50, if you need to be backwards compatible to the old separate read and write ports, just hard wire the write enable on 2 separate ports and you will achieve the same function.  And, don't forget to ASSIGN '0' to all the unused inputs as shown in my new simplified block diagram.
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #111 on: December 15, 2021, 08:58:58 am »
Not sure why, but Nockieboy also had some occasional missing pixels during block fills in his 8-bit GPU thread using my DDR3 V1.00.  If it is actually the same problem, when we updated to V1.50 on the 8-bit GPU thread, all empty pixel fills disappeared.  (Improved multiport design...)  Note that with 1.50, if you need to be backwards compatible to the old separate read and write ports, just hard wire the write enable on 2 separate ports and you will achieve the same function.  And, don't forget to ASSIGN '0' to all the unused inputs as shown in my new simplified block diagram.

Thank you for letting me know, that similar issues has been observed.

The project I was working on was to determinate if a video clip played across 2 HDMI output on the same PC card actually was in sync.

The HDMI output it was processed through several video processors as 2 individual signal and ended up on a a huge LED wall (20m wide x 3 m height) working in 5 zones (processors).

So when I saw your DDR3 /  HDMI output and I already did have the 2 HDMI input board where I have used a scope to measure the 2 x V-Sync which was perfect in Sync, I got the idea to make the  HDMI splitter showing half of HDMI A and half of HDMI B on the same HDMI output.

Due to it was not perfect working in time, We ended up using 2 x Bacho multi-format converter HDMI to SDI (you can convert across all frame-rates / resolution) so it doe have a full frame of memory inside ,,, also here we saw the sync issue on fast horizontal moving content, we desired to record the 2x SDI on some broadcast recorders and try to analyze frame by frame, at the time we also added in external gen-lock for the 2 Barcho units... and the problem was gone.
This confirmed the video out of the PC was perfect in sync  :)

So right now the there are no need for the setup anymore. however I would like for my own exercise try the 1.5 when I get some time over Christmas.

If there still are noise I will also try to "loop" HDMI input to HDMI output via the FPGA to see if the noise goes away

But I'm aware that the wiring I  have are no good for a reliable solution, I did even think if it would be work to make a PCB shield fitting the DECA board with 2 HDMI input , on the other hand you can buy cheap 4x 2k input to 1x4k output splitt units and some of them have quite many features


 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #112 on: December 15, 2021, 09:37:02 am »
(you can convert across all frame-rates / resolution) so it doe have a full frame of memory inside ,,, also here we saw the sync issue on fast horizontal moving content, we desired to record the 2x SDI on some broadcast recorders and try to analyze frame by frame, at the time we also added in external gen-lock for the 2 Barcho units... and the problem was gone.

Ignoring the issue with your HDMI decoder wiring to the DECA, you could do the same on the DECA going from 2 in to 1 except you would have to most likely use lower quality bi-linear or small bi-cubic scaling if you are resizing the source images.  But I do know with code I've done in the past, with the know-how, you can easily do better picture enhancement/processing routines in the DECA than whatever may be available in the mixing consoles you are currently using.  (Except for true 1080i upsampled motion-adaptive de-interlacing unless you dedicate the entire DECA completely to that 1 task.)

Quote
This confirmed the video out of the PC was perfect in sync  :)

This depends on videocard type, drivers and settings/selected video modes, it is not guaranteed.
 

Offline Wiljan

  • Regular Contributor
  • *
  • Posts: 230
  • Country: dk
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #113 on: December 15, 2021, 09:55:36 am »

Quote
This confirmed the video out of the PC was perfect in sync  :)

This depends on videocard type, drivers and settings/selected video modes, it is not guaranteed.

Absolutely .. it's a AMD dual head, and it was here all the discussion started if the problem was on the PC side or on the Wall side (I was in charge for the PC side) and a 3. party on the Wall side

I have a big interest in video processing, been working with broadcast for 30+ years  :o
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #114 on: December 31, 2021, 07:51:00 pm »
Ok, the my BrianHG_GFX_VGA_Window_System is 90% functional.  Only the final layer alpha channel mixer is missing as I am now just averaging windows for testing, but they are all there with all their features.

I paused here because the ravage DDR3 memory access done by the BrianHG_GFX_VGA_Window_System with multiple windows simultaneously open, each deliberately configured with an odd number of row pixels and to eat up over 90% of the available bandwidth causes that one DDR3 read port in use by the graphics system to randomly freeze.  It's time to look at debugging my multiport section of my controller before I finish my last alpha-blend window layer mixing module of my window system.

A version 1.6 will soon be coming where I fix this DDR3 frozen read bug.
(Narrowed it down to using the multiport in Quarter-rate, the higher speed Half-Rate mode works fine.  It seems to be a congestion issue.)
« Last Edit: December 31, 2021, 10:04:02 pm by BrianHG »
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 212
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #115 on: January 09, 2022, 02:52:14 pm »
Wanted to get serious with the BrianHG_DDR3_CONTROLLER after just playing (impressed  :-+) with it. Tried to replace a UniPHY DDR3 controller in one of my existing designs with it today, but failed miserably.

I'm a VHDL guy and it seems interfacing SystemVerilog designs with VHDL isn't really fully supported in Quartus. I implemented a VHDL component representing the BrianHG_DDR3_CONTROLLER_top module on the VHDL side that has the SystemVerilog parameters as VHDL generics but wasn't successful. It appears that it is not possible to map SystemVerilog parameters of any other types than plain integers or bit vectors.
E.g I was expecting that it should be possible to map a SystemVerilog bit parameter (as BHG_OPTIMIZE_SPEED) into a VHDL BIT generic, but all I get is
Code: [Select]
Error (10258): Verilog HDL error at BrianHG_DDR3_CONTROLLER_top.sv(116): unsupported type for Verilog parameter BHG_OPTIMIZE_SPEED
Tried to use an integer generic in the VHDL component instead. This was accepted on the VHDL side, but then failed on the SystemVerilog side as an invalid type.

Apparently, it is not possible to pass the SystemVerilog parameters as VHDL generics, so one either needs to do the parametrization on the SystemVerilog side or modify the interface.

Anybody more successful than myself in interfacing to VHDL?
Beethoven wrote his first symphony in C.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #116 on: January 09, 2022, 04:12:01 pm »
In SystemVerilog, when I say'

parameter bit [x:y] OPTIMIZE_SPEED = z
Note that the 'bit [x:y]' usually can be omitted and changed into an 'int' and the code should still function.
However, I have seen such parameters passed through VHDL as some VHDL of verilog do pass parameters.  Note that the 'bit [x:y]' parameter is similar to a limited logic/register with a limited bit range.  It the [x:y] is missing, it just means a single wire.

This is not my area of expertise.  Note that the Intel FPGA forum may have engineers who may have an answer for you.

If you are out of luck, one workaround which means not modifying my code may be adding a dummy 'mfro' BrianHG_DDR3_CONTROLLER_top_pre-vhdl.sv dummy box module and stuff all your parameters there while your  BrianHG_DDR3_CONTROLLER_top.vhd calls that dummy box.

Also verify Altera's 'Compiler Settings' / 'VHDL Input' and try using the 'VHDL 2008' settings instead of the default 'VHDL 1993'.

IE: When I use that 'bit ***', Instead of declaring the default parameter as 'constant integers', I am declaring them as 'constant standard logic' with so many bits.  This way, I plug them directly into my code and System Verilog will understand that I am passing for example an 16/10/8 bit constants or a 1 bit constant logic wires limiting the user from inputting defaults outside my allotted scope/field/range.  There has got to be a way within VHDL to pass the same logic arguments.
« Last Edit: January 09, 2022, 05:12:50 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #117 on: January 09, 2022, 05:44:34 pm »
@mfro, one thing you can try is to Quartus to 'generate' a verilog instantiation template file for my project.  The new .v generated by quartus re-write the parameters in the older style calling my SystemVerilog module.  Maybe it will be easier to call that new .v from VHDL instead of my direct SystemVerilog.

Though, once you see how Quartus rewrote the parameters, maybe that will give you a clue on how to directly feed my module in VHDL.

Also, do not forget the reverse route.  Just make my wire my top design to the FPGA as I have in the examples, and within that, initiate the rest of your VHDL project inside the FPGA top.sv with all the CDM_xxx and other FPGA IO ports wired to your VHDL instantiation in the top.sv.
« Last Edit: January 09, 2022, 05:49:27 pm by BrianHG »
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 212
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #118 on: January 09, 2022, 06:40:34 pm »
Thanks, Brian. Got it to work, eventually (I now have a compiler crash, but that's another story). At least it passes the stage where it maps the parameters.

It appears the SystemVerilog single bit parameters don't map to std_logic or bit in VHDL generics, but to a std_logic_vector(0 to 0) instead (studied the relevant chapter in the Quartus manual that doesn't really tell you much what maps to what, but the fact that it states the parameters are internally passed as strings inspired me to try a single bit vector instead.
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #119 on: January 10, 2022, 03:55:28 pm »
@mfro, one thing you can try is to Quartus to 'generate' a verilog instantiation template file for my project.  The new .v generated by quartus re-write the parameters in the older style calling my SystemVerilog module.  Maybe it will be easier to call that new .v from VHDL instead of my direct SystemVerilog.

Doing a generate VHDL instantiation creates this file on one of my new projects: GPU_DECA_DDR3_top.cmp

Code: [Select]
-- Copyright (C) 2020  Intel Corporation. All rights reserved.
-- Your use of Intel Corporation's design tools, logic functions
-- and other software and tools, and any partner logic
-- functions, and any output files from any of the foregoing
-- (including device programming or simulation files), and any
-- associated documentation or information are expressly subject
-- to the terms and conditions of the Intel Program License
-- Subscription Agreement, the Intel Quartus Prime License Agreement,
-- the Intel FPGA IP License Agreement, or other applicable license
-- agreement, including, without limitation, that your use is for
-- the sole purpose of programming logic devices manufactured by
-- Intel and sold by Intel or its authorized distributors.  Please
-- refer to the applicable agreement for further details, at
-- https://fpgasoftware.intel.com/eula.


-- Generated by Quartus Prime Version 20.1 (Build Build 720 11/11/2020)
-- Created on Sun Jan 09 12:32:21 2022

COMPONENT GPU_DECA_DDR3_top
GENERIC ( GPU_MEM : INTEGER := 524288; ENDIAN : STRING := "Little"; PDI_LAYERS : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"0001"; SDI_LAYERS : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"0100";
ENABLE_TILE_MODE : STRING := "A(1,0,0,0,0,0,0,0)"; SKIP_TILE_DELAY : std_logic := '0'; ENABLE_PALETTE : STRING := "A(1,1,1,1,1,1,1,1)"; SKIP_PALETTE_DELAY : std_logic := '0';
HWREG_BASE_ADDRESS : INTEGER := 256; HWREG_BASE_ADDR_LSWAP : INTEGER := 240; PAL_BASE_ADDR : INTEGER := 4096; TILE_BYTES : INTEGER := 65536;
TILE_BASE_ADDR : INTEGER := 16384; FPGA_VENDOR : STRING := "Altera"; FPGA_FAMILY : STRING := "MAX 10"; BHG_OPTIMIZE_SPEED : std_logic := '1';
BHG_EXTRA_SPEED : std_logic := '1'; CLK_KHZ_IN : INTEGER := 50000; CLK_IN_MULT : INTEGER := 24; CLK_IN_DIV : INTEGER := 4;
DDR_TRICK_MTPS_CAP : INTEGER := 600; INTERFACE_SPEED : STRING := "Half"; DDR3_CK_MHZ : INTEGER := 300; DDR3_SPEED_GRADE : STRING := "-15E";
DDR3_SIZE_GB : INTEGER := 4; DDR3_WIDTH_DQ : INTEGER := 16; DDR3_NUM_CHIPS : INTEGER := 1; DDR3_NUM_CK : INTEGER := 1;
DDR3_WIDTH_ADDR : INTEGER := 15; DDR3_WIDTH_BANK : INTEGER := 3; DDR3_WIDTH_CAS : INTEGER := 10; DDR3_WIDTH_DM : INTEGER := 2;
DDR3_WIDTH_DQS : INTEGER := 2; DDR3_RWDQ_BITS : INTEGER := 128; DDR3_ODT_RTT : INTEGER := 40; DDR3_RZQ : INTEGER := 40;
DDR3_TEMP : INTEGER := 85; DDR3_WDQ_PHASE : INTEGER := 270; DDR3_RDQ_PHASE : INTEGER := 0; DDR3_MAX_REF_QUEUE : STD_LOGIC_VECTOR(3 DOWNTO 0) := b"1000";
IDLE_TIME_uSx10 : STD_LOGIC_VECTOR(6 DOWNTO 0) := b"0001010"; SKIP_PUP_TIMER : std_logic := '0'; BANK_ROW_ORDER : STRING := "ROW_BANK_COL"; PORT_ADDR_SIZE : INTEGER := 30;
PORT_TOTAL : INTEGER := 5; PORT_MLAYER_WIDTH : STRING := "A(2,2,2,2)"; PORT_VECTOR_SIZE : INTEGER := 16; READ_ID_SIZE : INTEGER := 4;
DDR3_VECTOR_SIZE : INTEGER := 5; PORT_CACHE_BITS : INTEGER := 128; CACHE_ADDR_WIDTH : INTEGER := 4; BYTE_INDEX_BITS : INTEGER := 11;
PORT_TOGGLE_INPUT : STRING := "A(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)"; PORT_R_DATA_WIDTH : STRING := "A(000001000,000001000,000010000,000010000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000)"; PORT_W_DATA_WIDTH : STRING := "A(000001000,000001000,000010000,000010000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000,010000000)"; PORT_PRIORITY : STRING := "A(11,10,00,00,10,00,00,00,00,00,00,00,00,00,00,00)";
PORT_READ_STACK : STRING := "A(16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16)"; PORT_W_CACHE_TOUT : STRING := "A(100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000)"; PORT_CACHE_SMART : STRING := "A(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)"; PORT_DREG_READ : STRING := "A(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)";
PORT_MAX_BURST : STRING := "A(100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000,100000000)"; SMART_BANK : std_logic := '0' );
PORT
(
ADC_CLK_10 : IN STD_LOGIC;
MAX10_CLK1_50 : IN STD_LOGIC;
MAX10_CLK2_50 : IN STD_LOGIC;
KEY : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
LED : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
CAP_SENSE_I2C_SCL : INOUT STD_LOGIC;
CAP_SENSE_I2C_SDA : INOUT STD_LOGIC;
AUDIO_BCLK : INOUT STD_LOGIC;
AUDIO_DIN_MFP1 : OUT STD_LOGIC;
AUDIO_DOUT_MFP2 : IN STD_LOGIC;
AUDIO_GPIO_MFP5 : INOUT STD_LOGIC;
AUDIO_MCLK : OUT STD_LOGIC;
AUDIO_MISO_MFP4 : IN STD_LOGIC;
AUDIO_RESET_n : INOUT STD_LOGIC;
AUDIO_SCL_SS_n : OUT STD_LOGIC;
AUDIO_SCLK_MFP3 : OUT STD_LOGIC;
AUDIO_SDA_MOSI : INOUT STD_LOGIC;
AUDIO_SPI_SELECT : OUT STD_LOGIC;
AUDIO_WCLK : INOUT STD_LOGIC;
FLASH_DATA : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
FLASH_DCLK : OUT STD_LOGIC;
FLASH_NCSO : OUT STD_LOGIC;
FLASH_RESET_n : OUT STD_LOGIC;
G_SENSOR_CS_n : OUT STD_LOGIC;
G_SENSOR_INT1 : IN STD_LOGIC;
G_SENSOR_INT2 : IN STD_LOGIC;
G_SENSOR_SCLK : INOUT STD_LOGIC;
G_SENSOR_SDI : INOUT STD_LOGIC;
G_SENSOR_SDO : INOUT STD_LOGIC;
HDMI_I2C_SCL : INOUT STD_LOGIC;
HDMI_I2C_SDA : INOUT STD_LOGIC;
HDMI_I2S : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
HDMI_LRCLK : INOUT STD_LOGIC;
HDMI_MCLK : INOUT STD_LOGIC;
HDMI_SCLK : INOUT STD_LOGIC;
HDMI_TX_CLK : OUT STD_LOGIC;
HDMI_TX_D : OUT STD_LOGIC_VECTOR(23 DOWNTO 0);
HDMI_TX_DE : OUT STD_LOGIC;
HDMI_TX_HS : OUT STD_LOGIC;
HDMI_TX_INT : IN STD_LOGIC;
HDMI_TX_VS : OUT STD_LOGIC;
LIGHT_I2C_SCL : OUT STD_LOGIC;
LIGHT_I2C_SDA : INOUT STD_LOGIC;
LIGHT_INT : INOUT STD_LOGIC;
MIPI_CORE_EN : OUT STD_LOGIC;
MIPI_I2C_SCL : OUT STD_LOGIC;
MIPI_I2C_SDA : INOUT STD_LOGIC;
MIPI_LP_MC_n : IN STD_LOGIC;
MIPI_LP_MC_p : IN STD_LOGIC;
MIPI_LP_MD_n : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_LP_MD_p : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_MC_p : IN STD_LOGIC;
MIPI_MCLK : OUT STD_LOGIC;
MIPI_MD_p : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
MIPI_RESET_n : OUT STD_LOGIC;
MIPI_WP : OUT STD_LOGIC;
NET_COL : IN STD_LOGIC;
NET_CRS : IN STD_LOGIC;
NET_MDC : OUT STD_LOGIC;
NET_MDIO : INOUT STD_LOGIC;
NET_PCF_EN : OUT STD_LOGIC;
NET_RESET_n : OUT STD_LOGIC;
NET_RX_CLK : IN STD_LOGIC;
NET_RX_DV : IN STD_LOGIC;
NET_RX_ER : IN STD_LOGIC;
NET_RXD : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
NET_TX_CLK : IN STD_LOGIC;
NET_TX_EN : OUT STD_LOGIC;
NET_TXD : OUT STD_LOGIC_VECTOR(3 DOWNTO 0);
PMONITOR_ALERT : IN STD_LOGIC;
PMONITOR_I2C_SCL : OUT STD_LOGIC;
PMONITOR_I2C_SDA : INOUT STD_LOGIC;
RH_TEMP_DRDY_n : IN STD_LOGIC;
RH_TEMP_I2C_SCL : OUT STD_LOGIC;
RH_TEMP_I2C_SDA : INOUT STD_LOGIC;
SD_CLK : OUT STD_LOGIC;
SD_CMD : INOUT STD_LOGIC;
SD_CMD_DIR : OUT STD_LOGIC;
SD_D0_DIR : OUT STD_LOGIC;
SD_D123_DIR : INOUT STD_LOGIC;
SD_DAT : INOUT STD_LOGIC_VECTOR(3 DOWNTO 0);
SD_FB_CLK : IN STD_LOGIC;
SD_SEL : OUT STD_LOGIC;
SW : IN STD_LOGIC_VECTOR(1 DOWNTO 0);
TEMP_CS_n : OUT STD_LOGIC;
TEMP_SC : OUT STD_LOGIC;
TEMP_SIO : INOUT STD_LOGIC;
USB_CLKIN : IN STD_LOGIC;
USB_CS : OUT STD_LOGIC;
USB_DATA : INOUT STD_LOGIC_VECTOR(7 DOWNTO 0);
USB_DIR : IN STD_LOGIC;
USB_FAULT_n : IN STD_LOGIC;
USB_NXT : IN STD_LOGIC;
USB_RESET_n : OUT STD_LOGIC;
USB_STP : OUT STD_LOGIC;
BBB_PWR_BUT : IN STD_LOGIC;
BBB_SYS_RESET_n : IN STD_LOGIC;
GPIO0_D : INOUT STD_LOGIC_VECTOR(43 DOWNTO 0);
GPIO1_D : INOUT STD_LOGIC_VECTOR(22 DOWNTO 0);
DDR3_RESET_n : OUT STD_LOGIC;
DDR3_CK_p : OUT STD_LOGIC_VECTOR(DDR3_NUM_CK-1 DOWNTO 0);
DDR3_CK_n : OUT STD_LOGIC_VECTOR(DDR3_NUM_CK-1 DOWNTO 0);
DDR3_CKE : OUT STD_LOGIC;
DDR3_CS_n : OUT STD_LOGIC;
DDR3_RAS_n : OUT STD_LOGIC;
DDR3_CAS_n : OUT STD_LOGIC;
DDR3_WE_n : OUT STD_LOGIC;
DDR3_ODT : OUT STD_LOGIC;
DDR3_A : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_ADDR-1 DOWNTO 0);
DDR3_BA : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_BANK-1 DOWNTO 0);
DDR3_DM : OUT STD_LOGIC_VECTOR(DDR3_WIDTH_DM-1 DOWNTO 0);
DDR3_DQ : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQ-1 DOWNTO 0);
DDR3_DQS_p : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQS-1 DOWNTO 0);
DDR3_DQS_n : INOUT STD_LOGIC_VECTOR(DDR3_WIDTH_DQS-1 DOWNTO 0)
);
END COMPONENT;


It looks like Quartus knows about using the 'STD_LOGIC_VECTOR(0 DOWNTO 0) := ...
If fact, I'm assuming Quartus just made a VHDL compliant code to call my project...

In fact, they have it even simpler at:  BHG_EXTRA_SPEED : std_logic := '1';
« Last Edit: January 10, 2022, 04:47:37 pm by BrianHG »
 

Offline mfro

  • Regular Contributor
  • *
  • Posts: 212
  • Country: de
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #120 on: January 10, 2022, 07:00:28 pm »
doesn't work with BrianHG-DDR3-Controller_top.sv:

Code: [Select]
Error (283001): Can't create Component Declaration or Verilog Instantiation File for entity "BrianHG_DDR3_CONTROLLER_top" which has two or more dimensional ports

Thanks anyway.

The std_logic mapping also appears to be wrong (at least, it doesn't compile). The only mapping that works for me is indeed to std_logic_vector(0 to 0) as posted.

I'm on Quartus 20.1, btw.
Beethoven wrote his first symphony in C.
 
The following users thanked this post: BrianHG

Offline vdp

  • Newbie
  • Posts: 1
  • Country: bg
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller.
« Reply #121 on: January 17, 2022, 08:13:51 am »
I though Gowin already came with a free DDR3/4 controller IP.  Not much need for mine like with Lattice and Altera who charge an arm and a leg to hook up DDR3 ram.

Edit: nevermind
« Last Edit: January 29, 2022, 07:22:36 am by vdp »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #122 on: January 28, 2022, 01:00:33 pm »
 :phew: Ok, here is the new 2 page block diagram 'BrianHG_GFX_VGA_Window_System.pdf' block diagram and 'BrianHG_GFX_VGA_Window_System.txt' documentation for developers.

-Up to 64 window layers, with alpha blend transparency from layer-to-layer.
-In system real-time video mode switching support.
-Supports 32/16a/16b/8/4/2/1 bpp windows.
-Supports accelerated Fonts/Tiles stored in dedicated M9K blockram with resolutions of 4/8/16/32 X 4/8/16/32 pixels.
-Supports up to 1k addressable tiles/characters with 32/16a/16b/8/4/2/1 bpp, with mirror and flip.
-Each window has a base address, X&Y screen position & H&V sizes up to 65kx65k pixels.
-Independent bpp depth for each window.
-Optional independent or shared 256 color 32 bit RGBA palettes for each window.
-In tile mode, each tile/character's output with 8 bpp and below can be individually assigned to different portions of the palette.
-Multilayer 8 bit alpha stencil translucency between layers with programmable global override.
-Quick layer swap-able registers.
-Hardware individual integer X&Y scaling where each window output can be scaled 1x through 16x.

     My new BrianHG_DDR3_CONTROLLER_v16 which now has the multi-window VGA system demo should be uploaded in a few days.
« Last Edit: January 28, 2022, 01:42:22 pm by BrianHG »
 
The following users thanked this post: nockieboy

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #123 on: March 20, 2022, 04:35:50 pm »
Hi. I'm attempting to write an Avalon wrapper for your controller. During my memory test sim I encountered this error with a write/readback sequence:
(all addresses are byte-addressed)
  • Write 'data1' to 0x0000, readback and check against 'data1' (PASS)
  • Write 'data2' to 0x1000, readback and check against 'data2' (PASS)
  • Write 'data3' to 0x0000, readback and check against 'data3' (PASS)
  • Write 'data4' to 0x0000, readback and check against 'data4' (FAIL, data3 received)

The fourth read doesn't issue any command to the DDR3 model, it just returns dirty data from the cache. I would expect the fourth write with fresh data to signal to the cache it needs to perform another read.

Here's the failed write/read:


Is this expected behavior?
Thanks.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #124 on: March 20, 2022, 05:24:34 pm »
Hi. I'm attempting to write an Avalon wrapper for your controller. During my memory test sim I encountered this error with a write/readback sequence:
(all addresses are byte-addressed)
  • Write 'data1' to 0x0000, readback and check against 'data1' (PASS)
  • Write 'data2' to 0x1000, readback and check against 'data2' (PASS)
  • Write 'data3' to 0x0000, readback and check against 'data3' (PASS)
  • Write 'data4' to 0x0000, readback and check against 'data4' (FAIL, data3 received)

The fourth read doesn't issue any command to the DDR3 model, it just returns dirty data from the cache. I would expect the fourth write with fresh data to signal to the cache it needs to perform another read.

Here's the failed write/read:
If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2.  However, I will try to replicate the bug later tonight both with and without that 1 spare clock.  If you are using a separate CMD_xxx[ # ] as a write channel and another one for the read channel, then yes, it is possible to have stale data in the cache.

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. This allows the cache module to coalesce multiple writes within the same 16 bytes before sending a write command to the DDR3.  Otherwise, if you were to write, with a 8 bit data mode port, 16 consecutive bytes, every single byte write will send a DDR3 command wasting a huge setup and burst-8-cycle to the DDR3 which wouldn't be needed until the last of the 16 bytes has been received.  The smart cache feature means if you are reading from the same address as the current coalescing writes, you will read the new data even though it has yet to be send to the DDR3.

My bug may be because a write's smart caching takes 1 clock cycle to transfer it's new data to the read cache's buffer side, but you appear to have plenty of time in your sim.  Remember, the read cache is there to perform the same function as the write cache.

If you are using a single port directly controlling my 'PHY' module, then there is no caching and you should see the correct data in order or read and write.
« Last Edit: March 20, 2022, 05:35:11 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #125 on: March 20, 2022, 05:48:36 pm »

Is this expected behavior?
Thanks.
You should not see this behavior.  Pleave check that the setup time for the write command is placed ahead of the CMD_CLK.  It looks as if my module didn't see your write, or, it took the write at this address: (see attached photo)



To be sure when simulating and sending commands, try offsetting the commands you send by 1/2 CMD_CLK phase so that you can see clearly what is being accepted during the 'rise' of the source clock.

Also, the way you are accessing the ram with the set 'write mask', make sure you have the port width set to 128 bits, otherwise nothing will write.  You have only bits 96 through 127 write enabled.
« Last Edit: March 20, 2022, 06:28:27 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #126 on: March 20, 2022, 06:29:03 pm »

If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2. ...

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. ...

I'm only using one element in the CMD_* array. Relevant parameters are:
  • PORT_PRIORITY = '{default:0}
  • PORT_READ_STACK = '{default:4}
  • PORT_W_CACHE_TOUT = '{default:0}
  • PORT_CACHE_SMART = '{default:0}
  • PORT_MAX_BURST  = '{default:256}
  • SMART_BANK =  0
Everything else is the default for the DECA example at 400 MHz.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #127 on: March 20, 2022, 06:43:32 pm »

You should not see this behavior.  Pleave check that the setup time for the write command is placed ahead of the CMD_CLK.  It looks as if my module didn't see your write, or, it took the write at this address: (see attached photo)

To be sure when simulating and sending commands, try offsetting the commands you send by 1/2 CMD_CLK phase so that you can see clearly what is being accepted during the 'rise' of the source clock.


I'm pretty certain the address is getting sampled by your block correctly. I can see from the memory model prints that when I input address 0x0000 it corresponds to Row/Bank/Col = 0. Just to be sure I inverted the clock going to my logic and got the same result. 'clk' runs my logic and 'tmp' runs your logic in the screenshot.



Also, the way you are accessing the ram with the set 'write mask', make sure you have the port width set to 128 bits, otherwise nothing will write.  You have only bits 96 through 127 write enabled.

The port is set to 128-bits. I load the 128-bit words big-endian to match your controller, so CMD_wdata = 0x12345678 00000000 ... and CMD_wmask = 0xFFFF 0000 ....

Here you can see my writes/reads going into and out of the DDR3 successfully, note the final missing read operation due to the cache:
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #128 on: March 20, 2022, 06:51:04 pm »

If you are using the multiport module and the read and write channel are on the same CMD_xxx[ # ] bus, and smart cache is enabled, then you should receive the new data as long as there is 1 spare clock between the 2. ...

Writes to the DDR3 are held off until either a new write is sent outside the current cached address, or, the write cache timer has reached 0 due to no additional writes on that port.  The current 'PORT_W_CACHE_TOUT' parameter default is set to 255 CMD_CLKS. ...

I'm only using one element in the CMD_* array. Relevant parameters are:
  • PORT_PRIORITY = '{default:0}
  • PORT_READ_STACK = '{default:4}
  • PORT_W_CACHE_TOUT = '{default:0}
  • PORT_CACHE_SMART = '{default:0}
  • PORT_MAX_BURST  = '{default:256}
  • SMART_BANK =  0
Everything else is the default for the DECA example at 400 MHz.

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.
« Last Edit: March 20, 2022, 07:10:42 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #129 on: March 20, 2022, 07:04:31 pm »
If you are using my DDR3 V1.5, the:
 PORT_READ_STACK   [0:15]  should be  '{default:16} for maximum read speed when you stack a number of consecutive reads.  Though, with 128bit and if you do not require serious random read stacked events, 4 is perfectly fine.

« Last Edit: March 20, 2022, 07:15:42 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #130 on: March 20, 2022, 07:20:11 pm »

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.

In your comment for 'PORT_CACHE_SMART' you list disabling it for memory testing. I wanted to see each request go to the DDR3 without extra logic surrounding it. 'PORT_W_CACHE_TOUT' was disabled for a similar reason. Starting my development dumb & slow then improving it once the basics work.

I enabled the smart cache and it solved the particular test case, however the behavior was unexpected. I'll just leave it on for now.
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #131 on: March 20, 2022, 07:26:57 pm »

Warning, if 'PORT_CACHE_SMART  is not set to '{default 1}, then you will be reading old stale data since the last read.

Enabling the PORT_CACHE_SMART means if a write has been done at any time, if there is a matching read address cached, that read cache data will immediately reflect what was written to the write cache even before the write data has been sent to the DDR3.  This parameter should always be on unless you are trying to scrounge up 1 last logic cell on a full FPGA, or get that lat FMAX MHz.


Even with the 'PORT_W_CACHE_TOUT = '{default:0}', meaning a write will go out to the DDR3 ASAP, the DDR3 always operates at a delay since there is a ton of setup involved.  My controller is trying to prevent unnecessary DDR3 access whenever possible.

In your comment for 'PORT_CACHE_SMART' you list disabling it for memory testing. I wanted to see each request go to the DDR3 without extra logic surrounding it. 'PORT_W_CACHE_TOUT' was disabled for a similar reason. Starting my development dumb & slow then improving it once the basics work.

I enabled the smart cache and it solved the particular test case, however the behavior was unexpected. I'll just leave it on for now.

Note that if you do not need or want any features of my multiport module with the CMD_xxx interface, it is a waste of space and you will get much better performance just using my PHY controller.  No cache, not smart, send a command and the DDR3 will do it ASAP, and around 1/2 the logic cells.

Example PHY only interface:  https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_DECA_only_PHY_SEQ

The only thing is that your 400MHz controller will have a 200MHz interface only, no option for 100MHz quarter rate unless you use the 'toggle' enable & data ready feature which allows for alternate clock domain command interface.

Each enabled command will always be sent to the DDR3 regardless of address or repeats.  But, you will no longer have the ability to add multiple read/write ports and you are stuck with 128bit.
« Last Edit: March 20, 2022, 07:53:23 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #132 on: April 02, 2022, 02:22:34 am »
Hi,

I'm trying to use the PHY_SEQ only connected to my custom code. I'm finding that sometimes the CMD_busy signals ends up sticking to 1 and locking all my upstream logic, but not the downstream logic (your block), which ends up performing the same write/read over and over again. This is all with TOGGLE_CONTROLs = 0.

Could the behavior I'm encountering be because of 'CMD_ena' and 'refresh_in_progress' assert at the same time? See the red highlight in pt1.png for that. In pt2.png you can see the busy signal get stuck with hopefully some extra surrounding info.

Also, I just need to make sure that when TOGGLE_CONTROLS=0 the CMD_ena and CMD_busy signals are analogous to something like AXI stream tvalid and tready. It seems like TOGGLE_CONTROLS=1 is your preferred style, would it be better to use that for driving the PHY?

 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #133 on: April 02, 2022, 03:01:07 am »
I'm trying to use the PHY_SEQ only connected to my custom code. I'm finding that sometimes the CMD_busy signals ends up sticking to 1 and locking all my upstream logic, but not the downstream logic (your block), which ends up performing the same write/read over and over again. This is all with TOGGLE_CONTROLs = 0.
Note that with toggle controls at 0, the my 'CMD_BUSY' will go high is either the commands going in overflow the command stack, or, it will go high while an internal refresh request has been posted and it will stay high until the command has been added to the queue.  Whenever the 'CMD_BUSY' is high, all input activity on the CMD_ENA is ignored.

My Modelsim for the internal behavior of this DDR3 command stack processor belong to my 'BrianHG_DDR3_CMD_SEQUENCER_tb.sv' and the '.do' batch file 'setup_seq.do' and 'run_seq.do'.

Quote
Could the behavior I'm encountering be because of 'CMD_ena' and 'refresh_in_progress' assert at the same time? See the red highlight in pt1.png for that. In pt2.png you can see the busy signal get stuck with hopefully some extra surrounding info.

If they are asserted at the same time, the refresh in progress should take priority, yet the I do assert the CMD_BUSY ahead by 1 clock so you know you should not be sending a command at that time.

Q:  Did you wait long enough for the refresh to run through to see if your entered command came out the other end?  A refresh on a 4gb DDR3 is something like 350ns.  If you stacked a command or 2 in advance, the busy will stay high until those commands have finally been sent out in the neighborhood of 400ns later and don't forget there may be still a few commands in advance to pipe on through before the refresh begins.  (One advantage to using my multiport is if there are repetitive commands, it runs then in the cache first before bothering with accessing the DDR3.)

Quote
Also, I just need to make sure that when TOGGLE_CONTROLS=0 the CMD_ena and CMD_busy signals are analogous to something like AXI stream tvalid and tready. It seems like TOGGLE_CONTROLS=1 is your preferred style, would it be better to use that for driving the PHY?

Sorry, I am unfamiliar with the 'AXI stream tvalid and tready'.

My toggle mode treats the CMD_ENA_t input like a command address [ 0 ].  So, each command you send, that address should increment in parallel.

The CMD_BUSY_t operates like a return address [ 0 ] telling you which command address has finished processing.

The idea is if your control device driving my DDR3 is running, for example at 100MHz instead of 200MHz, incrementing/toggling that CMD_ENA_t input with every new command is seen by my controller as 1 new command.  Without toggle mode, pulsing the CMD_ENA at 100MHz will be seen as 2 consecutive commands by my 200MHz DDR3 core.  On your device host side, you know you are clear to continue sending commands so long as 'CMD_ENA_t  == CMD_busy_t'.  You can say within your module:

wire DDR3_is_busy = !(my_out_reg_CMD_ENA_t  == input_from_DDR3_phy_CMD_busy_t);
« Last Edit: April 02, 2022, 03:09:46 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #134 on: April 02, 2022, 03:28:07 am »
Ohh, 1 other thing about the refresh.  After a power-on reset, or reset pulse, the DDR3 will begin to run for around 15 milliseconds before the first initial refresh commands come in.  This is a one time thing after power-up and can be seen in some simulations.  This does not generate any lost or missing data as the CMD_BUSY flag will properly run if needed.  If no CMD_ENA commands are being sent, a small train of sequential refresh commands may run through, but, these additional ones may be interrupted by any CMD_ENA command you send as after the first one, the others are low priority.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #135 on: April 02, 2022, 03:43:43 am »
I've attached my decoding of your logic waveform:
Do not worry about the internals inside my source.  Wait until the actual commands are sent to the DDR3 and every command you CMD_ENA'ed while the CMD_BUSY was low will make it to the DDR3 when it is permitted due to DDR3 timing constraints and potential row and page selection as well as refresh.

Using the toggle mode =1, you may see how the CMD_ENA_t is toggled with each sent command while the CMD_BUSY_t return appears to you more like an ACKNOWLEDGE becoming equal to the CMD_ENA once a command is accepted.  I do not know the internal working of the AXI system, but an acknowledge style interface may be easier to work with if you generate the toggle out on your side.
« Last Edit: April 02, 2022, 03:48:17 am by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #136 on: April 02, 2022, 04:20:31 am »
Ohh, 1 other thing about the refresh.  After a power-on reset, or reset pulse, the DDR3 will begin to run for around 15 milliseconds before the first initial refresh commands come in.  This is a one time thing after power-up and can be seen in some simulations.  This does not generate any lost or missing data as the CMD_BUSY flag will properly run if needed.  If no CMD_ENA commands are being sent, a small train of sequential refresh commands may run through, but, these additional ones may be interrupted by any CMD_ENA command you send as after the first one, the others are low priority.
I see a refresh occur around 15 microseconds, if that's what you mean. I won't be getting close to 15 ms with the free version of Modelsim, lol. The first two refreshes shortly complete, but then I get locked up with one that doesn't end. See screenshot.

I tried delaying the CMD_ena signal by a single cycle, to avoid it being asserted on the same edge as 'refresh_in_progress'. The sim was able to get farther than it usually does, until the same issue happened again. Do I need to deassert CMD_ena during a refresh?

Tell me if this is wrong, quick pseudo-code for the CMD_* bus:

if(state)
  cmd_ena <= 1;
  if(cmd_ena & !cmd_busy)
     if(last_xfer)
         cmd_ena <= 0;
         state <= next_state

 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #137 on: April 02, 2022, 04:52:26 am »
See attached image.  The ddr3 is working fine.

Note that even though you set the use-toggle =0, the refresh in progress is an internal signal and it is always a toggle style signal.  So, viewing it alone, you cannot see the true refresh request state.  If you want to know the truth about the refresh, you need to make a :

wire busy_doing_a_refresh = ( refresh_req != refresh_in_progress );

Quote
if(state)
  cmd_ena <= 1;
  if(cmd_ena & !cmd_busy)
     if(last_xfer)
         cmd_ena <= 0;
         state <= next_state

What are you trying to do?

it's more like:
if (!cmd_busy && I_need_to_access_ddr3) begin
     CMD_xxx <= what to do
     CMD_ENA <= 1;
     state       <= next_state;
else if (cmd_busy && I_need_to_access_ddr3) begin
      state <= wait;
else if (!cmd_busy)  begin
      CMD_ENA <= 0;
      state      <= next_state;
end


Note that the state can be done as combinational logic saving a clock cycle.

wire state = (cmd_busy && I_need_to_access_ddr3) ? wait_state : next_state;

« Last Edit: April 02, 2022, 06:59:21 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #138 on: April 04, 2022, 11:58:23 pm »
Ooopps, I just looked back at my code.  I made a mistake in my above post.

The logic 'refresh_in_progress' is actually true logic, not toggle logic.

From what I can see, forcing the CMD_ENA indefinitely high has tied up my sequencer preventing the refresh request from taking place.  The moment the CMD_ENA goes low, the next command entered into the command FIFO stack would be the refresh.  I need to double check that this does not accidentally constitute a potential refresh violation.  (Note that my coding counts the elapsed time of missed refreshes and will stream a continuous block or refreshes when it gets the chance to, maintaining the datasheet's recommended average refresh row count / maximum time period.)  When using my multiport & it's toggle-enable set to 1, there is always room for a refresh to enter the queue.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #139 on: April 09, 2022, 03:45:19 am »
Hey,
After developing my Avalon bridge to wrap your PHY+PLL, I decided some benchmarks were in order to compare against the Altera UniPHY IP.

All results below were obtained with my synthesizable Avalon memory tester that writes the entire DDR3 with random data then verifies it. It was configured with a 64-bit data bus and max burst of 256 to match what the UniPHY IP wanted. The memory tester is controlled via a separate JTAG-Avalon IP that measures how long the transaction takes with a TCL script. Both DDR3 instances were clocked at 300 MHz with a half-rate Avalon interface.

Altera UniPHY
Code: [Select]
*** Build Summary ***
Total logic elements : 7,290 / 49,760 ( 15 % )
    Total combinational functions : 6,391 / 49,760 ( 13 % )
    Dedicated logic registers : 3,779 / 49,760 ( 8 % )
Total memory bits : 14,304 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 65 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1011776
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000f700d

Dave's Bridge + BHG PHY/PLL
Code: [Select]
*** Build Summary ***
Total logic elements : 6,241 / 49,760 ( 13 % )
    Total combinational functions : 3,264 / 49,760 ( 7 % )
    Dedicated logic registers : 4,974 / 49,760 ( 10 % )
Total memory bits : 5,792 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 63 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1030280
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000fb950

Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+
 
The following users thanked this post: BrianHG

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #140 on: April 09, 2022, 04:17:39 am »
Hey,
After developing my Avalon bridge to wrap your PHY+PLL, I decided some benchmarks were in order to compare against the Altera UniPHY IP.

All results below were obtained with my synthesizable Avalon memory tester that writes the entire DDR3 with random data then verifies it. It was configured with a 64-bit data bus and max burst of 256 to match what the UniPHY IP wanted. The memory tester is controlled via a separate JTAG-Avalon IP that measures how long the transaction takes with a TCL script. Both DDR3 instances were clocked at 300 MHz with a half-rate Avalon interface.

Altera UniPHY
Code: [Select]
*** Build Summary ***
Total logic elements : 7,290 / 49,760 ( 15 % )
    Total combinational functions : 6,391 / 49,760 ( 13 % )
    Dedicated logic registers : 3,779 / 49,760 ( 8 % )
Total memory bits : 14,304 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 65 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1011776
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000f700d

Dave's Bridge + BHG PHY/PLL
Code: [Select]
*** Build Summary ***
Total logic elements : 6,241 / 49,760 ( 13 % )
    Total combinational functions : 3,264 / 49,760 ( 7 % )
    Dedicated logic registers : 4,974 / 49,760 ( 10 % )
Total memory bits : 5,792 / 1,677,312 ( < 1 % )
Embedded Multiplier 9-bit elements : 0 / 288 ( 0 % )
Total PLLs : 1 / 4 ( 25 % )
Total pins : 63 / 360 ( 18 % )

*** Memory Test ***
/devices/10M50DA(.|ES)|10M50DC@1#1-2#Arrow MAX 10 DECA/(link)/JTAG/(110:132 v1 #0)/phy_0/master
Started memory test
Finished memory test
Microseconds recorded: 1030280
Number of passes   : 0x20000000
Number of failures : 0x00000000
Number of ticks    : 0x000fb950

Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+

:-+ Thanks a million for the verification and comparison.

One even bigger plus of my core is it can run @300MHz on a -8.  Altera's Uniphy requires a -6 to run in software mode.  Not to mention I support Cyclone III/IV which are missing differential DQS ports necessary for DDR3.

I'm deciding whether my next move will be to bring my design to Lattice ECP5 fpgas, or clean up my core to version 2.0 to gain a few more percentage performance points as well as further improve the robustness of fitting a design with a timing report all in the black.  I know with Cyclone III & IV, you can achieve 450MHz with a timing report of 100% in the black, but some of the fitter options require a number of tweaks.
« Last Edit: April 09, 2022, 05:07:41 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #141 on: April 09, 2022, 04:26:34 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

Note that my core essentially have a 4 command input fifo and to get read performance, the read results do come out way delayed due to the nature of DDR3 read setup, so you need to stream those read commands to get that perfect continuous unbroken consecutive burst.

When using my Multiport, it handles a lot of this work for you behind the scene if you use my default CMD_XXX parameter features enabled.  If you got the my PHY only working, this shouldn't be a problem as the ports are compatible if you set the data bit width to the same number.

Even with the extra gates, it is still usefull to generate an Avalon interface running with my full controller as the extra ports allow sharing with my multi-window HDMI display engine which will receive commands through the Avalon port as it's display controls are addressable through all the available memory ports simultaneously.
« Last Edit: April 09, 2022, 04:30:59 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #142 on: April 09, 2022, 04:47:31 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core.

Set to 300MHz with my multiport, I'm getting a throughput of ~1100MB/s.  Note that this has been achieved running my video graphics adapter in 1080p mode with 2 translucent 32bit windows superimposed ontop of each other.  Note that my controller take full advantage of large sequential bursts where my VGA controller bursts 4kb at a time per window.  This performance should be matched if you were to generate ALU DSP modules, like FFT and convolution filters which may also burst in large linear chunks.  This is at the edge of my controllers efficiency.  Running the controller at 350MHz and above leaves enough room for other parallel tasks as well.

Note that with or without my Multiport, my PHY Only achieves the same performance.  It is just the consecutive and large throughput nature of video which allows these speeds.
« Last Edit: April 09, 2022, 06:38:50 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #143 on: April 09, 2022, 05:10:49 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)

If you want, you can try the test again swapping my parameter 'BANK_ROW_ORDER'.  Depending on how you are accessing the DDR3, it may help improve throughput.
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #144 on: April 09, 2022, 05:35:56 am »
It got a bit slower with BANK_ROW_ORDER = "BANK_ROW_COL", 1052606 microseconds for the whole RAM. My initial testing was with "ROW_BANK_COL". I'm not sure what should be the more appropriate setting for doing only large upward bursts.

My bridge is setup to stream commands as quickly as the Avalon port can give them and the PHY can take them. The slowest path would be when a read is requested and the FIFO for decoding the returning BL8 is near full. In the future I'll consider branching my memory tester to talk straight with your PHY and check for a speed difference, then naturally run it against the controller too. I'll have to think about this 1100 MB/s figure a bit more.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #145 on: April 09, 2022, 05:38:46 am »
Final throughputs are 506 MB/s for UniPHY, and 497 MB/s for your core. It's entirely possible there is some loss of throughput from my bridge having to buffer commands, so I'd be interested in hearing if you've ever done a similar type of test (how much performance does the controller give over just the phy+pll?)


In half-rate mode using my full controller with the Multiport at 64bit, you should be approximately doubling your throughput.  However, you need to use my default parameters.  This means having a read stack set to 16 and write cache timeout set to 255, ect...

In my HDL comments, when I said if you were making a 'memory testing algorythm', I meant if you were trying to test the ram chip's memory cells, not the integrity of my controller.

My multiport is designed to squeeze together 2 consecutive 64bit chunks into more efficient 128bit packets for my controller.  So long as Avalon can perform back-to-back reads or writes at 150MHz at 64bit, my multiport will do the lifting for you.
« Last Edit: April 09, 2022, 06:15:34 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #146 on: April 09, 2022, 05:57:50 am »
It got a bit slower with BANK_ROW_ORDER = "BANK_ROW_COL", 1052606 microseconds for the whole RAM. My initial testing was with "ROW_BANK_COL". I'm not sure what should be the more appropriate setting for doing only large upward bursts.

With BANK_ROW_COL mode, if you divide your ram into 2/4/8 chunks and with my multiport, you assign for example 1 cpu onto bank 0, video onto 1&2, sound onto bank 3, Having the bank at the top of the address space means as each peripheral accesses it's own region of memory, that bank is remembered and kept open and as other peripherals access their own memory regions, their banks are opened and closed only as necessary.  It almost makes it as if you have 8 separate ram controllers.

This also helps if you are copying or processing huge sequential chunks of ram from an upper bank to a lower one as my ran controller knows to keep the 2 different section's rows simultaneously open during the transfer eliminating all the precharge and activate commands which would normally happen after each BL8.  Now, the precharge and activate only happens when a new row is required in either or both sections of ram you may be copying to and from.


« Last Edit: April 09, 2022, 06:00:18 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #147 on: April 09, 2022, 06:30:32 pm »
I'm going to call you the winner, based on:
  • the UniPHY core often fails timing unless you massage map/fit options into your build and watch the fitter spin for 4x as long
  • your core can run faster than 300 MHz
  • easier to simulate and include in a design
:-+
You forgot the largest point.
IT'S FREE!!! and opensource.
« Last Edit: April 09, 2022, 06:41:09 pm by BrianHG »
 

Offline davemuscle

  • Newbie
  • Posts: 9
  • Country: us
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #148 on: April 11, 2022, 02:44:54 am »
Assuming that all the TOGGLE_* parameters are kept the same, can the controller be used as a drop-in replacement for the PHY+PLL?

I've created a wrapper that allows you to switch between the two, kept all other code constant, and my memory tester locks up on the controller version but not the PHY version. I made sure to use TOGGLE_OUTPUTS = 1, and TOGGLE_INPUTS = '{default:1} for the controller parameters to match my TOGGLE_CONTROLS = 1 for the PHY setup.
 
Since the test never completes, I assume I'm encountering the 'long refresh' that made me switch from the controller to the PHY in the first place. Can you confirm the timing diagram for toggle-mode below? That's what it looks like for the PHY setup, but for the controller setup CMD_busy toggles a cycle earlier, in a combinatorial way. This makes me think there are some differences with the front-end interface.


 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #149 on: April 11, 2022, 03:16:17 am »
Ok, one of the features of my Multiport module is that it was designed to use positive enable  logic and convert it's output to the toggle which my phy module prefers.

Looking at my basic example: BrianHG_DDR3_DECA_Show_1080p_v15_375Mhz_HR/BrianHG_DDR3_DECA_top.sv,
The instantiation of the: 'BrianHG_DDR3_CONTROLLER_v15_top'

(*** Careful, use the V15 versions here...)

The parameter array '.PORT_TOGGLE_INPUT  (PORT_TOGGLE_INPUT),' will allow you to set a selection of which CMD_xxx [ # ] ports into a toggle mode which should operate virtually identical to my core's 'BrianHG_DDR3_PHY_SEQ.sv' in it's toggle mode.  Note that my PHY module's 'USE_TOGGLE_CONTROLS' is no longer accessible.

When using the toggle mode, every toggle can happen every single clock and the command will be accepted every single clock the toggle has taken place.   It would be the same if you disabled the .PORT_TOGGLE_INPUT for that port # and left the CMD_ena high for every clock.  The difference is how the busy and return will work.  In toggle mode, you can keep sending a toggle command every clock as long as the (CMD_busy == CMD_ena).  Every time the CMD_read_ready toggles, you know a new read word and new read vector out is ready.  With toggle disabled, the CMD_read_ready will be high when new valid data is ready, otherwise it is low.

It is at this point where I say if you are using my full controller, you are better off disabling the toggle option and use the plain enable true/false logic.  My original toggle feature was to allow my core to run at for example 200MHz while running my multiport at 100MHz or 50MHz, or 400MHz.  The interface between the 2 with the toggle feature allow for any type of clock frequency crossing without added headaches.  I added the toggle feature to the multiport's CMD_xxx ports as an afterthought in case someone wanted to interface with slower or faster logic, but I have not extensively tested it.
« Last Edit: April 11, 2022, 03:19:48 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #150 on: April 11, 2022, 03:25:40 am »
Note that the purpose of my multiport was to make my DDR3 controller's interface to appear exactly like a 16 port altsyncram FPGA blockram function.  You just need to be attentive to the CMD_busy when read or writing and wait for the CMD_read_ready to see your read request ohhh so may clock cycles later.  So, to get the full read performance, you need to remember to post a bunch of reads ahead of time.  My CMD_read vector in/out ports does some lifting offering you a means of delineating a destination for each posted read command.
« Last Edit: April 11, 2022, 03:30:52 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #151 on: May 23, 2022, 07:36:50 pm »
Version 1.6 update...

 :palm: After 3 days of work and test compiles, I finally found my Quarter Rate setting bug.  And I flew by it a dozen times...

   I forgot to clock latch in the read_data_valid_toggle signal going from the half-rate clock domain to the quarter rate domain.  With heavy reads, that toggle signal may come in on the first or second half of the Quarter clock's period.  When operating in Half-rate, there is no Quarter rate and that latch always arrives by the next clock, so no problem.  At Quarter rate, usually, with slow memory single reads, or continuous evenly length bursts, that latch signal just happens to always be aligned at the beginning of the Quarter rate clock maintaining proper function.  But with bursts with just the right pacing and an odd length buried inside, like when my multi-window VGA generator has 2 super-imposed windows, one with an odd number of pixels just at the right position, then a read data valid may align itself to the second half of the Quarter-rate clock as the DDR3 controller may have an odd number of half-rate clock cycles until the read_data_valid_toggle.  This causes a loss of synchronization as that read is latched at the wrong time with the wrong vector data used by my multiport commander to know when a read arrives and which port the read data belongs to.

   After 3 days, I thought I had a fundamental architecture problem, but no...  :phew:  However, along the way, I added a few new features and further improved the .sdc constraints rendering a further improvement in achieving a high level FMAX.

Full proper release DDR3 v1.6 with VGA controller will come in a day after I clean up my mess.
« Last Edit: May 23, 2022, 07:41:54 pm by BrianHG »
 
The following users thanked this post: voltsandjolts

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #152 on: May 24, 2022, 12:15:12 am »
Arrrrgggg, ok, for my changes, once I removed my watch dog timer, it works fine for 30 seconds or so, but still seizes up.  Still superior to instantly seizing up, or garbage reads, but there must be another signal somewhere interrupting the system.  It also wont seize up if I use just the video channel, but start the RS232 debugger and it eventually craps out.  Yet in half-rate, everything runs AOK.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #153 on: June 11, 2022, 02:33:29 am »
Release V1.6 Demo .sof programming files of DECA BrianHG_DDR3_Controller v1.6 and multi-window BrianHG_GFX_VGA_Window_System v1.6 for Arrow DECA eval board.

  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D
  >:D  500MHz/1GTPS! with 2x32bit 1080p60 video layers.    >:D
  >:D  That's >1100 megabytes/sec just to show the image,  >:D
  >:D  never mind simultaneously drawing all those ellipses.  >:D
  >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D >:D

Just open your JTAG programmer and add one of the following 2 files:
1. 'BrianHG_DDR3_DECA_GFX_DEMO_v16_1_LAYER_500MHzQR.sof'
        -> Replaces and replicates the original Ellipse Generator now using the new BrianHG_GFX_VGA_Window_System.

2. 'BrianHG_DDR3_DECA_GFX_DEMO_v16_2_LAYERS_500MHzQR.sof'
        -> Improved original Ellipse Generator demo where a second translucent superimposed video window scrolls at different coordinates and speeds generating an LSD trip visual effect.  (Note that the scroll switch needs to be turned on long enough at least bounce off 1 window edge to view effect.)

Check-on the 'Program/Configure' and click 'Start' to program.
The DECA's HDMI should output a 1080p image.


IMPORTANT NOTE:
If the picture is still or scrolling noise, just press buttons 0 or 1, or flip 'Switch 0' to enable drawing ellipses.  You just powered up the demo in frozen picture mode and you are looking at the powered up random blank memory.


Switch 0 = Enable/Disable drawing of ellipses.
Switch 1 = Enable/Disable screen scrolling.
Button 0 = Draw data from random noise generator.
Button 1 = Draw color image data from a binary counter.

Full Github v1.6 source code.
https://github.com/BrianHGinc/BrianHG-DDR3-Controller
« Last Edit: June 13, 2022, 08:46:23 pm by BrianHG »
 
The following users thanked this post: ale500

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.50.
« Reply #154 on: June 12, 2022, 03:23:46 am »
New VGA video system demo configured for up to 16 window layers driven by my RS232_Debugger.  :box:

Code: [Select]
//****************************************************************************************************************
//
// Demo documentation.
//
// BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS which test runs the BrianHG_DDR3_CONTROLLER_top_v16
// DDR3 controller with the BrianHG_GFX_VGA_Window_System_DDR3_REGS.
//
// Version 1.60, June 9, 2022.
//
// Written by Brian Guralnick.
// For public use.
// Leave questions in the [url]https://www.eevblog.com/forum/fpga/brianhg_ddr3_controller-open-source-ddr3-controller/[/url]
//
//****************************************************************************************************************

A pre-built DECA compatible programming .sof file : BrianHG_DDR3_DECA_GFX_HWREGS_v16_16_LAYERS.sof should be used for this demo.

This demo requires a PC with a RS232 <-> 3.3v LVTTL converter and the use of my RS232 debugger to live edit window controls.
All necessary files are found in this project's sub-folder 'RS232_debugger'.

Wiring: On DECA PCB, connector P8.
    P8-Pin 2 - GND            <-> PC GND
    P8-Pin 4 - GPIO0_D[1] out --> PC LVTTL RXD
    P8-Pin 6 - GPIO0_D[3] in  <-- PC LVTTL TXD

See Readme.txt file in the .zip for full documentation.
See 'BrianHG_GFX_VGA_Window_System.txt' for address controls.
See 'BrianHG_GFX_VGA_Window_System.pdf' for system block diagram.
« Last Edit: June 12, 2022, 03:26:54 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #155 on: June 12, 2022, 07:32:56 pm »
As of today, full Github v1.6 source code has now been released:

https://github.com/BrianHGinc/BrianHG-DDR3-Controller

Expect 300Mhz-350Mhz builds to always meet timing requirements.  (Even with a slow fabric -8.)
Expect 400Mhz builds to usually meet timing requirements with the occasional need to massage some compiler/fitter settings to aid in meeting timing requirements.  (Still easier than making Altera's paid Uniphy DDR3 controller achieve only 300Mhz.)
Expect Cyclone III/IV -6 can meet timing requirements at 450MHz.  Even 500MHz is possible with heavy massaging of fitter setting.

Expect my BrianHG_GFX_VGA_Window_System to run up to 32 windows in 480p, 16 in 720p, 8 in 1080p.
Also expect my BrianHG_GFX_VGA_Window_System to automatically simplify down to minimal gates when lowering layers down to 1, disabling palette / font/tile modes and hard-wiring numerous video/window setting.
« Last Edit: June 13, 2022, 02:18:18 am by BrianHG »
 
The following users thanked this post: nockieboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #156 on: August 22, 2022, 06:36:18 pm »
I just ordered one of these (apparently new) Gowin 20k boards. I'm going to see if I can port your design across, which is going to be ... challenging ... because I know little about DDR3 and nothing at all about Gowin, but if I can get it working, having a pre-built DDR3/FPGA board with that many GPIO for that price in an easily-embeddable DIMM is going to be kinda useful :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #157 on: August 22, 2022, 07:11:26 pm »
I just ordered one of these (apparently new) Gowin 20k boards. I'm going to see if I can port your design across, which is going to be ... challenging ... because I know little about DDR3 and nothing at all about Gowin, but if I can get it working, having a pre-built DDR3/FPGA board with that many GPIO for that price in an easily-embeddable DIMM is going to be kinda useful :)

Begin with nothing more than implementing my simple 'BrianHG_DDR3_PHY_SEQ' controller.  It is half the size and once you got that working, implementing the multiport with everything else will be much easier as they have no special code and are only needed if you need 16 read/write ports.

So long as Gowin can deal with System Verilog, the only 2 HDL modules you will have to adapt will be:
BrianHG_DDR3_PLL.sv
BrianHG_DDR3_IO_PORT_ALTERA.sv

If Gowin uses or can use Modelsim, this will be a great help as I have already set all this up.  I still do recommend downloading Altera/Intel's free Quartus's v20.1 (not v21.x) and at least install that Modelsim as it has Altera's PLL and DDR_IO libraries so you can see what the original supposed to look like as I created setup_xxx.do script files which simulate everything individually.  Don't worry, you may have multiple versions of Modelsim in you system at the same time.

This link: https://github.com/BrianHGinc/BrianHG-DDR3-Controller/tree/main/BrianHG_DDR3_DECA_PHY_SEQ_only_v16

Contains my simple stand alone 'BrianHG_DDR3_PHY_SEQ' controller wired to my RS232 debugger allowing you to view and edit the DDR3 memory contents from a PC with a LVTTL <-> RS232 com port, and it will report DDR3 tuning status and you can leave it running while updating you Gowin firmware.  Documentation is on my Github's read-me.

Step #1 would be to see if you can simulate a Gowin PLL with the phase step up and phase step down controls my DDR3 controller requires.  This means concentrating exclusively on replicating nothing more than my 'BrianHG_DDR3_PLL.sv' and it's stand alone testbench with it's 4 clock outputs.  If you are lucky, Gowin should provide their own PLL library functions in their own simulator.
« Last Edit: August 22, 2022, 07:39:35 pm by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #158 on: August 23, 2022, 01:44:36 am »
Thanks Brian,

Quote
Step #1 would be to see if you can simulate a Gowin PLL with the phase step up and phase step down controls my DDR3 controller requires.  This means concentrating exclusively on replicating nothing more than my 'BrianHG_DDR3_PLL.sv' and it's stand alone testbench with it's 4 clock outputs.  If you are lucky, Gowin should provide their own PLL library functions in their own simulator.

So this just got harder..

'Gowin' and 'Simulator' seem to be words that do not exist in the same sentence, other than in sentences where "do not have one" also appear... And this is in the "licensed" (ie: please send us your email) version, not the 'educational'
one.

There used to be an option in the IDE (I've seen screenshots!) where you could call into a 3rd party simulator, but that seems to have been removed (why ?!) There is an option to generate .vo "post-PnR simulation model files" in the options, but other than that there's sweet Fanny Adams to help out.

Clearly it'd be useful to have the same simulation environment as you're using, but I don't have Modelsim, and I can't find out how expensive it is... I sent off an email to the 'contact us' page at Siemens, but I have a bad feeling about software that doesn't advertise its price *anywhere*... Even my most recent eye-wateringly-expensive purchase (Altium) had advertised prices...

On the upside, it looks as though System Verilog (2017) is supported. And the boards I bought have shipped straight away, which is nice. It's been many (many!) moons since I could claim a student version of anything (and anyway it looks as though they've removed the free student edition for now), so on the down-side, I may be using icarus verilog or something similar...


 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #159 on: August 23, 2022, 02:21:30 am »
Clearly it'd be useful to have the same simulation environment as you're using, but I don't have Modelsim, and I can't find out how expensive it is... I sent off an email to the 'contact us' page at Siemens, but I have a bad feeling about software that doesn't advertise its price *anywhere*... Even my most recent eye-wateringly-expensive purchase (Altium) had advertised prices...

Modelsim is free.  It comes with Quartus 20.1 free web version and earlier.  (Includes Quartus Megafunction Libraries.)
It also comes with Lattice Diamond Design Software 3.12 and later.  (This version includes Lattice's library functions.)
I do not know about Gowin.

     If you do not include the appropriate -L xxxxx  on the compile line to include the vendor's library functions, then it's a basic Modelsim with over 90% functionality.  There are only 1 or 2 advanced post generation function views which arent available unless you buy the full Modelsim, but these are available in Quartus and Lattice itself.

     I personally begun to completely develop in Modelsim alone and then move my design to the FPGA tools as Modelsim's compile/build time is usually within a second.

Try googling:
HDL modelsim gowin fpga
and
HDL Active-HDL gowin fpga

Active-HDL is a somewhat close to but a cheaper experience than Modelsim.

Take a look at my https://github.com/BrianHGinc/SystemVerilog-TestBench-BPM-picture-generator as I made it work for both simulators.  Only difference is in the setup-xxx.do files.

You can look here:
https://www.intel.com/content/www/us/en/software-kit/661015/intel-quartus-prime-standard-edition-design-software-version-20-1-for-windows.html and click on 'Individual Files' where you will see Modelsim as a stand-alone download.

But the listed modelsim there only has the added Altera/Intel libraries, yet almost everything else works.
« Last Edit: August 23, 2022, 02:26:29 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #160 on: August 23, 2022, 02:28:06 am »
A-ha! Ok, thanks :)

I knew about the bundled versions, but I’d assumed they were vendor-locked in some way. Well, that makes things simpler, at least to start with :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #161 on: August 23, 2022, 02:36:18 am »
If you can get Gowin to generate simulation libraries for their functions, like PLL and DDR IO buffers, you may be able to include those with Altera's Modelsim as a work around hack.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #162 on: August 23, 2022, 02:47:31 am »
A-ha! Ok, thanks :)

I knew about the bundled versions, but I’d assumed they were vendor-locked in some way. Well, that makes things simpler, at least to start with :)
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #163 on: August 23, 2022, 03:51:36 am »
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.

That's no problem, trust me  :-DD To me, 'vi' is a luxury, it was 'ed' when I started :) command-line tools are A-OK with me 
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4112
  • Country: nz
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #164 on: August 23, 2022, 04:14:19 am »
It's not that they are vendor locked, as you need to learn the command line stuff or use the menus instead of relying on the FPGA tool to 'auto setup and run your simulation'.

That's no problem, trust me  :-DD To me, 'vi' is a luxury, it was 'ed' when I started :) command-line tools are A-OK with me

Ed? Pffft.  I used to work on a system where the best editor was an even more obscure and limited variant of TECO called SPEED.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #165 on: August 23, 2022, 05:02:36 am »
There was one time on a DECstation that had corrupted its usr partition, when I had to use only what was in /boot to get it to change how it booted. vi was in /usr/... :( head, cat and tail were in /bin... Took a while to get the boot config files how I wanted them. Worked in the end though :)

But I digress - sorry Brian, I'll keep it on-topic from now :)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #166 on: August 27, 2022, 06:46:27 pm »
Brian, am I reading this correctly ? In your BrianHG_DDR3_PLL.sv code, it looks like the delay-shift for the Altera PLL's can take any number between 0 and 4000ps. Is that correct ?

Because looking at the Gowin version of what you can do with the PLL, there are 16 possible phase-tuning parameters (step of 22.5°), and another 16 possible delay parameters (step of 0.125ns):

Code: [Select]
///////////////////////////////////////////////////////////////////////////////
// Phase control values
// --------------------
// 0000 0°          0001 22.5°          0010 45°            0011 67.5°
// 0100 90°         0101 112.5°         0110 135°           0111 157.5°
// 1000 180°        1001 202.5°         1010 225°           1011 247.5°
// 1100 270°        1101 292.5°         1110 315°           1111 337.5°
//
// Duty cycle values
// -----------------
// 0010 2/16        0011 3/16           0100 4/16           0101 5/16
// 0110 6/16        0111 7/16           1000 8/16           1001 9/16
// 1010 10/16       1011 11/16          1100 12/16          1101 13/16
// 1110 14/16
//
// Delay parameters (below are in manual, looks like others work)
// ----------------
// 0111 0.875ns     1011 1.375ns        1101 1.625ns        1110 1.75ns
// 1111 1.875ns
//
///////////////////////////////////////////////////////////////////////////////

From what I can see, the duty-cycle is there to indicate when the falling edge of the signal should be in the waveform, which is dependent on the phase, so for a 50/50 duty cycle, I ought to just add 8 to the phase and take the modulus in base 16

There are also 'fine-tuning' (±50ps, ±100ps, ±150ps) options for the phased-clock output, but those are parameter-based, not dynamically tunable.

But in any event, my number of phase-discriminating steps is going to be a *lot* less than the Altera one if it can do 4000 of them, hopefully it'll still be useful enough...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #167 on: August 27, 2022, 07:57:36 pm »
For the fixed integer '270' deg, I specify the integer parameter DDR3_WDQ_PHASE in 'degrees'.  To tell Altera how to adjust it's PLL, I needed to convert it into PS.  If your rPLL accepts the integer 270, use that one.  It is 50:50.  Ignore the Cyclone V PLL as it is a mess.  Better look at the CycloneIV/MAX10 PLL as it has fewer controls.

As for the read clock user phase tunable output, it is 50:50.  When the system begins or reset is sent, it defaults to '0' degree.  The Altera PLL will accept 16 tuning steps before a full 360deg round trip has been made.  It stays at 50:50.  (looks the same as Gowin)

The parameter DDR3_RDQ_PHASE is actually never used.  I always have it set to '0'.

The 'trick pll' should never be used for Gowin.  I use it to bypass a cap in one of Altera's DDR_IO buffers.

The localparam 'DDR3_WDQ_PHASE_ps' is a translation of the user set DDR3_WDQ_PHASE  into a picosecond delay.

Ignore the Altera dummy string as it circumvents a bug in the Altera alt_pll functions where they wanted 'string' inputs.
« Last Edit: August 27, 2022, 08:07:44 pm by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #168 on: August 27, 2022, 10:17:38 pm »
Ok, thanks Brian, I have both yours and my signals pretty much matching up initially. It looks as though the Gowin PLL takes longer to initialize. The internal 100MHz clock is slightly different - it's a cycle delayed (or ahead, I guess) of the Altera one but it's still in sync. Not sure if that's important because you've presumably got clock-crossing controls in place. Could alter some of the timing of those signals, though.


Presumably the phase_step signal is edge-triggered rather than level-triggered ? From the signaling, it certainly looks that way (see below). My WIP PLL module is going nuts at the moment, changing its phase on every (phase_sclk == 1'b1 and phase_step == 1'b1)...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #169 on: August 27, 2022, 10:42:59 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)
What is important is the direction and that there are 16 steps for a full rotation.

When my DDR3 controller applies a step, it waits for ~ 1us for the PLL output to adapt, hence I basically ignore the 'phase_done' signal by just waiting a crap load of time.

As for the clock out, the phases once set better stay where they belong, even in simulation.  Otherwise, the sim will fail over time.

I am not sure how your rPLL clock output cannot be exactly 400MHz if you set your reference clock divider and multiplier correctly.  For example, when I generate the requested:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

and synth the clock:
Code: [Select]
localparam       period  = 500000000/CLK_KHZ_IN ;

always #period                  CLK_IN = !CLK_IN; // create source clock oscillator

All the factors in the equations and delays hit dead on whole numbers.

In modelsim under menu 'wave/wave preferences / Grid & Timescale', if I set the grid preiod to a manual 2500ps and zoom in & scroll in the waveform output, you can see the 400MHz stays locked to the 50MHz source.


Do not worry about the initial PLL setup time.  I wait plenty of time for the PLL and other stuff to synchronize before running the system.  Also verify that Gowin provides a PLL locked signal out.  My DDR3 is held in reset during power-up until the locked signal is ready, then, there is a ton of other delays to accommodate the DDR3 startup sequence.
« Last Edit: August 27, 2022, 10:44:38 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #170 on: August 27, 2022, 11:08:41 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)

Yep, I'm going to have to put some logic in there to wait for it to go low again - there is no "step" functionality in the Gowin PLLs, you just set the value for the phase directly - so I'm wrapping some logic around the step/updn signals to mimic the same interface.

What is important is the direction and that there are 16 steps for a full rotation.

I think you're actually running at 8 steps per full period. After four calls to phase_step, the DDR3_CLK_RDQ signal is 180° out of phase with the DDR3_CLK signal. I'm adding 2 to my 'out-of-16' phase value to match the Altera original (currently simulating the Cyclone V variant).

When my DDR3 controller applies a step, it waits for ~ 1us for the PLL output to adapt, hence I basically ignore the 'phase_done' signal by just waiting a crap load of time.

That's good, because I don't have a 'phase done' :) I was thinking I'd synthesize an interface to one but if you don't use it, that's easier :)

As for the clock out, the phases once set better stay where they belong, even in simulation.  Otherwise, the sim will fail over time.

I am not sure how your rPLL clock output cannot be exactly 400MHz if you set your reference clock divider and multiplier correctly.  For example, when I generate the requested:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

and synth the clock:
Code: [Select]
localparam       period  = 500000000/CLK_KHZ_IN ;

always #period                  CLK_IN = !CLK_IN; // create source clock oscillator

All the factors in the equations and delays hit dead on whole numbers.

In modelsim under menu 'wave/wave preferences / Grid & Timescale', if I set the grid preiod to a manual 2500ps and zoom in & scroll in the waveform output, you can see the 400MHz stays locked to the 50MHz source.
Imprecise language here - the phases stay locked on, and the waves are exactly the correct periods/frequencies. What I meant was (if you look at 'initial-locked-state.png' two posts up), you can see

- the cursor is at 'gowin_clocks_locked' so we're just acheived stability.
- the red (DDR3_CLK and clk_ddrMain) signals are locked in sync
- same for the cyan DDR-write clocks (DDR3_CLK_WDQ and clk_ddrWrite)
- same for the blue DDR-read clocks (DDR3_CLK_RDQ and clk_ddrRead)
- same for the green client-interface clock (DDR3_CLK_50 and clk_ddrClient)

- the DDR3_CLK_25 and clk_ddrMgmt clocks both do have the same period/frequency, however the Gowin one (clk_ddrMgmt) is presenting a low->high (and high->low of course) transition one full DDR3_CLK period later than DDR3_CLK_25.

Do not worry about the initial PLL setup time.  I wait plenty of time for the PLL and other stuff to synchronize before running the system.  Also verify that Gowin provides a PLL locked signal out.  My DDR3 is held in reset during power-up until the locked signal is ready, then, there is a ton of other delays to accommodate the DDR3 startup sequence.

Sure, in the test bench the stop condition is now
Code: [Select]
always @(PLL_LOCKED & gowin_clocks_locked) #(endtime) $stop;            // Wait for both PLLs to start, then run the simulation until 1ms has been reached.

I figured there'd be a wait-for-the-signal to account any variation even in the Altera PLLs.

Cheers :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #171 on: August 27, 2022, 11:16:39 pm »
Edge or level triggered doesn't matter.  As long as a single step is made anytime after the step goes high.  (That is still a single step even if the step is held high.)

Yep, I'm going to have to put some logic in there to wait for it to go low again - there is no "step" functionality in the Gowin PLLs, you just set the value for the phase directly - so I'm wrapping some logic around the step/updn signals to mimic the same interface.


It will then be a 4 bit up-down counter.
To decode the step:

always@(posedge clk_in) begin
step_dly<= step;
if (step && !step_dly) begin
     if (up_dn) phase_pos <= phase_pos + 4'd1;
     else phase_pos <= phase_pos - 4'd1;
end

I bet that will do what you want, though I may have the +/- backwards.

A cleaner method would be to have 2 step_dly's, and in the if()  check for a true step_dly1 and a false step_dly2.
This way, the unknown clock domain source 'step' is first transferred to the 'clk_in' domain for better metastability.
« Last Edit: August 27, 2022, 11:23:31 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #172 on: August 28, 2022, 12:14:59 am »
You should be able to insert a parameter value 'Gowin' to the 'FPGA_VENDOR' parameter and add your PLL to my 'BrianHG_DDR3_PLL.sv'.

Then temporarily edit my 'BrianHG_DDR3_PHY_SEQ_v16_tb.sv' so that just the PLL's FPGA_VENDOR is set to Gowin.  The execute my 'setup_phy_v16.do' script to see if the Gowin PLL will initialize the Mircon DDR3 model.
If you need to touch-up your code, to re-compile, just execute 'run_phy_v16.do'.  (You will need include Gowin's primitive lib to the setup_xxx.do)

If that works, then you can call that part a success.  Next, the DDR_IO buffers.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #173 on: August 28, 2022, 12:28:09 am »
Yep, my logic looked pretty similar to your counter-style, if a little less concise :) ...
Code: [Select]
    reg     [3:0]       duty;               // Duty cycle, 50/50 = 8 + phase
    reg     [3:0]       phase;              // Phase to present to PLL
    reg                 lastPhaseStep;      // Look for the level-transition

    always @ (posedge clk) 
        begin
            if (rst)
                begin
                    phase           <= 4'b0;
                    duty            <= 4'b0;
                    lastPhaseStep   <= 1'b0;
                end
            else   
                begin
                    if ((phase_step == 1'b1) && (lastPhaseStep == 1'b0))
                        if (phase_updn == 1'b0)
                            begin
                                phase   <= phase - 4'h2;
                                duty    <= phase + 4'h6;    // current phase + (8-2)
                            end
                        else
                            begin
                                phase   <= phase + 4'h2;
                                duty    <= phase + 4'hA;    // current phase + (8+2)

                            end

                    lastPhaseStep <= phase_step;
                end
        end

I did originally have 'phaseEdge' as a 3-bit detector, and I was checking for 'phaseEdge' being 3'b011 but the clk this is synced off is the slow clkIn (~27MHz in the eventual design) and three clocks took an eternity. Now that I know you wait for 1µs, I might just put that back :)

Looking at it, I can make the phase a 3-bit counter as well, since the last bit is always 0.

Anyway, it now matches up perfectly with the signals from your BrianHG_DDR3_PLL, apart from that phase offset between your DDR3_CLK_25 and my clk_ddrMgmt clock - and I don't *think* that'll be an issue. I'd have to burn another PLL to get the 180° phase-delay, so unless it turns out to be important, I'm going to leave it.

You should be able to insert a parameter value 'Gowin' to the 'FPGA_VENDOR' parameter and add your PLL to my 'BrianHG_DDR3_PLL.sv'.

Then temporarily edit my 'BrianHG_DDR3_PHY_SEQ_v16_tb.sv' so that just the PLL's FPGA_VENDOR is set to Gowin.  The execute my 'setup_phy_v16.do' script to see if the Gowin PLL will initialize the Mircon DDR3 model.
If you need to touch-up your code, to re-compile, just execute 'run_phy_v16.do'.  (You will need include Gowin's primitive lib to the setup_xxx.do)

If that works, then you can call that part a success.  Next, the DDR_IO buffers.

I might try and get around to that this evening, else tomorrow. Life, I'm reliably informed by 'er indoors doesn't *entirely* revolve around playing with electronics :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #174 on: August 28, 2022, 12:42:33 am »
My mistake, here is a snapshot of the tuning:



As you can see the time bars, I pulse the phase_step for 20ns.
Then I wait 100ns before analyzing the read data meaning the PLL's read clock needs to be ready in around 90ns.

These inherent delays can be adjusted in my DDR3 initialization sequence, though, 100ns is plenty.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #175 on: August 28, 2022, 12:57:52 am »
Here is the true operation tuning time when parameter 'SKIP_PUP_TIMER = 0'.  This is normal operation, however, to simulate this, it takes a few minutes as Modelsim needs to simulate 0.2seconds of real time since we now go through the entire required power-up delays listed in the DDR3 specifications.



As you can see here, the 'phase_step' is pulsed for 2000ns.  The time after that is 3000ns before the read check takes place.  So, if you step at the rise of 'phase_step, your PLL needs to have a new valid output phase within 5000ns of the step.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #176 on: August 28, 2022, 01:02:46 am »
As it stands it’s running off CLK_IN, and 90ns @ ~25MHz is approx 2 clocks. Since the PLLs are going to be well up and running before the phase change will occur, I could slave the logic off the 100MHz clock instead, then there’s plenty of time :)

And then I saw your 2nd post… 5000ns is indeed plenty :)
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #177 on: August 28, 2022, 01:47:04 am »
My tuning section clk is actually hard tied the DDR3_CK / 4 output.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #178 on: August 28, 2022, 05:01:41 am »
So I'm not 100% sure, but I think this looks good ?

I can see the WRITE commands, eg:

Code: [Select]
WRITE @ DQS= bank = 7 row = 0004 col = 00000000 data = 8888)
... corresponding to later READ data, eg:

Code: [Select]
READ @ DQS= bank = 7 row = 0004 col = 00000000 data = 8888
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #179 on: August 28, 2022, 05:13:02 am »
Looks ok.  If the tuning wasn't doing anything, the sim would get stuck in an infinite loop during initialization.  No reads or writes would take place.

IE: Break the phase step signal to the PLL, make the PLL's input set to '0' and see if the sim will run.

Also, all you had to do was compare the output when running with 'FPGA_VENDOR = Altera' mode.

To be absolutely sure, I need to inspect the waveform during tuning.

If this checks out, then all you have left is the DDR_IO primitive to be inserted into my IO port file.
Your best bet is to read my code's Cyclone DDR_IO primitive and replicate in Gowin.  So long as you can specify the data input read clock and data output write clock, you should be able to get this to work.
(Oh, and dont forget proper .SDC file restraints.  The values I have may be different for Gowin on the output side as the delays have been tuned to maximize Altera's FMAX.)
« Last Edit: August 28, 2022, 05:16:59 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #180 on: August 28, 2022, 05:41:13 am »
Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #181 on: August 28, 2022, 05:57:37 am »
Looks ok.  If the tuning wasn't doing anything, the sim would get stuck in an infinite loop during initialization.  No reads or writes would take place.

IE: Break the phase step signal to the PLL, make the PLL's input set to '0' and see if the sim will run.

Yep, if I set ".phase_step(1'b0)" in the BrianHG_DDR3_PLL (...) instantiation, I get an endless series of read data 0, read data 1 lines. Presumably the tuning.

Also, all you had to do was compare the output when running with 'FPGA_VENDOR = Altera' mode.

That's ... totally fair.


To be absolutely sure, I need to inspect the waveform during tuning.

So tomorrow, I'll fork your repo on GitHub into my 'Spaced-cowboy' GitHub account, and upload the code that I have here. Then it's a 'git clone' away.

If this checks out, then all you have left is the DDR_IO primitive to be inserted into my IO port file.
Your best bet is to read my code's Cyclone DDR_IO primitive and replicate in Gowin.  So long as you can specify the data input read clock and data output write clock, you should be able to get this to work.

This is at line 358 of BrianHG_DDR3_IO_PORT_ALTERA.sv, right ?

The DDR facilities of the Gowin parts seem a bit primitive compared to the Altera ones, and ODDR is a bit strange - there's a TX input that I haven't figured out yet - might be a clock-enable signal perhaps. The PDF documentation isn't what I'd call 'great' (see attached), but there's probably some use of DDR outputs in their example code that I can go look at and figure out what the parameters do.

(Oh, and dont forget proper .SDC file restraints.  The values I have may be different for Gowin on the output side as the delays have been tuned to maximize Altera's FMAX.)

And here do the dragons dwell....

Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.

I'll give that a go tomorrow. Off to bed :)

Thanks for all the help Brian, this is actually going a lot smoother than I thought it would...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #182 on: August 28, 2022, 06:03:22 am »
Actually maybe ODDR doubles as IODDR, and TX sets the direction, since Q1 can connect to an IOBUF (which is bidirectional).
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #183 on: August 28, 2022, 06:12:45 am »
Looks like the TX is the output enable.
Then you have the Q output and the Q for the OE.

something like :

assign pin_name = Q1 ? Q0 : 1'bz ;

Where as the DDR input primitive would also wire to 'pin_name'.

Anyways, look at my 'BrianHG_DDR3_IO_PORT_ALTERA.sv'.

Remove lines 228 through 358 while concentrating on lines 360 through 461.
(note that lines 228 through 358 use HW DIFFERENTIAL drivers for the DDR3_CK and DDR3 DQS pins.  360 through 461 uses a software emulated dumb differential driver.)

Altera documentation:
https://www.intel.com/programmable/technical-pdfs/683148.pdf
« Last Edit: August 28, 2022, 06:19:59 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #184 on: August 28, 2022, 05:27:35 pm »
To be absolutely sure, I need to inspect the waveform during tuning.

Ok, so the repo is forked and the code changes uploaded, the fork is here

Another test would be to invert the phase up/down input and see if the startup matches the Altera startup.

Did that this morning, and all the phases seem to align between the Gowin and Altera PLL instances :)

This video has me selecting phase_done (rising edge) as a convenient click-to-move-to-point, and one clock subsequent to that signal going high is when the Altera PLL has shifted to the new phase.
  • The dark blue trace is the DDR3_CLK_RDQ clock, and it aligns with the dark blue trace from the Gowin PLLs below (clk_ddrRead).
  • The 'phase' counter above is actually from the Gowin module and represents what is sent to the Gowin PLL as the phase to use.
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #185 on: August 28, 2022, 08:03:08 pm »
To be absolutely sure, I need to inspect the waveform during tuning.

Ok, so the repo is forked and the code changes uploaded, the fork is here


That's nice and all, but, why did you use Gowin's black box ' pll_ddr1 & 2 ' instead of directly instantiating 'rPLL' yourself directly?
Also, why didn't you feed into the rPLL my 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' ?
Without those 3, you cannot change the frequency of my DDR3 controller.

(Cant see your video, it is a privileged page.)

*** Additional: Did you try a power-up sequence with the parameter 'SKIP_PUP_TIMER = 0'?  In this configuration, I pulse the phase_step really slow, and, there should be no DDR3 RST_N & CKE warnings from Micron's DDR3 model.  Note that this will take a few minutes to run and it will appear to be cycling through the configuration non-stop at the end before the read/write tests begin.
« Last Edit: August 28, 2022, 09:04:31 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #186 on: August 28, 2022, 09:46:05 pm »
Thanks for all the help Brian, this is actually going a lot smoother than I thought it would...

Well, I designed my controller to work on bottom end 20 year old FPGAs, those without IO delay functions, relying 100% on PLL phase capabilities, plus, multiple configurable proper end-to-end simulations test-benchs.  It's as easy as it can get with the negative that you cannot stack too many DDR3s in parallel without dropping the clock rate, or taking care about your trace lengths.  My guess is that I would feel safe with maximum a single laptop DDR3 sodim module with 4 ram chips, 64 bit DDR3 bus running at 400MHz, 800mtps.  2 chips and I would not go beyond 600MHz, 1.2gtps.

Though, I do not know Gowin's IO port's capabilities.
« Last Edit: August 28, 2022, 09:48:13 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #187 on: August 28, 2022, 09:58:12 pm »
There's a couple of reasons why I was using the drop-in IP from Gowin

  • I didn't think it made much difference - I wasn't aware of the significance of 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' and that you were trying to make that a cross-architecture feature.

    You'd also said "Do not worry about how I managed/strong armed Altera's PLLs into doing what I like from provided parameters.  With Gowin, it is ok if you are left with no choice but to provide only a selection of source clocks and output clocks per-generated by Gowin's IP tool" and I thought those parameters where part of that setup process.

  • It was also kind of handy, when I was re-generating the PLLs that it would just overwrite the same files and I'm done - just hit up-arrow/return in modelsim to see any change.

That said, now that I'm happier with the end-result, it's a reasonable ask. I'll see if I can figure out how to do it.

The video not being visible is weird - it's fine here on the internal network, my ISP must be doing something to filter external access because there's no protection on the page at all.I've converted it into a GIF, so perhaps it'll work when uploaded instead.

[edit] No such luck. I've loaded it onto Imgur, and if this doesn't work, sod it.

I had tried a sequence with  'SKIP_PUP_TIMER = 0', I did it last night, but maybe because it was late, I looked in BrianHG_DDR3_PHY_SEQ_v16.sv not BrianHG_DDR3_PHY_SEQ_v16_tb.sv for the parameter  :palm: ... I thought it was running faster than you'd mentioned.

So, running it with 'SKIP_PUP_TIMER = 0' for real this time, it does show a couple of warnings regarding RST_N - which presumably it oughtn't. The log is attached (as a .zip) but the offending lines are:

Code: [Select]
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.reset at time 7697500.0 ps WARNING: 200 us is required before RST_N goes inactive.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task at time 8698750.0 ps WARNING: 500 us is required after RST_N goes inactive before CKE goes active.

So, something else to look at there.
« Last Edit: August 28, 2022, 10:13:04 pm by SpacedCowboy »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #188 on: August 28, 2022, 10:27:57 pm »
There's a couple of reasons why I was using the drop-in IP from Gowin

  • I didn't think it made much difference - I wasn't aware of the significance of 'CLK_KHZ_IN/1000', 'CLK_IN_MULT-1' and 'CLK_IN_DIV-1' and that you were trying to make that a cross-architecture feature.

    You'd also said "Do not worry about how I managed/strong armed Altera's PLLs into doing what I like from provided parameters.  With Gowin, it is ok if you are left with no choice but to provide only a selection of source clocks and output clocks per-generated by Gowin's IP tool" and I thought those parameters where part of that setup process.

  • It was also kind of handy, when I was re-generating the PLLs that it would just overwrite the same files and I'm done - just hit up-arrow/return in modelsim to see any change.

That said, now that I'm happier with the end-result, it's a reasonable ask. I'll see if I can figure out how to do it.
Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)

Yes, I did say begin with the simple fixed 400MHz pll, however, you now must tune Gowin's PLL to my clocks and multipliers.  The rest of my DDR3 controller uses these figures to tune all the DDR3 delays, like the number of clocks between RAC/CAS/ refresh clock cycle timing based on selected parameter 'DDR3_SPEED_GRADE'.  All my power-up timers also tune themselves based these PLL settings as well.

Quote
The video not being visible is weird - it's fine here on the internal network, my ISP must be doing something to filter external access because there's no protection on the page at all. I've converted it into a GIF, so perhaps it'll work when uploaded instead.

I had tried a sequence with  'SKIP_PUP_TIMER = 0', I did it last night, but maybe because it was late, I looked in BrianHG_DDR3_PHY_SEQ_v16.sv not BrianHG_DDR3_PHY_SEQ_v16_tb.sv for the parameter  :palm: ... I thought it was running faster than you'd mentioned.

So, running it with 'SKIP_PUP_TIMER = 0' for real this time, it does show a couple of warnings regarding RST_N - which presumably it oughtn't. The log is attached (as a .zip) but the offending lines are:

Code: [Select]
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.reset at time 7697500.0 ps WARNING: 200 us is required before RST_N goes inactive.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task at time 8698750.0 ps WARNING: 500 us is required after RST_N goes inactive before CKE goes active.


So, something else to look at there.

Looking at your attached .txt file, your sim begins at line 2548.  Reading from there:
Code: [Select]
# restart -force
# Loading sv_std.std
# Loading work.BrianHG_DDR3_PHY_SEQ_v16_tb
# Loading work.BrianHG_DDR3_PLL
# Loading work.BrianHG_DDR3_PHY_SEQ_v16
# Loading work.BrianHG_DDR3_GEN_tCK
# Loading work.BrianHG_DDR3_CMD_SEQUENCER_v16
# Loading work.gowin_ddr_clocking
# Loading work.pll_ddr1
# Loading work.rPLL
# Loading work.pll_ddr2
# Loading work.BrianHG_DDR3_IO_PORT_ALTERA
# ** Warning: (vsim-3017) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Too few port connections. Expected 11, found 10.
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_DDR3_PLL/genblk6/gowin_ddr_clocks
# ** Warning: (vsim-3722) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Missing connection for port 'fdly'.
#
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(450): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/CMD_TXB', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(148): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_VECT_OUT', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(147): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(146): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_RDY_t', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# run -all
# ** Warning: *****************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 616
# ** Warning: *** BrianHG_DDR3_PLL Info ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 617
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 618
# ** Warning: ***      CLK_IN           =    50 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 619
# ** Warning: ***      DDR3_RDQ/WDQ     =   800 MTPS.    ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 620
# ** Warning: ***      DDR3_CLK/RDQ/WDQ =   400 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 621
# ** Warning: ***      DDR3_WDQ_PHASE   =   270 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 622
# ** Warning: *** True DDR3_WDQ_PHASE   =  1875 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 623
# ** Warning: ***      DDR3_RDQ_PHASE   =     0 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 624
# ** Warning: *** True DDR3_RDQ_PHASE   =  0000 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 625
# ** Warning: ***      CMD_CLK          =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 626
# ** Warning: ***      DDR3_CLK_50      =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 627
# ** Warning: ***      DDR3_CLK_25      =   100 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 628
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 629
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.0.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.1.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.2.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.3.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.4.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.5.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.6.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.7.
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 CAS Write Latency =           5
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled


Ok, the first 6 warnings should not be there.
The first 2 seem to be associated with your Gowin pll.

The next 4 for some reason comes from my code, and it should not.
Did you modify my test bench HDL?

The next set of warnings are just a print-out from my PLL.  I guess I should have used $display instead of $warning, though the $warning show better in Quartus' compiler.

The divide by zero error is in Micron's DDR3 model, I tend to ignore.

Then we reach the load MRS2, without the RST_N or CKE required delay warning, and everything runs from there correctly.


Again, read your 'Gowin/pll_ddr#/pll_ddr#.v' to see what's inside those black boxes.
The only thing you might have trouble with is configuring the adjustable 270deg in PLL #2 based on my source parameter.

Also don't forget to pass my string  .FPGA_FAMILY    to     rpll_inst.DEVICE, and in your sims set my .FPGA_FAMILY to "GW2A-18".

(it looks like you will have to strong-arm Gowin's 'rpll_inst.PSDA_SEL = "1100";' as it looks to be a string, not a binary number, though it looks easy enough.)

From my system PLL module's parameter DDR3_WDQ_PHASE, multiply by 16, divide by 360, shrink to 4 bit logic, then make a string = 0 through 15 "0000", "0001", "0010",...  Remember, this must all be done as localparams.
« Last Edit: August 28, 2022, 10:48:28 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #189 on: August 28, 2022, 11:07:07 pm »

Looking at your attached .txt file, your sim begins at line 2548.  Reading from there:
Code: [Select]
# restart -force
# Loading sv_std.std
# Loading work.BrianHG_DDR3_PHY_SEQ_v16_tb
# Loading work.BrianHG_DDR3_PLL
# Loading work.BrianHG_DDR3_PHY_SEQ_v16
# Loading work.BrianHG_DDR3_GEN_tCK
# Loading work.BrianHG_DDR3_CMD_SEQUENCER_v16
# Loading work.gowin_ddr_clocking
# Loading work.pll_ddr1
# Loading work.rPLL
# Loading work.pll_ddr2
# Loading work.BrianHG_DDR3_IO_PORT_ALTERA
# ** Warning: (vsim-3017) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Too few port connections. Expected 11, found 10.
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_DDR3_PLL/genblk6/gowin_ddr_clocks
# ** Warning: (vsim-3722) BrianHG_DDR3_PLL.sv(548): [TFMPC] - Missing connection for port 'fdly'.
#
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(450): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/CMD_TXB', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(148): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_VECT_OUT', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(147): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# ** Warning: (vsim-3839) BrianHG_DDR3_PHY_SEQ_v16.sv(146): Variable '/BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ/SEQ_RDATA_RDY_t', driven via a port connection, is multiply driven. See BrianHG_DDR3_PHY_SEQ_v16.sv(480).
#
#         Region: /BrianHG_DDR3_PHY_SEQ_v16_tb/DUT_PHY_SEQ
# run -all
# ** Warning: *****************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 616
# ** Warning: *** BrianHG_DDR3_PLL Info ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 617
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 618
# ** Warning: ***      CLK_IN           =    50 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 619
# ** Warning: ***      DDR3_RDQ/WDQ     =   800 MTPS.    ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 620
# ** Warning: ***      DDR3_CLK/RDQ/WDQ =   400 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 621
# ** Warning: ***      DDR3_WDQ_PHASE   =   270 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 622
# ** Warning: *** True DDR3_WDQ_PHASE   =  1875 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 623
# ** Warning: ***      DDR3_RDQ_PHASE   =     0 degrees. ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 624
# ** Warning: *** True DDR3_RDQ_PHASE   =  0000 ps.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 625
# ** Warning: ***      CMD_CLK          =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 626
# ** Warning: ***      DDR3_CLK_50      =   200 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 627
# ** Warning: ***      DDR3_CLK_25      =   100 MHz.     ***
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 628
# ** Warning: *********************************************
#    Time: 0 ps  Scope: BrianHG_DDR3_PHY_SEQ_v16_tb.DUT_DDR3_PLL File: BrianHG_DDR3_PLL.sv Line: 629
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.file_io_open: at time                    0 WARNING: no +model_data option specified, using /tmp.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.0.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.1.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.2.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.3.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.4.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.5.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.6.
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file: at time 0 INFO: opening /tmp/BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.open_bank_file.7.
# ** Error: (vsim-8630) ddr3.v(542): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(543): Infinity results from division operation.
#
# ** Error: (vsim-8630) ddr3.v(544): Infinity results from division operation.
#
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Partial Array Self Refresh = Bank 0-7
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 CAS Write Latency =           5
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Auto Self Refresh = Disabled
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Self Refresh Temperature = Normal
# BrianHG_DDR3_PHY_SEQ_v16_tb.sdramddr3_0.cmd_task: at time 1205718750.0 ps INFO: Load Mode 2 Dynamic ODT = Disabled


Ok, the first 6 warnings should not be there.
The first 2 seem to be associated with your Gowin pll.

Yep - it's the fdly [3:0] input to gowin_ddr_clocking. Apparently I forgot to take out the declaration of the input port at the top of the gowin_ddr_clocking module. We're never changing the delay on the output port, so I'm hardwiring those to 4'b0 and there was no need to pass it in. Fixed locally.

The next 4 for some reason comes from my code, and it should not.
Did you modify my test bench HDL?

Not that I'm aware of. The only changes that ought to have been made in the test bench are at the top of the file wrt the PLL instance. You can see the diffs here on GitHub

running 'git diff' on the local version of what's at GitHub is just showing me the line where I set SKIP_PUP_TIMER to 0, so I don't think there's any changes. Could be a Modelsim version ? I'm running
Code: [Select]

ModelSim ALTERA STARTER EDITION 10.3c

Revision: 2014.09
Date: Sep 20 2014



The next set of warnings are just a print-out from my PLL.  I guess I should have used $display instead of $warning, though the $warning show better in Quartus' compiler.

The divide by zero error is in Micron's DDR3 model, I tend to ignore.

Then we reach the load MRS2, without the RST_N or CKE error, and everything runs from there correctly.
:-+

Again, read your 'Gowin/pll_ddr#/pll_ddr#.v' to see what's inside those black boxes.
The only thing you might have trouble with is configuring the adjustable 270deg in PLL #2 based on my source parameter.

Also don't forget to pass my string  .FPGA_FAMILY    to     rpll_inst.DEVICE, and in your sims set my .FPGA_FAMILY to "GW2A-18".

Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

Speaking of which:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.

The first and third are fairly obvious, the second seems to scale the wrong way. 50 x 32 = 1600. Isn't the DDR3 clock running at 400MHZ, so with DDR it is 800MT/s, so 1600 is x2 not /2 ? Maybe just perspective, and probably doesn't matter as long as it's consistent :)
 
So I think what you're saying is that from the above 3 figures, I ought to be deriving any of the numerical PLL parameters such as FBDIV_SEL in
Code: [Select]
defparam rpll_inst.FCLKIN = "50";
defparam rpll_inst.DYN_IDIV_SEL = "false";
defparam rpll_inst.IDIV_SEL = 0;
defparam rpll_inst.DYN_FBDIV_SEL = "false";
defparam rpll_inst.FBDIV_SEL = 7;
defparam rpll_inst.DYN_ODIV_SEL = "false";
defparam rpll_inst.ODIV_SEL = 2;
defparam rpll_inst.PSDA_SEL = "0000";
defparam rpll_inst.DYN_DA_EN = "true";
defparam rpll_inst.DUTYDA_SEL = "1000";
defparam rpll_inst.CLKOUT_FT_DIR = 1'b1;
defparam rpll_inst.CLKOUTP_FT_DIR = 1'b1;
defparam rpll_inst.CLKOUT_DLY_STEP = 0;
defparam rpll_inst.CLKOUTP_DLY_STEP = 0;
defparam rpll_inst.CLKFB_SEL = "internal";
defparam rpll_inst.CLKOUT_BYPASS = "false";
defparam rpll_inst.CLKOUTP_BYPASS = "false";
defparam rpll_inst.CLKOUTD_BYPASS = "false";
defparam rpll_inst.DYN_SDIV_SEL = 2;
defparam rpll_inst.CLKOUTD_SRC = "CLKOUT";
defparam rpll_inst.CLKOUTD3_SRC = "CLKOUT";
defparam rpll_inst.DEVICE = "GW2A-18";


which, as I said, doesn't look impossible. And certainly updating the device to be passed through is fine too.

Is there any difference between doing it as "type (parameters) instance (signals)" rather than as "type instance (signals) defparams" ?
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #190 on: August 28, 2022, 11:20:48 pm »
Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

What math?
I gave you the math, you just had to...

defparam rpll_inst.FCLKIN = (CLK_KHZ_IN /1000);
defparam rpll_inst.IDIV_SEL = (CLK_IN_DIV-1);
defparam rpll_inst.FBDIV_SEL = (CLK_IN_MULT-1);

How hard was that?

Also:

localparam string gowin_phase[0:15] ='{"0000","0001","0010",...,"1111"};
localparam  phase_set = (DDR3_WDQ_PHASE * 16 / 360);

and
defparam rpll_inst.PSDA_SEL = gowin_phase_string[phase_set[3:0]];

Only that you might need to remove the 'string' in the first localparam as this is the same hard-handed manipulation I has to use to make Altera's PLL work as they have some input parameters which arent true integers or strings.

You will need to test everything in the first simple PLL testbench.

Quote
Is there any difference between doing it as "type (parameters) instance (signals)" rather than as "type instance (signals) defparams" ?
No difference.

If you like, you may find the common remainder of the CLK_IN_DIV & CLK_IN_MULT to divide both by that number if you like.  I always prefer having the CLK_IN_DIV set to at least 2 as this makes a true 50:50 source reference as the rise and fall times of your source crystal oscillator are now ignored.  You are now relying on the rise-to-next-rise of the crystal feeding your PLL source.
« Last Edit: August 28, 2022, 11:47:02 pm by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #191 on: August 28, 2022, 11:31:49 pm »
Clicking on 'Help / About Modelsim' gives me this:

 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #192 on: August 28, 2022, 11:36:32 pm »
Looking at the:
defparam rpll_inst.FCLKIN = "50";

Notice that the '50' is in quotes.
It may be another one of those BS fake string things.
We may need to strong-arm a phony string for that one too.

(If this is it, it isn't too bad.  Not until you take a look at LATTICE, they actually rely on some figures written in their 'COMMENTS', yes, inside their /* xxx=xxxMHz */ to make their PLL primitive design compile properly.  WTF?  How.  What kind of backwater design is this?)
« Last Edit: August 28, 2022, 11:47:47 pm by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #193 on: August 29, 2022, 12:00:07 am »
What math?
I gave you the math, you just had to...

defparam rpll_inst.FCLKIN = (CLK_KHZ_IN /1000);
defparam rpll_inst.IDIV_SEL = (CLK_IN_DIV-1);
defparam rpll_inst.FBDIV_SEL = (CLK_IN_MULT-1);

How hard was that?

Ok, I think the confusion is because those figures don't match what the tool produces for a 50MHz input clock (you can see the generated ones in my last post, it's 50/8/1 not 50/32/4) so it wasn't clear that you'd done it already, I thought you were just referring to some code of your own. Looking at the formula, and at the values (50,32,4) yours will of course produce a 400MHz clock, assuming there's no parameter-out-of-range in some in-the-middle calculation.


Also:

localparam string gowin_phase[0:15] ='{"0000","0001","0010",...,"1111"};
localparam  phase_set = (DDR3_WDQ_PHASE * 16 / 360);

and
defparam rpll_inst.PSDA_SEL = gowin_phase_string[phase_set[3:0]];

Only that you might need to remove the 'string' in the first localparam as this is the same hard-handed manipulation I has to use to make Altera's PLL work as they have some input parameters which arent true integers or strings.

You will need to test everything in the first simple PLL testbench.

Yep, the "joys" of stringification are heretofore unknown to me. I'm sure it'll be fun.

If you like, you may find the common remainder of the CLK_IN_DIV & CLK_IN_MULT to divide both by that number if you like.  I always prefer having the CLK_IN_DIV set to at least 2 as this makes a true 50:50 source reference as the rise and fall times of your source crystal oscillator are now ignored.  You are now relying on the rise-to-next-rise of the crystal feeding your PLL source.

That makes sense.

Looks like my modelsim is a little out of date as well - I can rectify that.

Looking at the:
defparam rpll_inst.FCLKIN = "50";

Notice that the '50' is in quotes.
It may be another one of those BS fake string things.
We may need to strong-arm a phony string for that one too.

(If this is it, it isn't too bad.  Not until you take a look at LATTICE, they actually rely on some figures written in their 'COMMENTS', yes, inside their /* xxx=xxxMHz */ to make their PLL primitive design compile properly.  WTF?  How.  What kind of backwater design is this?)

Is there really no function in verilog to accept an integer and convert it to a string format so it can be passed to a parameter ? Seems like a great potential extension to $sformat(), or some compile-time equivalent if it doesn't do it now.

And yes, that is nuts from Lattice. They also don't let you run Modelsim on a PC, serving the display to the nice big 3-monitor Mac using Microsoft Remote Desktop. They're just fine with Linux serving the display over X11 though...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #194 on: August 29, 2022, 12:00:14 am »

Yep, but if I'm varying the input clock, I'm presumably going to have to do maths on the parameters to get the various divisors set up. Shouldn't be impossible, just means I need a better understanding of what your constants mean and how the PLL parameters work to configure it than I do right now.

Speaking of which:
Code: [Select]
parameter int        CLK_KHZ_IN              = 50000,          // PLL source input clock frequency in KHz.
parameter int        CLK_IN_MULT             = 32,             // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int        CLK_IN_DIV              = 4,              // Divide factor.  When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.
The first and third are fairly obvious, the second seems to scale the wrong way. 50 x 32 = 1600. Isn't the DDR3 clock running at 400MHZ, so with DDR it is 800MT/s, so 1600 is x2 not /2 ? Maybe just perspective, and probably doesn't matter as long as it's consistent :)
 
:palm:  Is this what you were asking, FOUT = CLK_KHZ_IN  * CLK_IN_MULT / CLK_IN_DIV ?
50 * 32 / 4 = 400 MHz DDR3 CK.

So, if you want a 25Mhz source, just change the CLK_IN_DIV to 2 and you will get the same result.
« Last Edit: August 29, 2022, 12:05:21 am by BrianHG »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #195 on: August 29, 2022, 12:04:13 am »
Is there really no function in verilog to accept an integer and convert it to a string format so it can be passed to a parameter ? Seems like a great potential extension to $sformat(), or some compile-time equivalent if it doesn't do it now.

My quote:
Quote
Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #196 on: August 29, 2022, 12:05:11 am »
Only got as far as the first two lines. I'm going to stop talking now, it's just getting embarrassing.

Ok, when I said strong-armed, I meant the stupid 'localparam Altera_Dummy_String' I had to create as this is a parameter limitation bug in Altera's 20 year old pll primitive and their HDL design team's issues expecting a number encoded as a string embedded into a 64 bit integer, and in more than one place.  (Do not ask, I do not want to go into this BS hell.)

 :-DD
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #197 on: August 30, 2022, 01:56:18 am »
This `ifdef shouldn't be necessary: Gowin/gowin_ddr_clocking.sv#L212

Did you try renaming the word 'string' to 'int' on this line: Gowin/gowin_ddr_clocking.sv#L65

This is what I had to do for Altera's PLL: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L193

And right after, I had to: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L278

Then I sent the 'DDR3_WDQ_PHASE_pss' into the PLL's parameters.
(Obviously, you would choose your own names._

The key factors in making both ModelSim and Quartus Prime happy when they compile, is that my dummy string array = '{"xxxx"} was a 'localparam int', and I made a new 'localparam param_to_be_sent' = dummy_string_array[sel#];.

If I did not go through this, even though ModelSim would accept the numbers for simulation, Quartus Prime would compile away, not saying a thing accepting a default "0" for that parameter.

------
Note, for generating your: Gowin/gowin_ddr_clocking.sv#L191 as a string, just copy & rename my int Altera_Dummy_String, except, trim it to 0-255 as this will probably be the frequency range input.

I have a feeling the "xxx' for the Gowin's frequency input may be because they might accept fractions, like "14.31818".  This might make things messy.  Not all FPGA compilers can accommodate 'real' floating point numbers.  Anyways, you will need to compile in Gowin a simple PLL clock to see if everything works out OK.
« Last Edit: August 30, 2022, 02:04:48 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #198 on: August 30, 2022, 05:34:21 am »
This `ifdef shouldn't be necessary: Gowin/gowin_ddr_clocking.sv#L212

Did you try renaming the word 'string' to 'int' on this line: Gowin/gowin_ddr_clocking.sv#L65

This is what I had to do for Altera's PLL: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L193

And right after, I had to: BrianHG_DDR3/BrianHG_DDR3_PLL.sv#L278

Then I sent the 'DDR3_WDQ_PHASE_pss' into the PLL's parameters.
(Obviously, you would choose your own names._

The key factors in making both ModelSim and Quartus Prime happy when they compile, is that my dummy string array = '{"xxxx"} was a 'localparam int', and I made a new 'localparam param_to_be_sent' = dummy_string_array[sel#];.

If I did not go through this, even though ModelSim would accept the numbers for simulation, Quartus Prime would compile away, not saying a thing accepting a default "0" for that parameter.

Worked like a charm  :-+ - no synthesis errors and modelsim is happy too.

[edit] And having said that, and posted it, I immediately noticed that the write-clock duty-cycle isn't working, it's always falling at 0 now, not at (phase + 8/16). I'll look at that...

------
Note, for generating your: Gowin/gowin_ddr_clocking.sv#L191 as a string, just copy & rename my int Altera_Dummy_String, except, trim it to 0-255 as this will probably be the frequency range input.

I have a feeling the "xxx' for the Gowin's frequency input may be because they might accept fractions, like "14.31818".  This might make things messy.  Not all FPGA compilers can accommodate 'real' floating point numbers.  Anyways, you will need to compile in Gowin a simple PLL clock to see if everything works out OK.

I'm not actually having any problems with the input frequency for the clock, both simulation and synthesis seem quite happy with the expression as-is. I did back-port the gowin_ddr_clockings changes to the PLL test project and compare the Modelsim output for the Altera PLL with the Gowin one, and they both match up (see attached example).

The timing report in the Gowin synthesis seems to indicate that it understood the ask for the PLL outputs - again, see attached. I think this is a good indicator it's doing "the right thing".
« Last Edit: August 30, 2022, 05:37:39 am by SpacedCowboy »
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #199 on: August 30, 2022, 05:54:41 am »
Umm, shouldn't the 133MHz be 100MHz?
Maybe a typo in the divider.
Or if it's another output, maybe set it to 100MHz and place it on the first PLL reserving the write clock to PLL2.
(This is a plus in the future as you may use the 'delay' output primitive for the write data removing the need for the second PLL all together.  Though, get my DDR3 working firstly as intended.)

Also for the duty cycle, why not just hard write it in the parameter line just as if it came from the rPLL GUI generator?

Next, take a look at IO buffer primitives, then, the DDR primitives to drive those buffers.

DDR3_CK needs a differential LVDS output.  (May also be 2 output pin buffers, but this is lower quality.)
DDR3_DQS needs a differential LVDS bidirectional.  (May also be 2 bidir pin buffers, but this is lower quality.)
« Last Edit: August 30, 2022, 05:58:35 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #200 on: August 30, 2022, 06:09:30 am »
That's better. Fixed the phase. I'll upload the changes in the morning (to the quick-test) and copy gowin_ddr_clocking.sv over to the full repo after I've tested it out there too. It ought to just drop in, but ... famous last words...
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #201 on: August 30, 2022, 06:22:36 am »
Umm, shouldn't the 133MHz be 100MHz?
Maybe a typo in the divider.

Yeah, they worried me too at first, but it's just the CLKOUTD3 - fixed division of the CLKOUT by 3. Notice there's 8 clocks listed there, and I only want 5 of them - I burn a zero-phase-offset 400MHz output (because I get 2 of them) and I get two 133's "for free" as the /3 of the 400's.

Or if it's another output, maybe set it to 100MHz and place it on the first PLL reserving the write clock to PLL2.
(This is a plus in the future as you may use the 'delay' output primitive for the write data removing the need for the second PLL all together.  Though, get my DDR3 working firstly as intended.)

Sadly, there's no configuration for that output - it's /3 and that's your lot.


Also for the duty cycle, why not just hard write it in the parameter line just as if it came from the rPLL GUI generator?
There's a comment in the code about the write-clock being 90 or 270 degrees out of phase - maybe it's always 270 ?, anyway, it doesn't take much, the table is only 16 entries long, so I just created another 16-entries table and offset it by the phase.

Next, take a look at IO buffer primitives, then, the DDR primitives to drive those buffers.

DDR3_CK needs a differential LVDS output.  (May also be 2 output pin buffers, but this is lower quality.)
DDR3_DQS needs a differential LVDS bidirectional.  (May also be 2 bidir pin buffers, but this is lower quality.)

Yep, I've been looking at that a little. There's not too much to configure for DDR - I don't see any Differential DDR for example, the tool offers as below. I've done (this is all pre-testing) the command bus signals, which are relatively straightforward, and I implemented the n{CK} as 2 output buffers each with complementary drive. I'm not sure I have much in the way of options here...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #202 on: August 30, 2022, 06:28:20 am »
The DDR doesn't drive the PIN, it is the IO buffers which drive the pins.
The DDR <-> IO buffers <-> pin.
Read your Gowin documentation and you will understand the DDR primitive outputs, where they are tied to.
All the different type of IO buffers are right at the beginning of the primitives.

(I know Altera stuffed these 2 into a single primitive, Gowin does not...)
« Last Edit: August 30, 2022, 06:40:32 am by BrianHG »
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #203 on: August 30, 2022, 06:39:51 am »
I see, so chain the DDR output through an TLVDS_OBUF (or ELVDS_OBUF) for outputs, and *_TBUF for bidirectional signals. Ok, that makes sense, I’ll try integrating that into what I have.

Anyway (yawn) I’m done for the day, I’m up in 6 hours…
 

Offline SpacedCowboy

  • Frequent Contributor
  • **
  • Posts: 292
  • Country: gb
  • Aging physicist
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #204 on: August 31, 2022, 02:39:47 am »
So here's a question...

As given, the DDR input/output construct from Gowin doesn't have any obvious built in way to set a different clock on the input and output routes, like the Altera one seems to have.

Code: [Select]
`timescale 100 ps/100 ps
module Gowin_DDR (
  din,
  tx,
  clk,
  io,
  q
)
;
input [1:0] din;
input [0:0] tx;
input clk;
inout [0:0] io;
output [1:0] q;
wire [0:0] iobuf_o;
wire [0:0] ddr_inst_q0;
wire [0:0] ddr_inst_q1;
wire VCC;
wire GND;
  IOBUF \ddr_gen[0].iobuf_inst  (
    .O(iobuf_o[0]),
    .IO(io[0]),
    .I(ddr_inst_q0[0]),
    .OEN(ddr_inst_q1[0])
);
  ODDR \ddrx1_gen[0].oddr_inst  (
    .Q0(ddr_inst_q0[0]),
    .Q1(ddr_inst_q1[0]),
    .D0(din[0]),
    .D1(din[1]),
    .TX(tx[0]),
    .CLK(clk)
);
  IDDR \ddrx1_gen[0].iddr_inst  (
    .Q0(q[0]),
    .Q1(q[1]),
    .D(iobuf_o[0]),
    .CLK(clk)
);
  VCC VCC_cZ (
    .V(VCC)
);
  GND GND_cZ (
    .G(GND)
);
  GSR GSR (
    .GSRI(VCC)
);
endmodule /* Gowin_DDR */


I could, of course, pass in the 2 clocks and just wire up the IDDR to the read-clock and the ODDR to the write-clock. The obvious problem is that the direction control OEN on the IOBUF, which is linked to Q1 on the ODDR, in turn linked through to TX on the input is going to always be in the phase of the write-clock

The question is: does this matter ?

Looking at the signalling for DDR3, it seems there's a whole bunch of cycles before DQ is read/written or DQS* are asserted, and successive READ and WRITE ops will presumably need that preamble, so if the TX is set during that preamble, and held for the duration of the operation, I'll be fine just wiring up the data to the correct clocks, I think.

Of course, I could be wrong about that, and it's also possible that TX is controlled per-cycle, hence the question...
 

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: ca
Re: BrianHG_DDR3_CONTROLLER open source DDR3 controller. NEW v1.60.
« Reply #205 on: August 31, 2022, 02:58:21 am »
Don't worry about the details yet.

Make a bit width sizable ddr bidir with 2 clocks, 1 in, 1 out.
The DDR output should take in the OE in parallel with it's data and posedge clock out.
The read side does nothing but DDR read the t pin buffer.

We will worry about the 'half-clock' delayed OE feature I use with Altera later.
(I use it to half clock early turn on the OE and half clock late turn it off, expanding the OE's timing for severe overclocking performance.  I'm sure we can adapt Gowin, or just use the parallel OE without my expansion feature.)

All DDR buffers run exclusively on positive edge clocked logic.  Otherwise, what it the point of the DDR primitive?  Otherwise I could have done half my buffers in posedge clk and the other half in negedge clock and never use any DDR primitives on any FPGA and the world would be perfect.


If you are not sure how the Altera ddr buffer works, you can add the port IO traces to the sim and watch, or go to Altera's data sheet I posted a few threads ago.
« Last Edit: August 31, 2022, 03:01:33 am by BrianHG »
 
The following users thanked this post: SpacedCowboy

Offline BrianHGTopic starter

  • Super Contributor
  • ***
  • Posts: 7836
  • Country: