Author Topic: Why does this dds code fail (Solved)  (Read 6345 times)

0 Members and 1 Guest are viewing this topic.

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Why does this dds code fail (Solved)
« on: March 17, 2022, 10:15:36 am »
I'm playing with verilog, to which I'm new, and an Anlogic AL3-10 device. Using the Tang Dynasty IDE release 5.0.3 and run into strange behavior of which I'm not sure if it is my verilog or the IDE.

Trying to make a DDS with two 14 bit DAC's. For first testing I made a sawtooth generator and when running on a fixed counter increment it works for both channels, as long as there is a power of 2 relation between the two channels. When I set step values that have a non power of 2 relation it either kills the first channel or the frequency is incorrect.

The code I wrote for this:
Code: [Select]
//---------------------------------------------------------------------------
//Main module for connections with the outside world

module FA201_Lichee_nano
(
  //Input signals
  input wire i_xtal,       //50 MHz clock

  //Output signals
  output wire o_dac1_clk,
  output wire o_dac1_wrt,
  output wire o_dac2_clk,
  output wire o_dac2_wrt,

  output wire [13:0] o_dac1_d,
  output wire [13:0] o_dac2_d
);

  //---------------------------------------------------------------------------
  //Internal wire 
  wire core_clock;
 
  wire [31:0] channel1_signal_step;
  wire [31:0] channel2_signal_step;

  //---------------------------------------------------------------------------
  //Connection with the sub modules
 
  pll_clock pll 
  ( 
    .refclk   (i_xtal),
    .reset    (1'b0),
    .clk0_out (core_clock)
  );
 
  awg dac1
  (
     .i_main_clock           (core_clock),
     .i_signal_step          (32'h400000),
     .o_dac_clk              (o_dac1_clk),
     .o_dac_wrt              (o_dac1_wrt),
     .o_dac_d                (o_dac1_d)
  );

  awg dac2
  (
     .i_main_clock           (core_clock),
     .i_signal_step          (32'h1723549),
     .o_dac_clk              (o_dac2_clk),
     .o_dac_wrt              (o_dac2_wrt),
     .o_dac_d                (o_dac2_d)
  );

endmodule

//---------------------------------------------------------------------------

Code: [Select]
//----------------------------------------------------------------------------------
//Module for generating the DAC signals

module awg
(
  //Input
  input i_main_clock,
   
  input [31:0] i_signal_step,

  //Output
  output [13:0] o_dac_d,
 
  output o_dac_clk,
  output o_dac_wrt
);

  //--------------------------------------------------------------------------------
  //Registers

  reg clock;
  reg write;
 
  reg [35:0] signal_phase;

  //--------------------------------------------------------------------------------
  //Logic
 
  always@(posedge i_main_clock)
    begin
      clock <= ~clock;
    end
   
  always@(posedge i_main_clock)
    begin
      write <= ~write;
    end
 
  always@(negedge i_main_clock)
    begin 
      if(write == 1'b1)   
        signal_phase <= signal_phase + i_signal_step;         
      else
        signal_phase <= signal_phase;
    end
   
  //--------------------------------------------------------------------------------
  //Connect

  assign o_dac_d = signal_phase[35:22];
  assign o_dac_clk = clock;
  assign o_dac_wrt = write;

endmodule

//----------------------------------------------------------------------------------

When I set the same value for both step values it works with both outputting the same frequency. As long as there is a power of 2 relation between the two it works correctly. I just tried it with decimal 4194304 for channel 1 and decimal 12582912 for channel 2. A factor 3 between the two. The result is 30.4KHz on channel 1 and 22.8KHz on channel 2. The 22.8KHz is correct. The frequency on channel 1 should have been 7.6KHz

So why is this happening?
« Last Edit: April 07, 2022, 09:25:03 am by pcprogrammer »
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #1 on: March 17, 2022, 11:47:50 am »
Did you check your compilation timing report to see if your FPGA can achieve the FMAX clock rate.  Using different clock adder values may lead the compiler to remove significant bits in your :
Code: [Select]
signal_phase <= signal_phase + i_signal_step;  logic hence allowing a higher FMAX.  Maybe some odd values for 'i_signal_step' requires all 36 bits in your adder and your FPGA cannot achieve the required FMAX messing up the output frequency.

Remember, always adding by a fixed 65536 means the compiler will ignore the first 16bits in your:
Code: [Select]
signal_phase <= signal_phase + i_signal_step;  code which will now become only a 20 bit adder.  A 20bit adder will have a much higher FMAX than a full 36bit adder.  This is done because the bottom 16bit will never change and it is easier to drop them from the generated gates in the calculation.

Extra but not necessarily relevant:
No power-up reset, or power-up default for the counter & output clock & write regs?

This usually doesn't affect FPGAs, especially with Quartus as they will power-up default to 0, but I do know it can affect simulations in Modelsim.
« Last Edit: March 17, 2022, 11:49:46 am by BrianHG »
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #2 on: March 17, 2022, 12:27:29 pm »
Thanks for the reply. I'm new to modern FPGA's. Last time I did something with FPGA's was some 25 years ago.

So no, I did not look at timing reports. Was not aware of them. I do understand that a 20 bit adder will be faster then a 36 bit adder and that the compiler can optimize. But a test with these two values:

step 1: 32'h1723549
step 2: 32'h17235490

gave correct output frequencies for both channels. Channel 2 16x higher then channel 1.

For step 1 the full 36 bits have to be there. That makes me believe the timing would be ok.

As I'm new to verilog I thought the code might be wrong somehow :o

I do see some warnings about timing constraints. Not sure if this is a problem, because I also get them when it works :-//
Code: [Select]
TMR-5009 WARNING: No clock constraint on 2 clock net(s):
core_clock
i_xtal_pad

You are right about there not being a reset in the code. I don't think it to be a problem for this testing. In the final system I plan to put some form of reset into it.

Edit: I'm also trying to do simulations in ModelSim, but have problems in getting things working. It fails on the PLL in the design. Some library issue |O
« Last Edit: March 17, 2022, 12:29:18 pm by pcprogrammer »
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #3 on: March 17, 2022, 12:58:23 pm »

I do see some warnings about timing constraints. Not sure if this is a problem, because I also get them when it works :-//
Code: [Select]
TMR-5009 WARNING: No clock constraint on 2 clock net(s):
core_clock
i_xtal_pad


Yes, without a timing constraint, your FPGA core each time you compile will generate a core with any part able to run at any frequency from slow to fast.  In Altera/Intel's Quartus, we would need to generate and set a .SDC 'Synopsys Design Constraints' file to tell the compiler at which clock rates the CLK inputs are running at and IO timing for all the IOs defining what parts of the FPGA we are attempting to run the FPGA at.  The most important will be the source clocks so that at least the FPGA internals will function correctly with your source clock unless your design is really slow.  The result in the compiler compiler report will tell you if all the timing requirements were met.

If you are not ready to touch FPGA timing, then just set your PLL output to 1/4 speed.  Then re-compile-run your design and see if the output are correct, except working at 1/4 speed.

2 minor tricks to improve FMAX with your existing code: (Level A) change-
Code: [Select]
  always@(negedge i_main_clock)
    begin
      if(write == 1'b1)                // You might want to change this to 1'b0 to shift the output result.
        signal_phase <= signal_phase + i_signal_step;         
      else
        signal_phase <= signal_phase;
    end

to:
Code: [Select]
  always@(posedge i_main_clock)  // Always use 1 clock polarity everywhere to achieve the best FMAX
    begin
      if(write == 1'b1)   
        signal_phase <= signal_phase + i_signal_step;         
     /* else
        signal_phase <= signal_phase;    This may slow down FMAX with some compilers and it is not required in your code */
    end

 (Level B) change-
Code: [Select]
  //Connect

  assign o_dac_d = signal_phase[35:22];
  assign o_dac_clk = clock;
  assign o_dac_wrt = write;
to:
Code: [Select]
module awg
(
  //Input
  input i_main_clock,
   
  input [31:0] i_signal_step,

  //Output
  output reg [13:0] o_dac_d,  // This is now a register
 
  output reg o_dac_clk,  // This is now a register
  output reg o_dac_wrt  // This is now a register
);

.....
  //Connect
always@(posedge i_main_clock)  // Always use 1 clock polarity and separate the IO pins
    begin                      // from the faster core FPGA fabric logic by adding a 1 clock D-Reg latch delay
  o_dac_d <= signal_phase[35:22];
  o_dac_clk <= clock;
  o_dac_wrt <= write;
end

These additions are not a huge improvement unless you have a lot of other code in the FPGA or my above 2 tactics will really help with marginal designs or weird pin assignments which cross more than 1 IO banks.

There are ways to really improve FMAX, but such tactics are for achieving things like 300MHz 36bit adders with slower FPGAs.



As for Modelsim, I can only help your with Altera's version with it's included libraries as I have plenty of use and examples with them.
« Last Edit: March 17, 2022, 01:14:46 pm by BrianHG »
 
The following users thanked this post: pcprogrammer

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #4 on: March 17, 2022, 01:40:07 pm »
The timing constraints is a bit of a mystery to me. There is a timing wizard in the IDE but it is not clear on what needs to be set :palm:

I made this with it:
Code: [Select]
create_clock -name core_clock -period 4 -waveform {0 2}
create_clock -name i_xtal -period 20 -waveform {0 10}
create_clock -name i_xtal_pad -period 20 -waveform {0 10}
set_clock_latency  -source 1 [get_clocks {i_xtal}]

But it still complains about the two clocks not having constraints. Even when I add set_clock_latency lines for them.

I see your point in using the same edge of the clock for better performance, but there is some benefit of the phase difference in the actions on the data. The DAC takes in the data on the rising edge of the write signal, which is synchronized to the rising edge of the main clock. Having both the write and the signal phase change on the rising edge of the clock might cause timing issues since the write signal is used in the decision to update the signal phase. Using the phase difference gives 2ns of room for the signals to be stable. (PLL is on 250MHz)

Your level B change is interesting. Does this translate in the FPGA to use the registers in the IO blocks, or will it use additional registers in the logic blocks? In case of the IO blocks your comment about separating the IO from the faster core logic makes sense.

I have to experiment with all of this. But that is what learning is about :)

Well you are right that it is a timing issue. I removed the PLL and just clock things on the 50MHz crystal clock and it does what it is supposed to do with the factor 3 ratio between the two frequencies, as well as with the other fractional relation. Now 1/5th lower in frequency but it works.
« Last Edit: March 17, 2022, 01:59:26 pm by pcprogrammer »
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #5 on: March 17, 2022, 01:51:43 pm »
I see your point in using the same edge of the clock for better performance, but there is some benefit of the phase difference in the actions on the data. The DAC takes in the data on the rising edge of the write signal, which is synchronized to the rising edge of the main clock. Having both the write and the signal phase change on the rising edge of the clock might cause timing issues since the write signal is used in the decision to update the signal phase. Using the phase difference gives 2ns of room for the signals to be stable. (PLL is on 250MHz)
There is nothing but loss in using 2 different edge clocks in the way you have done the 2 different clock edges.

I knew it.  250MHz is a tall order for your 36bit full adder.  You need a really fast FPGA to do this, something like a 300$ TO 500$ fpga especially as your add enable is on a different phase means you FPGA needs to operate as if it needs a 500Mhz 36bit adder.

This is not the way to clock your dac or design as you should be running the entire design only at 125MHz and feeding just the dac clk outputs with a 250MHz clk with inverting it's way along at the phase you desire.  A 125MHz 36bit full adder is much more likely to meet timing requirements for the majority of FPGAs out there.
« Last Edit: March 17, 2022, 01:56:25 pm by BrianHG »
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #6 on: March 17, 2022, 02:01:04 pm »
Your level B change is interesting. Does this translate in the FPGA to use the registers in the IO blocks, or will it use additional registers in the logic blocks? In case of the IO blocks your comment about separating the IO from the faster core logic makes sense.

All this does is place dumb logic latches at the IO pin logic cells.
In your original code, depending on the compiler or it's settings, the compiler may try to place the adder itself on the IO pin logic cells.  This may save space for tiny PLD/FPGAs, but usually negatively impacts FMAX as those IO logic cells may not route the needed mux/gates to perform the add itself where you need them if your IO pin definitions have the IO in the non optimum locations.
 
The following users thanked this post: pcprogrammer

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #7 on: March 17, 2022, 02:14:19 pm »
I see that this high speed electronics world is a whole different playing field. Sure signal delays, clock skew and what you have was also in play 25 years ago, but on 10 or 20MHz the problems where not that big. 6502 or Z80 cpu's running on 2 or 4MHz with 100ns memory no problems. Now connecting your scope probe can make a difference between working and not working. Lots to learn again.


Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #8 on: March 17, 2022, 02:26:09 pm »
The timing constraints is a bit of a mystery to me. There is a timing wizard in the IDE but it is not clear on what needs to be set :palm:

I made this with it:
Code: [Select]
create_clock -name core_clock -period 4 -waveform {0 2}
create_clock -name i_xtal -period 20 -waveform {0 10}
create_clock -name i_xtal_pad -period 20 -waveform {0 10}
set_clock_latency  -source 1 [get_clocks {i_xtal}]

Is there a setting/field in the compiler setup to point to your 'source' .sdc file containing the above code so that it knows that it is to be used?  Note that some compilers may also support or require an 'include in the source verilog for some SDC files.  Altera Quartus has a field in the compiler menu settings where you list the .sdc files before they are recognized / utilized.
« Last Edit: March 17, 2022, 02:29:11 pm by BrianHG »
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #9 on: March 17, 2022, 02:36:35 pm »
Yes the IDE has a separate constraints section. It also holds the IO constraints.

When I remove the .sdc file from the project I get this warning:
Code: [Select]
TMR-5001 WARNING: No sdc constraints found while initiating timer.

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #10 on: March 17, 2022, 02:48:46 pm »
A snip from one of my projects:
Code: [Select]
#**************************************************************
# Create Clock
#**************************************************************
create_clock -period "10.0 MHz" [get_ports ADC_CLK_10]
create_clock -period "50.0 MHz" [get_ports MAX10_CLK1_50]
create_clock -period "50.0 MHz" [get_ports MAX10_CLK2_50]

create_clock -period "1.0 MHz"  [get_nets {I2C_HDMI_Config:u_I2C_HDMI_Config|mI2C_CTRL_CLK}]

The difference between 'get_ports' and 'get_nets' is that when I have the 'get_ports', this must point to a net with a matching net name on an IO pin while the other is for a net somewhere in my design which has generated a clock through logic.

As for the "50.0 MHz" after the -period, if I didn't have the MHz, then it would default to nanoseconds.
Without the -waveform, the compiler assumes 50/50 duty cycle.
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #11 on: March 17, 2022, 03:01:35 pm »
Thanks for your input. It is much appreciated.

The timing constraints is something I have to do some reading on.

I modified my code with your suggestions and with a bit of thinking of my own it is now working on the 125MHz the DAC can take.

Code: [Select]
//---------------------------------------------------------------------------
//Main module for connections with the outside world

module FA201_Lichee_nano
(
  //Input signals
  input wire i_xtal,       //50 MHz clock

  //Output signals
  output wire o_dac1_clk,
  output wire o_dac1_wrt,
  output wire o_dac2_clk,
  output wire o_dac2_wrt,

  output wire [13:0] o_dac1_d,
  output wire [13:0] o_dac2_d
);

  //---------------------------------------------------------------------------
  //Internal wire 
  wire core_clock; 
  wire dac_clock;
 
  //---------------------------------------------------------------------------
  //Connection with the sub modules
 
  pll_clock pll 
  ( 
    .refclk   (i_xtal),
    .reset    (1'b0),
    .clk0_out (core_clock),   
    .clk1_out (dac_clock)
  );
 
  awg dac1
  (
     .i_main_clock           (core_clock),
     .i_signal_step          (32'h1723549),
     .o_dac_d                (o_dac1_d)
  );

  awg dac2
  (
     .i_main_clock           (core_clock),
     .i_signal_step          (32'h17235490),
     .o_dac_d                (o_dac2_d)
  );
 
  //--------------------------------------------------------------------------- 
  //Connections to external world 
 
  assign o_dac1_clk = dac_clock;
  assign o_dac1_wrt = dac_clock;
  assign o_dac2_clk = dac_clock;
  assign o_dac2_wrt = dac_clock;

endmodule

//---------------------------------------------------------------------------

Code: [Select]
//----------------------------------------------------------------------------------
//Module for generating the DAC signals

module awg
(
  //Input
  input i_main_clock,
   
  input [31:0] i_signal_step,

  //Output
  output reg [13:0] o_dac_d
);

  //--------------------------------------------------------------------------------
  //Registers

  reg [35:0] signal_phase;

  //--------------------------------------------------------------------------------
  //Logic
 
  always@(posedge i_main_clock)
    begin 
      signal_phase <= signal_phase + i_signal_step;         
    end
   
  always@(posedge i_main_clock)
    begin 
      o_dac_d <= signal_phase[35:22];
    end

endmodule

//----------------------------------------------------------------------------------

I modified the PLL to run on 125MHz and provide two clock outputs with a 90 degree phase shift. The first clock is used for the phase signal increment and the other for the external DAC clock signals.

There are some spikes in the DAC output I have to examine, but at least the frequencies are correct.

Now I have to see if it will work with input from a MCU.

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #12 on: March 17, 2022, 03:38:19 pm »
Thanks for your input. It is much appreciated.
...
There are some spikes in the DAC output I have to examine, but at least the frequencies are correct.

Now I have to see if it will work with input from a MCU.

 :-+

For the DAC clock, you should have a second PLL output at 125MHz tied to the IO pins directly and on the second PLL output, tune it's phase to 90 degree, or 45, or 0.  Now you can directly tune the DAC clk output timing relative to the data output.  Ok, you already did that...

Also, if the compiler supports this attribute keyword, use it:

Code: [Select]
  //Output
 (* useioff = 1 *) output reg [13:0] o_dac_d
This tells the compiler that that output reg should be forced onto the IO pin's registers.

Short of defining the input and output delay in the .sdc file, this is a quick way to get fast clean parallel output buss.
« Last Edit: March 17, 2022, 03:44:03 pm by BrianHG »
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #13 on: March 17, 2022, 04:13:52 pm »
Thanks again. Really helpful.

With the code I get the following warnings:
Code: [Select]
TMR-6022 WARNING: Net o_dac2_wrt_pad cannot map all the sources/sinks to wire.
PHY-5011 WARNING: x0y0_pll_clkc0: 2666: is dangling

Searched with google for the dangling thing but could not find something useful.

The code is working so the clocks are driving the logic and IO, so I wonder what the "cannot map all the sources/sinks to wire" is about?

The spikes might be caused in the hardware. 125MHz on these wires (see photo) is maybe a bit much :-DD But hey it is just experimental hobby.
Measurements with a logic analyzer without the DAC module did show a lot of spikes on the signals. ( second picture) Signal D15 was also on channel1, which shows a lot of noise on the signal.

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #14 on: March 17, 2022, 06:59:54 pm »
Thanks again. Really helpful.

With the code I get the following warnings:
Code: [Select]
TMR-6022 WARNING: Net o_dac2_wrt_pad cannot map all the sources/sinks to wire.
PHY-5011 WARNING: x0y0_pll_clkc0: 2666: is dangling


You might not be allowed to tie the PLL output clock to too many IOs in this fashion.

Also, what is 'o_dac2_wrt' ?  Maybe this signal should stay high?

Another thing, it may be better to make your second PLL output run at 250 MHz and run this code to generate your DAC clocks:

Code: [Select]
module (
input clk_125,  // feed in 125MHz system clock from PLL
input clk_250,  // feed in 250MHz system clock from PLL
(* useioff = 1 *)(*preserve*) output reg dac_1_clk, // force preserve a logic cell at the IO pin to feed this clock
(* useioff = 1 *)(*preserve*) output reg dac_2_clk  // without the preserve, the FPGA compiler simplify by wiring 1 logic cell output to 2 IOs generating 1 clock ahead of the other.
)
parameter bit inv_clk_phase  = 0;

reg clk_half = 0;
reg clk_half_buffer = 0;
reg clk_full_buffer1 = 0 ;
reg clk_full_buffer2 = 0 ;
reg clk_full_out ;

always @(posedge clk_125) begin
clk_half <= !clk_half ; // Generate a 62.5MHz clock in phase with the 125MHz dac data.
end

always @(posedge clk_250) begin
clk_full_buffer1 <= clk_half ;
clk_full_buffer2 <= clk_full_buffer1 ;
clk_full_out     <= clk_full_buffer1 ^ reg clk_full_buffer2 ; // Generate a single 250MHz pulse once every 'clk_half' toggle.

dac_1_clk <=  inv_clk_phase ^ clk_full_out ; //  Copy the 'clk_full_out' to an IO buffer with a parameter option to invert it.
dac_2_clk <=  inv_clk_phase ^ clk_full_out ;
end

endmodule

This should make the IO's logic cell flipflop drive the 125MHz output clock instead of the FPGA fabric's global clock being tied to the IO pin through a fuse.


I'm confused, are the spikes you are complaining about the data bits shown on your logic analyzer?
Or is the dac actually showing glitched on the output?

Have you tried configuring the current drive for the FPGA IOs?
Have you tried changing the second PLL clock output's phase?

Note that with my code, the 2 clock phases should work with 0 degree, you would just change the 'inv_clk_phase' parameter.
Modifying the 250MHz clk phase should just be a last resort.

(Yes, 125 MHz data through that bundle of wires will be messy.  I recommend at least separating out of the bundle the CLK wires since D0's maximum toggle rate is 62.5MHz which isn't as bad as the 125MHz signals which may bleed into all the other traces.)
« Last Edit: March 17, 2022, 07:22:03 pm by BrianHG »
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #15 on: March 17, 2022, 07:34:04 pm »
I see that this high speed electronics world is a whole different playing field. Sure signal delays, clock skew and what you have was also in play 25 years ago, but on 10 or 20MHz the problems where not that big. 6502 or Z80 cpu's running on 2 or 4MHz with 100ns memory no problems. Now connecting your scope probe can make a difference between working and not working. Lots to learn again.
25 years?  Well, 21 years ago, Altera released their Apex II FPGA, the forerunner to they Cyclone I/II/III/IV FPGA.  It has close to the same IO speeds and would have run your current 125MHz 2 channel DDS code at the same rate as today.  In fact, it could pull off a 250MHz version.

Yes, I did make a video sampling and playback card on an Apex II chip with a 108MHz video DAC.
« Last Edit: March 17, 2022, 07:42:27 pm by BrianHG »
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #16 on: March 17, 2022, 08:25:39 pm »
Also, what is 'o_dac2_wrt' ?  Maybe this signal should stay high?

The DAC is an AD9767 and it uses the WRT signal to latch the data in a buffer and the CLK signal to latch the data from the buffer into the DAC. According to the datasheet they can have the same phase. So it needs to be actively clocked.

I will try the setup for it with the 250MHz clock you provided.

I'm confused, are the spikes you are complaining about the data bits shown on your logic analyzer?
Or is the dac actually showing glitched on the output?

The spikes I mentioned are in the actual DAC output. Not sure about the cause. Used my Hantek scope since it is smaller and starts faster then the Rigol. It is not as good, To get to the bottom I probably have to get my Rigol or my Yokogawa out on the desk.

The spikes I saw on the logic analyzer are caused by the noise on the signals. They change when I move the threshold on the analyzer, and by the looks of it they were not in line with the actual write edge, so most likely no bit error on the DAC of these. It certainly has to do with the wires. Tried the code on an Altera Cyclone IV board with the logic analyzer probe directly hooked onto the header pins, so no long wires, and I had way less glitches. Noticed that they appeared when most of the lines toggled from level.

Have you tried configuring the current drive for the FPGA IOs?
Have you tried changing the second PLL clock output's phase?

I did play a bit with pullups/pulldowns and skewrate and drive strength but it did not seem to make a difference on the logic analyzer and the first test with the 125MHz ~7KHz output on the DAC (free running 14 bits counter) showed a glitch free output of the DAC.

Did not play with the second PLL clock phase yet.

It is very interesting to play with

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #17 on: March 17, 2022, 08:34:21 pm »
I see that this high speed electronics world is a whole different playing field. Sure signal delays, clock skew and what you have was also in play 25 years ago, but on 10 or 20MHz the problems where not that big. 6502 or Z80 cpu's running on 2 or 4MHz with 100ns memory no problems. Now connecting your scope probe can make a difference between working and not working. Lots to learn again.
25 years?  Well, 21 years ago, Altera released their Apex II FPGA, the forerunner to they Cyclone I/II/III/IV FPGA.  It has close to the same IO speeds and would have run your current 125MHz 2 channel DDS code at the same rate as today.  In fact, it could pull off a 250MHz version.

Yes, I did make a video sampling and playback card on an Apex II chip with a 108MHz video DAC.

At least 25 years ago. I started in 1989 or 1990 with the smaller Xilinx XC3000 devices. Later on the XC4000 series also came available but were still expensive compared to the XC3000 series. The last thing I made with a low power version of the XC3042 was a smart card reader connected to and powered by a RS232 port on a PC. This was around 1997. After that I drifted of into software development.

Not bad that the forerunner of the Cyclone series was already capable of pulling something like this off.

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #18 on: March 17, 2022, 08:37:47 pm »
Ohhh.  I do not like the way that digital interface works with 4 clocks.  Well, I guess I'm in the new school where they squeeze everything onto a high speed serial bus.  Or at least a single DDR buss where 1 data buss for both DACs using 1 clock for everything.

 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #19 on: March 17, 2022, 09:22:07 pm »
For DDR mode, see figure 66 in the data sheet.
Cut down on your wiring by half.
 

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #20 on: March 17, 2022, 10:13:23 pm »
Not bad that the forerunner of the Cyclone series was already capable of pulling something like this off.
Actually it is sad that their entry level FPGA only have added DDR on the IO ports, better PLLs and double the available density.  Maybe more dedicated HW multipliers.

I know that they have faster series of FPGAs, but they are price prohibited and it's been 21 years.

Take a look at ram and CPU speeds and densities since 2001.  Altera FPGAs haven't really advanced all that much unless you go to the higher end Arria & Stratix and embedded ARM core devices, but above 1k$ per fpga isn't what I would call an advancement.
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #21 on: March 18, 2022, 06:11:02 am »
Ohhh.  I do not like the way that digital interface works with 4 clocks.  Well, I guess I'm in the new school where they squeeze everything onto a high speed serial bus.  Or at least a single DDR buss where 1 data buss for both DACs using 1 clock for everything.

For DDR mode, see figure 66 in the data sheet.
Cut down on your wiring by half.

I bought the module on Aliexpress and it came with some examples for boards with Xilinx Spartan6 devices. I used them as starting point for what I'm doing now. It was cheap when I bought it. Now almost double what I paid for it |O

Sure with high speed serial buses things are simpler and even faster, but not a lot of cheap modules available with these kind of devices on them.

The DDR mode is interesting. Something to investigate.

Edit: I looked at the datasheet for this DDR mode, which they call interleaved mode and think it lowers the max samples per second per DAC. The write clock is still bound to the 125MHz limit, and the data is only clocked on the rising edge. So not a true dual data rate setup.

It might be needed for my setup to introduce a phase difference between the WRT and CLK signals. For the interleaved mode they state:
Code: [Select]
At 5 V it is permissible to drive IQWRT and IQCLK together as shown in Figure 65, but at 3.3 V the interleaved data transfer is not reliable.

and there is some mentioning about the rising edge of CLK when being after the rising edge of WRT it needs a minimum delay of 2ns to be reliable. I'm running the IO and DAC on 3.3V.

High speed hardware is turning out to be much harder then software ;)
« Last Edit: March 18, 2022, 07:44:02 am by pcprogrammer »
 

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #22 on: March 18, 2022, 06:20:41 am »
Actually it is sad that their entry level FPGA only have added DDR on the IO ports, better PLLs and double the available density.  Maybe more dedicated HW multipliers.

I know that they have faster series of FPGAs, but they are price prohibited and it's been 21 years.

Take a look at ram and CPU speeds and densities since 2001.  Altera FPGAs haven't really advanced all that much unless you go to the higher end Arria & Stratix and embedded ARM core devices, but above 1k$ per fpga isn't what I would call an advancement.

Guess it is the same for Xilinx. And on top of the high prices for their high end devices you have to pay for the software to program them. When I first used the Xilinx FPGA's I was working for a subsidized foundation and got a good deal on XACT but it was still expensive and only suited for the low density devices. (Up to the XC3064)

At least now these companies provide the software for free for the lower range of devices, which makes it usable for a hobbyist like me to play with FPGA's

Offline pcprogrammerTopic starter

  • Super Contributor
  • ***
  • Posts: 4330
  • Country: nl
Re: Why does this dds code fail
« Reply #23 on: March 21, 2022, 01:12:19 pm »
Another thing, it may be better to make your second PLL output run at 250 MHz and run this code to generate your DAC clocks:
Code: [Select]
module (
input clk_125,  // feed in 125MHz system clock from PLL
input clk_250,  // feed in 250MHz system clock from PLL
(* useioff = 1 *)(*preserve*) output reg dac_1_clk, // force preserve a logic cell at the IO pin to feed this clock
(* useioff = 1 *)(*preserve*) output reg dac_2_clk  // without the preserve, the FPGA compiler simplify by wiring 1 logic cell output to 2 IOs generating 1 clock ahead of the other.
)
parameter bit inv_clk_phase  = 0;

reg clk_half = 0;
reg clk_half_buffer = 0;
reg clk_full_buffer1 = 0 ;
reg clk_full_buffer2 = 0 ;
reg clk_full_out ;

always @(posedge clk_125) begin
clk_half <= !clk_half ; // Generate a 62.5MHz clock in phase with the 125MHz dac data.
end

always @(posedge clk_250) begin
clk_full_buffer1 <= clk_half ;
clk_full_buffer2 <= clk_full_buffer1 ;
clk_full_out     <= clk_full_buffer1 ^ reg clk_full_buffer2 ; // Generate a single 250MHz pulse once every 'clk_half' toggle.

dac_1_clk <=  inv_clk_phase ^ clk_full_out ; //  Copy the 'clk_full_out' to an IO buffer with a parameter option to invert it.
dac_2_clk <=  inv_clk_phase ^ clk_full_out ;
end

endmodule

This should make the IO's logic cell flipflop drive the 125MHz output clock instead of the FPGA fabric's global clock being tied to the IO pin through a fuse.

I have been playing with this supplied code in ModelSim and got it working after fixing a typo after the xor command. (In "clk_full_out <= clk_full_buffer1 ^ reg clk_full_buffer2 ; " the "reg" before "clk_full_buffer2" should not be there) and it provides the correct phase shift for the clock signal.

What I wonder about is what if one wants a 45 degree phase shift with a method like this. Will it work with the PLL clock raised to 500MHz.

Also is it better to have the PLL deliver the two clocks (250MHz and 125MHz) or derive the 125MHz just like the 62.5MHz clock.

Online BrianHG

  • Super Contributor
  • ***
  • Posts: 8091
  • Country: ca
Re: Why does this dds code fail
« Reply #24 on: March 21, 2022, 05:26:58 pm »
Another thing, it may be better to make your second PLL output run at 250 MHz and run this code to generate your DAC clocks:
Code: [Select]
module (
input clk_125,  // feed in 125MHz system clock from PLL
input clk_250,  // feed in 250MHz system clock from PLL
(* useioff = 1 *)(*preserve*) output reg dac_1_clk, // force preserve a logic cell at the IO pin to feed this clock
(* useioff = 1 *)(*preserve*) output reg dac_2_clk  // without the preserve, the FPGA compiler simplify by wiring 1 logic cell output to 2 IOs generating 1 clock ahead of the other.
)
parameter bit inv_clk_phase  = 0;

reg clk_half = 0;
reg clk_half_buffer = 0;
reg clk_full_buffer1 = 0 ;
reg clk_full_buffer2 = 0 ;
reg clk_full_out ;

always @(posedge clk_125) begin
clk_half <= !clk_half ; // Generate a 62.5MHz clock in phase with the 125MHz dac data.
end

always @(posedge clk_250) begin
clk_full_buffer1 <= clk_half ;
clk_full_buffer2 <= clk_full_buffer1 ;
clk_full_out     <= clk_full_buffer1 ^ reg clk_full_buffer2 ; // Generate a single 250MHz pulse once every 'clk_half' toggle.

dac_1_clk <=  inv_clk_phase ^ clk_full_out ; //  Copy the 'clk_full_out' to an IO buffer with a parameter option to invert it.
dac_2_clk <=  inv_clk_phase ^ clk_full_out ;
end

endmodule

This should make the IO's logic cell flipflop drive the 125MHz output clock instead of the FPGA fabric's global clock being tied to the IO pin through a fuse.

I have been playing with this supplied code in ModelSim and got it working after fixing a typo after the xor command. (In "clk_full_out <= clk_full_buffer1 ^ reg clk_full_buffer2 ; " the "reg" before "clk_full_buffer2" should not be there) and it provides the correct phase shift for the clock signal.

What I wonder about is what if one wants a 45 degree phase shift with a method like this. Will it work with the PLL clock raised to 500MHz.

Also is it better to have the PLL deliver the two clocks (250MHz and 125MHz) or derive the 125MHz just like the 62.5MHz clock.

System clock timing on the FPGA is designed around the PLL.  IF you ever need performance and do not want to get deep into timing, always use the PLL clock outputs directly to clock logic on the FPGA.

The purpose of the 'clk_half <= !clk_half' is not to create a clock, but to create a toggle data output bit which toggles every new piece of data which you want to transmit to the DAC.  If the DAC was external ram, think of this wire as an address bit 0.

The purpose of the 'always @(posedge clk_250) begin' is that every time there is a transition in you dac's 'address bit 0' net called 'clk_half', at that time we also capture the data to be sent to the dac and on the next clock, we pulse the DAC clk pin for 1/2 a 125MHz cycle.  Now, when wiring an FPGA's PLL clock directly to an IO pin, unless that IO pin is the 'special' dedicated FPGA PLL CLK output pin, all other pins report a warning in the compilation that output jitter and timing is not guaranteed as this internal clock wiring path from internal global clock which the PLL outputs are wired to feeding a generic or multiple IO pins doesn't have internal specific wiring for that purpose.  The best generic IO pin performance comes when each IO pin's closest dedicated logic cell's Q data out drives that IO pin directly.  Since your DAC uses multiple clocks in parallel, making this tiny 250MHz section drive all those IO pins as normal logic data output to synthesis multiple clock in parallel will make all those parallel IOs as clean as possible.

So, in conclusion, if you want to shift your DAC's output clock by 180 degrees, use my parameter.  For smaller increments, (0 to -90 deg, then use my inv parameter and again 0 to -90 deg.) change the PLL's output phase of the 250MHz section.  This way, you let the compiler work out the timing between your logic 125MHz core and that tiny 250MHz IO pin driver.

The only reason to go to 500MHz would be is you want to make your core 250MHz, which your 125MHz dac isn't fast enough anyways.

As for real time operation tuning, there does exist a set of PLL input controls which will allow you to program manually stepping in something like 11.25 degree increments each PLL output while the system is operating, but that may be beyond your programming development needs at this time.  I would say for now, try combinations of the 'inv_clk_phase' parameter and adjusting the output phase of the 250MHz clock to -45 degrees and -90 degrees as this should allow you to create a clean window for the DAC to sample the data.
« Last Edit: March 21, 2022, 05:33:11 pm by BrianHG »
 
The following users thanked this post: pcprogrammer


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf