Author Topic: First attempt at FIR bandpass filter on FPGA, critique/timing errors.  (Read 2968 times)

0 Members and 1 Guest are viewing this topic.

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2296
  • Country: nz
Yeah, I've been running into that.  I don't know what the deal is here because I believe it should be:
Code: [Select]
localparam signed[17:0] coeff[0:255] = '{ 5,   0, ....};

But ISE doesn't like that and throws an error on the single quote.  Modelsim seems fine with it so I don't know what the deal is.

I think you're right and that might be what's causing my problems.

I don't know what I am doing, so I've cheated and made it 31:0. I then pull the value into a 18-bit register, so it all works out fine :D


EDIT: But it won't simulate, so I am changing to the style recommended in page 184 of https://www.xilinx.com/support/documentation/sw_manuals/xilinx11/xst.pdf, and cross the bridge of how to parameterize it later.
« Last Edit: April 29, 2018, 02:37:08 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
Hmmm And it meets timing like that?  I get an error from ISE that it's impossible to meet timing but I assume it's because I'm doing something wrong. 

Code: [Select]
At least one timing constraint is impossible to meet because component delays alone exceed the constraint. A timing
   constraint summary below shows the failing constraints (preceded with an Asterisk (*)). Please use the Timing Analyzer (GUI) or TRCE
   (command line) with the Mapped NCD and PCF files to identify which constraints and paths are failing because of the component delays
   alone. If the failing path(s) is mapped to Xilinx components as expected, consider relaxing the constraint. If it is not mapped to
   components as expected, re-evaluate your HDL and how synthesis is optimizing the path. To allow the tools to bypass this error, set the
   environment variable XIL_TIMING_ALLOW_IMPOSSIBLE to 1.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2296
  • Country: nz
So, my Verilog is very week, but I made a VHDL design that has been verified enough that I would sort of trust it. I then bashed up some really rough Verilog, and just worked around my lack of skill (e.g. how to set up the Kernel ROM correctly).

I then test-benched it against the original, debugged it till they were in agreement. Resource count is identical (1xDSP, 2xBRAM, 80 registers...), timing is identical, so I am sure they are bug-for-bug compatible. Here it is.

Code: [Select]
Timing summary:
 ---------------
 
 Timing errors: 0  Score: 0  (Setup/Max: 0, Hold: 0)
 
 Constraints cover 640 paths, 0 nets, and 290 connections
 
 Design statistics:
    Minimum period:   3.570ns{1}   (Maximum frequency: 280.112MHz)
 

Code: [Select]
`timescale 1ns / 1ps
module bandpass(
    input   clk,
    input  [17:0] din,
    input  din_enable,
    output [47:0] dout,
    output dout_enable
    );

reg num_taps = 255;
reg signed[17:0] buffer[0:1023];   /* Data in, sized for a block RAM */
/* Pipelining for the multipler */
reg signed[17:0] a1,a2,a3;
reg signed[17:0] b1,b2,b3;
/* Results of multipication. */
reg signed[35:0] product;   
/* Accumulate the products */
reg signed[47:0] accumulator;

reg [47:0] result;
reg result_enable;

   reg [9:0] max_count = 255;
   reg [9:0] data_index;
   reg [9:0] coeff_index;
reg [9:0] write_index;
/* Shift registers for scheduling things */
reg [4:0] reset_accum_sr;
reg [4:0] eject_result_sr;

   assign dout        = result;
assign dout_enable = result_enable;

integer i;

//////////////////////////////////////////
// Not sure how to assign initial values
//////////////////////////////////////////
initial
begin
      for (i=0; i<1024; i=i+1)
   buffer[i] = 2'b00;

data_index   = 255;
max_count    = 255;
write_index  = 254;
      coeff_index  = 254;

end

always @(posedge clk) begin

   /****************************
* The inferred DSP block    *
****************************/
/* The accumulator */
   accumulator <= (reset_accum_sr[0] == 1 ? accumulator:  48'b0) + { {12{product[35]}}, product[35:0] };

/* The multiply operation*/
   product <= a3 * b3;

/* The input pipeline */
a3 <= a2;
a2 <= a1;
a1 <= buffer[data_index];

b3 <= b2;
b2 <= b1;
// b1 <= coeff[coeff_index];
/////////////////////////////////////////////////////////////////
// Can't work out how to infer an pre-initialised ROM properly
/////////////////////////////////////////////////////////////////
// Filter Kernel
case(coeff_index)
           10'b0000000000: b1 <= 18'b111111110111111110; 10'b0000000001: b1 <= 18'b111111101000001011;
           10'b0000000010: b1 <= 18'b111111011001100110; 10'b0000000011: b1 <= 18'b111111001101000111;
           10'b0000000100: b1 <= 18'b111111000011100001; 10'b0000000101: b1 <= 18'b111110111101011101;
           10'b0000000110: b1 <= 18'b111110111011010110; 10'b0000000111: b1 <= 18'b111110111101011001;
           ... 496 more lines ....
           10'b1111111100: b1 <= 18'b000000000000000000; 10'b1111111101: b1 <= 18'b000000000000000000;
           10'b1111111110: b1 <= 18'b000000000000000000; 10'b1111111111: b1 <= 18'b000000000000000000;
endcase;

/***********************************
* Ejecting the result of the filter*
***********************************/
if (eject_result_sr[0] == 1)
begin
result <= accumulator;
result_enable <= 1;
        end else begin
result_enable <= 0;
end
/*********************************
* When we need to trigger the    *
* ejecting the result            *
*********************************/
if (coeff_index == max_count-1)
begin
eject_result_sr = {1'b1, eject_result_sr[4:1]};
end else begin
eject_result_sr = {1'b0, eject_result_sr[4:1]};
end

/*********************************
* Restarting the filter when new *
* data arrives                   *
*********************************/
if (din_enable == 1)
begin
reset_accum_sr = {1'b0, reset_accum_sr[4:1]};
coeff_index    = 0;
data_index     = write_index - max_count + 1;
end else begin
reset_accum_sr = {1'b1, reset_accum_sr[4:1]};
if (coeff_index != max_count)
begin
coeff_index  = coeff_index + 1;
data_index   = data_index + 1;
   end
end;

/*********************************
* Storing new data in the buffer *
*********************************/
if (din_enable == 1)
begin
buffer[write_index] <= din;
write_index <= write_index+1;
end;
end
endmodule
=
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline whollender

  • Regular Contributor
  • *
  • Posts: 51
  • Country: us
I've used this to initialize ROMs for FFT windowing instead of an FIR kernel, but the result is the same:

Code: [Select]
   // Window function ROM
   localparam WINDOW_ROM_WIDTH = 16;
   reg [WINDOW_ROM_WIDTH-1:0] window_rom [255:0];

   initial
      $readmemh("window_samp.hex", window_rom, 0, 255);


Where "window_samp.hex" is a text file with hex numbers:

Code: [Select]
0002
0002
0003
0005
0007
0009
000D
0011
0016
001B
0022
002A
0033
003D
0049
0056
0065
0076
0089
009E
00B5
00CF
00EB
010B

I don't think that I could find a better way to do it in Verilog.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
@hamster_nz Thank you for the code.  I've run it and I get the same results you do(mostly) so that should make it a lot easier to figure out what I'm doing wrong.  I'm going to start tweaking my code in the direction of yours to hopefully figure out the fundamental difference. 

I did figure out my issue with the localparam stuff.  Apparently assigning two dimensional arrays like that is only for system verilog.  You see to have to do it the way you did it or with an initial block assigning each value or as whollender mentions.  I haven't tried whollender's method yet but so far all methods have inferred BRAM. 

@whollender Thank you for the info.  I knew there had to be a better way than what I was doing.  That is much nicer because it's less of a pain to generate a file like that.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
So after hours of messing with my code and nearly losing my mind I've finally figured out what the main difference between my code and hamster_nz's code is.  The fundamental difference is the input pipeline.  My pipeline was only 2 registers deep while his was 3.  I'm not sure why this matters.  While reading through the DSP48A1 user guide I noticed that the input path can have 2 registers in series.  Since my input pipeline only had 2 registers I guess the system couldn't completely optimize this resulting in weird things like putting the accumulator outside the DSP slice. 

If my previous assumption is correct I'm still a little confused about why it couldn't optimize it.  Assuming I need to use the two input registers to pipeline, why is the third needed?  I can see that to meet timing I should have a third to pre-load the values from BRAM so that they are closer to the DSP slice inputs.  Then I would just expect that area to not meet timing.  Instead it seems to completely mess everything up.

I guess my main takeaway from this is that if you want the system to be completely optimized you have to use it a very exact way.  And until you get it exactly right none of it will optimize, even if some of the pieces seem completely unrelated. 

My final question to hampster_nz is what made you decide to make the input pipeline three registers deep?  Was it just to use the two input registers inside the DSP48A1 and then one outside the DSP slice?   
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 2079
  • Country: ca
If my previous assumption is correct I'm still a little confused about why it couldn't optimize it.  Assuming I need to use the two input registers to pipeline, why is the third needed?  I can see that to meet timing I should have a third to pre-load the values from BRAM so that they are closer to the DSP slice inputs.  Then I would just expect that area to not meet timing.  Instead it seems to completely mess everything up.

BRAM has internal output registers too. They may be bypassed, but this makes BRAM very slow. If you don't have these registers, the tools may feel they cannot go with BRAM at all. Therefore you need 3 registers - one for BRAM and two for DSP.

Getting close shouldn't be a problem, not at this speed. They're already close enough. I don't think there's a need from intermediary flip-flops between BRAM and DSP.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
Ah, that makes sense.  So when I'm only using two input pipeline registers it probably puts one in the BRAM and one in the DSP slice.   Which causes the DSP slice to not be entirely pipelined which breaks whatever optimization it wants to do. 
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2296
  • Country: nz
Does you design use one or two BRAMs? If it is only one, then that is also involved, because a lot of LUTs are being used to MUX in the samples.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
My design was only inferring one BRAM which makes sense because I was using a huge shift register to implement the buffer.  The data from the big shift register to the DSP was the limiting factor in my new design.  I wrote a new one using a circular buffer like you used in yours so when I get home today I'll see if that improves the design further. 
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
Well my new code is not creating 2 separate BRAM's.  Even though i tried to make a circular buffer it doesn't seem to be working.  And it's also not putting the accumulator inside the DSP slice again.  I just need to really go through it and figure out why.  Surprisingly it's doing much better than previous attempts on timing despite all the issues mentioned.   The input pipelining seems to help a lot.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
I figured out the accumulator problem.  I forgot to make my product and accumulator registers signed.  If it helps anyone else ever, it matters whether you make those signed or not.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2296
  • Country: nz
I guess my main takeaway from this is that if you want the system to be completely optimized you have to use it a very exact way.  And until you get it exactly right none of it will optimize, even if some of the pieces seem completely unrelated. 

I agree, but would say it more softly - to get the best results, you have to write with a clear view of what it should look like at the technology level in mind. If your DSP blocks need pipelining to perform well, you need pipelining in your design (either explicitly or implicitly.

I know other feel differently, and say that you should focus on the behavior you are wanting to describe, not on the implementation. Maybe they are from an ASIC world where you have greater freedom, or maybe they like having issues with timing closure  ;).

Quote
My final question to hampster_nz is what made you decide to make the input pipeline three registers deep?  Was it just to use the two input registers inside the DSP48A1 and then one outside the DSP slice?
Yes - to use the registers.. You might even need one more cycle.

- The RAM block needs a cycle to retrieve the data.

- A cycle is needed to get the data from the RAM into DSP's first input register.

- There is another cycle to get into the second register, the one that feeds the multiplier.

- The result of the multiplication then goes into the product register, so the multiplier has the whole cycle

- Another cycle is needed to add the the product to be added to the accumulator.

The 'trick' (if there is one), is to schedule/sequence everything on the first cycle, then use shift registers to delay the control signals the required number of cycles till when they are needed (e.g. resetting the accumulator, or producing the result).

« Last Edit: May 02, 2018, 04:46:06 am by hamster_nz »
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
Quote
I agree, but would say it more softly - to get the best results, you have to write with a clear view of what it should look like at the technology level in mind. If your DSP blocks need pipelining to perform well, you need pipelining in your design (either explicitly or implicitly.
I think I basically mean the same thing.  The thing that threw me off originally was I saw that it said that the DSP48A1 needs to be fully pipelined to be optimized.  I thought this meant there should be a register for the input, product, and accumulator(opposed to doing all this in one clock cycle).  But it seems to mean that the two input registers must also be used or other parts of the optimization might not be used.  But this may actually be an unrelated issue I want to ask about below.

Quote
The 'trick' (if there is one), is to schedule/sequence everything on the first cycle, then use shift registers to delay the control signals the required number of cycles till when they are needed (e.g. resetting the accumulator, or producing the result).
Yeah, when I was reading your code I didn't get why you did that at first.  But I then tried to write it myself and ran into the problem that doing that was solving.  It was confusing at first but now it seems pretty obvious. 

So I have my design which is basically hamster_nz's design(once you see a way to do it it's hard to think of a different way).  But on my home computer(Windows 7) I couldn't get my buffer register to infer BRAM.  I feel like I tried everything and I exactly copied hamster_nz's way of creating the buffer but I was getting weird results.  Most of the time I would get what I think is a large register with the output be these multiplexed.  Then ISE was telling me my design was empty because the only output wasn't being driven.  All this was very strange because Modelsim showed the design simulating correctly.  So I came into work and wanted to try something and installed the ISE VM because my work computer is Windows 10.  And with no changes the design is created perfectly!  So it seems like ISE is doing something weird. 

Not only that but it says it can run at 314MHz.  From what I can tell the inferred design is correct(2 BRAM, 1 DSP48A1).  The limiting factor was the path from BRAM to the first input register of the DSP48A1.

On the virtual machine ISE it also shows hamster_nz's code's max frequency being 300MHz which is different from what I get on my personal computer.(Edit: The max frequency differences here were because I used a speed grade of -3.  Switching it to -2 makes the max frequency of both go to 280MHz as expected). 

What operating system are you running ISE on?  Maybe I should always be running the virtual machine version? 

I'll have to do some side by side tests when I get home today. 

Below is my code in case there is a useful difference between it and what hamster_nz's.  It was originally not as similar to his but in troubleshooting my design I tried making them more similar until mine would start to work(which was never on my computer).
Code: [Select]
module Bandpass_Filter(
// Basic control signals
input wire clk,
input wire reset,

// Input data/control signals
input wire[11:0] din,
input wire din_enable,

// Output data/control signals
output reg[11:0] dout,
output reg dout_enable
);

// Define local registers
reg signed[17:0] buffer[0:1023];  // Input buffer to hold data values.  Sized to a power of 2. 
reg signed[35:0] product;       // Product result
reg signed[47:0] accumulator;    

reg signed[17:0] data_d1, data_d2, data_d3;     // Data pipeline
reg signed[17:0] coeff_d1, coeff_d2, coeff_d3; // Coefficent pipeline

reg[9:0] data_index;
reg[9:0] coeff_index;
reg[9:0] write_index;      // This points to the last item in the buffer

reg[4:0] clear_acc_sr;
reg[4:0] eject_data_sr;

integer i;

// Define localparams
reg[9:0] num_taps;
localparam data_scale = 16;


initial begin
for(i=0; i<1024; i=i+1)
buffer[i] = 18'b0;
data_index = 1000;
write_index = 999;
coeff_index = 999;
num_taps = 1000;
end

always @(posedge clk) begin
//////////////////////////////////////////
// Create data and coefficent pipeline //
//////////////////////////////////////////
data_d1 <= buffer[data_index];
data_d2 <= data_d1;
data_d3 <= data_d2;

// `include "filter_coeff_15.h";
case(coeff_index)
0: coeff_d1 <= 4;
1: coeff_d1 <= 3;
2: coeff_d1 <= 5;
3: coeff_d1 <= 8;
4: coeff_d1 <= 9;
5: coeff_d1 <= 10;
6: coeff_d1 <= 10;
7: coeff_d1 <= 9;
8: coeff_d1 <= 6;
9: coeff_d1 <= 3;
10: coeff_d1 <= 0;
11: coeff_d1 <= -4;
12: coeff_d1 <= -8;
13: coeff_d1 <= -11;
14: coeff_d1 <= -13;
15: coeff_d1 <= -14;
16: coeff_d1 <= -13;
17: coeff_d1 <= -11;
18: coeff_d1 <= -8;
19: coeff_d1 <= -4;
20: coeff_d1 <= 0;
21: coeff_d1 <= 4;
22: coeff_d1 <= 9;
23: coeff_d1 <= 12;
24: coeff_d1 <= 14;
25: coeff_d1 <= 15;
26: coeff_d1 <= 14;
27: coeff_d1 <= 12;
28: coeff_d1 <= 9;
29: coeff_d1 <= 5;
30: coeff_d1 <= 0;
31: coeff_d1 <= -4;
32: coeff_d1 <= -7;
33: coeff_d1 <= -10;
34: coeff_d1 <= -12;
35: coeff_d1 <= -12;
36: coeff_d1 <= -11;
37: coeff_d1 <= -9;
38: coeff_d1 <= -7;
39: coeff_d1 <= -4;
40: coeff_d1 <= -1;
41: coeff_d1 <= 2;
42: coeff_d1 <= 4;
43: coeff_d1 <= 6;
44: coeff_d1 <= 6;
45: coeff_d1 <= 6;
46: coeff_d1 <= 5;
47: coeff_d1 <= 4;
48: coeff_d1 <= 2;
49: coeff_d1 <= 1;
50: coeff_d1 <= 0;
51: coeff_d1 <= 0;
52: coeff_d1 <= 0;
53: coeff_d1 <= 1;
54: coeff_d1 <= 2;
55: coeff_d1 <= 3;
56: coeff_d1 <= 4;
57: coeff_d1 <= 4;
58: coeff_d1 <= 4;
59: coeff_d1 <= 3;
60: coeff_d1 <= 1;
61: coeff_d1 <= -2;
62: coeff_d1 <= -5;
63: coeff_d1 <= -8;
64: coeff_d1 <= -11;
65: coeff_d1 <= -13;
66: coeff_d1 <= -14;
67: coeff_d1 <= -14;
68: coeff_d1 <= -12;
69: coeff_d1 <= -8;
70: coeff_d1 <= -3;
71: coeff_d1 <= 3;
72: coeff_d1 <= 9;
73: coeff_d1 <= 15;
74: coeff_d1 <= 20;
75: coeff_d1 <= 23;
76: coeff_d1 <= 24;
77: coeff_d1 <= 22;
78: coeff_d1 <= 19;
79: coeff_d1 <= 13;
80: coeff_d1 <= 5;
81: coeff_d1 <= -3;
82: coeff_d1 <= -11;
83: coeff_d1 <= -19;
84: coeff_d1 <= -25;
85: coeff_d1 <= -28;
86: coeff_d1 <= -29;
87: coeff_d1 <= -27;
88: coeff_d1 <= -22;
89: coeff_d1 <= -15;
90: coeff_d1 <= -7;
91: coeff_d1 <= 2;
92: coeff_d1 <= 11;
93: coeff_d1 <= 19;
94: coeff_d1 <= 24;
95: coeff_d1 <= 27;
96: coeff_d1 <= 28;
97: coeff_d1 <= 25;
98: coeff_d1 <= 20;
99: coeff_d1 <= 14;
100: coeff_d1 <= 6;
101: coeff_d1 <= -1;
102: coeff_d1 <= -8;
103: coeff_d1 <= -13;
104: coeff_d1 <= -17;
105: coeff_d1 <= -18;
106: coeff_d1 <= -18;
107: coeff_d1 <= -15;
108: coeff_d1 <= -12;
109: coeff_d1 <= -8;
110: coeff_d1 <= -3;
111: coeff_d1 <= 0;
112: coeff_d1 <= 3;
113: coeff_d1 <= 4;
114: coeff_d1 <= 3;
115: coeff_d1 <= 2;
116: coeff_d1 <= 0;
117: coeff_d1 <= -2;
118: coeff_d1 <= -3;
119: coeff_d1 <= -4;
120: coeff_d1 <= -3;
121: coeff_d1 <= 0;
122: coeff_d1 <= 4;
123: coeff_d1 <= 9;
124: coeff_d1 <= 14;
125: coeff_d1 <= 19;
126: coeff_d1 <= 23;
127: coeff_d1 <= 24;
128: coeff_d1 <= 23;
129: coeff_d1 <= 19;
130: coeff_d1 <= 11;
131: coeff_d1 <= 2;
132: coeff_d1 <= -10;
133: coeff_d1 <= -21;
134: coeff_d1 <= -32;
135: coeff_d1 <= -40;
136: coeff_d1 <= -45;
137: coeff_d1 <= -46;
138: coeff_d1 <= -42;
139: coeff_d1 <= -33;
140: coeff_d1 <= -20;
141: coeff_d1 <= -4;
142: coeff_d1 <= 13;
143: coeff_d1 <= 30;
144: coeff_d1 <= 44;
145: coeff_d1 <= 55;
146: coeff_d1 <= 61;
147: coeff_d1 <= 61;
148: coeff_d1 <= 54;
149: coeff_d1 <= 43;
150: coeff_d1 <= 26;
151: coeff_d1 <= 7;
152: coeff_d1 <= -13;
153: coeff_d1 <= -31;
154: coeff_d1 <= -47;
155: coeff_d1 <= -58;
156: coeff_d1 <= -63;
157: coeff_d1 <= -62;
158: coeff_d1 <= -55;
159: coeff_d1 <= -43;
160: coeff_d1 <= -27;
161: coeff_d1 <= -9;
162: coeff_d1 <= 9;
163: coeff_d1 <= 25;
164: coeff_d1 <= 38;
165: coeff_d1 <= 46;
166: coeff_d1 <= 48;
167: coeff_d1 <= 46;
168: coeff_d1 <= 39;
169: coeff_d1 <= 29;
170: coeff_d1 <= 18;
171: coeff_d1 <= 6;
172: coeff_d1 <= -4;
173: coeff_d1 <= -12;
174: coeff_d1 <= -17;
175: coeff_d1 <= -18;
176: coeff_d1 <= -16;
177: coeff_d1 <= -13;
178: coeff_d1 <= -8;
179: coeff_d1 <= -3;
180: coeff_d1 <= 0;
181: coeff_d1 <= 1;
182: coeff_d1 <= -1;
183: coeff_d1 <= -5;
184: coeff_d1 <= -12;
185: coeff_d1 <= -20;
186: coeff_d1 <= -27;
187: coeff_d1 <= -33;
188: coeff_d1 <= -35;
189: coeff_d1 <= -32;
190: coeff_d1 <= -25;
191: coeff_d1 <= -13;
192: coeff_d1 <= 4;
193: coeff_d1 <= 23;
194: coeff_d1 <= 42;
195: coeff_d1 <= 59;
196: coeff_d1 <= 73;
197: coeff_d1 <= 80;
198: coeff_d1 <= 79;
199: coeff_d1 <= 69;
200: coeff_d1 <= 51;
201: coeff_d1 <= 26;
202: coeff_d1 <= -3;
203: coeff_d1 <= -35;
204: coeff_d1 <= -65;
205: coeff_d1 <= -90;
206: coeff_d1 <= -108;
207: coeff_d1 <= -115;
208: coeff_d1 <= -112;
209: coeff_d1 <= -96;
210: coeff_d1 <= -71;
211: coeff_d1 <= -38;
212: coeff_d1 <= 0;
213: coeff_d1 <= 39;
214: coeff_d1 <= 74;
215: coeff_d1 <= 102;
216: coeff_d1 <= 121;
217: coeff_d1 <= 128;
218: coeff_d1 <= 122;
219: coeff_d1 <= 104;
220: coeff_d1 <= 76;
221: coeff_d1 <= 42;
222: coeff_d1 <= 4;
223: coeff_d1 <= -33;
224: coeff_d1 <= -65;
225: coeff_d1 <= -89;
226: coeff_d1 <= -104;
227: coeff_d1 <= -107;
228: coeff_d1 <= -100;
229: coeff_d1 <= -83;
230: coeff_d1 <= -60;
231: coeff_d1 <= -33;
232: coeff_d1 <= -6;
233: coeff_d1 <= 19;
234: coeff_d1 <= 39;
235: coeff_d1 <= 51;
236: coeff_d1 <= 56;
237: coeff_d1 <= 54;
238: coeff_d1 <= 46;
239: coeff_d1 <= 34;
240: coeff_d1 <= 21;
241: coeff_d1 <= 10;
242: coeff_d1 <= 2;
243: coeff_d1 <= -2;
244: coeff_d1 <= 0;
245: coeff_d1 <= 6;
246: coeff_d1 <= 16;
247: coeff_d1 <= 26;
248: coeff_d1 <= 34;
249: coeff_d1 <= 38;
250: coeff_d1 <= 36;
251: coeff_d1 <= 26;
252: coeff_d1 <= 9;
253: coeff_d1 <= -14;
254: coeff_d1 <= -42;
255: coeff_d1 <= -70;
256: coeff_d1 <= -95;
257: coeff_d1 <= -114;
258: coeff_d1 <= -122;
259: coeff_d1 <= -117;
260: coeff_d1 <= -99;
261: coeff_d1 <= -68;
262: coeff_d1 <= -25;
263: coeff_d1 <= 24;
264: coeff_d1 <= 76;
265: coeff_d1 <= 124;
266: coeff_d1 <= 163;
267: coeff_d1 <= 188;
268: coeff_d1 <= 195;
269: coeff_d1 <= 183;
270: coeff_d1 <= 152;
271: coeff_d1 <= 103;
272: coeff_d1 <= 42;
273: coeff_d1 <= -25;
274: coeff_d1 <= -93;
275: coeff_d1 <= -152;
276: coeff_d1 <= -199;
277: coeff_d1 <= -226;
278: coeff_d1 <= -231;
279: coeff_d1 <= -214;
280: coeff_d1 <= -175;
281: coeff_d1 <= -120;
282: coeff_d1 <= -53;
283: coeff_d1 <= 19;
284: coeff_d1 <= 87;
285: coeff_d1 <= 146;
286: coeff_d1 <= 188;
287: coeff_d1 <= 211;
288: coeff_d1 <= 212;
289: coeff_d1 <= 193;
290: coeff_d1 <= 156;
291: coeff_d1 <= 106;
292: coeff_d1 <= 49;
293: coeff_d1 <= -9;
294: coeff_d1 <= -60;
295: coeff_d1 <= -101;
296: coeff_d1 <= -127;
297: coeff_d1 <= -137;
298: coeff_d1 <= -132;
299: coeff_d1 <= -114;
300: coeff_d1 <= -87;
301: coeff_d1 <= -55;
302: coeff_d1 <= -24;
303: coeff_d1 <= 1;
304: coeff_d1 <= 19;
305: coeff_d1 <= 26;
306: coeff_d1 <= 24;
307: coeff_d1 <= 14;
308: coeff_d1 <= 0;
309: coeff_d1 <= -14;
310: coeff_d1 <= -25;
311: coeff_d1 <= -28;
312: coeff_d1 <= -20;
313: coeff_d1 <= -1;
314: coeff_d1 <= 27;
315: coeff_d1 <= 63;
316: coeff_d1 <= 100;
317: coeff_d1 <= 134;
318: coeff_d1 <= 158;
319: coeff_d1 <= 167;
320: coeff_d1 <= 158;
321: coeff_d1 <= 128;
322: coeff_d1 <= 78;
323: coeff_d1 <= 12;
324: coeff_d1 <= -65;
325: coeff_d1 <= -144;
326: coeff_d1 <= -216;
327: coeff_d1 <= -272;
328: coeff_d1 <= -305;
329: coeff_d1 <= -309;
330: coeff_d1 <= -280;
331: coeff_d1 <= -221;
332: coeff_d1 <= -135;
333: coeff_d1 <= -30;
334: coeff_d1 <= 84;
335: coeff_d1 <= 196;
336: coeff_d1 <= 292;
337: coeff_d1 <= 363;
338: coeff_d1 <= 399;
339: coeff_d1 <= 397;
340: coeff_d1 <= 356;
341: coeff_d1 <= 278;
342: coeff_d1 <= 172;
343: coeff_d1 <= 48;
344: coeff_d1 <= -82;
345: coeff_d1 <= -203;
346: coeff_d1 <= -304;
347: coeff_d1 <= -373;
348: coeff_d1 <= -406;
349: coeff_d1 <= -398;
350: coeff_d1 <= -351;
351: coeff_d1 <= -272;
352: coeff_d1 <= -169;
353: coeff_d1 <= -55;
354: coeff_d1 <= 59;
355: coeff_d1 <= 160;
356: coeff_d1 <= 239;
357: coeff_d1 <= 289;
358: coeff_d1 <= 305;
359: coeff_d1 <= 290;
360: coeff_d1 <= 248;
361: coeff_d1 <= 186;
362: coeff_d1 <= 113;
363: coeff_d1 <= 39;
364: coeff_d1 <= -26;
365: coeff_d1 <= -75;
366: coeff_d1 <= -105;
367: coeff_d1 <= -113;
368: coeff_d1 <= -103;
369: coeff_d1 <= -79;
370: coeff_d1 <= -49;
371: coeff_d1 <= -20;
372: coeff_d1 <= 0;
373: coeff_d1 <= 6;
374: coeff_d1 <= -6;
375: coeff_d1 <= -34;
376: coeff_d1 <= -75;
377: coeff_d1 <= -123;
378: coeff_d1 <= -169;
379: coeff_d1 <= -203;
380: coeff_d1 <= -216;
381: coeff_d1 <= -201;
382: coeff_d1 <= -155;
383: coeff_d1 <= -79;
384: coeff_d1 <= 23;
385: coeff_d1 <= 141;
386: coeff_d1 <= 262;
387: coeff_d1 <= 371;
388: coeff_d1 <= 454;
389: coeff_d1 <= 497;
390: coeff_d1 <= 492;
391: coeff_d1 <= 432;
392: coeff_d1 <= 321;
393: coeff_d1 <= 166;
394: coeff_d1 <= -21;
395: coeff_d1 <= -219;
396: coeff_d1 <= -409;
397: coeff_d1 <= -570;
398: coeff_d1 <= -682;
399: coeff_d1 <= -731;
400: coeff_d1 <= -708;
401: coeff_d1 <= -613;
402: coeff_d1 <= -453;
403: coeff_d1 <= -241;
404: coeff_d1 <= 0;
405: coeff_d1 <= 247;
406: coeff_d1 <= 475;
407: coeff_d1 <= 659;
408: coeff_d1 <= 781;
409: coeff_d1 <= 826;
410: coeff_d1 <= 789;
411: coeff_d1 <= 676;
412: coeff_d1 <= 498;
413: coeff_d1 <= 273;
414: coeff_d1 <= 26;
415: coeff_d1 <= -217;
416: coeff_d1 <= -431;
417: coeff_d1 <= -595;
418: coeff_d1 <= -693;
419: coeff_d1 <= -719;
420: coeff_d1 <= -673;
421: coeff_d1 <= -564;
422: coeff_d1 <= -408;
423: coeff_d1 <= -225;
424: coeff_d1 <= -38;
425: coeff_d1 <= 132;
426: coeff_d1 <= 267;
427: coeff_d1 <= 355;
428: coeff_d1 <= 391;
429: coeff_d1 <= 377;
430: coeff_d1 <= 323;
431: coeff_d1 <= 242;
432: coeff_d1 <= 152;
433: coeff_d1 <= 70;
434: coeff_d1 <= 12;
435: coeff_d1 <= -13;
436: coeff_d1 <= 0;
437: coeff_d1 <= 47;
438: coeff_d1 <= 116;
439: coeff_d1 <= 193;
440: coeff_d1 <= 258;
441: coeff_d1 <= 291;
442: coeff_d1 <= 277;
443: coeff_d1 <= 205;
444: coeff_d1 <= 72;
445: coeff_d1 <= -113;
446: coeff_d1 <= -335;
447: coeff_d1 <= -568;
448: coeff_d1 <= -781;
449: coeff_d1 <= -943;
450: coeff_d1 <= -1023;
451: coeff_d1 <= -997;
452: coeff_d1 <= -852;
453: coeff_d1 <= -590;
454: coeff_d1 <= -224;
455: coeff_d1 <= 215;
456: coeff_d1 <= 686;
457: coeff_d1 <= 1140;
458: coeff_d1 <= 1523;
459: coeff_d1 <= 1785;
460: coeff_d1 <= 1886;
461: coeff_d1 <= 1799;
462: coeff_d1 <= 1516;
463: coeff_d1 <= 1051;
464: coeff_d1 <= 439;
465: coeff_d1 <= -267;
466: coeff_d1 <= -999;
467: coeff_d1 <= -1682;
468: coeff_d1 <= -2241;
469: coeff_d1 <= -2609;
470: coeff_d1 <= -2737;
471: coeff_d1 <= -2596;
472: coeff_d1 <= -2185;
473: coeff_d1 <= -1533;
474: coeff_d1 <= -694;
475: coeff_d1 <= 256;
476: coeff_d1 <= 1225;
477: coeff_d1 <= 2115;
478: coeff_d1 <= 2833;
479: coeff_d1 <= 3298;
480: coeff_d1 <= 3454;
481: coeff_d1 <= 3275;
482: coeff_d1 <= 2766;
483: coeff_d1 <= 1969;
484: coeff_d1 <= 954;
485: coeff_d1 <= -183;
486: coeff_d1 <= -1332;
487: coeff_d1 <= -2377;
488: coeff_d1 <= -3213;
489: coeff_d1 <= -3752;
490: coeff_d1 <= -3934;
491: coeff_d1 <= -3735;
492: coeff_d1 <= -3170;
493: coeff_d1 <= -2288;
494: coeff_d1 <= -1173;
495: coeff_d1 <= 66;
496: coeff_d1 <= 1310;
497: coeff_d1 <= 2435;
498: coeff_d1 <= 3329;
499: coeff_d1 <= 3904;
500: coeff_d1 <= 4102;
501: coeff_d1 <= 3904;
502: coeff_d1 <= 3329;
503: coeff_d1 <= 2435;
504: coeff_d1 <= 1310;
505: coeff_d1 <= 66;
506: coeff_d1 <= -1173;
507: coeff_d1 <= -2288;
508: coeff_d1 <= -3170;
509: coeff_d1 <= -3735;
510: coeff_d1 <= -3934;
511: coeff_d1 <= -3752;
512: coeff_d1 <= -3213;
513: coeff_d1 <= -2377;
514: coeff_d1 <= -1332;
515: coeff_d1 <= -183;
516: coeff_d1 <= 954;
517: coeff_d1 <= 1969;
518: coeff_d1 <= 2766;
519: coeff_d1 <= 3275;
520: coeff_d1 <= 3454;
521: coeff_d1 <= 3298;
522: coeff_d1 <= 2833;
523: coeff_d1 <= 2115;
524: coeff_d1 <= 1225;
525: coeff_d1 <= 256;
526: coeff_d1 <= -694;
527: coeff_d1 <= -1533;
528: coeff_d1 <= -2185;
529: coeff_d1 <= -2596;
530: coeff_d1 <= -2737;
531: coeff_d1 <= -2609;
532: coeff_d1 <= -2241;
533: coeff_d1 <= -1682;
534: coeff_d1 <= -999;
535: coeff_d1 <= -267;
536: coeff_d1 <= 439;
537: coeff_d1 <= 1051;
538: coeff_d1 <= 1516;
539: coeff_d1 <= 1799;
540: coeff_d1 <= 1886;
541: coeff_d1 <= 1785;
542: coeff_d1 <= 1523;
543: coeff_d1 <= 1140;
544: coeff_d1 <= 686;
545: coeff_d1 <= 215;
546: coeff_d1 <= -224;
547: coeff_d1 <= -590;
548: coeff_d1 <= -852;
549: coeff_d1 <= -997;
550: coeff_d1 <= -1023;
551: coeff_d1 <= -943;
552: coeff_d1 <= -781;
553: coeff_d1 <= -568;
554: coeff_d1 <= -335;
555: coeff_d1 <= -113;
556: coeff_d1 <= 72;
557: coeff_d1 <= 205;
558: coeff_d1 <= 277;
559: coeff_d1 <= 291;
560: coeff_d1 <= 258;
561: coeff_d1 <= 193;
562: coeff_d1 <= 116;
563: coeff_d1 <= 47;
564: coeff_d1 <= 0;
565: coeff_d1 <= -13;
566: coeff_d1 <= 12;
567: coeff_d1 <= 70;
568: coeff_d1 <= 152;
569: coeff_d1 <= 242;
570: coeff_d1 <= 323;
571: coeff_d1 <= 377;
572: coeff_d1 <= 391;
573: coeff_d1 <= 355;
574: coeff_d1 <= 267;
575: coeff_d1 <= 132;
576: coeff_d1 <= -38;
577: coeff_d1 <= -225;
578: coeff_d1 <= -408;
579: coeff_d1 <= -564;
580: coeff_d1 <= -673;
581: coeff_d1 <= -719;
582: coeff_d1 <= -693;
583: coeff_d1 <= -595;
584: coeff_d1 <= -431;
585: coeff_d1 <= -217;
586: coeff_d1 <= 26;
587: coeff_d1 <= 273;
588: coeff_d1 <= 498;
589: coeff_d1 <= 676;
590: coeff_d1 <= 789;
591: coeff_d1 <= 826;
592: coeff_d1 <= 781;
593: coeff_d1 <= 659;
594: coeff_d1 <= 475;
595: coeff_d1 <= 247;
596: coeff_d1 <= 0;
597: coeff_d1 <= -241;
598: coeff_d1 <= -453;
599: coeff_d1 <= -613;
600: coeff_d1 <= -708;
601: coeff_d1 <= -731;
602: coeff_d1 <= -682;
603: coeff_d1 <= -570;
604: coeff_d1 <= -409;
605: coeff_d1 <= -219;
606: coeff_d1 <= -21;
607: coeff_d1 <= 166;
608: coeff_d1 <= 321;
609: coeff_d1 <= 432;
610: coeff_d1 <= 492;
611: coeff_d1 <= 497;
612: coeff_d1 <= 454;
613: coeff_d1 <= 371;
614: coeff_d1 <= 262;
615: coeff_d1 <= 141;
616: coeff_d1 <= 23;
617: coeff_d1 <= -79;
618: coeff_d1 <= -155;
619: coeff_d1 <= -201;
620: coeff_d1 <= -216;
621: coeff_d1 <= -203;
622: coeff_d1 <= -169;
623: coeff_d1 <= -123;
624: coeff_d1 <= -75;
625: coeff_d1 <= -34;
626: coeff_d1 <= -6;
627: coeff_d1 <= 6;
628: coeff_d1 <= 0;
629: coeff_d1 <= -20;
630: coeff_d1 <= -49;
631: coeff_d1 <= -79;
632: coeff_d1 <= -103;
633: coeff_d1 <= -113;
634: coeff_d1 <= -105;
635: coeff_d1 <= -75;
636: coeff_d1 <= -26;
637: coeff_d1 <= 39;
638: coeff_d1 <= 113;
639: coeff_d1 <= 186;
640: coeff_d1 <= 248;
641: coeff_d1 <= 290;
642: coeff_d1 <= 305;
643: coeff_d1 <= 289;
644: coeff_d1 <= 239;
645: coeff_d1 <= 160;
646: coeff_d1 <= 59;
647: coeff_d1 <= -55;
648: coeff_d1 <= -169;
649: coeff_d1 <= -272;
650: coeff_d1 <= -351;
651: coeff_d1 <= -398;
652: coeff_d1 <= -406;
653: coeff_d1 <= -373;
654: coeff_d1 <= -304;
655: coeff_d1 <= -203;
656: coeff_d1 <= -82;
657: coeff_d1 <= 48;
658: coeff_d1 <= 172;
659: coeff_d1 <= 278;
660: coeff_d1 <= 356;
661: coeff_d1 <= 397;
662: coeff_d1 <= 399;
663: coeff_d1 <= 363;
664: coeff_d1 <= 292;
665: coeff_d1 <= 196;
666: coeff_d1 <= 84;
667: coeff_d1 <= -30;
668: coeff_d1 <= -135;
669: coeff_d1 <= -221;
670: coeff_d1 <= -280;
671: coeff_d1 <= -309;
672: coeff_d1 <= -305;
673: coeff_d1 <= -272;
674: coeff_d1 <= -216;
675: coeff_d1 <= -144;
676: coeff_d1 <= -65;
677: coeff_d1 <= 12;
678: coeff_d1 <= 78;
679: coeff_d1 <= 128;
680: coeff_d1 <= 158;
681: coeff_d1 <= 167;
682: coeff_d1 <= 158;
683: coeff_d1 <= 134;
684: coeff_d1 <= 100;
685: coeff_d1 <= 63;
686: coeff_d1 <= 27;
687: coeff_d1 <= -1;
688: coeff_d1 <= -20;
689: coeff_d1 <= -28;
690: coeff_d1 <= -25;
691: coeff_d1 <= -14;
692: coeff_d1 <= 0;
693: coeff_d1 <= 14;
694: coeff_d1 <= 24;
695: coeff_d1 <= 26;
696: coeff_d1 <= 19;
697: coeff_d1 <= 1;
698: coeff_d1 <= -24;
699: coeff_d1 <= -55;
700: coeff_d1 <= -87;
701: coeff_d1 <= -114;
702: coeff_d1 <= -132;
703: coeff_d1 <= -137;
704: coeff_d1 <= -127;
705: coeff_d1 <= -101;
706: coeff_d1 <= -60;
707: coeff_d1 <= -9;
708: coeff_d1 <= 49;
709: coeff_d1 <= 106;
710: coeff_d1 <= 156;
711: coeff_d1 <= 193;
712: coeff_d1 <= 212;
713: coeff_d1 <= 211;
714: coeff_d1 <= 188;
715: coeff_d1 <= 146;
716: coeff_d1 <= 87;
717: coeff_d1 <= 19;
718: coeff_d1 <= -53;
719: coeff_d1 <= -120;
720: coeff_d1 <= -175;
721: coeff_d1 <= -214;
722: coeff_d1 <= -231;
723: coeff_d1 <= -226;
724: coeff_d1 <= -199;
725: coeff_d1 <= -152;
726: coeff_d1 <= -93;
727: coeff_d1 <= -25;
728: coeff_d1 <= 42;
729: coeff_d1 <= 103;
730: coeff_d1 <= 152;
731: coeff_d1 <= 183;
732: coeff_d1 <= 195;
733: coeff_d1 <= 188;
734: coeff_d1 <= 163;
735: coeff_d1 <= 124;
736: coeff_d1 <= 76;
737: coeff_d1 <= 24;
738: coeff_d1 <= -25;
739: coeff_d1 <= -68;
740: coeff_d1 <= -99;
741: coeff_d1 <= -117;
742: coeff_d1 <= -122;
743: coeff_d1 <= -114;
744: coeff_d1 <= -95;
745: coeff_d1 <= -70;
746: coeff_d1 <= -42;
747: coeff_d1 <= -14;
748: coeff_d1 <= 9;
749: coeff_d1 <= 26;
750: coeff_d1 <= 36;
751: coeff_d1 <= 38;
752: coeff_d1 <= 34;
753: coeff_d1 <= 26;
754: coeff_d1 <= 16;
755: coeff_d1 <= 6;
756: coeff_d1 <= 0;
757: coeff_d1 <= -2;
758: coeff_d1 <= 2;
759: coeff_d1 <= 10;
760: coeff_d1 <= 21;
761: coeff_d1 <= 34;
762: coeff_d1 <= 46;
763: coeff_d1 <= 54;
764: coeff_d1 <= 56;
765: coeff_d1 <= 51;
766: coeff_d1 <= 39;
767: coeff_d1 <= 19;
768: coeff_d1 <= -6;
769: coeff_d1 <= -33;
770: coeff_d1 <= -60;
771: coeff_d1 <= -83;
772: coeff_d1 <= -100;
773: coeff_d1 <= -107;
774: coeff_d1 <= -104;
775: coeff_d1 <= -89;
776: coeff_d1 <= -65;
777: coeff_d1 <= -33;
778: coeff_d1 <= 4;
779: coeff_d1 <= 42;
780: coeff_d1 <= 76;
781: coeff_d1 <= 104;
782: coeff_d1 <= 122;
783: coeff_d1 <= 128;
784: coeff_d1 <= 121;
785: coeff_d1 <= 102;
786: coeff_d1 <= 74;
787: coeff_d1 <= 39;
788: coeff_d1 <= 0;
789: coeff_d1 <= -38;
790: coeff_d1 <= -71;
791: coeff_d1 <= -96;
792: coeff_d1 <= -112;
793: coeff_d1 <= -115;
794: coeff_d1 <= -108;
795: coeff_d1 <= -90;
796: coeff_d1 <= -65;
797: coeff_d1 <= -35;
798: coeff_d1 <= -3;
799: coeff_d1 <= 26;
800: coeff_d1 <= 51;
801: coeff_d1 <= 69;
802: coeff_d1 <= 79;
803: coeff_d1 <= 80;
804: coeff_d1 <= 73;
805: coeff_d1 <= 59;
806: coeff_d1 <= 42;
807: coeff_d1 <= 23;
808: coeff_d1 <= 4;
809: coeff_d1 <= -13;
810: coeff_d1 <= -25;
811: coeff_d1 <= -32;
812: coeff_d1 <= -35;
813: coeff_d1 <= -33;
814: coeff_d1 <= -27;
815: coeff_d1 <= -20;
816: coeff_d1 <= -12;
817: coeff_d1 <= -5;
818: coeff_d1 <= -1;
819: coeff_d1 <= 1;
820: coeff_d1 <= 0;
821: coeff_d1 <= -3;
822: coeff_d1 <= -8;
823: coeff_d1 <= -13;
824: coeff_d1 <= -16;
825: coeff_d1 <= -18;
826: coeff_d1 <= -17;
827: coeff_d1 <= -12;
828: coeff_d1 <= -4;
829: coeff_d1 <= 6;
830: coeff_d1 <= 18;
831: coeff_d1 <= 29;
832: coeff_d1 <= 39;
833: coeff_d1 <= 46;
834: coeff_d1 <= 48;
835: coeff_d1 <= 46;
836: coeff_d1 <= 38;
837: coeff_d1 <= 25;
838: coeff_d1 <= 9;
839: coeff_d1 <= -9;
840: coeff_d1 <= -27;
841: coeff_d1 <= -43;
842: coeff_d1 <= -55;
843: coeff_d1 <= -62;
844: coeff_d1 <= -63;
845: coeff_d1 <= -58;
846: coeff_d1 <= -47;
847: coeff_d1 <= -31;
848: coeff_d1 <= -13;
849: coeff_d1 <= 7;
850: coeff_d1 <= 26;
851: coeff_d1 <= 43;
852: coeff_d1 <= 54;
853: coeff_d1 <= 61;
854: coeff_d1 <= 61;
855: coeff_d1 <= 55;
856: coeff_d1 <= 44;
857: coeff_d1 <= 30;
858: coeff_d1 <= 13;
859: coeff_d1 <= -4;
860: coeff_d1 <= -20;
861: coeff_d1 <= -33;
862: coeff_d1 <= -42;
863: coeff_d1 <= -46;
864: coeff_d1 <= -45;
865: coeff_d1 <= -40;
866: coeff_d1 <= -32;
867: coeff_d1 <= -21;
868: coeff_d1 <= -10;
869: coeff_d1 <= 2;
870: coeff_d1 <= 11;
871: coeff_d1 <= 19;
872: coeff_d1 <= 23;
873: coeff_d1 <= 24;
874: coeff_d1 <= 23;
875: coeff_d1 <= 19;
876: coeff_d1 <= 14;
877: coeff_d1 <= 9;
878: coeff_d1 <= 4;
879: coeff_d1 <= 0;
880: coeff_d1 <= -3;
881: coeff_d1 <= -4;
882: coeff_d1 <= -3;
883: coeff_d1 <= -2;
884: coeff_d1 <= 0;
885: coeff_d1 <= 2;
886: coeff_d1 <= 3;
887: coeff_d1 <= 4;
888: coeff_d1 <= 3;
889: coeff_d1 <= 0;
890: coeff_d1 <= -3;
891: coeff_d1 <= -8;
892: coeff_d1 <= -12;
893: coeff_d1 <= -15;
894: coeff_d1 <= -18;
895: coeff_d1 <= -18;
896: coeff_d1 <= -17;
897: coeff_d1 <= -13;
898: coeff_d1 <= -8;
899: coeff_d1 <= -1;
900: coeff_d1 <= 6;
901: coeff_d1 <= 14;
902: coeff_d1 <= 20;
903: coeff_d1 <= 25;
904: coeff_d1 <= 28;
905: coeff_d1 <= 27;
906: coeff_d1 <= 24;
907: coeff_d1 <= 19;
908: coeff_d1 <= 11;
909: coeff_d1 <= 2;
910: coeff_d1 <= -7;
911: coeff_d1 <= -15;
912: coeff_d1 <= -22;
913: coeff_d1 <= -27;
914: coeff_d1 <= -29;
915: coeff_d1 <= -28;
916: coeff_d1 <= -25;
917: coeff_d1 <= -19;
918: coeff_d1 <= -11;
919: coeff_d1 <= -3;
920: coeff_d1 <= 5;
921: coeff_d1 <= 13;
922: coeff_d1 <= 19;
923: coeff_d1 <= 22;
924: coeff_d1 <= 24;
925: coeff_d1 <= 23;
926: coeff_d1 <= 20;
927: coeff_d1 <= 15;
928: coeff_d1 <= 9;
929: coeff_d1 <= 3;
930: coeff_d1 <= -3;
931: coeff_d1 <= -8;
932: coeff_d1 <= -12;
933: coeff_d1 <= -14;
934: coeff_d1 <= -14;
935: coeff_d1 <= -13;
936: coeff_d1 <= -11;
937: coeff_d1 <= -8;
938: coeff_d1 <= -5;
939: coeff_d1 <= -2;
940: coeff_d1 <= 1;
941: coeff_d1 <= 3;
942: coeff_d1 <= 4;
943: coeff_d1 <= 4;
944: coeff_d1 <= 4;
945: coeff_d1 <= 3;
946: coeff_d1 <= 2;
947: coeff_d1 <= 1;
948: coeff_d1 <= 0;
949: coeff_d1 <= 0;
950: coeff_d1 <= 0;
951: coeff_d1 <= 1;
952: coeff_d1 <= 2;
953: coeff_d1 <= 4;
954: coeff_d1 <= 5;
955: coeff_d1 <= 6;
956: coeff_d1 <= 6;
957: coeff_d1 <= 6;
958: coeff_d1 <= 4;
959: coeff_d1 <= 2;
960: coeff_d1 <= -1;
961: coeff_d1 <= -4;
962: coeff_d1 <= -7;
963: coeff_d1 <= -9;
964: coeff_d1 <= -11;
965: coeff_d1 <= -12;
966: coeff_d1 <= -12;
967: coeff_d1 <= -10;
968: coeff_d1 <= -7;
969: coeff_d1 <= -4;
970: coeff_d1 <= 0;
971: coeff_d1 <= 5;
972: coeff_d1 <= 9;
973: coeff_d1 <= 12;
974: coeff_d1 <= 14;
975: coeff_d1 <= 15;
976: coeff_d1 <= 14;
977: coeff_d1 <= 12;
978: coeff_d1 <= 9;
979: coeff_d1 <= 4;
980: coeff_d1 <= 0;
981: coeff_d1 <= -4;
982: coeff_d1 <= -8;
983: coeff_d1 <= -11;
984: coeff_d1 <= -13;
985: coeff_d1 <= -14;
986: coeff_d1 <= -13;
987: coeff_d1 <= -11;
988: coeff_d1 <= -8;
989: coeff_d1 <= -4;
990: coeff_d1 <= 0;
991: coeff_d1 <= 3;
992: coeff_d1 <= 6;
993: coeff_d1 <= 9;
994: coeff_d1 <= 10;
995: coeff_d1 <= 10;
996: coeff_d1 <= 9;
997: coeff_d1 <= 8;
998: coeff_d1 <= 5;
999: coeff_d1 <= 3;
default: coeff_d1 <= 0;
endcase
coeff_d2 <= coeff_d1;
coeff_d3 <= coeff_d2;

// Define multiplication
product <= data_d3*coeff_d3;

// Define accumulator
accumulator <= (clear_acc_sr[0] ? 48'b0 : accumulator) + { {12{product[35]}}, product[35:0] };

// Handle resetting filter when new data is received
if(din_enable) begin
clear_acc_sr <= {1'b1, clear_acc_sr[4:1]};
data_index <= write_index - num_taps + 1'b1;
coeff_index <= 1'b0;
end else begin
data_index <= data_index + 1'b1;
coeff_index <= coeff_index + 1'b1;
clear_acc_sr <= {1'b0, clear_acc_sr[4:1]};
end

// Handle writing buffer data
if(din_enable) begin
buffer[write_index] <= { {6{din[11]}}, din[11:0] };
write_index <= write_index + 1'b1;
end

// Handle eject_data scheduling
if(coeff_index == num_taps-1) begin
eject_data_sr <= {1'b1, eject_data_sr[4:1]};
end else begin
eject_data_sr <= {1'b0, eject_data_sr[4:1]};
end

// Handle ejecting data
if(eject_data_sr[0]) begin
dout <= accumulator[11:0];// >>> data_scale;
dout_enable <= 1'b1;
end else begin
// dout <= dout;
dout_enable <= 1'b0;
end

end

endmodule
« Last Edit: May 02, 2018, 05:55:53 pm by pigtwo »
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2296
  • Country: nz
Quote
What operating system are you running ISE on?  Maybe I should always be running the virtual machine version? 
I am running it on Window 10 natively. But have an older system that I use when I need to program hardware.

I think you have earned yourself a DSP48A1 scout badge. You need a new project that forces you to learn some other part of the chip. BRAM? Weird PLL reprogramming? SERDES? 

Maybe a Stereo FM transmitter based on waggling a pin really fast with the SERDES block?

Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline pigtwo

  • Regular Contributor
  • *
  • Posts: 131
I've compared my results and I do get different results from the two different installations.  I do notice that my home computer is running ISE 14.6.  I reinstalled to 14.7 and it seems to be working now.  I feel much better about this now.  I was losing my mind for quite a while.  I've retried a few of my past problems and they now work much better.  All that stuff about the accumulator being outside of the DSP slice seems to have gone away.  So I take back what I said about the design needing to be hyper exact before optimizations happen.  It seems to have just been some sort of problem I was having. 

Quote
I think you have earned yourself a DSP48A1 scout badge. You need a new project that forces you to learn some other part of the chip. BRAM? Weird PLL reprogramming? SERDES? 
Hahah, I definitely feel like I have a much better understanding of the DSP48A1 now.  I agree,  this whole project was actually to get me familiar with fast multiplications(DSP48A1) so I'd be more prepared for my next project which I'm sort of stealing from you.  I wanted to become familiar driving high speed LVDS lines with a SERDES so the project is driving a DVI display showing the Mandlebrot set/Julia sets.  Ideally allowing the user to zoom or pan.  I have no idea how practical that is since I've done zero calculations so far.

Quote
Maybe a Stereo FM transmitter based on waggling a pin really fast with the SERDES block?

That sounds like a very good idea.  It seems like a very cool projects especially since I know almost nothing about FM transmission, antennas etc.  Plus I've really wanted to buy a spectrum analyzer but havn't really had a good reason to.  Maybe this can be it. 

Thanks again for all the help!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf