Author Topic: Why does my macrocell count increase?  (Read 1796 times)

0 Members and 1 Guest are viewing this topic.

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Why does my macrocell count increase?
« on: September 28, 2022, 02:43:52 am »
Hi,

I'm trying to cram a 16 bit adder/subtracter into an ATF1508 CPLD. I'm doing something that I expected would reduce the complexity of my design, but it ends up increasing it.

I'm using Quartus II 13.0.1, set to the compatible EPM7128 device. This is the first time I've ever used Verilog or a CPLD, so there's a good chance I'm messing up something.

I have the following code:
Code: [Select]
// 16 bit adder/subtracter
// Opcode 0 produces A + B
// Opcode 1 produces A - B
// Opcode 2 produces A + 1
// Opcode 3 produces -B

module ALU
#(parameter WIDTH = 16)
(output [WIDTH - 1:0] result, output c, output v, output n,
input [3 : 0] opcode, input [WIDTH - 1:0] A, input [WIDTH - 1:0] B);

// Calculate -B
wire [WIDTH - 1 : 0] negb = ~B + 16'd1;

// Choose what we'll feed into the adder. The first argument will always be A.
// The second argument could be -B, 1, or B
wire isSub = (opcode == 1);
wire isIncr = (opcode == 2);
wire [WIDTH - 1 : 0] addend2 = isSub ? negb : (isIncr ? 16'd1 : B);

// If the opcode is 3, produce -B. Otherwise, produce A + whatever we selected above
wire carry;
wire [WIDTH - 1 : 0] iresult;
assign {carry, iresult} = (opcode == 3) ?  negb : A + addend2;

// Set the outputs
assign result = iresult;

assign v = ~((A[WIDTH - 1] ^ addend2[WIDTH - 1])) & (A[WIDTH - 1] ^ iresult[WIDTH - 1]);
assign c = carry;
assign n = result[WIDTH - 1];

endmodule

When I compile it, it uses 101 macrocells (out of 128 available).

I hoped that if I reduced the opcode from 4 bits to 2, I might save some macrocells - enough to add a little more functionality (like calculating a Z flag). So I changed "input [3 : 0] opcode" to "input [1 : 0] opcode" and recompiled. I was rewarded with 148 "Can't place node" errors.

Why would that happen?

Thank you,
   Bob
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #1 on: September 28, 2022, 03:29:36 am »
Did you move around the IOs?

A design like this should use a very small amount of macrocells, though, I do not know the product term size of the EMP7128 as it's been decades since I use them.

Have you tried clocking the design?  Remember to use the global clock input pin...

Yes, your code can be written in a cleaner way and clocking the design does improve output switching timing and stability.

Maybe cleaning your code might help Quartus determine the minimum LC usage design.
 

Offline ale500

  • Frequent Contributor
  • **
  • Posts: 415
Re: Why does my macrocell count increase?
« Reply #2 on: September 28, 2022, 09:10:41 am »
What if you use combined add/sub logic per bit ? a-la full adder/subtractor. For getting negated B you are using one 16 bit full adder...
85 Macrocells using combined full adder/subtractors :).
 

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Re: Why does my macrocell count increase?
« Reply #3 on: September 28, 2022, 12:55:12 pm »
Have you tried clocking the design?  Remember to use the global clock input pin...

I don't understand the benefit of clocking it.

The classic 74181 ALU is not clocked.

If I wanted to store the result in a latch, I'd certainly need a clock. But since I can't even fit the adder/subtractor in the CPLD, I figured I would have a separate latch chip. Of course, that would be clocked.

In what way could my design be cleaner? As I said, this is my first attempt at Verilog.

A design like this should use a very small amount of macrocells

Should it? Remember, it's 16 bits wide.

 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #4 on: September 28, 2022, 05:45:30 pm »
In what way could my design be cleaner? As I said, this is my first attempt at Verilog.

This is a characteristic of CPLDs & FPGA.  It is not a characteristic of verilog programming.
Clocking the FPGA, meaning is you feed for example a 50MHz clock in your design, at the absolute most basic, this means once you modify your inputs, all the output bits will change to their new results at the next clock cycle, usually all in a crisp parallel manner.

A non-clocked design just means the combination of logic gates which generate your function will ripple through the CPLD internal fuse wiring until a stable result is reached.  Just like a wiring of 74LS/HC gates to make your ALU.  However, since you only want to replicate a 74LS IC, the CPLD should still run circles around it.

Ok, for cleaning your design, let's take a look at this:

Beginning:
Code: [Select]
module ALU
// 16 bit adder/subtracter
// Opcode 0 produces A + B
// Opcode 1 produces A - B
// Opcode 2 produces A + 1
// Opcode 3 produces -B

#(parameter WIDTH = 16)
(output reg [WIDTH - 1:0] result, output reg c, output reg v, output reg n, // I had to add the 'reg' so that the always @(*) begin will function and also allow clocking in the future if you so desire.
input [3 : 0] opcode, input [WIDTH - 1:0] A, input [WIDTH - 1:0] B);


Good enough.

Next, lets wires containing your 4 calculated opcodes:

Code: [Select]
wire [WIDTH:0] i_add  = A + B ;  // We are adding the extra 1 bit at the top to keep track of the carry/borrow flag.
wire [WIDTH:0] i_sub  = A - B ;
wire [WIDTH:0] i_inc  = A + 1 ;
wire [WIDTH:0] i_negb = 0 - B ;  // I wired it like this to hopefully help direct the compiler to a simplification of the inputs.

Now, lets set the outputs:

Code: [Select]
always @(*) begin  // If we ever want to clock this design, the ' @(*) ' here would change to ' @(posedge clk_in) '

  case (opcode[1:0]) // You only wanted to use the 2 bottom bits of the opcode.

    2'd0 : begin // A+B
              result  <= i_add [WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_add [WIDTH] ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0 ; // What does 'v' equal?
              n       <= 1'b0 ; // is n a negative flag?  Are you using signed numbers?  This would change things.
            end

    2'd1 : begin // A-B
              result  <= i_sub [WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_sub [WIDTH] ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0    ; // What does 'v' equal?
              n       <= (B > A) ; // is n a negative flag?  Are you using signed numbers?
            end

    2'd2 : begin // A+1
              result  <= i_inc [WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_inc [WIDTH] ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0 ; // What does 'v' equal?
              n       <= 1'b0 ; // is n a negative flag?  Are you using signed numbers?
            end

    2'd3 : begin // 0-B
              result  <= i_negb[WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_negb[WIDTH]       ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0 ; // What does 'v' equal?
              n       <= 1'b1 ; // is n a negative flag?  Are you using signed numbers?
            end
  endcase

end // always @(*)
endmodule

This should make a little more sense and allow you to add opcodes if you wish.
Though, what does the 'v' stand for?  I have not computed the 'v' in my code.  It is just always set to 0.

This code took too many marcocells with a 'speed' optimized compile.
Setting the compiler optimization to 'Area' allowed this code to fit with 104 of 128 macrocells.

I've attached a Quartus 13.0sp1 full project using an EPM7128 which seems to compile and fit this code.
« Last Edit: September 28, 2022, 05:50:29 pm by BrianHG »
 
The following users thanked this post: rea5245

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #5 on: September 28, 2022, 06:22:46 pm »
 :palm: A little mistake:

Code: [Select]
    2'd3 : begin // 0-B
              result  <= i_negb[WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_negb[WIDTH]       ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0 ; // What does 'v' equal?
              n       <= (B>0) ; // ***** The negative flag should go high if input B > 0
            end

I did not properly calculate the negative flag for (0-B).
The result is still 104 macrocells.
« Last Edit: September 28, 2022, 06:30:17 pm by BrianHG »
 

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Re: Why does my macrocell count increase?
« Reply #6 on: September 28, 2022, 06:45:10 pm »
Though, what does the 'v' stand for?  I have not computed the 'v' in my code.  It is just always set to 0.

v is the overflow flag. I'll deal with it once I have enough free space on the CPLD.

The n flag is set to the high bit of the result. So with 2s complement numbers, it means the result is negative.

You wrote:

Code: [Select]
wire [WIDTH:0] i_negb = 0 - B ;  // I wired it like this to hopefully help direct the compiler to a simplification of the inputs.
I can appreciate that a compiler might handle 0-B differently than -B, but aren't we dabbling in black magic at this point? Do we really know what's going to be different, if anything?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #7 on: September 28, 2022, 06:48:30 pm »
Careful here, what I wrote was for you to be able to debug visually and conclusively.

I wrote this from experience.
If you want 'Black Magic', then here it is:
Code: [Select]
module ALU
// 16 bit adder/subtracter
// Opcode 0 produces A + B
// Opcode 1 produces A - B
// Opcode 2 produces A + 1
// Opcode 3 produces -B

#(parameter WIDTH = 16)
(output reg [WIDTH - 1:0] result, output reg c, output reg v, output reg n, // I had to add the 'reg' so that the always @(*) begin will function and also allow clocking in the future if you so desire.
input [3 : 0] opcode, input [WIDTH - 1:0] A, input [WIDTH - 1:0] B);


wire signed [WIDTH+1:0] sA = (opcode==3) ? 0 : A ;
wire signed [WIDTH+1:0] sB = (opcode==3) ? -B :
                             (opcode==2) ?  1 :
                             (opcode==1) ? -B : B ;

wire signed [WIDTH+1:0] i_add  = sA + sB ;  // We are adding the extra 1 bit at the top to keep track of the carry/borrow flag.

always @(*) begin  // If we ever want to clock this design, the ' @(*) ' here would change to ' @(posedge clk_in) '

              result  <= i_add [WIDTH - 1:0] ; // Trim the i_add integer bits down to the 'WIDTH' to match 'result's width
              c       <= i_add [WIDTH] ; // I'm assuming ' c ' is the carry / borrow flag.
              v       <= 1'b0 ; // What does 'v' equal?
              n       <= i_add [WIDTH+1] ; // is n a negative flag?  Are you using signed numbers?  This would change things.

end // always @(*)
endmodule

Remember, you asked.
This code should achieve the same result.
It now compiles to 62 of 128 macrocells.
You now have a bunch of free space.
« Last Edit: September 28, 2022, 06:52:01 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #8 on: September 28, 2022, 06:53:13 pm »
Yes, my last edit above got it down to 62 macrocells.
I did this by using a single 18 bit 'SIGNED' adder for everything.
That's almost half your original 101 cells.
 

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Re: Why does my macrocell count increase?
« Reply #9 on: September 28, 2022, 06:56:41 pm »
Yes, my last edit above got it down to 62 macrocells.
I did this by using a single 18 bit 'SIGNED' adder for everything.
That's almost half your original 101 cells.

I'm getting 130 macrocells when I compile your code - too big to fit.  :-//
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #10 on: September 28, 2022, 07:03:47 pm »
Go into menu:

Assignments - Settings :

Select 'Analysis and Synthesis Settings'.

Change 'Optimization Technique' from 'Speed' to 'Area'.

(Balanced should also work, however, you will get 94 macrocell usage)

Make sure you are using my last edited code where I have the ' wire signed [WIDTH+1:0] i_add '
« Last Edit: September 28, 2022, 07:05:21 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #11 on: September 28, 2022, 07:26:48 pm »
Is this what you want the overflow to do?

v  <= i_add [WIDTH+1]  &&  ~i_add [WIDTH+0]  ;

Since you do not have a carry/borrow input and your source A&B are un-signed, I cannot see any type of overflow other than a negative number with the carry flag unset.
 

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Re: Why does my macrocell count increase?
« Reply #12 on: September 28, 2022, 07:31:39 pm »
Change 'Optimization Technique' from 'Speed' to 'Area'.

Wow! That works wonders! Thanks!

Since you do not have a carry/borrow input and your source A&B are un-signed...

A and B can be either interpreted as either signed or unsigned, depending on the context. This is for a TTL computer (well... not entirely TTL, since I'll have the CPLD), so whether the numbers are signed or not depends on the software's interpretation of them.
 

Offline rea5245Topic starter

  • Frequent Contributor
  • **
  • Posts: 587
  • Country: us
Re: Why does my macrocell count increase?
« Reply #13 on: September 28, 2022, 07:36:44 pm »
So my immediate problem of fitting everything into a CPLD is solved (thank you Brian!). But my original question remains a puzzle: why did narrowing down the opcode field cause the number of macrocells to increase? I hoped it would reduce their usage, and at worst, I would've expected it to have no effect.

Is this something an experienced Verilog programmer could've predicted?
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #14 on: September 28, 2022, 07:40:10 pm »
A and B can be either interpreted as either signed or unsigned, depending on the context. This is for a TTL computer (well... not entirely TTL, since I'll have the CPLD), so whether the numbers are signed or not depends on the software's interpretation of them.

Keep in mind that you may want to:

   input [3 : 0] opcode, input signed [WIDTH - 1:0] A, input signed [WIDTH - 1:0] B);

The only difference in the result will be the sign bits during the addition.
However, if you want to control the signed/unsigned nature of the inputs, then you will need to make a signed wire = the unsigned inputs with the MSB tied to a source selection switch.

Since I already used an 18bit math core, done right, the end result should still stay within 62 +/- 2 or 3 macrocells as you are just manipulating / copying the MSB of the input to the upper sign bits via input pin.
(When manually driving this signed function, the red 'SIGNED' should not be used at the input port definitions.)
« Last Edit: September 28, 2022, 07:46:59 pm by BrianHG »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #15 on: September 28, 2022, 07:44:48 pm »
So my immediate problem of fitting everything into a CPLD is solved (thank you Brian!). But my original question remains a puzzle: why did narrowing down the opcode field cause the number of macrocells to increase? I hoped it would reduce their usage, and at worst, I would've expected it to have no effect.

Is this something an experienced Verilog programmer could've predicted?
It's just a fluke where having the extra input terms may have entered into a final solution, or, the compiler found a means of making the design have a higher FMAX output by using less inputs, but by adding a few extra macrocells to compute the output.  Unless you are specifying strict boolean logic for each output cell, this happens more often than you think as the compiler tries to get the highest output speed possible.

If you clocked your design and specified you only need 50mhz performance, then Quartus may have automatically simplified the system down to the smallest gate/macrocell count instead of striving for >200MHz performance.
 
The following users thanked this post: Someone

Offline Foxxz

  • Regular Contributor
  • *
  • Posts: 126
  • Country: us
Re: Why does my macrocell count increase?
« Reply #16 on: September 29, 2022, 12:11:45 am »
So my immediate problem of fitting everything into a CPLD is solved (thank you Brian!). But my original question remains a puzzle: why did narrowing down the opcode field cause the number of macrocells to increase? I hoped it would reduce their usage, and at worst, I would've expected it to have no effect.

Is this something an experienced Verilog programmer could've predicted?
It's just a fluke where having the extra input terms may have entered into a final solution, or, the compiler found a means of making the design have a higher FMAX output by using less inputs, but by adding a few extra macrocells to compute the output.  Unless you are specifying strict boolean logic for each output cell, this happens more often than you think as the compiler tries to get the highest output speed possible.

If you clocked your design and specified you only need 50mhz performance, then Quartus may have automatically simplified the system down to the smallest gate/macrocell count instead of striving for >200MHz performance.

Can you set a timing constraint without resorting to clocking to get the same effect?

Edited: NM I think we're thinking the same thing using different terminology
« Last Edit: September 29, 2022, 12:14:21 am by Foxxz »
 

Offline BrianHG

  • Super Contributor
  • ***
  • Posts: 8138
  • Country: ca
    • LinkedIn
Re: Why does my macrocell count increase?
« Reply #17 on: September 29, 2022, 12:46:14 am »
So my immediate problem of fitting everything into a CPLD is solved (thank you Brian!). But my original question remains a puzzle: why did narrowing down the opcode field cause the number of macrocells to increase? I hoped it would reduce their usage, and at worst, I would've expected it to have no effect.

Is this something an experienced Verilog programmer could've predicted?
It's just a fluke where having the extra input terms may have entered into a final solution, or, the compiler found a means of making the design have a higher FMAX output by using less inputs, but by adding a few extra macrocells to compute the output.  Unless you are specifying strict boolean logic for each output cell, this happens more often than you think as the compiler tries to get the highest output speed possible.

If you clocked your design and specified you only need 50mhz performance, then Quartus may have automatically simplified the system down to the smallest gate/macrocell count instead of striving for >200MHz performance.

Can you set a timing constraint without resorting to clocking to get the same effect?

Edited: NM I think we're thinking the same thing using different terminology
You are now getting into the realm of creating a proper .sdc (Synopsys Design Constraints) file where you specify input and output setup, delay, and hold times.  At least, for the larger FPGAs, this is how I would go about it.

The problem here is if you specify a time too large or too small in the .sdc file, the FPGA compiler may add logic cells to mimic the effect of your requested input to output delays.  For small PLD designs of the MAX7000 series, unless you wish to mimic a peculiar timing of some old logic ICs or even old GAL/PAL cplds, setting the compiler's optimization technique is the quickest way to see a large effect on the macrocell usage.

My 'Black Magic' trick achieving 62 macrocells was achieved by defining a final single full signed adder with 2 extra bits which Quartus could easily chew on, utilizing the upper 2 output bits to represent the carry and negative flags.  To make all the calculations which 'rea5245' requested, I only switched the input values to that adder based on the function requiring nothing more than some XOR and AND terms modifying the adders A & B inputs.  However, when we had 'Speed' set for the optimization technique, all 3 versions of the source code would hit the same ~135 required macrocells, yet, the 'Area' optimization technique had varying final result sizes from 101 macrocells to my last version at 62 macrocells.  I guess you can say that everything contributed into the final size.

I guess you should always set the compiler to optimize for 'Area', and define a .sdc file where the compiler always tries for a small result, but adds logic as nessary to improve output performance if you so have it set in the .sdc file.
« Last Edit: September 29, 2022, 12:50:23 am by BrianHG »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf