EEVblog Electronics Community Forum

Electronics => Projects, Designs, and Technical Stuff => Topic started by: KTP on June 16, 2010, 01:12:42 am

Title: diy cpu
Post by: KTP on June 16, 2010, 01:12:42 am
Since I do not have any EE classes this summer (none available) I thought it would be a good time to learn Verilog.  My goal by the end of summer is to design and synthesize my own simple microcontroller inside a Altera Cyclone II FPGA using the dev kit provided by Altera (I have it available to me through the school).  I know this has been done many times, but I think it would be a good way to learn Verilog and also I have never built a cpu before.

I'll keep you posted :-)
Title: Re: diy cpu
Post by: wd5gnr on June 16, 2010, 10:19:27 pm
Here's one of mine:

http://www.drdobbs.com/embedded-systems/221800122

If you look around the DDJ site, you'll also see my cross assembler and Forth compiler for it. The simulator with JavaScript is as yet unpublished.

In addition, I did this one awhile back:
http://opencores.org/project,blue

The cross assembler for this one is not as flexible, but the front panel is... um... novel (synonym for hack; watch the video). This one is based loosely on Caxton Foster's book Computer Architecture. If you want to build a "classic" minicomputer, this is a good book to read even though it is old. If you are doing pipelined RISC, not the right book.


73

Al W
http://www.hotsolder.com
Title: Re: diy cpu
Post by: KTP on June 17, 2010, 04:41:50 pm
Thanks!  I will look over your designs and the books when I finish reviewing the basics.

I was considering trying to build the cpu out of nand logic only and then synthesizing this into the fpga (where i realize it will get muddled and optimized into the LUT cells) or try and keep the internals of the FPGA I plan to use in mind such that the cpu uses the smallest amount of real estate.  I could see some useful educational experience doing it either way.

It sometimes seems easy for a seemingly benign line of Verilog code to instantiate a flipflop or similar.  I don't want to be in the situation that I don't know what physical circuit the code is representing.
Title: Re: diy cpu
Post by: ngkee22 on June 17, 2010, 05:29:23 pm
I took several verilog hardware design classes in college.  If you are familiar with C programming, Verilog resembles it a lot, it shouldn't be hard to learn.  I enjoyed making hardware with Verilog.  I never got a chance to see VHDL though, I hear it allows for faster performance in the hardware design and it is more structured.
Title: Re: diy cpu
Post by: A-sic Enginerd on June 17, 2010, 07:08:42 pm
It sometimes seems easy for a seemingly benign line of Verilog code to instantiate a flipflop or similar. 

Be careful with that one. You might actually wind up with a latch because of bad coding. So in that sense, yeah, it's easy to do.  :D

Most common rookie move to cause this is not giving a default on a block of code intended to generate purely combinatorial logic. The synth tool doesn't know what to do with it so it uses the rationale of "when in doubt, just keep the old value", and the only way to do that is: a latch.

Don't take what the compiler does as just black magic. You need to be "thinking hardware" when you write your code and you shouldn't have any big surprises. You should get flops where you expect them, and combinatorial logic where you expect it. FPGA's in general don't like latches and latches in general don't make for good, sound, robust logic design. So look through your synth report and pay attention to ALL the errors and even the warnings. After a while you will begin to recognize some of the warnings you can ignore, but when getting started pay attention to all of them.
Title: Re: diy cpu
Post by: A-sic Enginerd on June 17, 2010, 07:24:11 pm
I was considering trying to build the cpu out of nand logic only and then synthesizing this into the fpga

If your true goal is to learn Verilog and more advanced logic design, don't go this route. There are trade-offs with the size and speed of a design compared to how complex it is and supportability. To see this, take it to the hypothetical of going into production on this. Sure you could build a design that is maybe say.....25% smaller than if you'd been more liberal with your design (i.e.: don't restrict to just NAND gates), but how much is that going to cost your employer because it's taking you extra long to get the design right, not to mention what it'll take to support it over the long haul. There's no "in general" right or wrong answer on where that balance point is, it simply has to be evaluated for each project.

Also, now days you can actually be doing yourself a disservice by this type of restriction. I know you're targeting an FPGA, but for just a moment lets say you were going to target an actual ASIC. Well, it turns out that all chip vendors now days actually have custom cells in their libraries that can give optimal performance. Example: they have a single standard cell that can take 4 inputs and put it through what looks like (AB + CD). Yet it's much smaller and a whole lot faster than if you would have tried to implement that function with discrete gates....let alone restricting it to getting that same function with purely NAND gates. I know my argument here speaks specifically to ASICs, but there's a certain level of the same thing happening with an FPGA. They are designed to be able to actually implement some functions faster than if you try to spell it out discretely as you mention.
Title: Re: diy cpu
Post by: KTP on June 17, 2010, 09:27:44 pm
Ok, for example today I created a full 1 bit adder by writing this code:

module fulladder (A, B, Cin, Sum, Cout);
   input A, B, Cin;
   output Sum, Cout;
   
   assign Sum = ((~A&~B&Cin) | (~A&B&~Cin) | (A&B&Cin) | (A&~B&~Cin));
   assign Cout = (A&B) | (A&Cin) | (B&Cin);
endmodule

I then created a schematic symbol for this full 1 bit adder and placed four of them on a top level schematic with the carry outs hooked up to the carry ins.  I then had a 4 bit ripple carry adder which took up 6 logic cells in the Cypress II fpga and had about a 12ns delay before the sum was available.

Next I created a new Verilog project and used this code to directly make a 4 bit adder:

module simpleadder(A, B, Cin, Cout, Sum);
   input [3:0]A;
   input [3:0]B;
   input Cin;
   output [3:0]Sum;
   output Cout;
   
   assign {Cout,Sum} = (A + B + Cin);
endmodule

This also used six logic cells and had a delay of 12nS.  Am I to assume the Quartus software made the same logic from both examples?  Did it even create a ripple carry adder logic or did it make some form of carry lookahead adder or ?
Title: Re: diy cpu
Post by: A-sic Enginerd on June 17, 2010, 09:55:38 pm
This also used six logic cells and had a delay of 12nS.  Am I to assume the Quartus software made the same logic from both examples?  Did it even create a ripple carry adder logic or did it make some form of carry lookahead adder or ?

Would have to look into the details. Could it have made the same logic from both - possibly. However, don't assume what it built just by looking at how many cells it used and what the delay is. It's very possible to get two completely different functions that just happen to have the same cell count and delay. Especially when dealing with an FPGA, because at that point the delay has every bit as much to do with how many LUTs are used (or whatever your specific FPGA architecture is) as it does the complexity of what you're doing. I.e.: when put into an FPGA, doing a simple 2 input AND can have just as much delay as one of your adders you've built.

As an additional footnote, don't be surprised to find that building some of your own basic blocks (adders) yields no different performance than simply using things like the "+" in your verilog code and letting the synthesis tools determine the best implementation. I recently ran across (again) myself. I needed to count how many bits were set to one in a 32 bit vector. I did a little digging and found various algorithms and white papers from guys working on their masters and what not. In the end, believe it or not, my best results came from doing a simple 'for' loop to walk the vector. Granted the synth tools we have are commercial grade for ASICs (Synopsys and / or Cadence), but just something to watch for.

EDIT: building up your own ALU is a nice little exercise to learn digital design and things like Verilog so don't take the easy way out just because I suggested it. ;)
Title: Re: diy cpu
Post by: KTP on June 18, 2010, 12:49:15 am
Yes, that is probably the first thing I will build for my cpu is the ALU.  I need to actually read up on what functionality it should have.  I figure at minimum it needs to be able to add two numbers and present the result in a holding register.  It should also have the ability to perform 2's complement on one of the two input numbers.  Maybe the ability to shift left and right?  Anyway I can read up on some wiki about that.

As a baby step I designed a 4 bit up counter out of D-flipflops in Verilog and simulated and downloaded it to the dev board.  Works fine...I was surprised that a key press did not generate multiple counts...they must debounce the switches on the board...

For S&G, here is the code:

module counterfun(KEY,LEDR);
   input [0:0]KEY;
   output [3:0]LEDR;
   wire [3:0]D;
   wire [3:0]Q;
   wire clk;
   assign clk = KEY[0];
   
   Dflip FF0(D[0], clk, Q[0]);
   Dflip FF1(D[1], clk, Q[1]);
   Dflip FF2(D[2], clk, Q[2]);
   Dflip FF3(D[3], clk, Q[3]);
   
   assign D[0] = ~Q[0];
   assign D[1] = Q[1] ^ Q[0];
   assign D[2] = Q[2] ^ (Q[1] & Q[0]);
   assign D[3] = Q[3] ^ (Q[2] & Q[1] & Q[0]);
   
   assign LEDR[0] = Q[0];
   assign LEDR[1] = Q[1];
   assign LEDR[2] = Q[2];
   assign LEDR[3] = Q[3];
   
endmodule

module Dflip (D, Clock, Q);
   input D, Clock;
   output reg Q;
   
   always @(posedge Clock)
      Q<=D;
endmodule
Title: Re: diy cpu
Post by: A-sic Enginerd on June 18, 2010, 05:39:13 am
Yes, that is probably the first thing I will build for my cpu is the ALU.  I need to actually read up on what functionality it should have.  I figure at minimum it needs to be able to add two numbers and present the result in a holding register.  It should also have the ability to perform 2's complement on one of the two input numbers.  Maybe the ability to shift left and right?  Anyway I can read up on some wiki about that.

As a baby step I designed a 4 bit up counter out of D-flipflops in Verilog and simulated and downloaded it to the dev board.  Works fine...I was surprised that a key press did not generate multiple counts...they must debounce the switches on the board...

For S&G, here is the code:

module counterfun(KEY,LEDR);
   input [0:0]KEY;
   output [3:0]LEDR;
   wire [3:0]D;
   wire [3:0]Q;
   wire clk;
   assign clk = KEY[0];
   
   Dflip FF0(D[0], clk, Q[0]);
   Dflip FF1(D[1], clk, Q[1]);
   Dflip FF2(D[2], clk, Q[2]);
   Dflip FF3(D[3], clk, Q[3]);
   
   assign D[0] = ~Q[0];
   assign D[1] = Q[1] ^ Q[0];
   assign D[2] = Q[2] ^ (Q[1] & Q[0]);
   assign D[3] = Q[3] ^ (Q[2] & Q[1] & Q[0]);
   
   assign LEDR[0] = Q[0];
   assign LEDR[1] = Q[1];
   assign LEDR[2] = Q[2];
   assign LEDR[3] = Q[3];
   
endmodule

module Dflip (D, Clock, Q);
   input D, Clock;
   output reg Q;
   
   always @(posedge Clock)
      Q<=D;
endmodule

My $0.02....

Ok, I get declaring the logic explicitly for each bit. Nice beginner exercise. However, once you move on to bigger badder things you won't want to deal with that and you'll just do a simple add to the vector as a whole. One liner sort of thing.

assign D = Q + 4'h1;

However, I would discourage the use of declaring an explicit module for your flops and then instantiating. Not necessary and just muddles things up. Oh yeah, probably want to add a reset in there also. ;)

so to give the fragments (won't do 100% full code)

// add reset to input list
module counter fun (
input KEY,
input RESET,
output LEDR);

// change Q from wire to reg
reg [3:0] Q;


// rest of goo logic....minus the FFx instantiations
.
.
.
//

// build flops here. 99% of the projects I've worked on, active low resets are preferred.
always @ (posedge clk or negedge RESET)
if (~RESET)
   Q <= 4'h0;
else
   Q <= D;

endmodule

There's a few other tidbits I could talk about, but at this point it's splitting hairs and really a don't care for someone just learning. So, keep going, it's looking good!!!!!
Title: Re: diy cpu
Post by: jahonen on June 18, 2010, 07:25:13 am
Most Altera and Xilinx FPGA architectures consist basically from 4-input LUTs (followed by D-FF and some extension interconnects to other LEs) which can implement any logic function involving up to 4 input and one output variables. That explains why you didn't save anything by writing the adder the "hard way".

If you want to optimize something for FPGA specifically, that structure is good to keep in mind. FPGA is not really "array of gates", it just mimics it.

Regards,
Janne
Title: Re: diy cpu
Post by: allanw on June 18, 2010, 01:51:23 pm
Do you know anything about RTL design? I think it'd be very difficult to design a CPU without that level of abstraction.
Title: Re: diy cpu
Post by: A-sic Enginerd on June 18, 2010, 04:59:24 pm
Do you know anything about RTL design? I think it'd be very difficult to design a CPU without that level of abstraction.

Not necessarily.  In fact, good way to learn. Had a class in college where we did this very thing using a flavor of ABLE that ran on *shudder* a MAC *ooofff*. Personally, I think it's actually a good starter project. Would you wind up with a CPU that you could go mainstream with and be willing to use on any of your own future projects? Not likely. But you'll definitely learn a lot. You'll certainly find out along the way of bad, good, and better ways to do things.
Title: Re: diy cpu
Post by: KTP on June 18, 2010, 05:11:35 pm
RTL = register transfer logic?  That is about as much as I know about it :-)

Is it just designing logic using registers to synch with a clock and combinational logic to interconnect the registers?  This sounds a lot like a state machine, which was how I was going to approach the cpu design.

A-sic, I really appreciate all of the help and comments.  I was initially interested in designing a cpu out of just transistors, but decided on an FPGA for practical reasons.  I know that the FPGA is not just a collection of interconnects and primitive logic but rather that it uses LUTs and DFFs.  Probably the reason I was writing the initial Verilog code the way I did (aside from the fact that I have only started learning the basic Verilog commands yesterday) was I was still in the mode where I wanted to know at a gate level how my code was being implemented.  I kind of see now this is not practical or even possible in a Verilog synthesized FPGA design.  It may take me some time to get used to typing "assign F = A * B" though...

I think I will improve my counter a bit.  I will make it 8 bits and have a parallel load capability along with tri-state output buffering (I think Verilog has some sort of Z thing for that).  This might be a good start toward a program counter  ;D

KTP
Title: Re: diy cpu
Post by: A-sic Enginerd on June 18, 2010, 05:58:17 pm
RTL = register transfer logic?  That is about as much as I know about it :-)

Is it just designing logic using registers to synch with a clock and combinational logic to interconnect the registers?  This sounds a lot like a state machine, which was how I was going to approach the cpu design.

A-sic, I really appreciate all of the help and comments.  I was initially interested in designing a cpu out of just transistors, but decided on an FPGA for practical reasons.  I know that the FPGA is not just a collection of interconnects and primitive logic but rather that it uses LUTs and DFFs.  Probably the reason I was writing the initial Verilog code the way I did (aside from the fact that I have only started learning the basic Verilog commands yesterday) was I was still in the mode where I wanted to know at a gate level how my code was being implemented.  I kind of see now this is not practical or even possible in a Verilog synthesized FPGA design.  It may take me some time to get used to typing "assign F = A * B" though...

I think I will improve my counter a bit.  I will make it 8 bits and have a parallel load capability along with tri-state output buffering (I think Verilog has some sort of Z thing for that).  This might be a good start toward a program counter  ;D

KTP

oooooo...be careful with tristatable buses. They have lots of gotchyas.  ;)
Title: Re: diy cpu
Post by: A-sic Enginerd on June 18, 2010, 06:03:39 pm
Probably the reason I was writing the initial Verilog code the way I did (aside from the fact that I have only started learning the basic Verilog commands yesterday) was I was still in the mode where I wanted to know at a gate level how my code was being implemented.  I kind of see now this is not practical or even possible in a Verilog synthesized FPGA design.  It may take me some time to get used to typing "assign F = A * B" though...
KTP

Don't let it derail you. Like I said before, the steps you're taking are exactly the way one learns. Don't try to jump ahead too quickly. There are still times when you want to explicitly plunk down the nitty gritties. Only experience teaches you when it's safe to take shortcuts. Your multiplication is actually an even better example. Multipliers can be a bit more tricky to deal with.
Title: Re: diy cpu
Post by: KTP on June 18, 2010, 06:05:50 pm
oooooo...be careful with tristatable buses. They have lots of gotchyas.  ;)

Hmmm...how else would you interconnect the systems?  Have all the registers feed into some giant set of multiplexers that selects which register output gets routed to all of the inputs?
Title: Re: diy cpu
Post by: jahonen on June 18, 2010, 07:56:28 pm
Altera Quartus Recommended HDL Coding Styles (http://www.altera.com/literature/hb/qts/qts_qii51007.pdf) document says (at page 46) that

Quote
When you target Altera devices, you should use tri-state signals only when they are attached to top-level bidirectional or output pins. Synthesis tools implement designs with internal tri-state signals correctly in Altera devices using multiplexer logic, but Altera does not recommend this coding practice.

Regards,
Janne
Title: Re: diy cpu
Post by: KTP on June 18, 2010, 08:39:43 pm
Ooooh!  Thanks Janne.

So I *do* want to use an array of multiplexers to implement my bus/register system.  I think I will make a test program with a simple system consisting of a 6 bit parallel load up counter, a 8 bit "display" register connected to LEDR[7:0] on my board, a 8 bit "input" connected to SW[7:0], and a 8 bit "memory" register.  I will clock these with a clock generated by pushbutton KEY[0] and control which register output has access to the bus with eight 4:1 multiplexers whose select lines are controlled by SW[9:8].  The upper 2 bits of the input switches (SW[7:6]) will control the data flow into the three registers and enable/disable the counter.  When finished I should be able to load the counter with a 6 bit number from user input, view the current count on leds, store the current count, load the counter from memory, load memory from user input, view the contents of memory on leds, and view the input switches on the leds.

Whew, that should keep me busy the rest of this afternoon.   :)
Title: Re: diy cpu
Post by: KTP on June 18, 2010, 11:47:14 pm
Yay! it worked...after I realized the DE2 board pushbutton switches are active low (I had assigned reset = ~KEY[0] at first thinking they were active high and everything was being held in reset and looked like it wasn't working).

I dropped the counter and register down to 6 bits to give me enough switches such that I could manually control the multiplexer which lets one device have the bus and the decoder which selects one device to listen to the bus.  I am able to press the clock pushbutton and cycle the counter, load the counter from the keys, load the memory from the keys, write the counter to memory, load counter from memory, etc.

My code looks messy but I am still learning....I will paste it here (sorry it is sort of long):

module bustest(LEDR, LEDG, SW, KEY);
   input [9:0]SW; // physical switches on DE1 board
   input [1:0]KEY; // physical pushbuttons on DE1 board
   output [9:0]LEDR; // physical leds on DE1 board
   output [1:0]LEDG;
   wire clock, reset;
   reg [5:0]bus;
   wire [1:0]buscontrol;
   wire [1:0]devsel;
   wire [5:0] MuxPC, MuxMem, MuxKeys, MuxDisp;
   reg [3:0]Load;
   wire CntEna;
   assign MuxKeys = SW[5:0]; //route user input switches to multiplexer input
   assign buscontrol = SW[9:8];
   assign devsel = SW[7:6];
   assign clock = ~KEY[0];
   assign reset = KEY[1];   
   assign CntEna = 1'b1;
   assign LEDR[5:0] = bus;
   assign LEDG[0] = clock;
   assign LEDG[1] = reset;
   assign LEDR[7:6] = devsel;
   assign LEDR[9:8] = buscontrol;
   // create counter and registers
   NUpCount PC(bus, clock, reset, MuxPC, Load[0], CntEna);
   NReg Mem(bus, clock, reset, MuxMem, Load[1]);
   NReg Display(bus, clock, reset, MuxDisp, Load[2]);
   
   always@(buscontrol, bus, MuxKeys, MuxPC, MuxMem, MuxDisp, devsel)
   begin
      // create multiplexers for bus control
      if(buscontrol == 2'b00)
         bus = MuxKeys;
      else if(buscontrol == 2'b01)
         bus = MuxPC;
      else if(buscontrol == 2'b10)
         bus = MuxMem;
      else
         bus = MuxDisp;
         
      // create decoder for selecting device to load from bus
      case (devsel)
         2'b00 : Load = 4'b0001; // counter connected to bus
         2'b01 : Load = 4'b0010; // memory register connected to bus
         2'b10 : Load = 4'b0100; // display register connected to bus
         2'b11 : Load = 4'b1000; // no device inputs connected to bus
      endcase
   end
   

endmodule

module NUpCount(D, Clock, Resetn, Q, Dload, Ena);
   parameter n = 6;
   input [n-1:0]D;
   input Clock, Resetn, Dload, Ena;
   output reg [n-1:0]Q;
   
   always @(negedge Resetn, posedge Clock)
      if(!Resetn)
         Q<=0;
      else if(Dload)
         Q<=D;
      else if(Ena)
         Q<=Q+1;
endmodule

module NReg(D, Clock, Resetn, Q, Dload);
   parameter n = 6;
   input [n-1:0]D;
   input Clock, Resetn, Dload;
   output reg [n-1:0]Q;
   
   always @(negedge Resetn, posedge Clock)
      if(!Resetn)
         Q<=0;
      else if(Dload)
         Q<=D;
endmodule
Title: Re: diy cpu
Post by: andersendr on June 22, 2010, 04:09:33 pm
Here is the class website for CE2930, Computer Architecture.  I just took this class and it has a few examples.  All of these examples are specifically setup for the MIPS32, since that is what we built in class.

http://myweb.msoe.edu/ce2930/index.html

If you just want to learn about VHDL, here are the 2 classes we had to take on it.
The first being about basic building blocks in VHDL
http://myweb.msoe.edu/ce1900/index.html

The second being about sequential logic in VHDL
http://myweb.msoe.edu/ce1910/index.html
Title: Re: diy cpu
Post by: KTP on June 22, 2010, 11:55:02 pm
Thanks.  I sort of stalled out after playing around with the bus structure.  I have read a lot of material now on simple cpu designs and could probably struggle forth and write out the code for a very simple 8 bit data bus 16 bit address bus cpu with a couple of registers and maybe 16 instructions, but a lot of it would be copying code fragments here and there from books and on the web.   I am trying to figure out a way to start with something a bit simpler that will allow me to progress toward this more advanced project.

While I think on that I am just working through the problems and examples in the Digital Logic book by Stephen Brown and Zvonko Vranesic.