EEVblog Electronics Community Forum

Electronics => Projects, Designs, and Technical Stuff => Topic started by: bitman on August 05, 2017, 02:23:40 am

Title: Home build microcode design
Post by: bitman on August 05, 2017, 02:23:40 am
Hello,
I'm in the last 80-90% of designing a TTL based CPU, which partly is teaching me electrical engineering stuff, and once done I hope to use to teach basic computer principles from.  But I'm now at a point where finding a solution to a particular challenge requires some input.

I'm preparing to setup the microcode in the EPROM. Currently I have 16 bits for each step in the microcode, meaning potentially 16 signals I can send to different components to enable/disable them. What I'm realizing is that's way too few - my current count is 24+ depending on how much ALU I want to do, and it may be higher if I add a couple more registers. I'm currently going through the design to see where I can consolidate and lower the number of signals I need to control, but in the end I don't think 16 will cut it. So my question is simple: Are there any tricks using latches etc. or do I need to increase the number of controls?

The latch idea bugs me because it will get in the way of enabling two registers at once (for instance). And I need a latch since I now need to set the stage over multiple clock cycles, vs. enabling all modules involve in one OP at once.  The number of chips needed will also go up dramatically using a "demux" + latches covering all 4 registers.

I've "robbed" a few antique book stores and got some books from the 70ies on the design of busses and I'm still in the process of reading the relevant sections. I wanted to ask here what a typical design would look like here. Is it a matter of adding more "bits" for the microcode to control, or do you make use of demux/latches?
Title: Re: Home build microcode design
Post by: danadak on August 05, 2017, 12:10:49 pm
If you add more bits to microcode this improves speed over use
of muxing/latches. Classic problem of parallelism vs serialization,
more HW vs less.


Regards, Dana.
Title: Re: Home build microcode design
Post by: rstofer on August 05, 2017, 04:24:20 pm
You just need to go wider.  It wasn't unusual for designs to have 128 bits of microcode width - or even more.
I would just skip trying to use decoders (4 lines from microcode into 1 of 16 outputs).  Besides, adding one more EPROM is a lot less wiring than a bunch of random logic all over the place.  It is so much easier to change microcode than to alter a bundle of wire-wrap.

Mick and Brick was considered the definitive work and is more often referenced by the authors' names than by the title "Bit-Slice Microprocessor Design"

https://www.alibris.com/Bit-Slice-Microprocessor-Design-John-Mick/book/724588 (https://www.alibris.com/Bit-Slice-Microprocessor-Design-John-Mick/book/724588)

It is fairly specific to AMD 2900 series of logic elements but the ideas are universal.

Then there is Husson, better known as Microprogramming - Principles and Practices

https://www.alibris.com/Microprogramming-principles-and-practices-Samir-S-Husson/book/4350030 (https://www.alibris.com/Microprogramming-principles-and-practices-Samir-S-Husson/book/4350030)

This book revolves around microprogramming the IBM 360

Microcode should be in RAM and loaded on boot.  That was the reason IBM invented the 8" floppy - they wanted to load microcode.

http://geekandsundry.com/the-history-of-the-floppy-disk/ (http://geekandsundry.com/the-history-of-the-floppy-disk/)

Well, you can see how that worked out!  Now we have thumb drives.

I like simple.  I think I would try to get my system to boot microcode from a Compact Flash even if I had to add a uC along side. An SD would be easier to interface (less pins) but the CF has a clean programming interface - it's just an IDE drive.  In any event, I would want a way to get from the Meta Assembler running on a PC into a device that could be read at boot time.  I would skip the EPROM experience altogether.

You can't expect your microcode to work right out of the gate (pun actually intended, I worked hard for it).

Then there is the possibility of 'feature creep'  We can keep adding microcode and features until we run out of RAM.  An interesting read about bringing up a new machine:

https://www.alibris.com/booksearch?keyword=the+soul+of+a+new+machine (https://www.alibris.com/booksearch?keyword=the+soul+of+a+new+machine)

Title: Re: Home build microcode design
Post by: bd139 on August 05, 2017, 04:56:24 pm
This. You can just stack the ROM/EPROMs containing the microcode as wide as you want and get as many controls signals as needed. This is so lazy it's almost worth it over discrete logic. Also you get consistent timing with ERPOMS unlike rats nest of gates. I designed one of them at university. Never got to finish the damn project though in a semester.
Title: Re: Home build microcode design
Post by: woodchips on August 05, 2017, 05:36:41 pm
I agree, width is what you want, 64 bits or so. It is far simpler to debug the presence or absence of a signal from the microstore than a decoded signal from a bucket of TTL. Also make the microcode in RAM, unless you have a parallel output i/f board for the computer which can read the computer microcode address and then output the correct pattern with a strobe, slow, but works fine.

If you have immediate/literal or whatever you call it now data to load a register then you need 16 bits for that straightaway.
Title: Re: Home build microcode design
Post by: ale500 on August 05, 2017, 07:08:47 pm
Many discrete processors had multiple ROM chips for the microcode, I just wonder why do you need that today, I mean why do not use just one big E(E)PROM and several latches ? and load the latches in sequence, you not after the utmost speed, I'd guess. Just maybe less wire-wrap :)
Title: Re: Home build microcode design
Post by: bitman on August 06, 2017, 12:06:52 am
Many discrete processors had multiple ROM chips for the microcode, I just wonder why do you need that today, I mean why do not use just one big E(E)PROM and several latches ? and load the latches in sequence, you not after the utmost speed, I'd guess. Just maybe less wire-wrap :)
In my case it's simple - it's 8bit PROMs so to get more width on the data side you need to "stack" them. More memory doesn't give me a wider bus.
Title: Re: Home build microcode design
Post by: bitman on August 06, 2017, 03:05:57 pm
Thanks - I'll widen the "bus" for microcode control. Not looking forward to having to burn/write to 4 different EPROMs making sure I mount them in the right sequence when done.
Title: Re: Home build microcode design
Post by: rstofer on August 06, 2017, 04:30:51 pm
Be sure to build a RAM based ROM simulator such that you can develop your microcode without actually burning EPROMs.  A few 24 pin flat cables with DIP connectors will couple the boards.

I would probably plan for at least N+1 EPROMs

If you look through Mick and Brick (page 308) you will see and excellent example of microcode written for a meta-assembler.  I haven't tried it but I wonder if it would be worth a couple of days playing around with Python to create such a thing.

Or maybe just hit up Google and get something like:
https://sourceforge.net/projects/metalasm/

Title: Re: Home build microcode design
Post by: rstofer on August 06, 2017, 05:17:57 pm
I assume you are going to register the EPROM outputs.  This will result in cleaner signal timing but it also allows for overlapping microcode fetch.  See page 14 of Mick and Brick.

I REALLY like this book.  I took a class in designing microcoded systems using the AMD 2900 series components MANY years ago ('83?) and this was the text.  We had to reinvent the 8086, lay out the hardware and write the microcode.  It was a lot of fun!
Title: Re: Home build microcode design
Post by: danadak on August 06, 2017, 11:26:29 pm
At a later time, if still of interest, Cypress PSOC architecture have UDBs (Universal Digital Blocks)
which contain a simple structure that can be microcoded via a GUI editor. Lots of fun, and it allows
access to/from the rest of the chip rich in analog, DSP, COMM....


https://duckduckgo.com/?q=udb+editor&atb=v58-5_a&iax=1&ia=videos


Regards, Dana.
Title: Re: Home build microcode design
Post by: bitman on August 08, 2017, 02:32:42 pm
Be sure to build a RAM based ROM simulator such that you can develop your microcode without actually burning EPROMs.  A few 24 pin flat cables with DIP connectors will couple the boards.

So I have dip-switches for both address and data bus, mostly for testing. I really did/do not want to be using those to change code on a permanent basis. I have none for micro-code at all, instead I have a small piece of code I use to generate the dump/data for the eprom. The hard work is taking the proms in and out to be written as things get completed. However, having 32dips plus logic to view/browse/enter data into 8 sub-positions per OP is well, daunting and I think that will take a lot longer than moving the EPROMs to the writer.

MY PLAN was to figure out how to develop a simple serial interface so I could load them from a USB stick or something like that, but for now that's far too advanced for my knowledge level so I'll have to do it the stupid way for now.

Note - the board has very little memory (32Kx8). If I used all potential 256 OP codes to RAM, I would have very little space left for REAL stuff. Of course I could build separate RAM module(s) for just microcode but at that point I don't get it anymore because any change I do there would have to be persisted on the EPROM anyway. I'll much rather do that structured from a PC than messing with dip-switches all over the place.

Again, I really want to thank you and everyone else for GREAT input - I've looked at the books, and once I get through a few others I bought for this purpose (70'ies publication dates too) I think these will be next :)
Title: Re: Home build microcode design
Post by: Kalvin on August 08, 2017, 02:35:42 pm
You can use a logic simulator first to try out different ideas. Much easier to edit few files instead of modifying hardware.
Title: Re: Home build microcode design
Post by: rstofer on August 08, 2017, 03:49:18 pm
I wasn't thinking about DIP switches, I was talking about a way to get from your EPROM socket to some other memory board that had RAM and, perhaps, a uC to load the RAM

I couldn't find premade cables but here is the connector.  Add some 24 or 28 strand ribbon cable and you are good to go.

http://www.jameco.com/z/8200-24-R-Connector-IDC-DIP-Plug-24-Pin-0-1-Amp-Use-With-Part-37760_42691.html (http://www.jameco.com/z/8200-24-R-Connector-IDC-DIP-Plug-24-Pin-0-1-Amp-Use-With-Part-37760_42691.html)

http://www.jameco.com/z/8200-28-IDC-Connector-28-Position-2-54mm-IDT-DIP-Plug-Cable-Mount_99670.html (http://www.jameco.com/z/8200-28-IDC-Connector-28-Position-2-54mm-IDT-DIP-Plug-Cable-Mount_99670.html)




Title: Re: Home build microcode design
Post by: tggzzz on August 08, 2017, 04:12:24 pm
I should still have a copy of Mick and Brick, plus the 2900 databooks.

If the OP wants a challenge, and to keep the EPROM narrow, and to minimise the logic count, he could consider making a serial processor. The tradeoff is lost speed, but that might be a good tradeoff.
Title: Re: Home build microcode design
Post by: Kalvin on August 08, 2017, 04:20:40 pm
If the OP wants a challenge, and to keep the EPROM narrow, and to minimise the logic count, he could consider making a serial processor. The tradeoff is lost speed, but that might be a good tradeoff.

Indeed, googling with magic words like
- bit-serial arithmetic
- bit-serial architecture
will return good pointers.
Title: Re: Home build microcode design
Post by: rstofer on August 08, 2017, 09:37:17 pm
Maybe flash memory is a little easier to use

http://ww1.microchip.com/downloads/en/DeviceDoc/20005022C.pdf (http://ww1.microchip.com/downloads/en/DeviceDoc/20005022C.pdf)

It could probably be programmed in-circuit as long as the rest of the CPU didn't go nuts (hold in reset?).

A clip like this might work.  It is for an SOIC package, not a DIP.  I don't know if it will fit.

https://www.digikey.ca/product-detail/en/pomona-electronics/6107/501-1723-ND/737553 (https://www.digikey.ca/product-detail/en/pomona-electronics/6107/501-1723-ND/737553)

On PCBs, I often see programming headers as just pads with vias and they are slightly misaligned on every other pin to put translational force onto the pin header that is stuck into the vias.  If you are using PCBs, you can even put pin headers along side each device.  Or, you can put one header on the end of the address bus and then just data headers along side each device.

Yes, you still need some way to generate the address, data and control signals to program the device but that should be fairly straightfoward with some kind of counter to drive the address and 8 data lines.

Those flash devices are only a couple of bucks each.

https://www.digikey.ca/products/en/integrated-circuits-ics/memory/774?k=sst39sf010a (https://www.digikey.ca/products/en/integrated-circuits-ics/memory/774?k=sst39sf010a)

Just the fact that the PDIP is 0.300" is an improvement!  If they can be programmed in-circuit, things get a lot better.
Title: Re: Home build microcode design
Post by: bd139 on August 08, 2017, 09:47:38 pm
EEPROM FTW

https://www.youtube.com/watch?v=K88pgWhEb1M (https://www.youtube.com/watch?v=K88pgWhEb1M)

Purely by coincidence used for microcode storage.
Title: Re: Home build microcode design
Post by: BrianHG on August 08, 2017, 10:10:31 pm
128k x 32bit eprom chips used to be made as a standard.  It would make bread-boarding things easy.
If you really want to save that extra wiring time avoiding wiring 2 of 256k x 16 eproms, or 4x 8bit eproms.
You might be able to find these in old surplus electronic shops or ebay.
Title: Re: Home build microcode design
Post by: bitman on August 09, 2017, 01:23:24 am
EEPROM FTW

Purely by coincidence used for microcode storage.

Right, I had something similar built, but it wasn't reliable enough. Even worse, I have a couple of different EPROMs I've used in the project, and I didn't like having multiple different setups and programs created one for each chip. So I got a TL866A a while back. My software simply spits out a binary file of the size of the chip, and the minipro does the rest. It also reduces the number of times I have to insert and remove the chip from the breadboard.  Since I can only write ONE at a time, I would need additional circuits to indicate which set of data to use too. It just seemed a very fragile idea - in particular because I know I'll have to redo the microcode a lot as I learn from doing.

I probably should mention that Ben Eater (from the youtube you linked) was one of the people online that inspired me to do this project.  While he got me started with his design, I quickly diverted quite a bit from what he was/is doing.
Title: Re: Home build microcode design
Post by: bitman on August 09, 2017, 01:33:18 am
Thanks guys. I'll definitely read up on the bit-serial architecture etc. but at this point that would require a full re-design of a complete model I've already put together. So that would have to wait for another project.

I should probably mention, that this is ALL built on breadboards. Replacing chips etc. isn't an issue - making it look neat is :)  I've basically got everything designed mandatory for a run, but I need to reduce some of the IO - there are some that are exclusive and could be reduced with a few logic gates, which should reduce the complexity in the microcode. Then I need a "POR" (power on reset) and find a way to have the rest of the PC (Program Counter) set to a non-zero value on reset (MSB must be set) - but that's about the majority of the challenges I have left except for the microcode.

Speed is not really a concern. This is primarily for MY education and once done I hope to use it to teach from too. Currently the clock I've put together maxes out at around 300kHz and I doubt I'll EVER run at that speed. As a matter of fact, I have the setup so I can do manual clock-pulses instead of automatic, to illustrate how things are working. This means I have a LOT of LEDs to illustrate state of different areas visually. So from that angle, this is not meant as something high performing - but it has to work.
Title: Re: Home build microcode design
Post by: edavid on August 09, 2017, 01:54:15 am
I'm preparing to setup the microcode in the EPROM. Currently I have 16 bits for each step in the microcode, meaning potentially 16 signals I can send to different components to enable/disable them. What I'm realizing is that's way too few - my current count is 24+ depending on how much ALU I want to do, and it may be higher if I add a couple more registers. I'm currently going through the design to see where I can consolidate and lower the number of signals I need to control, but in the end I don't think 16 will cut it. So my question is simple: Are there any tricks using latches etc. or do I need to increase the number of controls?

The trick is called vertical microcode.  There are a lot of variations.  For example, you might have a 2 bit field in your microcode word that enables 4 different sets of control bit latches.  That lets you control 4 x 14 = 56 control bits with a 16 bit microcode word.   The tradeoff is that you might take up to 4 cycles to execute one microinstruction, but that doesn't seem like a problem in your case.

Title: Re: Home build microcode design
Post by: C on August 09, 2017, 05:24:04 am

A PDP-11 34 was a microcoded 16-bit computer built around the 74181 & 74182.
It had no clock, It used a delay line for timing & some bits in microcode selected which delay tap to use to get to next step in microcode. This allowed for some microcode steps to be a short time while allowing more time for paths that used the 74181 or other slower logic. For a CPU memory access, a signal from memory created the next microcode step pulse. CPU memory cycle was async with the CPU stopped waiting for memory. The only problem this creates is NO Memory at address and no signal to start CPU again. This was handled by simple monostable that would create the needed pulse after a time and also generate a vectored interrupt for no memory. This was not as complicated as it sounds. It was just a microcode jump to a different microcode address when no memory existed.

While the PDP-11 only had 8 registers visible, more registers were used inside the CPU to make microcode easer and logic simpler. For example one of the hidden registers was the actual address lines that appeared on the unibus. This added two more bits to microcode, but allowed the same microcode used to do CPU instruction move r1,r2 to be used for Move PC(R7) , (hidden)address register.
In part due to hidden registers, the 8 visible registers were all the same. The microcode is what made R7 the PC register & R6 the SP.


Above I said 74181 which is hard to find these days. But if you think about it, you could create one with a rom with proper contents.

"POR" (power on reset)
Think you are thinking logic with connected microcode, big problem. Try thinking microcode with connected logic. Then POR is just starting microcode at address 0 in microcode & the microcode steps does what is needed for POR before getting to main part of microcode.

The later PDP-11 45 added some ram to microcode space & some instructions to load the microcode ram from CPU address space.. You could then build special instructions using a normal program.

Note that the PDP-11 used 4 74181's for 16-bits. The instruction set allows you to work with larger integers by using arrays in memory.
For the IBM 360, the cheaper 360's used less bits in the adder(74181 for PDP) and more microcode such that less hardware in CPU still looked like a IBM 380. Pay more and get more 74181's and faster CPU.



Title: Re: Home build microcode design
Post by: tggzzz on August 09, 2017, 06:45:13 am
I should probably mention, that this is ALL built on breadboards. Replacing chips etc. isn't an issue - making it look neat is :)

A significant part of your education will be how appallingly unreliable solderless breadboards are; you will spend more time debugging the breadboard than your design. Problems are excess inductance and intermittent contacts.

Soldered bread boards are much better, but practical knowledge is needed  to avoid pain points.

Have fun, and keep learning :)
Title: Re: Home build microcode design
Post by: bitman on August 10, 2017, 02:41:10 am

A PDP-11 34 was a microcoded 16-bit computer built around the 74181 & 74182.
I actually have two 74181's (not the 182) as the ALU. I wanted more than just add/subtract and I found a few old/used ones on Ebay that I'm using. Just for 8 bit data bus though (each is 4 bit). I don't really need anything bigger on that bus for this project.

As for the microcode, that was my initial question here on how wide to make it, as 16 bits was too few for the design I had, not even counting the 5 bits needed for the 74181. Adding demux and buffers will add a lot of complexity that I'm trying to avoid (the teaching part). I really would like to end up with a simple line from the microcode bit to the "feature" like enabling a register to load. It will be a lot easier to explain that way.  When I initially began, 16 bit seemed like way too much but well, it's not turning out so well. My PC register currently have 4 lines, that I think I can reduce to 3. Most of my registers have 3 states (enabled/no connection, read, write). I originally wanted 5 registers, plus PC, memory, instruction register ... things accumulated way too fast to be under 16. At least 25 right now, and that's without all 5 registers that I was planning on making.

So right now I'm trying to reduce the lines, like to a register - I think I can have just 2, one enable and one for READ (low), WRITE (high) but I'll need some more logic gates to make that happen and well breadboards are HARD to get everything to fit on :)  So I'm stuggling to come up with ways to reduce the number of lines I need to go to the microcode but at the same time I realized that there's NO way I can get away with just 16, unless I do as you explained with line selection etc.  The problem with that is I cannot have TWO states open at the same time. For instance, if I use 2 bits to binary represent 4 registers's enable line, I can ONLY enable one at a time. Then I need to latch it, I need more steps in the microcode to make it work and it won't be as clear what's going on.  Right now I can simply enable both registers and transfer values between them. Or enable output from the ALU and enable a register to read in one pulse. So I really like the idea of a parallel bus for the microcode as I think it shows better. The fact it's less steps isn't important. I've made it so I can have up to 8 (3 bits) steps per OP which from what I can see is plenty for my need. Funny side bar on that - I have really screwed up making an "end instruction" set - and last night as I was testing I had a big DUH moment when I realized I can mistaken the instruction register output for the micro-code instruction output. *sigh* that will be a rather large re-wiring job, but I'm happy I found it now and not later.

And that's why I use bread-boards. It's not my first "duh" moment when I have had to totally scratch or take many steps backwards to make the step forward work.  And that's really my purpose with this - realize how things like pull up/down impacts, how to read the data-sheets correct so things work and how to diagnose shorts through ICs. Even though I'm not done, I've picked up a lot of things that I'm now first reading about - and I've learned the hard way what is clearly documented in old TTL books.
Title: Re: Home build microcode design
Post by: tggzzz on August 10, 2017, 07:01:14 am
And that's why I use bread-boards. It's not my first "duh" moment when I have had to totally scratch or take many steps backwards to make the step forward work.  And that's really my purpose with this - realize how things like pull up/down impacts, how to read the data-sheets correct so things work and how to diagnose shorts through ICs. Even though I'm not done, I've picked up a lot of things that I'm now first reading about - and I've learned the hard way what is clearly documented in old TTL books.

Ah, been there, done that :)

You'll learn a lot, including how to work out what you don't know and how to think of quick tests to see if your guesses are/aren't right.

You may know this, but don't forget ceramic decoupling capacitors on short leads for each IC, and the "ground bounce" when many outputs switch simultaneously.

Happy hacking!
Title: Re: Home build microcode design
Post by: C on August 10, 2017, 07:59:52 am
First how you look at things can make a difference.

Z80
  Most times you see the Z80 instruction byte value listed in HEX. Take a look at it in octal.  For 8-bits an FFH becomes 377. For most part the instruction set is very logical. Note that Z80 uses some byte values as pre instruction selectors to get a combination of 8-bit instructions and 16-bit instructions.

PDP-11
  To see the logic of this instruction set you also need to use octal. DEC created a PDP-11 programming care that has full instruction set on it.
http://www.montagar.com/~patj/dec/pocket/pdp11_programmingcard_1975.pdf (http://www.montagar.com/~patj/dec/pocket/pdp11_programmingcard_1975.pdf)
Note how logical the instruction set is if you read the bits from left(MSB) to right. This logical design also helps in micro code and in hardware.
As I said all visible registers( R0-R7) are the same, it's microcode that puts special names on them.

74182
  Not a big deal you have none. You can do it in logic if needed.

What I read in your last post sounds like you are making it harder then needed.

If you do not care about speed, you can get simple.
Start with a normal fast ram chip as store of your registers. You can get rid of a lot of chips doing this at a cost of more microcode steps.
A register to register move would become Move ram register to temp register, Move temp register to ram register. Simple but more time and microcode steps.

Now add your 181's
To start add two 8 bit latches for 181 data inputs and one 8-bit latch for 181 output. Later you should see that you can remove one or more of these latches, but this is simple start. 
Now you need microcode to move a register to 181-A latch & microcode to move a register to 181-B latch.
Microcode to setup 181's 5 control bits.
Microcode to latch 181 output in 181-output latch.
Microcode to move 181-output latch to register.
This can be a used for a lot of instructions.

The microcode is acting like a C function.
You can have a C IF or CASE statement as microcode.
All this says that you have variable length microcode sections

You microcode storage is cheap these days, bigger is better.
You need a microcode PC to know what microcode step your at.
Don't use a counter! Use a chips like a 273. Power up clears the 273.
With the output of 273 going to microcode store you select a row of microcode.
By having two fields in microcode for next microcode address, you could use a 1 of 2 selector for what appears at 273's inputs for next clock. You now have an IF in microcode.
by using larger 1 of x you can get a case statement or more choices. One of the inputs could be a field in CPU's instruction.
Look at PDP-11's double operand instructions. bits 15-12
Bit 15 is word or byte, a IF at some point in microcode.
You have 3 bits you can use to jump to different spot in microcode in the process of instruction decode..
Microcode can work from left to right processing each field in instruction.
All the different PDP-11 resister modes is just a step through the 181 or other microcode steps.

With how cheap your microcode store these days, there is no good reason to have every row of microcode filled.
In place of added logic, you can store logic results in microcode.
Nothing says that all inputs to 273's needs to always come from field in microcode.
Nothing says that all addresses to microcode store chips have to be the same.

So far you have ram chip for registers and 3  temp latches.
For a 16-bit CPU address bus you add 2 more temp 8-bit latches.
For 8-bit data bus add 2 more 8-bit latches, in & out.
One or more 8-bit latches for Instruction decode.
One for flags register.
Nothing says that you could not use more space in ram chip to store temp results. You might need this for Multiply, Divide, floating point, ect.

With a larger microcode store you can do a lot more with little added logic.
Start with your store being 64-bits wide. Go even wider if it can remove added logic.
As for rows, with 2 273's you could have 65.536.
You want a bunch of cheap & fast chips for your microcode store.
You could use rom or ram.
Some or all ram would let you have an instruction that loads microcode store from CPU memory or some other place. All ram is just using a little rom to do initial load..
Remember that with huge possible microcode store you can strap some address inputs to make it smaller.
The extra address lines can be used to select different microcode rows as next row address.

You said two 181's, You can build any size CPU you want, you just need to microcode process 8-bits at a time.
Where the PDP-11 uses 1 bit in instruction to select byte or word(16-bit), one more bit lets you have four sizes. Or you can do like Z80 where only some instructions can be a different size and put size field in microcode.

For teaching, think very wide but simple. From memory the PDP-11 microcode made things harder by using some bits in microcode for many purposes. Simple logic is easer to understand.

I should also mention that for the PDP-11there was a board that you could plug in that would allow testing/checking the logic with just a volt meter. Was more then just single step of microcode.
Also note on programming card "Processor Register Addresses"

Hope this might help some.
Title: Re: Home build microcode design
Post by: bitman on August 10, 2017, 05:14:03 pm
Thanks - this is a lot - I'll pick a few things out right now and address those. I'm woefully behind on creating diagrams of my current design, but since it changes every time I look at it, it feels like "why bother" every time. Diagramming takes me a long time since I suck at it (it wasn't until very recently I realized I could use labels to make it look much nicer). Anyway, I've asked quite a few questions about this project on the Beginners forum, and I explained there that I'm very hesitant posting exactly what I'm doing because I want to dumb into issues as it's how I learn.

That said ...

First how you look at things can make a difference.

Z80
  Most times you see the Z80 instruction byte value listed in HEX. Take a look at it in octal.  For 8-bits an FFH becomes 377. For most part the instruction set is very logical. Note that Z80 uses some byte values as pre instruction selectors to get a combination of 8-bit instructions and 16-bit instructions.
I did a LOT of Z80 in the 80ies - it's very rusty. If you think it will be helpful I'll find some datasheets and study it.

Quote
PDP-11
  To see the logic of this instruction set you also need to use octal. DEC created a PDP-11 programming care that has full instruction set on it.
http://www.montagar.com/~patj/dec/pocket/pdp11_programmingcard_1975.pdf (http://www.montagar.com/~patj/dec/pocket/pdp11_programmingcard_1975.pdf)
I never had a chance to use an actual PDP11 so I'm not really aware of it's basic CPU design etc. Von Neuman was pretty much the type of CPU I studied and knew how to design on paper - and it's pretty much the design I'm going with for my own home build.

I have two busses - 16 bit address, 8 bit data. I am aiming for 5 registers, right now I just have 2 using 273's and a 245 as buffer. The buffer is REALLY important, as it allows me to put a LED array right on the input to the buffer to show the value in the register. Nice and easy :)  I use 555s for the clock and a few logic ICs for control. I have a PC which is basic counters, the 181 for ALU, a data<->address bus transfer buffer, an address register, an instruction register and YES I do use a counter for the microcode offset - more about that later.

My plan was not to use a flag register. But instead simply hook the Z and C flags (the only flags I have) directly to two bits on the microcode address line. So a JMPZ would be at one address, and JMP would be a binary exponential offset from that - no fancy logic needed. Now, I know that reduces the number of OPs I can create but I don't think 64 OP codes is a limit for me (or it's one I can live with).  If that's a bad idea I need to understand why - and right now I find out when I apply it to reality and it sometimes makes me go "hmmmm - that was a dumb idea" :)

I use a simple 2Kx8 EPROM for microcode, I have a RAM chip + a EPROM on the address line. I use MSB on the address line to select between the EPROM and the RAM (the RAM only has 15 address bits).

Quote
74182
  Not a big deal you have none. You can do it in logic if needed.

I understood that the 182 is just for carrybit optimization - something I'm not really aiming at. Are there other functions I need to be aware of?

Quote
What I read in your last post sounds like you are making it harder then needed.

If you do not care about speed, you can get simple.
Start with a normal fast ram chip as store of your registers. You can get rid of a lot of chips doing this at a cost of more microcode steps.
A register to register move would become Move ram register to temp register, Move temp register to ram register. Simple but more time and microcode steps.
To be honest, reading this seems to be more complex than what I have?  Every register is connected to the data bus through the buffer. I READ from the data line without using a buffer, but I control the read clock through an AND gate with the clock and my "enable" signal. I share the AND gate between the registers. Now that I want to reduce the number of lines, I am rethinking that but it's a very basic design for now. So transferring data is a matter of enabling output on one, and WRITE on another.  Only a few registers are connected to the ALU directly.
[/quote]

Quote
Now add your 181's
To start add two 8 bit latches for 181 data inputs and one 8-bit latch for 181 output. Later you should see that you can remove one or more of these latches, but this is simple start. 
Now you need microcode to move a register to 181-A latch & microcode to move a register to 181-B latch.
Microcode to setup 181's 5 control bits.
Microcode to latch 181 output in 181-output latch.
Microcode to move 181-output latch to register.
This can be a used for a lot of instructions.
Thanks! This I need to revist to fully grasp. I was NOT planning on using registers - maybe a set of logic gates to reduce the number of bits to 2 or 3 (I only need shift, add/sub, compare for now. I'm not sure if I need to enable AND, OR, NOT etc. but I may. Again, I may need to compromise as I reduce the number of signals.

Quote
The microcode is acting like a C function.
You can have a C IF or CASE statement as microcode.
All this says that you have variable length microcode sections
So how would you implement this? I already explained about how I were going to implement the conditional jump statement. Nothing in the microcode will understand what "JUMP" means let alone a condition? I'm still reading about this, so if there's a good source to understand how you would wire microcode to do these kind of conditions I would very much like to learn.

Quote
You microcode storage is cheap these days, bigger is better.
You need a microcode PC to know what microcode step your at.
Don't use a counter! Use a chips like a 273. Power up clears the 273.

Why?? That's pretty much what I have. A 4 bit counter where I reset when the 4th bit is set (so it's a 3 bit counter). Seems to work just as I want it to? My plan is to add the RESET of the counter to a microcode bit so I can terminate a set of microcode steps if I don't use all 8 positions.  That seemed to be an easy design. What I'm stuggling more with is how to advance the PC every time (seems like every OP will have one or two steps that are EXACTLY the same, so there should be a way to implement this without using the EPROM for every OP).  Note, the counter I use has a built in register too - but it's only used to LOAD a particular number into the counter, so I don't use it for anything.

Quote
So far you have ram chip for registers and 3  temp latches.
For a 16-bit CPU address bus you add 2 more temp 8-bit latches.
For 8-bit data bus add 2 more 8-bit latches, in & out.
One or more 8-bit latches for Instruction decode.
One for flags register.
Nothing says that you could not use more space in ram chip to store temp results. You might need this for Multiply, Divide, floating point, ect.
If I have to add a bunch of other stuff to those latches, all I really did was built two busses instead of 1? What's the advantage here - since I'll have to latch/buffer between each register for transfer/load regardless?

Quote
With a larger microcode store you can do a lot more with little added logic.
Start with your store being 64-bits wide. Go even wider if it can remove added logic.
I'll stick to 4 chips for now (32bit) :) All the chips I've found with 16 bit data out are all serial instead of parallel and I cannot see how I can use that without a microcontroller.

Quote
As for rows, with 2 273's you could have 65.536.

This is where I goofed up earlier this week. You're talking about the number of OP codes not microcode?  The OP code is a simple 8 bit value which I use as an address line into the microcode storage. I add a few more bits for offset but that's about it. I have 4 8 bit EPROMS giving me 32 output for the same address.  The OP code is stored in the instruction register which is a 273 (it gets it's value from the bus, where the "read memory" buffer puts the value of the address on the address line, which comes from the PC or the address register).

Given that, I am a bit confused about the above statement. Hopefully you can see how my terminology works here, and what I'm attempting to do, and with that hopefully you can phrase it in a way I understand the difference. Right now it's only got me more confused.

Quote
You want a bunch of cheap & fast chips for your microcode store.
You could use rom or ram.
Some or all ram would let you have an instruction that loads microcode store from CPU memory or some other place. All ram is just using a little rom to do initial load..
Remember that with huge possible microcode store you can strap some address inputs to make it smaller.
The extra address lines can be used to select different microcode rows as next row address.
So speed is again not important. Strangely enough though, the RAM I have is SLOWER reading than my EPROM is :) The ram is about 55-70ns - the EPROM is around 30 (from memory). That's plenty fast for my purpose here. But you're right, if I used DRAM and have 5ns I would copy to RAM on boot. Absolutely.

Quote
You said two 181's, You can build any size CPU you want, you just need to microcode process 8-bits at a time.
Where the PDP-11 uses 1 bit in instruction to select byte or word(16-bit), one more bit lets you have four sizes. Or you can do like Z80 where only some instructions can be a different size and put size field in microcode.
So I'm struggling with that. I've been wondering how to create OP codes to do 16 bit operations given I only have 8 bits on the BUS. Not sure how I would feed 16 bit data to the 8 bit ALU and find a way to move the carry over to the second operation. Probably need another set of latches :(

Quote
For teaching, think very wide but simple. From memory the PDP-11 microcode made things harder by using some bits in microcode for many purposes. Simple logic is easer to understand.

I should also mention that for the PDP-11there was a board that you could plug in that would allow testing/checking the logic with just a volt meter. Was more then just single step of microcode.
Also note on programming card "Processor Register Addresses"

Hope this might help some.

So I would have to pick up PDP11 myself first :) I once illustrated how a CPU worked using paper notes that people carried from person to person. Each person was a function, and the people that moved the papers around were the bus :)  But it was again a basic Von Neuman + a bit of Z80 that I used back then (80ies - a looong time ago).

Thanks alot . while I may not have fully grasped everything you tried to say, I'll read it again and see how I can apply it.
Title: Re: Home build microcode design
Post by: edavid on August 10, 2017, 06:23:33 pm
So speed is again not important. Strangely enough though, the RAM I have is SLOWER reading than my EPROM is :) The ram is about 55-70ns - the EPROM is around 30 (from memory).

Are you sure?  What is the EPROM part number?
Title: Re: Home build microcode design
Post by: rstofer on August 10, 2017, 09:08:18 pm
You need to register (I'm not going to use the term 'latch') the status bits for subsequent jump on condition codes.  These need not be the next sequential instruction and won't be when intermediate results are being saved to registers.

You will also use the carry flag when doing multibyte arithmetic which is how it was always done on the 8 bit microcontrollers.

You might consider registering bit 3 of the ALU output to assist with the DAA (Decimal Adjust Accumulator) instruction if you decide to implement it.  It's handy if you want to do decimal arithmetic.

I don't know if registering the accumulator parity is worth worrying about.  The 8080 types had JPE and JPO instructions.  I have only seen them used once.  Ever...

There are 74181s all over eBay but there is no good reason to implement a 16 bit datapath unless you really want to.  A whole lot of computing was done with 8 bit processors.

Title: Re: Home build microcode design
Post by: C on August 10, 2017, 09:09:47 pm
Quote
I did a LOT of Z80 in the 80ies - it's very rusty. If you think it will be helpful I'll find some datasheets and study it.

Your rusty knowledge is most likely ok,  You would just need to look at the instruction set byte values in octal.
Four 8 x 8 tables will show the logic.

Think of a picture of a (real)Z80 acting like a Z80 by running a program.
The (real)Z80 is changing IO bits just like a real Z80 changes what is on it's pins.
You have IO for each pin of the real Z80.
You would have two 8-bit ports for address bus
One two direction port for data bus or one in and one out port.
A port for control bus.
and more.
So you have less then 40 pins total.

The program on the (real)Z80 is acting like microcode. Due to the ports used,
The (real)Z80 memory is separate from the other.
the microcode does not have to stop to maintain proper values on any pin.
A change could be made so that the (real)Z80 would be acting like a PDP-11 or other CPU.

64 bit add on a Z80, 64 is just a number could be larger or smaller, 8-bit steps makes it easer.
64-bits is 8 bytes, so think of three arrays of 8 bytes.
Z80 does add A= A + __
To add larger like 64-bits you need add X = A + B
Note that Z80 has two Add instructions.
one is ADD while the other is ADC {add with carry}
You do a ADD on least byte followed by 7 ADC. Each ACD is adding in the carry from lower byte.
so
ADD X(0) = A(0) , B(0) ; Lsb
ADC X(1) = A(1) , B(1)
ADC X(2) = A(2) , B(2)
ADC X(3) = A(3) , B(3)
ADC X(4) = A(4) , B(4)
ADC X(5) = A(5) , B(5)
ADC X(6) = A(6) , B(6)
ADC X(7) = A(7) , B(7)

If you look at the 181, you have some pins that connect to next 181.
So one 181 can act like many if you save a copy of value that goes between the chips and use this value as input for next step with 181.
ADC is just using stored output of connecting lines of 181 as input to connecting lines.

Quote
My plan was not to use a flag register.
It is common to have one instruction modifying flags and then do more instructions before checking if a flag was changed.

Quote
I use a simple 2Kx8 EPROM for microcode, I have a RAM chip + a EPROM on the address line. I use MSB on the address line to select between the EPROM and the RAM (the RAM only has 15 address bits).
Quick look at this sounds like a mess. It looks like you are trying to put microcode into CPU memory.
You said you wanted to start with a 32-bit wide microcode and are building an 8-bit CPU.
Microcode store = 4x  2Kx8 EPROM 
You could add some ram to Microcode store 4x ___ ram\

Instructions and data for CPU should be separate.
A PDP-11 is von Neumann architecture to the programmer & users.
The logic and microcode of CPU is more Harvard architecture.
Normal Harvard architecture  is instructions (microcode) separate from data.
Data inside CPU is all CPU registers, the 181's and other registers that make logic simpler.

Quote
I understood that the 182 is just for carrybit optimization - something I'm not really aiming at. Are there other functions I need to be aware of?
N0, Just a way of speeding up 181's

Quote
To be honest, reading this seems to be more complex than what I have?

A bus is easer and quicker to wire and can do more at the cost of time.
Last post I used a normal ram to contain all the CPU registers.
This limits you to a read or a write.

With a lot more chips you could have a 374 as a register.
You would have an input bus and an output bus.
OE controls what goes on output bus.
Ck controls when input bus is latched.
You now have Read-modify-write, faster but many more chips needed.

You could enable OE all the time and use two or more chips like 244 to get data to the two inputs of 181. Again many more chips needed but you can gain some speed.
You have two output buses and one input bus.

So a Move using ram needs one latch to hold data between read cycle and write cycle of ram.
Using the ALU(the 181's) needs two read cycles and one write cycle with ram. And two or three latches.

Simple is using more microcode steps with wider microcode.

I used three temp registers for 181 in last post. Each would have a bit in microcode for when to clock the input in to the latch.
The one on output of 181 would have a additional bit in microcode for when to enable output.

Quote
Only a few registers are connected to the ALU directly.
This greatly limits what can be done.

8-bits you normally think a max of 256 instructions.
Z80 Shows you can get more then 256 instruction with a little change.

With more instructions you lose some by needing to read more instruction bytes but can have a huge gain by not needing to use even more instructions to do same job.
Look at difference of using Z80's DJNZ vs doing same thing with normal instructions.
A lot of instructions are used to work around only having one accumulator on Z80
Much easer and quicker on PDP-11 where every register is an accumulator.
With the register modes on PDP-11 any register can be used as a stack pointer.
As I said the modes are created by microcode and some use of 181's for some modes.
Quote
Quote
The microcode is acting like a C function.
You can have a C IF or CASE statement as microcode.
All this says that you have variable length microcode sections
So how would you implement this? I already explained about how I were going to implement the conditional jump statement. Nothing in the microcode will understand what "JUMP" means let alone a condition? I'm still reading about this, so if there's a good source to understand how you would wire microcode to do these kind of conditions I would very much like to learn.
[/quote]

What is a conditional jump in assembly or IF in C
Do next PC address or other PC address based on condition.
Next address is a field in microcode.
Other address is a field in microcode.
Condition just selects which is used.

Quote
Why?? That's pretty much what I have. A 4 bit counter where I reset when the 4th bit is set (so it's a 3 bit counter). Seems to work just as I want it to?
At first it looks easy. But it is not very powerful and very limiting.

Next address field in microcode lets you have any amount of microcode for an instruction. Lets you have common sections of microcode.
The Z80 checks for interrupt and other things before next instruction, Common microcode.
Some Z80 instructions do (HL) and access memory. Again common microcode.

Quote
Quote
As for rows, with 2 273's you could have 65.536.
This is where I goofed up earlier this week. You're talking about the number of OP codes not microcode?
No this is microcode bit rows or microcode steps.

Draw your self a box. The pins of a Z80 are the connections from inside box to outside of box.
Outside of box the Z80 looks like a Von Neuman processor.
The Z80 chip is a black box.
You are building what is inside the black box.

If you look around there are other 8-bit microprocessors in a 40 pin chip.

If you keep the insides of black box general in logic, then the microcode makes it a Z80, 8080, 8085, 6502, 6800, 6809
By using two bytes for an instruction you could have the PDP-11 instruction set with a 8-bit outside data bus. The hardware is there, this is just a microcode change.
Add more bits to address bus, this is a small change in hardware and you can add later gen Z80's

General here is simple logic that is very flexible.
My ram chip for registers  gives you simple for a huge number of registers.

From a teaching point of view, simple can be easy to understand.
Simple also would let you show small changes needed to expand to a 16-bit data bus and then act like a PDP-11
Or change to different instruction set.

Look at what you are doing now.
Remove Added logic to simplify  and get same result in microcode steps.

Quote
So I would have to pick up PDP11 myself first :) I once illustrated how a CPU worked using paper notes that people carried from person to person. Each person was a function, and the people that moved the papers around were the bus :)  But it was again a basic Von Neuman + a bit of Z80 that I used back then (80ies - a looong time ago).

For simple what is on programming card is all you need to know about PDP-11 instructions.
To get closer to an actual PDP-11 then you would have to add more knowledge about PDP-11
For example the PDP-11 34 used the unibus to connect CPU to rest of computer.
An LSI-11 23 used the PDP-11 instruction set with the Qbus and CPU bus to rest of computer.
The PDP-11 45 used unibus for IO and a different bus for memory. Basic instruction set is the same, 45 added some instructions.

Now with little hardware change, DEC created a VAX. Most VAXes could still run PDP-11 programs.

IF you do not understand something, please ask.
 




 

Title: Re: Home build microcode design
Post by: rstofer on August 10, 2017, 11:59:05 pm
Unless there is some software use issue, it would seem to me that any of the 8080,8085 or Z80 chips would be a real project to do with discretes.  The project just seems too big!

That's why I like BLUE.  I have never built it but I have been thinking about it as an educational computer for decades (like '75 and onward).  BLUE is a 16 bit minicomputer with a very limited instruction set and a very sparse set of registers.  There is no stack pointer, all addressing is direct and there are no internal user registers.  A very basic, easy to understand machine.  And, yes, it could be microcoded although the project itself is simple enough to do with a couple of dozen TTL chips.

https://www.alibris.com/Computer-Architecture-Caxton-C-Foster/book/1255656 (https://www.alibris.com/Computer-Architecture-Caxton-C-Foster/book/1255656)

Here's a simulator for BLUE

http://brainwagon.org/2011/07/07/a-basic-simulator-for-caxton-fosters-blue-architecture/ (http://brainwagon.org/2011/07/07/a-basic-simulator-for-caxton-fosters-blue-architecture/)

A very supercharged FPGA implementation - expanded instruction set, assembler, all kinds of stuff:

https://opencores.org/project,blue (https://opencores.org/project,blue)

If BLUE is too small, the INDIGO computer in the next chapter is much more complete.

BLUE and INDIGO are simply the colors of the boxes...
Title: Re: Home build microcode design
Post by: rstofer on August 12, 2017, 12:58:42 am
Here's a neat device to make registers.  It will do 16 registers of 4 bits so 4 of them make a 16 register array of 16 bits.  Lots of registers1

http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf (http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf)
Title: Re: Home build microcode design
Post by: bitman on August 12, 2017, 02:42:53 pm
So speed is again not important. Strangely enough though, the RAM I have is SLOWER reading than my EPROM is :) The ram is about 55-70ns - the EPROM is around 30 (from memory).

Are you sure?  What is the EPROM part number?

Nope :(  It was from memory, and unfortunately that has been scewed by reading almost a hundred data sheets over the weeks. I double checked and it's 200ns - not 20.  The chip I settled on was xls28c16bp - I gave up on the EPROMs that need different voltages for read and write. Beyond my capabilities right now to deal with so this one was much easier for me to use.
Title: Re: Home build microcode design
Post by: bitman on August 12, 2017, 03:35:30 pm
@C - wow, thanks for a very lenghty post. I've been away from my office for a few days, so I have a lot to catch up on here.
Lots of great ideas (never thought of the input+output buss - that would greatly simplify the settings and IO lines I need). I'm still not sure how you do logic with microcode. I understand that it's implemented once you have an answer, by executing the next or the following OP code. It's how you TEST for things in micro-code. The only way I know how to do that is use the flag input as part of the address offset in microcode, so offset X gives you "false" and offset Y gives you true. One would go ONE microcode instruction down, the other would go two.

And while I understand that not having a fixed number of steps for each instruction, I don't get how you can find instructions unless you use an index based on the OP code. I really would like to share parts of the microcode between all instructions just as you explained, but I'm struggling with how to design that. And the simplicity of the counter and a PROM seems straight forward and easy to explain too - too "convoluted" and part of the whole idea falls apart here.

What I am realizing is that to make a full explanation I should include the Z80 and perhaps the PDP11 or similar systems to illustrate how things can be optimized and made better. So again, THANK YOU for giving me lots of materials/reasons for reading :)

I'm keeping this short as I catch up on a lot of things. If I have any questions I'll definitely post again.
Title: Re: Home build microcode design
Post by: bitman on August 12, 2017, 03:37:25 pm
Here's a neat device to make registers.  It will do 16 registers of 4 bits so 4 of them make a 16 register array of 16 bits.  Lots of registers1

http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf (http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf)

I have to ask - HOW? How is it easier if I need to shift values and read two addresses for each byte?
I get the impression that I'm missing something very fundamental here. To me, it's a lot more complex with fewer bits?
Title: Re: Home build microcode design
Post by: rstofer on August 12, 2017, 03:52:18 pm
Here's a neat device to make registers.  It will do 16 registers of 4 bits so 4 of them make a 16 register array of 16 bits.  Lots of registers1

http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf (http://ltodi.est.ips.pt/lab-dee-et/datasheets/TTL/74189.pdf)

I have to ask - HOW? How is it easier if I need to shift values and read two addresses for each byte?
I get the impression that I'm missing something very fundamental here. To me, it's a lot more complex with fewer bits?

Or perhaps I am missing something...

With two of these tied together (address field is common to both), you have 16 registers of 8 bit width (4 bits from each device).  With four tied together (again, the address fields are common), you have 16 registers of 16 bit width.  But the cool thing might be the fact that you could control just 8 of the 16 bit width so you could easily have something like register pairs (B-C, D-E, H-L) as well as true 16 bit registers like PC and SP.  You also have the ability to use the pairs as 16 bit registers.

The fact that the output is open collector means the register bank will be easy to gate on and off a bus without a MUX.

They are not true dual port RAM (pity) so you can't read the outputs and load the inputs simultaneously.

I wasn't aware that the devices are getting scarce.  They are in stock at Jameco but unless I could get a handful, I'm not sure I would design them into anything.  They do make the registers a little more compact.  Four chips gives 16 each 16 bit registers.  That's a lot in a small amount of real estate.
Title: Re: Home build microcode design
Post by: rstofer on August 12, 2017, 04:10:32 pm
Your microcode has a program counter (for lack of a better name) that is simply incremented as you traverse the linear instructions.  Your microcode also has a next address field which is jammed into the PC instead of incrementing.  Somewhere in your microcode, you look at a condition (carry bit, for example) and decide whether to increment the PC or jam a branch address.  More often than not, the next address field is redundant because you are executing sequential instructions.  There may be some conditions under which the field can be used for other things.  It only needs to be an address when the code is going to branch.  But be careful!  The downstream stuff needs to know when to ignore the bits from the address field.

You could implement a stack mechanism for microcode and this would allow you to call subroutines.  You might limit the call depth to just one level and this might be quite adequate.  It is highly unlikely that microcode will involve recursion.  I have had to do this very thing in my FPGA project because the main FSA has a lot of calls to memory read/write and that happens to take two cycles.  So, I save the return state in a register before branching to the memory code and when the memory operation is complete, the memory code branches to the return state.  It's simple but it eliminates a lot of extra states in the FSA.

If you do the microcode properly, the PC is already pointing to the next instruction.  So, if you want to call something, you use the next address field to load the PC while simultaneously saving the current PC in the return address register.  In this scheme, the subroutine always returns to the instruction following the CALL and this is usually what you want.
Title: Re: Home build microcode design
Post by: rstofer on August 12, 2017, 04:28:16 pm
I don't know if I have already recommended this book but here goes:

"Microprocessor Design Using Verilog HDL" available from Circuit Cellar:

http://www.cc-webshop.com/Microprocessor-Design-Using-Verilog-HDL-CC-BK-9780963013354.htm (http://www.cc-webshop.com/Microprocessor-Design-Using-Verilog-HDL-CC-BK-9780963013354.htm)

This book is all about the Z80 and although your project won't involve Verilog, the important parts of the book are the early chapters where the author describes the hardware layout and uses spreadsheets to decompose all of the instructions into process steps.  I could see where this could be quite handy as an approach to writing microcode.

Title: Re: Home build microcode design
Post by: C on August 12, 2017, 05:00:06 pm

In rstofer post, He used more chips to get wider. That is one way.

Look at it a different way
That is a 64 bit register but you work with blocks of 4-bits.
You have to do 16 steps doing this where a 64-bit wide one would be one step.

You asked " how to do big math"
How about 64-bit to match above.
You take 16 181's and connect them for 64-bit.
If you capture the data between chips in the center and use two steps, you could use 8 chips.  For the second step you are using the captured data as input.
If  you keep splitting the 181 ALU you get to
Use ONE 181, a latch and 16 steps.

A lot less chips but also > 16 times slower.

What is the difference between a 74F189 and normal ram memory?
The 71F189 has data inputs & outputs. This lets you use one step.
The outputs give your READ, what is connected gives you MODIFY & inputs gives you WRITE.
To use normal ram that is two steps.

 



 
Title: Re: Home build microcode design
Post by: rstofer on August 12, 2017, 06:15:09 pm

In rstofer post, He used more chips to get wider. That is one way.

Look at it a different way
That is a 64 bit register but you work with blocks of 4-bits.
You have to do 16 steps doing this where a 64-bit wide one would be one step.

You asked " how to do big math"
How about 64-bit to match above.
You take 16 181's and connect them for 64-bit.
If you capture the data between chips in the center and use two steps, you could use 8 chips.  For the second step you are using the captured data as input.
If  you keep splitting the 181 ALU you get to
Use ONE 181, a latch and 16 steps.

A lot less chips but also > 16 times slower.

The Intel 4004 took this approach - 4 bits.  We could get down to 1 bit in the degenerate case.  It's been done!  In fact, early numerical control did serial arithmetic from a magnetostrictive delay line memory.  The 1960s Bendix Dynapath is an example.

There is always a speed/complexity tradeoff.

Quote
What is the difference between a 74F189 and normal ram memory?
The 71F189 has data inputs & outputs. This lets you use one step.
The outputs give your READ, what is connected gives you MODIFY & inputs gives you WRITE.
To use normal ram that is two steps.

Separate input and output lines are interesting but timing is everything.  I haven't designed with them so I don't know if it is a help or not.  In any event, it isn't true dual port RAM so it is still limited.  It isn't truly clocked either.  It is more like latched.  I haven't checked the timing to see if it comes close to being clocked.  Setup and hold times will matter, I suspect.

One thing that is annoying and may disqualify the part is the fact that the outputs are inverted.  That's a real PITA unless there is somewhere that inversion is being done anyway.  The 74F219 does not invert the outputs.  Maybe that's a better idea.  They appear to be more readily available.

Don't overlook having two parallel banks of registers.  This way you can add a register to another register by addressing the banks separately and we get away from having a pair of MUXes to select the operands on both sides of the ALU.  Whenever a register is written, both banks are updated simultaneously.

I haven't really thought through how the register chips would work.  The idea would be to lay out a datapath and then start writing pseudo microcode and see how it works out.  Maybe it doesn't work out at all.  Do we register the ALU output?  Do we write inverted data? It matters...

Doing nibble width operations on bytes or byte width operations on words requires more microcode and, worse, it gets in the way of the discussion.

In my view, the Z80 itself gets in the way of the discussion.  There are too many instructions, too many addressing modes and the byte/word stuff is a stumbling block.  That's why, for an educational tool, I would look at a far simpler CPU.  Complexity can come later.

Here's another thought:  If the CPU doesn't need 16 registers, load constants like +1 -1 and 0 into other registers so that something like INC and DEC can be implemented in the ALU without having to MUX in some other values.

You get these values in the registers by having the microcode force the ALU to create them.  Or maybe they come from the next address field.  There will no doubt be a MUX on the inputs to the register bank to select between the ALU and other possible sources.  No reason one of the MUX inputs can't be the next address field.
Title: Re: Home build microcode design
Post by: C on August 12, 2017, 08:12:58 pm
rstofer
I have done 1 bit & magnetostrictive delay line memory

First there was registers
then chips like 189
Z80 calls it a "Register Array"
Others call it a "Register File"

DM74LS670 3-STATE 4-by-4 Register File
http://eeshop.unl.edu/pdf/DM74LS670.pdf (http://eeshop.unl.edu/pdf/DM74LS670.pdf)

Think all are getting hard to find, Today a little slower and you can use ram with Read-modify-Write cycles.
Faster is pay the dollars for 2 or more port memory.

If you want less logic you use more microcode and get smarter how it is used.
More serial processing in place of more logic parallel.

Think about the ALU, here an 181 was mentioned.
If all control bits are in microcode, a section of microcode can do anything the ALU is capable of doing.
On the PDP-11 all 8 visible registers could be used as input or output for ALU
If my memory is correct 16 to 1 mux where used not 8 to 1
The 8 extra inputs to mux allowed a simpler CPU logic and more capabilities for microcode.
Flags register
Address bus register
Data bus register
Control bus register
Instruction register
Microcode input register
Temp register
I think these were the other 8 inputs to mux

The microcode could do more then PDP-11 instruction set allowed.
Options to add instructions by adding more microcode.
There was an option in later generation PDP-11's to add a writable microcode so you could create special instructions.

Each microcode step was timed by a TTL delay line.
Fast logic paths in CPU were faster.
Memory cards closer to CPU were faster due to less bus delays. Nanosecond faster memory meant a faster computer.

 




 


Title: Re: Home build microcode design
Post by: C on August 13, 2017, 12:47:12 am
Lecture 6. Microarchitecture II - Carnegie Mellon - Computer Architecture 2015 - Onur Mutlu
https://www.youtube.com/watch?v=b4Dl85FVSok (https://www.youtube.com/watch?v=b4Dl85FVSok)

This might help
Title: Re: Home build microcode design
Post by: bitman on August 14, 2017, 10:08:03 pm
Somewhere in your microcode, you look at a condition (carry bit, for example) and decide whether to increment the PC or jam a branch address. 
Right - that's the issue. HOW? I understand that OP codes are basically conditional based on the state of a flag which is set electronically by ALUs or simular circuits to indicate the state of one or more values in registers.  Since microcode is "no more" than simply turning on/off signals, the question is how does microcode READ and ACT on the values it reads. This is one of the topics that once I understood what was being done (not the "how") it helped tremendously as a programmer to make better and more optimized code.  C made sense, expressions etc. - so I get that part, now I'm at the part where I'm trying to deal with the "how" side of things.

The way I'm planning (in the last few weeks I've had but very few hours to think/look at this) is to let the flags control the offset of the microcode being executed. That way, when executing a JMZ instruction, there are two locations in the microcode - one that is executed when there's no Z flag, and another section when the Z flag is set.  Each section is statically defined, but the choice of what section to execute is done using the flag input. It means I will reduce my 8 bit OP codes with one per per flag, which is a LOT. I have two flags, so that reduces the number of OP codes to 64 in my case. I can make due with that, even though I may have to not implement features I thought about.

Quote
More often than not, the next address field is redundant because you are executing sequential instructions.  There may be some conditions under which the field can be used for other things.  It only needs to be an address when the code is going to branch.  But be careful!  The downstream stuff needs to know when to ignore the bits from the address field.\
Correct - the preamble to each OP is the same, so that repeats regardless of the OP being executed. And there definitely are aspects of the microcode that is shared but it would be one or two cycles tops - an internal jump would not be optimal. So I could of course use an indirect reference to a table of instructions, but I'm not sure I see the exact advantage here. If values are given as parameters to some OP codes it's going to be very hard code to do to use indirect pointers based on a base offset instead of just the current OP code. This is however something I've thought about but well, I think it would destroy the whole idea of what I'm doing to go that deep.

What I was thinking about doing was the "set PC to count", "load PC address content onto bus", "read OP from bus into instruction register". This needs to happen for EVERY instruction. If I can find a way to having that done independent of the actual content I have, it would make the design much cleaner.

Quote
You could implement a stack mechanism for microcode and this would allow you to call subroutines.  You might limit the call depth to just one level and this might be quite adequate.  It is highly unlikely that microcode will involve recursion.
I'm fascinated if this is how Microcode is done today. That to me is too complex for microcode but something you implement a bit higher in the stack. 

Quote
  I have had to do this very thing in my FPGA project because the main FSA has a lot of calls to memory read/write and that happens to take two cycles.  So, I save the return state in a register before branching to the memory code and when the memory operation is complete, the memory code branches to the return state.  It's simple but it eliminates a lot of extra states in the FSA.
You would need to load every register into memory - how do you ensure that ALL states are saved? The moment you change your memory register, you lost state?

Quote
If you do the microcode properly, the PC is already pointing to the next instruction.  So, if you want to call something, you use the next address field to load the PC while simultaneously saving the current PC in the return address register.  In this scheme, the subroutine always returns to the instruction following the CALL and this is usually what you want.
Correct - it would be GREAT if I can make it so the PC is always "ahead one" so the LAST thing every instruction needs to do is advance the PC by one. THAT I'm definitely struggling with right now.
Title: Re: Home build microcode design
Post by: C on August 14, 2017, 10:57:15 pm
My understanding is you want something you can teach.
For this simple is better. A very slow system would be fine and allow you to show how you can add to the basics to make it faster.

To do simple you need to use the ALU a lot for many things.
Think you are using 2 74181's
The 181 gives you four outputs that normally go to the next 74181. You need to latch these outputs.
You will use them as inputs to Control logic to change microcode path.
You will later feed these latched outputs in to LSB 181 so that you can do bigger number math. A 1 of 2 data selector would let you do this.

When you look at a Z80 chip, every pin on that chip is a gated input or the output of a Latch.

so microcode to get an instruction is like this in microcode.
1. Copy PC to Address register
2 set Control bus register to read from memory and turn on M1
3 check for wait input If true go to 3 else goto 4
4. copy memory data in register to IR register.
5. end memory cycle by seting control bus register..
You now have instruction in IR register.
6. enable some bits of IR register to modify part of next microcode address.
For Z80 thus is D6 & D7 so microcode does a four way jump.
You might jump to
100 in microcode for D6 & D7 = 00
200 in microcode  for D6 & D7 = 01
300 in microcode  for D6 & D7 = 10
400 in microcode  for D6 & D7 = 11

If you look in Z80 instruction set 01 is LD R,R

Here is where using a RAM chip for registers reduces logic a lot.
200 Enable IR D2-D0 to register ram, read mode
201 latch output of register ram
202 ensble IR D5-D3 to register ram. enable register ram output latch, Write mode
203 You have now completed the move between registers.

Watch the Video.
Title: Re: Home build microcode design
Post by: C on August 14, 2017, 11:15:44 pm
Look at Z80 instruction set
D7 & D6 = 10 is 8 bit arth & logic
300 Enable IR D2-D0 to register ram, read mode
read from register ram latch in second register ram output latch
Enable register A address to register ram, read mode
latch latch in first register ram output latch
Enable IR D5-D3 tp modify next microcode address

You now have inputs to ALU set and have different areas in microcode to set control input to ALU. As you will need ALU in many places in microcode, ALU Control is a field in microcode.

Jump to common area and write output of ALU to address of register A in register ram

In keeping it simple, use microcode to have ALU to add 1 to PC for next PC address.


 
Title: Re: Home build microcode design
Post by: rstofer on August 15, 2017, 06:14:42 am
Somewhere in your microcode, you look at a condition (carry bit, for example) and decide whether to increment the PC or jam a branch address. 
Right - that's the issue. HOW? I understand that OP codes are basically conditional based on the state of a flag which is set electronically by ALUs or simular circuits to indicate the state of one or more values in registers.  Since microcode is "no more" than simply turning on/off signals, the question is how does microcode READ and ACT on the values it reads. This is one of the topics that once I understood what was being done (not the "how") it helped tremendously as a programmer to make better and more optimized code.  C made sense, expressions etc. - so I get that part, now I'm at the part where I'm trying to deal with the "how" side of things.

First of all, consider building the PC with a string of binary counters such that a simple increment doesn't take an adder.

Suppose there is a MUX with two inputs, the PC or the Next Address Field.  Suppose there is a bit in the microcode that combines with a particular flag (again, say carry) that tells the MUX which address to use for the next microcode instructions.  If the MUX is told to use the next address field, parallel load this into the PC. 

It is up to you whether you increment the PC at the end of an instruction sequence or just after the fetch.  I tend to do it after the fetch such that the PC points to the next instruction.  Then if I have to copy the value into a return address register for a CALL instruction, the return address is already correct.

Something like that...
Quote
Quote
You could implement a stack mechanism for microcode and this would allow you to call subroutines.  You might limit the call depth to just one level and this might be quite adequate.  It is highly unlikely that microcode will involve recursion.
I'm fascinated if this is how Microcode is done today. That to me is too complex for microcode but something you implement a bit higher in the stack. 

Quote
  I have had to do this very thing in my FPGA project because the main FSA has a lot of calls to memory read/write and that happens to take two cycles.  So, I save the return state in a register before branching to the memory code and when the memory operation is complete, the memory code branches to the return state.  It's simple but it eliminates a lot of extra states in the FSA.
You would need to load every register into memory - how do you ensure that ALL states are saved? The moment you change your memory register, you lost state?

I must have messed up the explanation, it just isn't that complex.  As you process certain instructions, you will find subsequences of microcode repeated throughout.  To reduce redundancy, or just guarantee that a subsequence is coded properly once instead of a dozen times, you create a subroutine.  It takes no parameters and probably doesn't return a value but is simply a sequence of microcode.  If you implement a CALL instruction in microcode, all you need to do is save the return address and branch to the called routine.  When it's time to return, jam the return address into the PC.

In my case, there were several places in the FSM where I needed to read or write memoty as part of processiing an instruction.  I decided to make these sequences subroutines because RAM the operations were two states and when using external SRAM, they were one state.

Just a one level stack so subroutines can't be nested.

As you start to code the various instructions, you may find the idea useful.  Or not...
Title: Re: Home build microcode design
Post by: rstofer on August 15, 2017, 08:03:55 am
On the IBM 1130, instructions come in two formats:  Long and short.  A short instruction does not use an address in the following word so I'll skip those.  The Long Format instruction has a word following the instruction word (like LXI <addr> except the instruction is also 16 bits).  Then there are index registers to add to the address and these are not really registers, they are located at addresses 1..3 of RAM.  There is also indirect addressing, I'll skip that...

An instruction/operand fetch might go like this:

1) Fetch the word at the PC and increment the PC. Memory Read required
2) Decode a long instruction so fetch the address field at the PC and increment the PC.  Memory Read required
3) Notice that the address is to be indexed so fetch the index register from memory.  Memory Read required.
4) Add the index register to the previously fetched address field
5) Use this effective address to fetch the operand .  Another Memory Read required.

By the time you add in indirect addressing and short format addressing, operand fetch is quite a project!

To just fetch the operand (as shown above) takes 4 memory reads.  In the case where each read takes two cycles, I wind up with 4 extra states (4 are required as a minimum if I inline the code) or I can CALL a subroutine in appropriate places and use far fewer states.  Further, as the Memory Read changes over the life of the project (from BlockRAM to SRAM to PSDRAM), I only had to change the code in one place rather that at every subsequence requiring memory access.

In an FPGA, all states will need some kind of name.  Extra states means I need to invent more names.

In the case of an FSM, all I had to do was save the next state in a register (the return address register), and set the next state to the CALLed routine.  I had to leave the present state and go somewhere, it might as well be to the code that handles memory access.  It's the same transition cycle in any event.  When the called routine is complete, it just branches to the return address.  This is all part of the last state of the memory access code.  No extraneous states are used.


It costs nothing extra to call and return while saving a LOT of states.

These are all FSM operations and have little relationship to anything higher in the code stack.  The user doesn't know and doesn't need to know how instructions are decoded and executed.

For microcode, the CALL is just going to be a bit in the word storing the microcode PC (not the PC register) into a temporary register and jamming the address of the called function (the next address field) into the microcode PC.

At some point, the microcode looks an awful lot like a higher level of coding and the architecture looks a lot like a higher level view of a CPU because even the microcode has a program counter and a (somewhat limited) stack.

Earlier I mentioned testing a bit (carry) and causing the MUX to select the next address field or the PC as the address of the next microcode instruction.  I was talking about just 1 condition.  There can be many conditions that are tested (carry, zero, minus, plus, parity, etc as well as inverted tests like not carry, not zero, not minus, etc.) and the effect is the same.  You take a branch if the condition is met or you execute the next microcode instruction if the condition is not met (or vice versa).  So, your microcode word has to have bits to select the conditions (AND the condition with the test bit) and some external logic combines these conditions (big OR gate) to direct the MUX.  You need to consider the possibility that you will test multiple conditions in a single instruction.

Until you lay out the instructions and start writing pseudo microcode, it's hard to see what the architecture should look like.  Do you need a MUX for the microcode address?  Do you need a MUX in front of the parallel load feature of the microcode PC?  How many inputs will these MUXen require?  (I just invented MUXen.  I derived it from VAXen which are multiple VAX machines).  You can't know these things until you see exactly what is required to execute instructions in the shortest number of cycles.
Title: Re: Home build microcode design
Post by: rstofer on August 15, 2017, 01:26:41 pm
One other input to the microcode address MUX will be some permutation of the op code.  Let's say you have 256 opcodes and, on average, they take 8 states.  Build an address with leading zeros, the op code shifted left 3 bits and 3 trailing zeros.  Right after instruction fetch, you just use this value as the microcode next address.  You branch and simultaneously load the value into the microcode PC.  At some point, you need to increment the PC before the next microcode fetch.

If executing an instruction takes more than 8 states then there will be a branch to some overflow area where code is just tacked to the end.  OTOH, if you have a lot of space, maybe you allow 16 states per instruction.

There will be variations on this idea as you see how the instructions are encoded.  It's usually not random and similar instructions tend to have adjacent op codes.

Now your MUX for the microcode address has at least 4 inputs:  The PC, the next address field, the return address and this computed GOTO address.

Notice that the address is being used to cause a branch while simultaneously being loaded into the PC.  This saves a state.  Otherwise, you would have the MUX load the PC and then in the next cycle perform the actual branch.

You'll see how this works when you start laying out some of your instructions.

Over at opencores.org, they have the T80 project which is an FPGA implementation of the Z80.  There might be something interesting in the code.  I have used that core to run CP/M and Pacman.  It definitely works!

I could see some advantage to coding this thing at a very low level in C.  There would be a huge switch statement that emulated the microcode.  This might be my very first approach to the problem.  Get the CPU working in software.  Things like MUXen would be functions, their control would come from the pseudo microcode in the switch statement.  C is a lot faster than wire-wrap.

There are simh simulations for the Z80 but I don't know anything about them.

http://www.classiccmp.org/cpmarchives/cpm/mirrors/www.schorn.ch/cpm/intro.php (http://www.classiccmp.org/cpmarchives/cpm/mirrors/www.schorn.ch/cpm/intro.php)

Title: Re: Home build microcode design
Post by: bitman on August 15, 2017, 07:00:36 pm
Until you lay out the instructions and start writing pseudo microcode, it's hard to see what the architecture should look like.

I think the crocks is right here. My approach for my "design" is as simple as "have the microcode do exactly what you do when manually setting the switches". Which means, I haven't really felt I needed to plan out the microcode - I was going to assign each input to a bit, make a piece of code on a PC where those bits were represented by symbols, and simply setup the microcode as I wanted it there (ie. I would set the enable flags based on what I would do manually).  What I thought was my only challenge was to figure out how to so a simple conditional jump based on the two flags I have, and the "simple" solution I came up with seems to work on paper, however timing wise I'll have to try it out to be sure.  My "only" issue has been I have way more "enable" bits than I expected to have, so I am trying to reduce them with gates first, and THEN I'll assign them to microcode bits.

Thanks to you and the others here, I will do a bit (a lot!) more reading and reconsider things. I am still aiming for simple so what I would definitely want to avoid is very advanced microcode that is hard to explain the first time.  I had my 71 year old father here the other day and sat with him showing how it works (with no microcode). All the LEDs showing state and what was "where" really helped him understand basic concepts of stuff he wasn't aware of, and that's part of what I'm aiming at here. Secondly, I want to prove to myself I can do this basic stuff with electronics, and not pretend I would design any level of modern processors with pipelining and optimization - that's definitely NOT the level I'm at knowledge wise but I love to learn how it's done.
Title: Re: Home build microcode design
Post by: rstofer on August 15, 2017, 07:51:44 pm
My "only" issue has been I have way more "enable" bits than I expected to have, so I am trying to reduce them with gates first, and THEN I'll assign them to microcode bits.

My IBM1130 emulation has about 80 enable bits and more than half are encoded = use 3 bits to represent 8 alternatives, etc.  That decoder is fairly cheap in an FPGA, not so cheap in discretes.  Actually, under the covers, the FPGA will assign one bit per alternative (known as one-hot encoding) so the overall word length is probably around 150 bits - just guessing, too many to count.

Adding external logic to reduce the width of the microcode word adds more chips to the design (decoders, AND, OR, etc) and really complicates the discussion.  In every case, speed, simplicity, whatever, wide is better.  Except that the usage will necessarily be sparse.

The state machine has 116 states and it is one-hot encoded.  So when I save the return state to a register, that register is 116 bits wide.  Ouch!

So, 116 words of 150 bits.  Not that bad really!  20 chips of 256x8 will do nicely.

Bigger CPUs will require more microcode.  The 1130 has a fairly limited instruction set and is essentially register-less except for those 3 index registers in RAM and the usual Accumulator, Extension, Program Counter, Memory Address Register and Memory Buffer Register.  No array of general purpose registers.

Let me show you a little about testing conditions in microcode - ok, in VHDL but the idea is the same (file attached):

When I get to the FETCH state, I first look at the DisplaySwitch and if it is set, I simply fetch the contents of memory and display it.  The address comes from the console entry switches.  You can see the A_BusCtrl field is set to select the console entry switches with the A_BUS_CES selection of the A bus MUX (29 inputs used).  I then load the Storage Address Register and head off to a MemRd.  I return to state s0c which just puts the contents just read in the Storage Buffer Register (code not shown, not important).

The point is, my microcode word would need a bit called TestDisplaySwitch which would be gated with the actual switch signal to determine whether to do those things above or continue to the next bit of code when another bit tests for a PendingInterrupt.  I can do this and still have the ability to do the actual fetch in the same FSM cycle because I can have multiple next address fields (s0c, s1a, s2).  You probably can't do it that way in microcode so you will need more cycles.  Test the DisplaySwitch and move to Test for Pending Interrupt and move to the actual Fetch.   Three cycles unless you get very creative.  But notice that each of the two tests will have a separate next address field and the logic that determines whether to get the next microcode word from the PC or next address field will have 2 more inputs.  That is going to be a very wide OR gate with LOTS of 2-input AND gates feeding in test results.  There are a lot of decisions to make and every one of them will involve selecting which address to use next.

In each of these blocks, you can see where various outputs are being manipulated.  This is just exactly what your microcode has to do.

Hope this gives you something of value!


Title: Re: Home build microcode design
Post by: rstofer on August 15, 2017, 08:28:44 pm
As I said above, the FPGA code will result in a very wide next address field because of one-hot encoding.  The synthesizer does this so that exactly one bit is high at each state.  In the case of microcode, the next address field will be a binary number so maybe only 8 bits wide for 256 lines of microcode.  The need for one-hot disappears because the wide microcode word already deals with selecting individual control signals.  So, maybe the memory is 256x96 or a dozen ROMs.  Maybe even less.  Don't let the numbers I posted above concern you too much, your code won't look much like the VHDL implementation.  But the ideas are the same.

You can use a MUX on the test signals if you are only testing one signal at a time.  So, if you had 16 discrete tests, you could use a 16:1 MUX and control it with 4 bits.  This replaces the large OR gate and the quantity of AND gates.  If you needed 32 tests, you could use 2 MUXen and just OR the output to the address selection MUX.  This might be a little more difficult if there are 4 inputs to the address MUX.  You might use 2 microcode bits to select one of 4 address choices and the output of the test MUX to enable the selection.  Just make sure it defaults to the program counter.
Title: Re: Home build microcode design
Post by: bitman on August 17, 2017, 12:27:10 am
That decoder is fairly cheap in an FPGA, not so cheap in discretes.  Actually, under the covers, the FPGA will assign one bit per alternative (known as one-hot encoding) so the overall word length is probably around 150 bits - just guessing, too many to count.
I'll go back to reading here, but I do want to point out that I'm just playing with simple 7400 series ICs here. No FPGA, no tricks up the sleve - just plain old 1970ies technology to show the basics of a processor (granted, I have RAM in my setup but let's not be that precise). FPGA is way beyond my "little" project and it's capabilities will not help here. I had the same thoughts about using the 181 because a lot of the complexity is hidden inside the chips and hard to demo - just having a simple adder and XOR for subtraction would have allowed me to show easily how things are done electronically. But I wanted more features than just those two operations so I eventually opted for the 181. Going to FPGA is overkill. Do they even make them with "thru-hole" setups?

But again I have to thank you for the tremendous amount of information here. It's very informative and I've picked up a lot of tricks I weren't aware of before.  Some of the books mentioned on this thread is on order, but won't be delivered for weeks - so in the mean time I'll fool around with my 7400 series chips.
Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 01:00:10 am
My FPGA exsmple wasn't intended to sell FPGAs, just illustrate that there will be a good deal of condition testing, even in microcode.  I was thinking that every place I have an IF or ELSIF statement, I am testing something external to the current state and making a decision on where to branch next.  The exact same thing happens in microcode.

You will most certainly have conditional instructions (JZ, JNZ, JC, JNX, JP, JNP) and potentially DJNZ.  Or maybe there are repetitive MOV statements.  All of these will require testing and branching within microcode.

It's easy to think through the simple LDA, STA, arithmetic and logical instructions, no branching is required in the microcode.  This test/branch issue will start to show up when you begin decomposing the instruction set in to steps.
Title: Re: Home build microcode design
Post by: C on August 17, 2017, 01:15:21 am

I think you would gain a lot watching Video I posted.

Video talks about micro sequencing so that you can have many steps for each instruction.
This one simple idea, lets you use much simpler logic. 

Think of your microcode
If you latch the output of your EPROM microcode, then A clock can start Eprom changing to new address and you have output of latches controlling your logic for a clock cycle.
This time from start address change to latch data output sets min time for a clock cycle.

Video shows this and answers other questions you have asked.

Need to keep in mind Video is about newer CPU where the PDP-11 was built from simple TTL with the fancy chip being the 181



Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 04:22:35 pm
The Video linked above mentions Patterson & Hennessy and the appendix where a microcoded MIPS is discussed.  My hardcopy of P&H doesn't have it nor does this 900+ page PDF

http://nsec.sjtu.edu.cn/data/MK.Computer.Organization.and.Design.4th.Edition.Oct.2011.pdf (http://nsec.sjtu.edu.cn/data/MK.Computer.Organization.and.Design.4th.Edition.Oct.2011.pdf)

This may take a while to download but it's free!

I found the appendix here:

http://www.cs.tufts.edu/comp/140/files/Appendix-D.pdf (http://www.cs.tufts.edu/comp/140/files/Appendix-D.pdf)

Glossing over the material, it appears that there are only 10 states in the MIPS control unit and only a couple of dozen (less, actually) control outputs.  This is possible because the MIPS is a RISC machine.

There are other documents on the Internet that, along with the book above, work to describe the MIPS architecture.  It is notable that Microchip uses the architecture in their PIC32 processors.

Figure 2.4.6 shows a separate adder used to form microprogram_counter + 1.  I don't know (yet) if the counter could be implemented as a counter/register.

It would take some time to really understand the Appendix.

Dr. Hennessy was one of the founders of MIPS Computer Systems Inc.
Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 06:36:13 pm
That video linked above is well worth the time. 

For a simple machine like the LC3b, the microcode is short (31 instructions) and directly to the point.  Note that there is no microcode PC, the next address field is used in every instruction.  This is a nice way to go when the microcode is short.  But what happens in these very compact code sequences is that the programmer has to assign specific addresses early on (like 0,1,2,...15 for the op code branches) and, necessarily, the initial microcode instruction address for the LC3b coming out of reset isn't 0, it starts at 18.  What isn't shown is how that comes to be.  It can't start at 0 because that is the branch address for instruction code 0.

Here is a description of the microarchitecture of the LC3b there is much more available on the Internet:

http://users.ece.utexas.edu/~patt/05f.360N/handouts/360n.appC.pdf (http://users.ece.utexas.edu/~patt/05f.360N/handouts/360n.appC.pdf)

Note that the ISA for the machine is very basic but it does have 8 registers in a bank along with a distinct PC (page 7).

This is a very approachable CPU.  I would certainly be more interested in this design as a learning tool than in some truly hairy CPU like the Z80.

Other than as a notational tool, I don't care for "don't cares".  I assume that the "X"s will be replaced by "0"s when the ROM is programmed.  "Don't cares" make sense in Karnaugh Maps where I can use them to reduce the logic.


There is a C compiler available for LC3b.
Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 06:45:51 pm
I have been looking for a project.  Given that there is a C compiler for LC3b, maybe I should implement it in an FPGA using the structure of a microcoded design.  That will be FUN!
Title: Re: Home build microcode design
Post by: C on August 17, 2017, 06:58:16 pm
 rstofer

Some people think that a RISC machine is simpler then a CISC machine.

I look at that statement as a loaded question.

For example, Today I see most designs being based on a fixed clock.
This changes what is possible in a design.

The PDP-11 is not a Fixed Clock machine.
The microcode has a field that selects what delay is needed for each microcode step!. In place of one fixed in time critical path time, you have many different times.
The PDP-11 uses a delay line as it's clock source. The selected output of this delay line becomes the next input to the delay line.
With this, you can have a microcode step where the microcode address change is the critical path time and have a critical path that adds the delay of the 181 for a second critical path time.
CPU logic is still clocked logic but clock is not a fixed time length but in many fixed time delays.

So a small change can change the design rules.
This change also lets microcode do more with less logic by using more steps.

Adding Rows & bits to microcode can be a lot cheaper for chip count then adding a bunch of logic.

To restate What I have said
The PDP-11 makes 8 registers visible to the ISA.
The PC one of the 8, is made into the PC Register by Different microcode.
Any one to the 8 registers can be a Stack Register. It's the microcode that connects the PUSH/POP instructions to the stack register.

To Make logic in CPU simpler 16 registers are used internal to CPU
The Address bus, Data bus, control bus & IR register are some of the hidden 8 registers.

So should you look at MIPS if your are building with TTL chips and not a CHIP where you have many transistors?

Microcode with micro sequencing, you can build a CPU with a lot less chips and use the 181 ALU many places for it's math & logic reducing chip count more.. 

From a Teaching point of view, starting from very simple that works with options to add more parallel processing later would be easer to teach.
 
Figure 2.4.6 shows a separate adder using to form microprogram_counter + 1.  I don't know (yet) is the counter could be implemented as a counter/register.

With micro sequencing, you can use the one  ALU to add to the PC while instruction fetch is happening.

As I see it, if you are building from TTL chips, less chips is easer to build and teach.
Using a register file where possible is much less chips then having many registers with many inputs & outputs.

If the CPU is built around an ALU based on the 181, it is smart to use the 181 as much as possible.

Need to keep in mind that LC3b is a 32-bit RISC CPU.
That video linked above is well worth the time. 

 I would certainly be more interested in this design as a learning tool than in some truly hairy CPU like the Z80.
Hairy Z80
Are you comparing apples to oranges?
An 8-bit base instruction set size has a lot of limits due to being 8-bits wide.
Think you would find that Z80 in microcode with micro sequencing is very simple.
The base instruction set is the 8080 where
D7, D6 is and instruction field used to select 4 groups of instructions.
D5,D4,D3 and D2,D1,D0 are two octal based fields.
The instruction set is simple when you use octal to look at instructions.
What Z80 added to 8080 instruction set is also simple & logical.


Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 08:48:56 pm
First up, the COMPLETE LC3b takes 52 states, not just the first 31.  This is caused by interrupt and trap handling.  It is described in Appendix C.  http://highered.mheducation.com/sites/0072467509/student_view0/appendices_a__b__c__d____e.html (http://highered.mheducation.com/sites/0072467509/student_view0/appendices_a__b__c__d____e.html)

Second, by CISC, I mean CPUs that have complex instructions like DJNZ or REP MOV, instructions that can take a variable number of cycles (states) and require more condition testing.  Do you interrupt a complex instruction?  How do you recover?  Can a REP MOV instruction destroy interrupt latency?  That kind of thing...

The RISC term, as I intend it, is used to describe CPUs where all computing is done between registers.  Those registers are loaded and stored (LOAD and STORE architecture) but no computation is done using memory as an operand.  Operands are in registers.  Which allows overlapped fetch and, ultimately simplifies pipelining.  The registers and ALU can be doing whatever while the fetch and decode units are doing their thing.  Not that this is the only way of doing things.

I haven't had this much fun since grad school!  It's been more than 40 years but we spent a great deal of time discussing the PDP-11.  This was primarily due to it being understandable with a reasonably orthogonal ISA.  I never used a PDP-11 (sadly)...  My low end experience is with the IBM 1130 and, on the upper end, the CDC 6400.  That 6400 seemed so fast, at the time ('71), but it was only about 1 MIPS.  Lunar Lander on the console was a lot of fun!

This LC3b is a very complete CPU.  It has just about all the features of any modern CPU (ARM, etc) except for virtual memory.

Title: Re: Home build microcode design
Post by: rstofer on August 17, 2017, 09:14:10 pm
One of the reasons I consider the Z80 to be 'hairy' is the fact that a partial instruction decode is necessary just to determine how long the instruction is.  It can be 1,2,3 or 4 bytes.  Just the instruction fetch cycle requires a lot of thought.  And the first byte could just be an escape code (0xED) and the real op code is in the second byte.

Instructions like LDIR require the microcode to count and loop.

The register to register instructions are straightforward.  It's all the instructions around the outside that take a lot of thought.  In any event, I linked to a book earlier in the thread that breaks it all down using spreadsheets.  It's a very nice approach!

Title: Re: Home build microcode design
Post by: C on August 17, 2017, 10:52:39 pm

Does the Z80 have one instruction set?

If you look at problem a different way Z80 has 4 or 5 instruction sets.

0xED could be thought of a one instruction cycle switch to a different instruction set.

ARM does an instruction to switch and stay in thumb instruction set.
This costs a thumb instruction to switch back.

Instructions like LDIR require the microcode to count and loop.

Why does it have to count.
You have two memory cycles, While this is happening you need to do two adds and one subtract with the idle ALU and check Zero output of ALU one time.
Nothing new just not using an instruction to cause a conditional jump on Zero.

Title: Re: Home build microcode design
Post by: bitman on August 18, 2017, 02:55:03 am
I think you would gain a lot watching Video I posted.
I'm about 30-40 minutes in. So far nothing but bullet points about stuff he's going to talk about and attempt at humor that seems to counter his capable knowledge.  I'm not done - and I'm just responding to the few issues I think needs to be addressed, in order to set expectations.
I'm by no means ignoring the advice here. I've ordered a few books, and given videos inspired me to do this, I've watched A LOT of them already and more to come.
Title: Re: Home build microcode design
Post by: C on August 18, 2017, 04:27:55 am

There is a short spot in video talking about micro sequencing.

With this Idea the microcode can act like a computer with parallel tasks.

If done correctly, your ALU the 181's can be used many places in the microcode with little to no cost.

for example
During instruction fetch while waiting on memory, the ALU can be updating the PC.
You already have the ALU just sitting there, it's just some microcode to add one to PC.

Note that by using the 181 you can select one of two inputs.

You need to keep asking, "can the 181 ALU do this and save some added logic?"

In place of starting small and adding as needs arise, Try thinking huge and look at cost to reduce it more.

With an 8-bit wide ALU, which makes more sense to save chips. Adding even more hardware to get 16-bits where needed or doing two steps with 8-bits to get the 16-bits.
Proper design of an 8-bit ALU will let you any size in chunks of 8-bits.

How you look at a problem can really effect what you see as a cure to a problem.
 
I think for your project, a low chip count is important. Being able to use the ALU many places for it's math and logic can remove need for other logic.

Can your existing ALU design be used to operate on any size taking 8-bit steps?
Can your microcode set all 5-bits of control inputs?
Can your microcode direct anything to the inputs of the ALU?
Can your microcode direct the output of ALU to anyplace?

Can you microcode be changing to next state while current state is being processed?

Need to keep in mind that you are more interested in a low chip count where a lot of videos are about how is best to use the many transistors on a chip to get the best use of them.

I think you have spent a lot of time to get where you are at.
If you set it to the side, can you see a better way that uses less chips?

What often looks complicated to do in logic is at times very easy in microcode with less logic.

You have talked about reducing width of microcode.
Each 8-bits in width costs one chip. How many chips will be needed to reduce the width?

 
Title: Re: Home build microcode design
Post by: rstofer on August 18, 2017, 02:59:15 pm
The more places you use the 181s, the wider the ALU input MUXen will need to be.  At some point, you will have a MUX on both the A & B inputs but the question is how wide they need to be.

Again, one common operation is to increment the PC.  This can be done with the main 181s, a separate adder (only) or, even easier, by using a parallel loadable counter.  I would go with the counter.  Adding one is nearly free and you'll need a register for the PC in any event.

If, OTOH, you use a register file chip, it would seem reasonable to have the PC in the file.  This might be a mistake because it won't allow the PC value to be updated while the ALU is using other registers.  So, I would leave the PC outside of the file.  Or, you take more cycles with more microcode.

There is no reason why the 181s can't add 1 to the PC.  Just don't update the condition codes.  You need to update those only when specific instructions require the update.

One cool thing about the register file chip is that, in essence, it already has a MUX on the output.  You can select which of 16 registers are on the output pins.  You may still want to have parallel banks of registers but that depends on how you implement the microcode.

Patt & Patel Appendix C  Pages 568 and 570 (linked yesterday) shows a very diagrammatic way of approaching the FSM.  The Verilog book I linked earlier shows a very nice spreadsheet approach.  But the absolute first thing that needs to be done is to decompose each instruction into little steps - one per clock cycle.  Once that is done, the hardware requirements will be obvious.  You will know how data has to flow through the datapath and all the microcode has to do is manage the traffic.

If you try to use the approach of page 568, you don't have to leave the decode state in a star arrangement as shown.  You can take each of the sequences and draw them as parallel vertical stripes.  P&P doesn't need to do that because there are so few instructions.

The video gets better....  Still, it's only an overview of the sequencer but it does show how the various bits in the microcode control which register is gated on the bus, which register is loaded from the bus, how the next address is worked out and so on.  Remember, certain states are predeclared (0..15) so the code needs to branch all over the place.  Absolutely nothing will be sequential (look at 15,28,30).  It is up to the programmer to figure out the addresses.  Note in the video how the condition codes modify the next address.  Again, these result in fixed addresses so assigning addresses is going to be a large part of a design like this.  But the microcode is compact!  Fifty two states and 49 control signals for a full RISC CPU that handles traps, system calls, priority interrupts, IO devices, several registers and looks like every other RISC machine (ARM, MIPS, etc).

The alternative is to branch to somewhat wider spaced addresses (say 8 x op code) so that each instruction starts with 8 dedicated sequential addresses.  Then if you need more, you can branch to some pool of addresses way out yonder.  Instead of bouncing all over the place, you can organize your code in a nice linear sequence.  Since the code store is cheap and you're not looking for speed, why not see how many states the longest instruction takes, round up to a binary multiple and space all the instructions that far apart?  The only downside is that the next address field gets wider and requires a wider ROM bank and address register.

One advantage of the longer linear arrangement is that you can work on one instruction at a time.  One of the reasons my 1130 is so large is that I actually did work one instruction at a time.  I made no attempt to share common subsequences, I just took the most straight forward approach to coding the FSM.

One issue with tightly compact microcode is that you are truly screwed if you overlooked a state.  You can't simply add another inline statement, you need to branch to some empty location.  Your thought process is strung out all over the place.  Since the next address is programmer supplied, there is no need for a microcode program counter.

For my 1130 project, all of the thinking about each instruction is all in one place.  Certainly not a minimal design and clearly not a workable approach for minimal hardware.

Reading around the Internet, I see where the Z80 was random logic and the Z80A was microcoded.  Don't know if it is true.


Title: Re: Home build microcode design
Post by: westfw on August 19, 2017, 12:50:47 am
Quote
I wanted to ask here what a typical design would look like
I guess that this may be stale by now, but you're essentially looking at the difference between "vertical" and "horizontal" microcode.
Vertical uses "many" shorter microoperations to execute an instruction, while "horizontal" microcode has wider microoperations that do a lot of things in parallel.   Assuming a horizontal scheme, it was common to have "very wide" microcode.  Here's a photo of one of cisco's microcoded interface cards from the late 1980s (one of quite a few similar designs):

I'm counting ... 104 bit wide microcode?  13x 8bit-wide highspeed EPROMS.
Now, probably 16bits of that was a RAM address, and 14bits or so was the "next address" field, but...
(Given the history, this was probably similar to the microcode architecture used by DEC in their 36bit mainframes...)

Title: Re: Home build microcode design
Post by: rstofer on August 19, 2017, 05:00:48 pm
I just finished coding up LC3 in Vivado, probably for the Nexys4_DDR (it has switches, LEDs and 8 digits of 7 segment displays).  I have only completed the basic instructions, I figure I should test those before I start on interrupts and privilege levels.  A couple of thoughts:

First, I wanted to think about the CPU in terms of microcoding but I didn't really want to put microcode in BlockRAM.  But as I coded the conventional state machine approach, I was struck by how similar the coding is to microcode written with a meta-assembler.  In true microcode, we define a wide word and break it up into named fields.  We assign a default value to either the entire word or to specific fields.  If a microcode step doesn't deliberately define a value for a field, it assumes the default value.  This is EXACTLY the case with VHDL - in the combinatorial part of the FSM, we MUST define default values to the fields to prevent the inference of latches.  Then within the Case statement each 'when' statement is a single line of microcode.  We set values to the various fields that need to be considered in the state and we leave the others as default.  The only physical difference is that I don't usually define one great big word, rather I name just a bunch of fields and I don't care how they are organized because it doesn't matter.

If, OTOH, I did define a great big word, I could break it up into fields using the 'alias' construct.  Then I could deal with individual fields exactly as I do now.

Bottom line, there is no difference between microcoding and writing FPGA code.  The very same things happen in the same cycle either way.

Second, that Appendix A & C from Patt & Patel (linked earlier) is an EXCELLENT introduction to microcoding.  Appendix A has the Op Codes and what they are supposed to do and Appendix C describes the hardware layout and the microcode transition diagram.  There is a worksheet for filling in the '1's and '0's of the microcode.  Just completing the primary instructions will be a terrific introduction to microcode.  There isn't always a direct path from one part of the system to another.  Sometimes you need to route the data through MUXen and adders (where, in a couple of cases you add 0 and in one case the main ALU just does a passthrough, HINT!).

This all ties together to become an excellent introduction to hardware design (a simple single bus multi-cycle architecture), FSM layout and microcoding.  It is well worth the time even if it isn't the appropriate target.  The video gives an overview of the process but the professor only codes about 4 states - and he did the easy ones!

Now I need to come up with a way to test it...

To those that start a similar project with Vivado, I will give a huge HINT!  There are syntax errors that cause the file to be removed from the project hierarchy and placed in a non-module status.  One example might be a missing ')'.  The JIT syntax analyzer doesn't always catch it and as soon as you save the file, it is removed from the hierarchy!  No power on earth will get it back in the source tree!  You need to go through your most recently added code and look for syntax errors.  Once corrected, the file will be automatically restored to the hierarchy.  I'll bet I spent an hour looking through Google for a solution and, if it's there, I missed it.  I eventually discovered how it came to be and I started commenting out blocks of code until it was restored and then removing the commenting until it failed.  Sure, it sounds easy now but it took me a while to figure out what was happening.  No such error ever occurred with ISE!