I am thinking about writing some emulation for an 8080 type of project. I've always loved the idea of one piece of equipment emulating another, but I've never thought about what type of approach to take to write code to do it.
Here are my thoughts on it, please make any suggestions for refinements or something I've got wrong or can improve.
First, I plan on implementing each IC as what I will call a device. I plan on writing it in C, but I'm going to do a class style approach of putting all of a device's internal data into a structure and calling the function with a pointer to that structure, like "i8080_clk(i8080statetype *i8080state);" This would allow me to reuse the function in case I wanted multiple devices of the same type. The structure would contain all the internal data for the device and also what I am going to call pins. A pin would be an interface outside the device. A pin will be a struct itself containing 4 variables: direction, output, invertoutput, and netid. Direction and output are set by the device itself during the course of operation. Invertoutput and netid are configured by the initial process that creates what I am calling a net to link devices together. A net has the properties of type, invertoutput, state, laststate, and a list of pointers to all the device pins that are connected to it. Type is complicated in that it can be a net (pullup or pulldown) or gate (and, or, xor). A pullup net for example starts with a state of 1 and looks for any pins attached to it set to output low to change to low. If it finds two outputs set differently (a short!), it would cause a debug trap. Pulldown starts at low and looks for output high pins to set its state to high. Typically a net that should only have one device with its pin set as output at any given time. The gate types do what you would expect, evaluate all the attached device pins and determine an output.
One issue I was concerned with is emulating the parallel nature with a sequential processing. If you have two devices that are clocked with the same clock, you can't call them to do that at the same time because you must call them sequentially:
if (sharedclock)
{
i8080_clk(i8080statetype *i8080state);
device2_clk(devicestatetype *devicestate);
}
The issue here is that each may need to make decisions based on pin inputs that should not change while each function executes. I originally thought I'd have to do something like a "state" and "nextstate" or something like that, but I think the nets that I am using to link pins together can accomplish the same thing. If I change the code to:
if (sharedclock)
{
i8080_clk(i8080statetype *i8080state);
device2_clk(devicestatetype *devicestate);
update_nets();
}
And each of the clk functions looks at the net states for its input pins state, it won't matter if i8080_clk changes its output pins because they will not take effect until update_nets() is finally called. So any devices that are clocked or latched together must use this type of mechansim where each clk function is called and finally an update_nets function is called.
Then I was thinking about memory and how an i8080 communicates with it. It uses a RD signal which is basically attached to the memory IC which enables the memory output, right? This signal is a latch which is just like a clock in that it is in its own latch/clock domain, right? So a new section would need to be added below the above section:
if (net(RD)=high)
{
memory_update(memorystatetype *memorystate);
update_nets();
}
In this case if the i8080 sets its RD signal high and update_nets updates the net state for that, the next section looks to see if the RD state is high, which is essentally what clocks/latches/enables the memory device. I called the function _update because it really isn't being clocked per se, but just responding to the situation of RD going high. In this case, it would read the address pins, grab the data at that address, put it on its data pins, and set its data pins to output. The next update_nets() would adjust the data pin states to reflect the data being presented by the memory. Obviously the i8080_clk that enabled the RD signal needed to switch the data pins to input. On the next execution of the i8080_clk, it can sample the net data pins and acquire the data.
In the case of a disk controller that may have its own clock, it may have a _clk() function which operates its own internal logic and also a _latch() function which latches data into or out of some registers for or from the disk controllers internal logic process.
Some of the above IF blocks will run in response to conditions like the RD is enabled or disabled. These should always run for each loop. Others of the above IF blocks will need to run round robin depending on whether they are "clocked" or not. Given that often timers provided by an OS are not as high a granularity as you would like, I thought of a plan to try to round robin clock them appropriately. Let's say you have a CPU running at 2 MHz and a disk controller running on its own oscillator at 1 MHz. You can keep a variable that represents the where they are in relation to the system clock. Lets say you grab the system time and you can see that it has been bumped 15ms from the last time you checked the system time in the loop (which is probably what windows would do to you!). You would multiply that 15ms to convert it to factor that has the granularity you need. 2 MHz at 15ms is 30000 cycles, so if 15ms goes by, you advance the system timer by 30000. This means that your 2 MHz clock needs to fall behind this timer by 1 to be executed and the 1 MHz clock needs to call behind this timer by 2 to be executed. To be able to properly round robin them you would need to evaluate which is more behind and run that one first. You wouldn't need to figure out which one is more behind logic if the timer had the granularity to move forward one at a time, but if it moves forward 30000 at a time, you want to allow one the faster clock to run twice as fast as the slower one in this example so they catch up to the 30000 close together.
runclk1=clk1+1<=systime;
runclk2=clk2+2<=systime;
if (runclk1 && runclk2)
{
if (clk1<clk2)
runclk2=0; //clk is more behind, do not run clk2 this time
else runclk1=0; //clk2 is more behind, do not run clk1 this time
}
if (runclk1)
{
icA_clk();
icB_clk();
update_nets();
clk1+=1;
}
if (runclk2)
{
icA_clk();
icB_clk();
update_nets();
clk1+=2;
}
To remove all throttling and still run things together in round robin order, you could always push the system clock ahead so it is far enough ahead that all functions can never catch up no matter how fast the main loop is. Then they will still run in order of who is the most behind, but as fast as possible.
Any thoughts, ideas, improvents, or flaws?
I do realize there may be a diminishing point of return for how far you go emulating something. My thought here though is that this could be cycle accurate which is one thing I am looking to accomplish so it could be stopped and single stepped.