You aren't making any accommodation for Carry In.
neither did the orignal poster. ...
[cquote]
More worrying to me is the "always_ff at @(posedge clk)' controlling the flags and the result: On every single clock, even during instruction fetch, the flags are getting clicked. Any perturbation of either bus or op_sel will change the results. Maybe there is some other logic that prevents this.
[/quote]
That is up to higher up modules. my goal was single cycle alu. if the machine is doing a move operation then the alu is not being used so the output doesn't really matter.
My code is partitioned in such a way that the combinatorial block of the ALU is always active. By the time the posedge comes the output of the combinatorial block is stable so it locks the result .
you could add an additional 'enable_alu' signal that prevents other cycles from modifying the output.
Overflow isn't detected.
mine does. The alu has 9 bit output. the msb is overflow
This processor does not have instruction to do continuous adding. The only add instruction is from x = a + b. you would need additional x=x+a instruction for that.
`define OP_ADD_A 3'b1000
`define OP_ADD_B 3'b1000
`OP_ADD_A alu_out = (result + bus_a);
... and so on.
But architecturally speaking this is garbage.
i would prefer to see additional registers that control source and target. Then you can do anything to anything. The memory map would be a block of ram for example 64K. the processor registers themselves would live above that
reg_source1 @0x10000
reg_source2 @0x10001
reg_target
instructions
LD_SRC_A ' load data from source a in accumulator
LD_SRC_B ' load data from source b in accumulator
ADD_A ' add data for source A to accumluator
ADD_A_INC ' add data for source a to accumulator and increment source_a address.
ADD_B ' add data from source b to accumulator
STOR ' write accumulator contents to target
STORINC ( same as STOR but increment target address ) this allows for stream processing.
since the data pointers (A , B and TARGET) reside inside the memory map you can even manipulate thos eprogrammatically.
i made a core once that could unroll loops. there were speical 'stream instructions.
you set the source and target begin addresses and load a count value. the adding went on (with post increment of address pointers ) as long as count did not reach zero.
crunch a table with 100 numbers ? like adding a constant to 100 memory locations .. 100 clockticks. hardware did it.
vector processing.