You aren't making any accommodation for Carry In.
neither did the orignal poster. ...
More worrying to me is the "always_ff at @(posedge clk)' controlling the flags and the result: On every single clock, even during instruction fetch, the flags are getting clicked. Any perturbation of either bus or op_sel will change the results. Maybe there is some other logic that prevents this.
That is up to higher up modules. my goal was single cycle alu. if the machine is doing a move operation then the alu is not being used so the output doesn't really matter.
My code is partitioned in such a way that the combinatorial block of the ALU is always active. By the time the posedge comes the output of the combinatorial block is stable so it locks the result .
you could add an additional 'enable_alu' signal that prevents other cycles from modifying the output.
Overflow isn't detected.
mine does. The alu has 9 bit output. the msb is overflow
This processor does not have instruction to do continuous adding. The only add instruction is from x = a + b. you would need additional x=x+a instruction for that.
`define OP_ADD_A 3'b1000
`define OP_ADD_B 3'b1000
`OP_ADD_A alu_out = (result + bus_a);
... and so on.
But architecturally speaking this is garbage.
i would prefer to see additional registers that control source and target. Then you can do anything to anything. The memory map would be a block of ram for example 64K. the processor registers themselves would live above that
reg_source1 @0x10000
reg_source2 @0x10001
LD_SRC_A ' load data from source a in accumulator
LD_SRC_B ' load data from source b in accumulator
ADD_A ' add data for source A to accumluator
ADD_A_INC ' add data for source a to accumulator and increment source_a address.
ADD_B ' add data from source b to accumulator
STOR ' write accumulator contents to target
STORINC ( same as STOR but increment target address ) this allows for stream processing.
since the data pointers (A , B and TARGET) reside inside the memory map you can even manipulate thos eprogrammatically.
i made a core once that could unroll loops. there were speical 'stream instructions.
you set the source and target begin addresses and load a count value. the adding went on (with post increment of address pointers ) as long as count did not reach zero.
crunch a table with 100 numbers ? like adding a constant to 100 memory locations .. 100 clockticks. hardware did it.
vector processing.