Wish I could just design my own micro, there's a dead-zone between 8-bit and higher-end... CPLDs somehow end up fitting in some niches.
Go for it ! it ain't that hard. i've done 2 so far in an FPGA. you define an instruction set , write a huge switch case statement in verilog and cram in all the crap you want to do there. one of my machines uses a dataword that holds an instruction and 3 register addresses. i use multiport memory for RAM.
The ADD instruction takes 3 addresses , two are source the last the target.
ADD 01,02,03 adds data of location to location 02 and stores reult in location 3
ADD 01,02,01 adds data of location to data of location 2 and stores result in 01
ADD 01,01,01 adds data of location to itself and stores result to itself.
in other words : this machine has no registers , accumulator or anything. The whole memory is a registerbank. instead of shuffling data around and having to use intermediate registers i simply connect the whole ALU where it needs to be.
the entire machine is pointer operated , only 'load' operations can throw data in a pointer
the pointers have their own little ALU so i can give relative offsets
the instructions are optimized towards a high level language ( similar in keywords to Basic. I wanted it to be as close to 'human readable english' as possible )
there are special instructions to repeat things a number of times.
the repeat instruction throws something in a downcounter and the loop vectors back to the begin address until the memory location hits zero. there is no stack or anything needed
For x = 10 to 20 step 2
blablabla ...
next
Data is 32 bit in my machine. i 'seed' the data with the begin value. the next word in memory contains the step interval , the third word the end clause
the fourthe the return address and the fifth the continue address.
in machine language :
the above line of code allocates 5 words of RAM :
x = 10
x+1 = 2
x+2 = 20
x+3 loop begin adress
x+4 loop exit address
the ALU gets a machine insutruction
ADD x,x+1,x ' add the step to the count value and store in the count value
blablabla ...
IFEQUAL x,x+2,x+4 'compare memory locations of x ( the counter ) and x+2 ( the endvalue ) . if they math load the program counter with the contents of X+4
PC = x+3 ' load progrma counter iwht contents of x+3 ( begin of the loop )
the compiler is very simple to make. i don't care about optimisation of area of instruction packing ar anything else. i'm after speed. one clocktick should do as much as possible. if i need to add two numbers i do not want to waste 2 clockticks moving data to accumulators, one executing tha add and another moving the data back out. it has to happen in 1 tick.
the triple port memory is a dream for that. i can simply preload the source and target addresses on the bus on the falling edge of the previous instruction. the rising edge triggers my ALU and the result of the alu is latched in memory on the next falling edge. it's a kind of peristaltic movement.
my instruction set is expandable. i can define as many opcodes as i want.