Currently, we are testing Arise-v2 on the HDL simulator (and sometimes on the real hardware) by writing short assembly programs. Basically, they are loops operating on predefined values so we can check if things go as expected, step by step. The HL-compiler is already able to identify and parse mathematical expressions(1) and to transform them into RPN. However, it still needs to recognize a matrix's item so it can use SLAC for the EA.
I've always used binary trees, which is a different representation of the same information as RPN, or you can look at them as a representation of RPN. The leaves of the tree are either variables (blocks of information of where the value is stored at run time) or constants. Starting from leaves, I can reduce the tree by replacing the operation nodes by leaves, and simultaneously emitting commands. For example, an expression:
a*b + c*d
is represented as
+(*(a,b),*(c,d))
I find a node containing only leaves, and I remove it:
temp1 := *(a,b) // this will be replaced by a single instruction
+(temp1,*(c,d))
And another one:
temp1 := *(a,b) // this will be replaced by a single instruction
temp2 := *(c,d) // this will be replaced by a single instruction
temp3 := +(temp1,temp2) // this will be replaced by a single instruction
Now I can assign storage to the temporary variables (temp1,temp2). Usually they're stored in registers, or in stack if there's not enough registers. However, you need really really complex expressions to run out of registers.
Except for the operations on values, there are operations which operate on addressing. Namely, a subscript operation:
subscript(array,index) // expanded from array[index]
and dot operation:
dot(structure,member) // expanded from structure.member
These cannot be dealt with as easily as arithmetics. You cannot simply remove nodes. Especially, if such operations are nested. For example:
dot(subscript(array,index),member) // expanded from array[index].member
So, they get aggregated. After aggregation, the resulting expression always has one base variable (such as "array" in the above example), and linear indexes, which are either fixed offsets (from structure elements or arrays with fixed indices), or indexed with fixed coefficients. It is always possible to represent the addressing calculation in the form:
EA = base + k0 + index1*k1 + index2*k2 + ...
Where index1, index2 are derived from the sub-expressions and usually stored in registers (the same as temp1,temp2 above). k0, k1, k2 are numbers known at compile time, and "base" is the register on which the original variable is based upon (such as stack pointer for local variables).
At this point, you need some sort of heuristics to process the equation and map it to the most efficient combination of your SLAC instruction, other instructions, or hardware addressing. In most cases this is fairy simple. Since, at this point in compilation, you haven't yet assigned the storage for temporaries (index1, index2 etc.), you can mandate that things which must be in the registers for your addressing are indeed placed into the registers, not on stack.
Then you emit the corresponding instructions, including the final instruction which fetches (or stored for L-values) the data from the variable. This fetched value is sitting in a register and looks just a regular variable to the upstream parts of the expression.