Now you have the primitive for pre/post increment or push/pop n/POPCNT!
I don't get it as I was talking about 1 instruction which is able to perform pre/post increment within the load/store class of instructions
not in one instruction, for example, the pre-increment addressing mode:
xor Ru01,Ru01 // clear Ru01 (int sum = 0)
mov Ru15,0x64EF01AD // tail pointer of an integer array
nloop 17 // for (i = 20 ; i > 0 ; i--) sum += *(--pointer);
{
sub Ru15,4 // decrement address ; pointer -= sizeof(int)
load Ru00,Ru15 // value = *pointer
add Ru01,Ru00 // sum += value
}
I do not say that array is passed as the last parameter in the stack (or you must change 0x64EF01AD by SP) ... ;-)
something like:
push r1 ---> store r1, (sp)+4 ---> store r1, (sp), sp=sp+4; 1 instruction
pop r1 ---> load r1, -4(sp) ---> sp=sp-4, load r1, (sp); 1 instruction
well, neither in one instruction neither implementable with SP as special register, I just think of something like that for
push n:
// push in stack the int array[20]
mov Ru15,0x64EF01AD // pointer = array
mov Ru31,SP // next store space in stack ; instack = SP
nloop 17 // duplicate values of int array in stack : size(int array) - 3 = 17
{
load Ru00,Ru15 // value = *pointer
add Ru15,4 // next value in array ; pointer++
add SP,4 // reserve space for value
sto Ru31,Ru00 // *(SP-4) = value
add Ru31,4 // next store address in stack ; instack++
} // nloop 17,4 ; 20 * 5 instructions executed
oh, I forget an optimization
:
// push in stack the int array[20]
mov Ru15,0x64EF01AD // pointer = array
mov Ru31,SP // next store space in stack ; instack = SP
add SP,80 // reserve space in stack for the whole array
nloop 17 // duplicate values of int array in stack : size(int array) - 3 = 17
{
load Ru00,Ru15 // value = *pointer
add Ru15,4 // next value in array ; pointer++
sto Ru31,Ru00 // *instack = value
add Ru31,4 // next store address in stack ; instack++
} // nloop 17,3 ; 20 * 4 instructions executed if instructions are not parallelized between ALU & LOAD/STORE unit
this is much simple than any complexe addressing mode, but
nloop may be reduce in early stage of your pipeline (use 0 real CPU cycle), but creating complexe addressing mode will not cover all the use case, plus the compiler will rarely use them and that's complexify code generation stage... See the reduction in the addressing mode between the
68k family and these derivatives versions.
Even better, if you have a prefetch instruction unit, it can prefetch
@exit_loop the next instructions at the end of the first loop!
EDIT: correction about the 68000 addressing mode features