EEVblog Electronics Community Forum

Products => Computers => Programming => Topic started by: DiTBho on June 13, 2023, 02:38:43 pm

Title: x86/32bit, a nice trick
Post by: DiTBho on June 13, 2023, 02:38:43 pm
Code: [Select]
say:
            # EIP is the 32bit Instruction Pointer, aka program counter "PC"
            # The EIP register contains the address of the next instruction
            # to be executed if no branching is done.
            # EIP can only be read through the stack after a call instruction.

            movl 0(%esp), %edx  # load PC_next from the return_addr on the stack
            pushl %ebx

            call print16_hex32
            call print16_nl

            movl %edx, %ebx     # ch_put(%dl)
say_loop:
            movb (%ebx), %dl
            cmpb $EOS, %dl
            je say_done
            call ch_put
            incl %ebx           # increment the string pointer
            jmp say_loop

say_done:
            movl %ebx, %edx     # string_len
            popl %ebx
            incl %edx           # pc_next = string_len +1

            call print16_hex32
            call print16_nl

            movl %edx, 0(%esp)  # save PC_next as return_addr on the stack
            ret

Code: [Select]
            call say
            .string "################## hAllo ########################\n"

Code: [Select]
my-mon > load_elf 0x00100000
00100000: b2 40 e8 a6 00 00 00 e8 7a 01 00 00 e8 ce 00 00 | .@......z.......
00100010: 00 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 | .###############
00100020: 23 23 23 20 68 41 6c 6c 6f 20 23 23 23 23 23 23 | ### hAllo ######
00100030: 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 23 | ################
00100040: 23 23 0a 00 ba fe ca ad de e8 8e 01 00 00 e8 33 | ##.............3
00100050: 01 00 00 ba 00 10 10 00 e8 7f 01 00 00 e8 24 01 | ..............$.
00100060: 00 00 e8 0b 01 00 00 e8 1a 01 00 00 ba 0c 10 10 | ................
00100070: 00 e8 66 01 00 00 e8 0b 01 00 00 e8 f2 00 00 00 | ..f.............
entry position: 0x00100000

my-mon > exec
executing from 0x00100000 ...

0010 0011
################## hAllo ########################
0010 0044

... returning to my-mon

my-mon >

nice, ain't it?  :D


edit:
GNU AS, G-AS, syntax.
Title: Re: x86/32bit, a nice trick
Post by: DiTBho on June 20, 2023, 08:58:24 pm
this trick is the 32-bit cleaned version of the one I wrote here (https://www.eevblog.com/forum/programming/x86-assembly-uart-16450-simple-putch/msg4865165/#msg4865165) in the other topic, and it's nice because it's very simple and can be used for other architectures when they allow having data and code in the same space.

So, you just have to understand how and where the CPU saves the PC_NEXT value (which is the return address in a function call) and use it  :D
Title: Re: x86/32bit, a nice trick
Post by: gf on June 27, 2023, 08:09:01 am
I have seen such coding schemes for "immediate function arguments" in the past, but on modern x86 processors (and likely some other superscalar architectures, too) I recommend that you keep call/return always paired (i.e. don't push the return address manually and jump to the subroutine instead of call), and that you also don't fiddle with the return address on the stack, since both defeat the processor's return address prediction, slowing down the return significantly.

Edit: See https://en.wikipedia.org/wiki/Branch_predictor#Prediction_of_function_returns
Title: Re: x86/32bit, a nice trick
Post by: DiTBho on June 28, 2023, 07:56:29 am
I recommend that you keep call/return always paired

on x86/32, you cannot access EIP directly, the best you can do is read the PC_NEXT from the stack after a function call  :-//

On MIPS the hardware saves PC_NEXT on a dedicated CPU register, and, to facilitate function call nesting, C compilers usually also copy PC_NEXT on the stack.

The branch predictor only looks and cares at the CPU return address, namely "RA" on debuggers.

it could become the new hype: write your own cheats for different architectures  ;D

Title: Re: x86/32bit, a nice trick
Post by: gf on June 30, 2023, 06:05:27 pm
I recommend that you keep call/return always paired
on x86/32, you cannot access EIP directly, the best you can do is read the PC_NEXT from the stack after a function call  :-//

The "return" instruction is basically an indirect jump, whose destination address can be found on the top of the stack. At the end it does not matter, how this address was stored in the stack slot. Nevertheless, it still makes a subtle difference.

Note that the "return" instruction immediately starts speculative execution at the predicted address, before the actual destination address has been fetched from the top of the stack. Aim is of course to prevent a pipeline stall. Once the actual destination address has been fetched from the stack, it can be verified whether the prediction was correct or not. If the destination address was mis-predicted, then speculative execution must be rolled back, and execution is restarted at the correct address. And for a "return" instruction, the predicted destination address is basically the return address of the lastly executed "call" instruction.

If the "return" instruction was preced by a corresponding "call" instruction, then "return" will most likely predict the address correctly.

However, nobody hinders you to push an arbitrary destination address onto the stack and invoke "return" in order to jump indirectly to this address. But then "return" is no longer paired with a matching "call", and it will likely mis-predict the address, which has the above mentioned consequences. At the end it will still do the right thing, but with a performance penalty.

Title: Re: x86/32bit, a nice trick
Post by: T3sl4co1l on June 30, 2023, 07:04:07 pm
Ah, so you can [would most likely?] Spectre yourself by using the wrong data source for prediction / in the pipeline and it resolves true in the end (checked numerically not semantically)?  That makes sense.

Tim
Title: Re: x86/32bit, a nice trick
Post by: SiliconWizard on June 30, 2023, 07:46:35 pm
For someone (like me) who hates the idea of return addresses pushed onto data stacks (as we've unfortunately been doing for decades and which has led to more bugs and exploits that we can keep track of), the idea of purposefully meddling with these is even more horrific.

Tricks are fun I guess though.
Title: Re: x86/32bit, a nice trick
Post by: DiTBho on July 01, 2023, 12:12:01 pm
The "return" instruction is basically an indirect jump, whose destination address can be found on the top of the stack. At the end it does not matter, how this address was stored in the stack slot. Nevertheless, it still makes a subtle difference.

Note that the "return" instruction immediately starts speculative execution at the predicted address, before the actual destination address has been fetched from the top of the stack. Aim is of course to prevent a pipeline stall. Once the actual destination address has been fetched from the stack, it can be verified whether the prediction was correct or not. If the destination address was mis-predicted, then speculative execution must be rolled back, and execution is restarted at the correct address. And for a "return" instruction, the predicted destination address is basically the return address of the lastly executed "call" instruction.

If the "return" instruction was preced by a corresponding "call" instruction, then "return" will most likely predict the address correctly.

However, nobody hinders you to push an arbitrary destination address onto the stack and invoke "return" in order to jump indirectly to this address. But then "return" is no longer paired with a matching "call", and it will likely mis-predict the address, which has the above mentioned consequences. At the end it will still do the right thing, but with a performance penalty.

Got your points, thanks!

Don't even think it was a problem, let's say, I've seen that hack implemented in a very old version of Lilo (Linux loader), and it was real-mode code, probably written for i386, so before the first branch prediction unit implementation for x86, and I cleaned it up a bit to make it work as a hack to 32-bit