Author Topic: How to tell if some binary is valid 8051 code?  (Read 2838 times)

0 Members and 1 Guest are viewing this topic.

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
How to tell if some binary is valid 8051 code?
« on: September 06, 2017, 06:56:52 pm »
This is about a 8051 single board computer idea. The external bus interface is connected into a Von Neumann bus (that is, external code and xdata is the same bus.) The system can either execute the binary code stored in the external EEPROM, or boot into BASIC stored in the internal Flash to interpret the BASIC code in the external EEPROM. How do I tell if the EEPROM contains a valid machine language program?

Here is the proposed boot sequence, with the USB to Serial chip ATmega16U2 also being the supervisor. The 8051 always boots from BASIC when powered. The BASIC checks if P1.4 is low, if so it enters interactive mode. Then it checks if the first bytes of the EEPROM corresponds to a valid 8051 machine language program. If so it asserts P1.4 low and the supervisor would reset the 8051 to boot from the EEPROM. Then it checks if a valid BASIC program exists in EEPROM (whose header is always invalid as 8051 machine code) and executes it. If all boot strategy failed it enters interactive mode BASIC.
 

Offline grumpydoc

  • Super Contributor
  • ***
  • Posts: 2905
  • Country: gb
Re: How to tell if some binary is valid 8051 code?
« Reply #1 on: September 06, 2017, 07:03:30 pm »
This is about a 8051 single board computer idea. The external bus interface is connected into a Von Neumann bus (that is, external code and xdata is the same bus.) The system can either execute the binary code stored in the external EEPROM, or boot into BASIC stored in the internal Flash to interpret the BASIC code in the external EEPROM. How do I tell if the EEPROM contains a valid machine language program?

Here is the proposed boot sequence, with the USB to Serial chip ATmega16U2 also being the supervisor. The 8051 always boots from BASIC when powered. The BASIC checks if P1.4 is low, if so it enters interactive mode. Then it checks if the first bytes of the EEPROM corresponds to a valid 8051 machine language program. If so it asserts P1.4 low and the supervisor would reset the 8051 to boot from the EEPROM. Then it checks if a valid BASIC program exists in EEPROM (whose header is always invalid as 8051 machine code) and executes it. If all boot strategy failed it enters interactive mode BASIC.
First solve the Halting Problem

More pragmatically, embed a signature maybe.

Will the BASIC have line numbers and have expanded keywords or will it be in some compressed/tokenised form?
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: How to tell if some binary is valid 8051 code?
« Reply #2 on: September 06, 2017, 09:17:49 pm »
If it's not valid BASIC or machine code, what is it? Random garbage could be anything, so it's impossible to guarantee that it won't look like valid code.

The technique I used was to make a rule that the machine code must start with an opcode sequence that is not likely to occur 'accidentally'. On the 8051 the first 3 bytes are normally a jump to the initialization code. If you ensure that the init code is always at the same address then you can check for that, and then you have 3 bytes that must match. If the init address may vary then you will have to follow it and examine more code. Often the next 2 instructions set the stack pointer, so you could check for that. You could also 'sanity check' the target addresses to eliminate obvious errors.   

Of course if your EEPROM has random garbage in it then there is still a small chance of misidentifying it as valid code, in which case the machine will crash. If this is unacceptable then embed a signature at a known location somewhere in the EEPROM. For added security include a checksum or CRC that covers the signature or even the entire EEPROM.

If you are creating the code that goes in the EEPROM then this is the obvious solution. Things get trickier if you allow other users to load it with random data. My solution was to simply reject anything that doesn't follow my rules. To make it easier to comply I provided a macro that is added to the user's source code. 
       

 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: How to tell if some binary is valid 8051 code?
« Reply #3 on: September 06, 2017, 09:56:56 pm »
like E.L.F.  :D
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8264
Re: How to tell if some binary is valid 8051 code?
« Reply #4 on: September 07, 2017, 09:05:50 am »
The 8051 has only one undefined instruction, A5, which is unlikely to occur in BASIC source code too.

You could probably make, with some effort, BASIC source code which also happens to be a valid 8051 binary. ;)
 

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 711
  • Country: us
Re: How to tell if some binary is valid 8051 code?
« Reply #5 on: September 08, 2017, 04:55:08 pm »
Markov model?  Drawback is you'd have to train it on a large corpus of valid 8051 machine code to get good statistics.
 

Offline Neganur

  • Supporter
  • ****
  • Posts: 1138
  • Country: fi
Re: How to tell if some binary is valid 8051 code?
« Reply #6 on: September 08, 2017, 08:40:07 pm »
Nvm, I misread the post.
 

Online rstofer

  • Super Contributor
  • ***
  • Posts: 9889
  • Country: us
Re: How to tell if some binary is valid 8051 code?
« Reply #7 on: September 08, 2017, 09:54:43 pm »
Add a 4 byte header.

The header in the first byte would define the file type and the following 2 bytes would declare the length.  The 4th byte is the checksum f the first 3 bytes.  The code itself would start immediately after the 4 byte header.  You could further define the format to allow another header at the end of the first file which would define a second file, and so on.

In this way, you could pick the code file to run with dipswitches or something.
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: How to tell if some binary is valid 8051 code?
« Reply #8 on: September 09, 2017, 03:59:22 am »
The 8051 has only one undefined instruction, A5, which is unlikely to occur in BASIC source code too.

You could probably make, with some effort, BASIC source code which also happens to be a valid 8051 binary. ;)
BASIC program contains a header that starts with a byte of 0xA5.
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: How to tell if some binary is valid 8051 code?
« Reply #9 on: September 09, 2017, 04:18:29 am »
Add a 4 byte header.

The header in the first byte would define the file type and the following 2 bytes would declare the length.  The 4th byte is the checksum f the first 3 bytes.  The code itself would start immediately after the 4 byte header.  You could further define the format to allow another header at the end of the first file which would define a second file, and so on.

In this way, you could pick the code file to run with dipswitches or something.
Similar story:

Code: [Select]
struct basic_header
{
    u8 header = 0xa5;
    u8 version:4 = 0x0;
    u8 header_count:4;
    struct headers
    {
        u8 type;
        u8 version = 0x0;
        u16 load_addr;
        u16 base_addr;
        u16 length;
    } headers[headers_count];
    u16 checksum;
}
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: How to tell if some binary is valid 8051 code?
« Reply #10 on: September 13, 2017, 01:17:24 am »
Is it sane to assume all valid 8051 code starts with a jump instruction to skip over the IVT? If so I can just detect if the first three bytes contained a valid jump instruction to tell if the binary makes valid 8051 binary code?
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: How to tell if some binary is valid 8051 code?
« Reply #11 on: September 13, 2017, 04:17:11 am »
The chances of encountering system firmware that doesn't jump is very small. In the unlikely event that it just piles straight through the interrupt vectors it may have other incompatibilities as well. So if nonstandard code is not detected and won't run, too bad! Just tell your users that their code must include a jump if they want it to be detected.

However there is one slight complication - the instruction could be an LJMP, an AJMP, or perhaps even an SJMP, so you may have 10 different opcodes to check for. Unfortunately this greatly increases the chances of misidentifying data as valid code. Therefore a better option might be to insist on having a signature somewhere in the EEPROM that identifies it as system firmware. If that is not possible then provide some way to select it manually (eg. a jumper that enables/disables the external EEPROM, or changes the range of addresses it is mapped to).


 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: How to tell if some binary is valid 8051 code?
« Reply #12 on: September 13, 2017, 03:13:39 pm »
The chances of encountering system firmware that doesn't jump is very small. In the unlikely event that it just piles straight through the interrupt vectors it may have other incompatibilities as well. So if nonstandard code is not detected and won't run, too bad! Just tell your users that their code must include a jump if they want it to be detected.

I believe most 8051 startup code includes a jump, emitted by either Keil, IAR or SDCC.

However there is one slight complication - the instruction could be an LJMP, an AJMP, or perhaps even an SJMP, so you may have 10 different opcodes to check for. Unfortunately this greatly increases the chances of misidentifying data as valid code. Therefore a better option might be to insist on having a signature somewhere in the EEPROM that identifies it as system firmware. If that is not possible then provide some way to select it manually (eg. a jumper that enables/disables the external EEPROM, or changes the range of addresses it is mapped to).

So some variant of JMP = valid machine code, A5 = BASIC in ROM (it is always the first byte of BASIC header,) anything else = invalid code, go to interactive BASIC.
 

Offline technixTopic starter

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: How to tell if some binary is valid 8051 code?
« Reply #13 on: September 15, 2017, 01:12:42 am »
This is about a 8051 single board computer idea. The external bus interface is connected into a Von Neumann bus (that is, external code and xdata is the same bus.) The system can either execute the binary code stored in the external EEPROM, or boot into BASIC stored in the internal Flash to interpret the BASIC code in the external EEPROM. How do I tell if the EEPROM contains a valid machine language program?

Here is the proposed boot sequence, with the USB to Serial chip ATmega16U2 also being the supervisor. The 8051 always boots from BASIC when powered. The BASIC checks if P1.4 is low, if so it enters interactive mode. Then it checks if the first bytes of the EEPROM corresponds to a valid 8051 machine language program. If so it asserts P1.4 low and the supervisor would reset the 8051 to boot from the EEPROM. Then it checks if a valid BASIC program exists in EEPROM (whose header is always invalid as 8051 machine code) and executes it. If all boot strategy failed it enters interactive mode BASIC.
First solve the Halting Problem

More pragmatically, embed a signature maybe.

Will the BASIC have line numbers and have expanded keywords or will it be in some compressed/tokenised form?
I am using the invalid instruction 0xA5 as the signature byte for BASIC. I need some kind of signature for valid machine language code, maybe the jump instruction that skips over the IVT?

The BASIC can be either expanded keywords or precompiled byte code when stored in EEPROM. The header can tell between the two.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf