'Null bytes in shellcode? Why does mov eax,1 machine code have bytes that are 00?

Going through the shellcode article on wikipedia, it gives an example as follows:

B8 01000000    MOV EAX,1          // Set the register EAX to 0x000000001

To make the above instruction null free, they've re-written it as follows:

33C0           XOR EAX,EAX        // Set the register EAX to 0x000000000
40             INC EAX            // Increase EAX to 0x00000001

Where is the null byte in the first instruction? How do the converted instructions not have a null byte?



Solution 1:[1]

The null bytes are right after B8 01 in the first instruction. The second instruction uses the xor operation to zero out eax (any x xor x = 0) and then increment it by one to achieve the same result without 00, the null byte.

Solution 2:[2]

Where is the null byte in the first instruction?

B8 01000000    MOV EAX,1          // Set the register EAX to 0x000000001
     ^^^^^^
     1 2 3

There are actualy 3 null bytes

How do the converted instructions not have a null byte? Because

  1. The opcodes for XOR and INC do not contain nullbytes (http://ref.x86asm.net/coder32-abc.html)
  2. Are here used only taking registers as arguments

For the MOV EAX, 1 instruction, the assembler has to write the opcode (B8), and 2 arguments, from which one is a 32 bit integer. Since 1 is a very small number, the remaining bits are padded with zeros, resulting in a null byte.

The XOR and INC instructions do not take integers in your code, and don't have to insert zero's.

Update

I didn't notice the +r in the opcode for MOV r32, imm32.

Registers are encoded using 3 bits in x86, and eax is 000. B8 is in binary 0b10111000, at the end are 3 free bits.

0b10111000 + 0x000 = 0b10111000 = 0xB8

So B8 encodes to MOV EAX, imm32.

What left is 0x01000000, what 1 is in little endian.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Squeezy
Solution 2