Abstract
In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.
This is a wonderful way to learn some assembly on MacOS, and introduce some secure software development practices.
Plus, it is fun!
Spawning a shell
This is the shellcode. Where it all begins. This will help us to introduce syscalls and the syscall we'll use the most, execve
.
Let's start with some definitions. We'll borrow Wikipedia's:
a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive or accessing the device's camera), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
Syscalls are widely used throughout programs - the concept is not strange at all.
a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode.
So the "shellcode" is actually the exploit, the hack, the payload, the call-it-whatever that subverts the program.
As we are talking about Apple MacOS, we'll largely refer to XNU's source code - especially the syscalls.master file.
Let's see the original code.
; 26 bytes execute /bin/sh
;
bits 64
global _main
_main:
xor esi, esi ; esi = 0
mul esi ; eax = 0, edx = 0
bts eax, 25 ; eax = 0x02000000
mov al, 59 ; rax = sys_execve
mov rbx, '/bin//sh'
push rdx ; 0
push rbx ; "/bin//sh"
push rsp
pop rdi ; rdi="/bin//sh", 0
syscall
We first observe that there is some usage inconsistency, here. e*
registers are 32 bits, al
is 16 bits, and r*
registers are 64 bits. However, this code compiles and runs:
gbiondo@tripleX Odzhan % nasm -f macho64 shellspawn.asm
gbiondo@tripleX Odzhan % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -o shellspawn shellspawn.o
gbiondo@tripleX Odzhan % ./shellspawn
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ ps -ef |grep -i bash
0 695 621 0 Sun12PM ttys000 0:00.03 login -pfl gbiondo /bin/bash -c exec -la zsh /bin/zsh
503 46117 703 0 2:30PM ttys001 0:00.01 (bash)
503 46123 46117 0 2:30PM ttys001 0:00.00 grep -i bash
0 23384 621 0 4:13PM ttys002 0:00.02 login -pfl gbiondo /bin/bash -c exec -la zsh /bin/zsh
bash-3.2$ exit
gbiondo@tripleX Odzhan %
In the result of the process status command, we see that the command that has been run is /bin/bash -c exec -la zsh /bin/zsh
. Shortly, the code before spawns a shell.
To understand what is going on in the program, one should know how parameters are passed to invoked routines and what parameters are required to syscall.
Param. no. | Register | Register name |
---|---|---|
1 | RDI |
Destination index register. |
2 | RSI |
Source index register. |
3 | RDX |
Data Register. |
In detail:
- Destination index register is used for string, memory array copying and setting, and for far pointer addressing with
ES
. - Source index register is used for string and memory array copying.
- Data register is used for I/O port access, arithmetic, some interrupt calls. The data register is used in I/O operations as well as preferred in division and multiplication.
We invoke syscalls by loading in the register RAX (the accumulation register, which is used for I/O port access, arithmetic, interrupt calls, etc.) the number of the syscall added to 0x2000000
.
Having said this, we need to understand how to find syscalls numbers. Here is where the syscalls.master comes into play. In fact, there is where all the syscalls are defined:
0 AUE_NULL ALL { int nosys(void); } { indirect syscall }
1 AUE_EXIT ALL { void exit(int rval) NO_SYSCALL_STUB; }
2 AUE_FORK ALL { int fork(void) NO_SYSCALL_STUB; }
3 AUE_NULL ALL { user_ssize_t read(int fd, user_addr_t cbuf, user_size_t nbyte); }
4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
[... SNIP ...]
57 AUE_SYMLINK ALL { int symlink(char *path, char *link); }
58 AUE_READLINK ALL { int readlink(char *path, char *buf, int count); }
59 AUE_EXECVE ALL { int execve(char *fname, char **argp, char **envp); }
60 AUE_UMASK ALL { int umask(int newmask); }
61 AUE_CHROOT ALL { int chroot(user_addr_t path); }
[... SNIP ...]
So the code moves into RAX the value 59, which is linked to the execve
syscall. This API requires:
fname
, which is a pointer to a char array (in plain English: a string)argp
, which is a pointer to a pointer to a char array (in plain English: an array of strings)envp
, which is a pointer to a pointer to a char array (in plain English: an array of strings)
A quick look at the man page (man 2 execve
) gives a clearer explanation. For starters, the signature of this API is:
int execve(const char *path, char *const argv[], char *const envp[]);
and the description states:
execve()
transforms the calling process into a new process. The new process is constructed from an ordinary file, whose name is pointed to by path, called the new process file. This file is either an executable object file, or a file of data for an interpreter. An executable object file consists of an identifying header, followed by pages of data representing the initial program (text) and initialized data pages.[...]
If any optional args are specified, they become the first (second, ...) argument to the interpreter
[...]
The zeroth argument, normally the name of the execve()'d file, is left unchanged.
[...]
The argument
argv
is a pointer to a null-terminated array of character pointers to null-terminated character strings. These strings construct the argument list to be made available to the new process. At least one argument must be present in the array; by custom, the first element should be the name of the executed program (for example, the last component of path).[...]
The argument
envp
is also a pointer to a null-terminated array of character pointers to null-terminated strings. A pointer to this array is normally stored in the global variableenviron
. These strings pass information to the new process that is not directly an argument to the command (see environ(7)).[...]
Nuff said. Let's go through the code. The best way to do this is to debug it. We have seen already how this can be done with lldb in a previous article, but first it is meaningful to see the sections of this binary:
objdump --arch=x86_64 -m --section-headers shellspawn
Sections:
Idx Name Size VMA Type
0 __text 0000001a 0000000100003f9e TEXT
1 __unwind_info 00000048 0000000100003fb8 DATA
Time to disassemble. As I usually do, I start with a breakpoint in the main
routine and disassemble it. The result won't be much different from what we wrote:
(lldb) breakpoint set -name main
Breakpoint 1: where = shellspawn`main, address = 0x0000000100003f9e
(lldb) breakpoint list
Current breakpoints:
1: name = 'main', locations = 1
1.1: where = shellspawn`main, address = shellspawn[0x0000000100003f9e], unresolved, hit count = 0
(lldb) disassemble -n main
shellspawn`main:
shellspawn[0x100003f9e] <+0>: xor esi, esi
shellspawn[0x100003fa0] <+2>: mul esi
shellspawn[0x100003fa2] <+4>: bts eax, 0x19
shellspawn[0x100003fa6] <+8>: mov al, 0x3b
shellspawn[0x100003fa8] <+10>: movabs rbx, 0x68732f2f6e69622f
shellspawn[0x100003fb2] <+20>: push rdx
shellspawn[0x100003fb3] <+21>: push rbx
shellspawn[0x100003fb4] <+22>: push rsp
shellspawn[0x100003fb5] <+23>: pop rdi
shellspawn[0x100003fb6] <+24>: syscall
The first operation is actually an xor
that zeroes the contents of the ESI register (every value XORred with itself returns zero).
Then there is the mul
instruction. It is the mnemonic for mul
tiply and it has quite a peculiar syntax, because some of its operands are implicit. This instruction performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is implied, and it is EAX (or RAX). So, writing mul src
means multiplying rax
and src
as unsigned integers, and putting the result in rax
and the high 64 bits of the product into rdx
. In this case, the result of the instruction 0x100003fa0
is zeroing both RAX and RDX.
After the instruction is executed, we have:
(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fa2 shellspawn`main + 4
shellspawn`main:
-> 0x100003fa2 <+4>: bts eax, 0x19
0x100003fa6 <+8>: mov al, 0x3b
0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
0x100003fb2 <+20>: push rdx
Target 0: (shellspawn) stopped.
(lldb) register read RAX RDX RSI
rax = 0x0000000000000000
rdx = 0x0000000000000000
rsi = 0x0000000000000000
The next instruction, bts
is bit test and set. It selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1.
EAX is a 32 bits register zeroed; and 0x19=25; this means that the result of the operation will be a string of all 0, apart for the 25th bit, set to 1: 0000 0010 0000 0000 0000 0000 0000 0000
or 0x02000000
, and in fact we have:
(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fa6 shellspawn`main + 8
shellspawn`main:
-> 0x100003fa6 <+8>: mov al, 0x3b
0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
0x100003fb2 <+20>: push rdx
0x100003fb3 <+21>: push rbx
Target 0: (shellspawn) stopped.
(lldb) register read RAX
rax = 0x0000000002000000
The next instruction pushes 0x3b
into the lowest part of the RAX/EAX register; in decimal, this is 59, the number of our syscall. In practical terms, after the execution of this instruction, we expect that the register RAX will contain 0x000000000200003b
; and coherently we have:
(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fa8 shellspawn`main + 10
shellspawn`main:
-> 0x100003fa8 <+10>: movabs rbx, 0x68732f2f6e69622f
0x100003fb2 <+20>: push rdx
0x100003fb3 <+21>: push rbx
0x100003fb4 <+22>: push rsp
Target 0: (shellspawn) stopped.
(lldb) register read EAX
eax = 0x0200003b
So far, the instructions in the disassembled did not differ from the original ones, now we have quite a discrepancy, but here? How can mov rbx, '/bin//sh'
translate in movabs rbx, 0x68732f2f6e69622f
? Well, 68732f2f6e69622f
is 16 chars, just exactly as the string '/bin//sh'. If we translate bitwise the string in ASCII we have:
68 | 73 | 2f | 2f | 6e | 69 | 62 | 2f |
h | s | / | / | n | i | b | / |
which is /bin//sh/
written backwards. This has to do with the endianness of macOS, which is little-endian. Shortly, this pushes the absolute value of the string into rbx
:
(lldb) s
Process 61834 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fb2 shellspawn`main + 20
shellspawn`main:
-> 0x100003fb2 <+20>: push rdx
0x100003fb3 <+21>: push rbx
0x100003fb4 <+22>: push rsp
0x100003fb5 <+23>: pop rdi
Target 0: (shellspawn) stopped.
(lldb) register read rbx
rbx = 0x68732f2f6e69622f
So far, the RSI and RDX registers have been instantiated. Furthermore, RAX has been initialised with the syscall code. Now the only need to instantiate the value of RDI. This next instruction would push the current value of rdx (0x0
) in the stack.
For starters, before executing the instruction we read the value of the stack pointer register:
(lldb) register read rsp
rsp = 0x00007ff7bfeff888
and we do the same after the instruction is run:
(lldb) register read rsp
rsp = 0x00007ff7bfeff880
This also highlights how the stack grows to lower memory addresses.
If we were to examine the memory area pointed by RSP before the instruction is executed, we'd see:
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00 .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00 .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00 .(......%.......
0x7ff7bfeff878: 60 00 0c 00 01 00 00 00 90 f9 ef bf f7 7f 00 00 `...............
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00 ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00 ...B............
After the instruction, we'd have:
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00 .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00 .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00 .(......%.......
0x7ff7bfeff878: 60 00 0c 00 01 00 00 00 00 00 00 00 00 00 00 00 `...............
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00 ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00 ...B............
Exactly in the same manner, the contents of rbx is pushed in the stack:
(lldb) register read rbx
rbx = 0x68732f2f6e69622f
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fb4 shellspawn`main + 22
shellspawn`main:
-> 0x100003fb4 <+22>: push rsp
0x100003fb5 <+23>: pop rdi
0x100003fb6 <+24>: syscall
0x100003fb8: add dword ptr [rax], eax
Target 0: (shellspawn) stopped.
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00 .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00 .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 25 00 00 00 00 00 00 00 .(......%.......
0x7ff7bfeff878: 2f 62 69 6e 2f 2f 73 68 00 00 00 00 00 00 00 00 /bin//sh........
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00 ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00 ...B............
and so is rsp
:
(lldb) register read rsp
rsp = 0x00007ff7bfeff878
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fb5 shellspawn`main + 23
shellspawn`main:
-> 0x100003fb5 <+23>: pop rdi
0x100003fb6 <+24>: syscall
0x100003fb8: add dword ptr [rax], eax
0x100003fba: add byte ptr [rax], al
Target 0: (shellspawn) stopped.
(lldb) memory read 0x00007ff7bfeff888-64 0x00007ff7bfeff888+64
0x7ff7bfeff848: a0 83 08 00 01 00 00 00 9e 3f 00 00 01 00 00 00 .........?......
0x7ff7bfeff858: 10 40 07 00 01 00 00 00 80 f8 ef bf f7 7f 00 00 .@..............
0x7ff7bfeff868: 83 28 01 00 01 00 00 00 78 f8 ef bf f7 7f 00 00 .(......x.......
0x7ff7bfeff878: 2f 62 69 6e 2f 2f 73 68 00 00 00 00 00 00 00 00 /bin//sh........
0x7ff7bfeff888: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff898: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x7ff7bfeff8a8: 00 00 00 00 00 00 00 00 a0 83 08 00 01 00 00 00 ................
0x7ff7bfeff8b8: 00 00 00 42 00 00 00 00 83 d5 00 00 01 00 00 00 ...B............
Finally the value of rdi
is instantiated with the top of the stack (which contains the value of rsp
, 0x00007ff7bfeff878
):
(lldb) register read rdi rsp
rdi = 0x0000000000000001
rsp = 0x00007ff7bfeff870
(lldb) s
Process 71654 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x0000000100003fb6 shellspawn`main + 24
shellspawn`main:
-> 0x100003fb6 <+24>: syscall
0x100003fb8: add dword ptr [rax], eax
0x100003fba: add byte ptr [rax], al
0x100003fbc: sbb al, 0x0
Target 0: (shellspawn) stopped.
(lldb) register read rdi rsp
rdi = 0x00007ff7bfeff878
rsp = 0x00007ff7bfeff878
Finally, the syscall is invoked and a new shell is spawned.
Strategy
A quick recap of how the shellcode works.
1 - find the syscall number and specification
First of all, syscalls. A quick way to find syscalls is:
gbiondo@tripleX sys % pwd
/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/sys
gbiondo@tripleX sys % cat syscall.h
As we have seen before, once we have the number of the syscall, the number shall be converted in hexadecimal, and added to 0x02000000
. The resulting value shall be stored in the EAX/RAX register.
The next stage would be understanding the parameters accepted by the syscall itself. For instance, we see that syscall 4 is SYS_write
, or, seen differently (syscalls.master file):
4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
We conjecture that this syscall has 3 parameters: a file descriptor, the address of the buffer, and the size of the string to print. Confirmation comes from man 2 <syscall name>
. In this case, a man 2 write
would return:
ssize_t write(int fildes, const void *buf, size_t nbyte);
and the subsequent description of the parameters.
For shellcode, the most important syscall to know is execve
, so let's stick to it. Its description is above. In this example, we simply invoke /bin/sh
, so there is no need to use other parameters. In the next article we'll take care of how to run commands with parameters.
Some of you may have seen that the author of the blog post where we found the code used the string /bin//sh
with two slashes between the directory and the executable. We'll come back on this later, but the reason why this string is used is to avoid null bytes.
2 - Zeroing RSI
This is accomplished by XORring the values of the registers with itself. The reason the task hasn't been accomplished with:
mov rsi, 0
is to avoid null bytes. This topic comes quite frequently, and there's a good reason. For the very moment, we religiously accept the dogma "null bytes are the root of all evil" :).
3 - Zeroing RAX and RDX
This is nothing but good practice. Good housekeeping practices :)
4 - Setting up RAX
RAX shall contain the number of the syscall.
Consider that the value 0x02000000
can be stored in 32 bits, so the author uses EAX instead.
Creating that constant means setting the 25th bit of a zeroed register to 1.
59 can be stored in 16 bits, so the author uses AL to store the value - simply moving the number to the register.
Actually the actions taken in points 3- and 4- show an inherent, intrinsic elegance...
5 - The first parameter to execve
The first parameter to execve is stored in the register RDI. It must be a null-terminated string containing the path of the executable. This string is built in the stack, and then pop
ped to RDI. This is accomplished by storing the string in a temporary register (rbx
), storing a null-terminator in the stack, and finally storing the string in the stack - remember that the stack is LIFO! - finally, the resulting string is pop
ped down to the register.
6 - The second parameter to execve
The second parameter to execve is stored in RSI. It contains the list of parameters passed to the function. In our case, the list is empty, and considering that the register has been initialised to 0 in point 1 -, no further change is required.
7 - The third and last parameter to execve
The last parameter to the syscall is stored in RDX. In point 3 - its value has been set to 0. This is coherent, in fact, the third parameter is supposed to contain a null-terminated array in which environment variables are passed. In this case, it's an empty list.
Conclusions
This post focused on analysing the "malware" - or better, the shellcode, assuming we know no assembly. Every step has been explained like we had no assembly talent whatsoever - which is good, because this is how we learn to reverse-engineer something.
We will see some more examples, and then obtain a more generic strategy to use when designing shellcode.
See you next time. Have fun!