Abstract
In my previous article Building a binary, I went through the compilation phase in quite an abstract fashion.
Here I do the same exercise with a pseudo-real world example, a simple Objective C program.
We will see how the compilation model works, and we will conjecture how the MachO utilities work, by taking a look at the disassembled code of the supplied program
Compiling some code
We will work with the following code:
//
// main.m
// DebugMe
//
// Created by Gabriel Biondo on 24/03/2022.
//
#import <Foundation/Foundation.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define STRING_FORMAT @"%@"
#define START_MSG @"INITIALISING RANDOMNESS"
#define GENERATE @"GENERATING RANDOM NUMBER"
#define MAXIMUM 123
@interface myNumber: NSObject
@property int value;
- (Boolean) isPerfectSquare;
- (int) nearestPerfectSquare;
- (Boolean) isPrime;
- (void) randomInit;
@end
@implementation myNumber
- (void) randomInit {
NSLog(STRING_FORMAT, START_MSG);
srand(time(0));
NSLog(STRING_FORMAT, GENERATE);
int num = rand() % MAXIMUM;
NSLog(@"Generated number: %i", num);
self.value = num;
}
- (Boolean) isPerfectSquare{
double num = (double)self.value;
double sqr = sqrt(num);
int squareRoot = (int) sqr;
return (squareRoot*squareRoot == self.value);
}
- (int) nearestPerfectSquare {
int nearest = 0;
if ([self isPerfectSquare]) {
nearest = self.value;
} else {
double num = (double)self.value;
double sqr = sqrt(num);
int low = (int) sqr;
int hi = low + 1;
int lowq = low * low;
int hiq = hi * hi;
int deltaLow = self.value - lowq;
int deltaHi = hiq - self.value;
if (deltaLow < deltaHi) {
nearest = lowq;
}
if (deltaHi < deltaLow) {
nearest = hiq;
}
if (deltaHi == deltaLow){
NSLog(@"The given number is exactly in the middle of two perfect squares: %i and %i. Returning the lowest", lowq,hiq);
nearest = lowq;
}
}
return nearest;
}
- (Boolean) isPrime{
Boolean result = TRUE;
if (self.value > 2){
double num = (double)self.value;
double sqr = sqrt(num);
int threshold = 1 + (int) sqr;
for (int i=2; i<=threshold; i++){
if ((self.value % i) == 0) {
result = FALSE;
}
}
} else {
result = FALSE;
}
return result;
}
@end
int main(int argc, const char * argv[]) {
@autoreleasepool {
// insert code here...
myNumber * m = [myNumber new];
myNumber * n = [myNumber new];
myNumber * o = [[myNumber alloc] init];
myNumber * p = [[myNumber alloc] init];
myNumber * q = [[myNumber alloc] init];
n.value = 144;
m.value = 155;
o.value = 20;
p.value = 73;
if ([n isPerfectSquare]) {
NSLog(@"%i is a perfect square", n.value);
} else {
NSLog(@"%i is not a perfect square", n.value);
}
if ([m isPerfectSquare]) {
NSLog(@"%i is a perfect square", m.value);
} else {
NSLog(@"%i is not a perfect square", m.value);
}
int k = [m nearestPerfectSquare];
NSLog(@"The nearest square to %i is %i", m.value, k);
int h = [n nearestPerfectSquare];
NSLog(@"The nearest square to %i is %i", n.value, h);
int j = [o nearestPerfectSquare];
NSLog(@"The nearest square to %i is %i", o.value, j);
int i = [p nearestPerfectSquare];
NSLog(@"The nearest square to %i is %i", p.value, i);
if ([p isPrime]) {
NSLog(@"%i is prime", p.value);
} else {
NSLog(@"%i is not prime", p.value);
}
if ([m isPrime]) {
NSLog(@"%i is prime", m.value);
} else {
NSLog(@"%i is not prime", m.value);
}
[q randomInit];
}
return 0;
}
This is a very simple Objective-C program – we define a class and we use it, with no interaction with the AppKit or other proprietary frameworks (well, except for the usage of NSLog).
The typical output for this program is something like:
2022-03-24 15:12:57.357 myNumber[27549:2847719] 144 is a perfect square
2022-03-24 15:12:57.358 myNumber[27549:2847719] 155 is not a perfect square
2022-03-24 15:12:57.358 myNumber[27549:2847719] The nearest square to 155 is 144
2022-03-24 15:12:57.358 myNumber[27549:2847719] The nearest square to 144 is 144
2022-03-24 15:12:57.358 myNumber[27549:2847719] The nearest square to 20 is 16
2022-03-24 15:12:57.358 myNumber[27549:2847719] The nearest square to 73 is 81
2022-03-24 15:12:57.358 myNumber[27549:2847719] 73 is prime
2022-03-24 15:12:57.358 myNumber[27549:2847719] 155 is not prime
2022-03-24 15:12:57.358 myNumber[27549:2847719] INITIALISING RANDOMNESS
2022-03-24 15:12:57.358 myNumber[27549:2847719] GENERATING RANDOM NUMBER
2022-03-24 15:12:57.358 myNumber[27549:2847719] Generated number: 89
So far, so good. Now let's review the compilation process.
Compilation process
As shown in the aforementioned article, this code undergoes all stages of compilation before being transformed into an executable program.
Preprocessing
To see the result of preprocessing, we invoke:
clang -E -framework Foundation main.m
and obtain a very long text. Shortly, all libraries have been included and all macros have been replaced with their actual contents. For instance, the randomInit
method becomes:
- (void) randomInit {
NSLog(@"%@", @"INITIALISING RANDOMNESS");
srand(time(0));
NSLog(@"%@", @"GENERATING RANDOM NUMBER");
int num = rand() % 123;
NSLog(@"Generated number: %i", num);
self.value = num;
}
Compilation
To obtain the results of the compilation phase, we invoke clang with the switch -S
:
clang -S -framework Foundation main.m
We will not analyse the results now, because we are going to debug and disassemble the executable in greater detail later.
Assembling
The assembling phase can be observed by stopping clang with the -c
switch:
clang -c -framework Foundation main.m
Linking
Finally, linking takes place with the command:
clang -framework Foundation main.m -o myNumber
This is an unusual program – everything is on a single file. The reader is suggested to build a proper program and understand how the compilation process would work in that case.
Some binary analysis
Meet objdump
. According to its man page:
The llvm-objdump utility prints the contents of object files and final linked images named on the command line.
We can start by looking at the disassemble all option (-D
, --disassemble-all
switches). The name says it all: it disassembles the whole binary.
We issue the command
objdump -D myNumber > main.disass
so to have a file (main.disass
) with the results of the disassemble operation.
Let’s take a look at what happened. We have:
Disassembly of section __TEXT,__text:
00000001000036f0 <-[myNumber randomInit]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
0000000100003780 <-[myNumber isPerfectSquare]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
0000000100003800 <-[myNumber nearestPerfectSquare]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
0000000100003930 <-[myNumber isPrime]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
00000001000039f0 <-[myNumber value]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
0000000100003a10 <-[myNumber setValue:]>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
0000000100003a30 <_main>:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__stubs:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__stub_helper:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__cstring:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__objc_methname:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__objc_classname:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__objc_methtype:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __TEXT,__unwind_info:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA_CONST,__got:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA_CONST,__cfstring:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA_CONST,__objc_classlist:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA_CONST,__objc_imageinfo:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__la_symbol_ptr:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__objc_const:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__objc_selrefs:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__objc_classrefs:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__objc_ivar:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__objc_data:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Disassembly of section __DATA,__data:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Sections
In the above listing we find several ‘sections’. They are:
- __TEXT,__text
- __TEXT,__stubs
- __TEXT,__stub_helper
- __TEXT,__cstring
- __TEXT,__objc_methname
- __TEXT,__objc_classname
- __TEXT,__objc_methtype
- __TEXT,__unwind_info
- __DATA_CONST,__got
- __DATA_CONST,__cfstring
- __DATA_CONST,__objc_classlist
- __DATA_CONST,__objc_imageinfo
- __DATA,__la_symbol_ptr
- __DATA,__objc_const
- __DATA,__objc_selrefs
- __DATA,__objc_classrefs
- __DATA,__objc_ivar
- __DATA,__objc_data
- __DATA,__data
and, yes, the order is important.
We observe that the first section contains the definition of the methods
[myNumber randomInit]
[myNumber isPerfectSquare]
[myNumber nearestPerfectSquare]
[myNumber isPrime]
[myNumber value]
[myNumber setValue:]
_main
hence, it contains the definition of the actual program.
Scrolling down the disassembled code, we find the section __DATA,__objc_const
. Here we can see some interesting subsections:
__OBJC_METACLASS_RO_$_myNumber
__OBJC_$_INSTANCE_METHODS_myNumber
__OBJC_$_INSTANCE_VARIABLES_myNumber
__OBJC_$_PROP_LIST_myNumber
__OBJC_CLASS_RO_$_myNumber
These seem to be relevant for the class we have defined. the names aree self-explanatory.
In the __TEXT,__text section
, we find the code for the methods defined in the first section.
Actually we abused the potential of objdump to obtain the list above, in fact, the summaries of the headers for each section could be obtained with any of the switches -h, --headers, --section-headers
. See the example below.
gbiondo@tripleX Debugging % objdump --headers myNumber
myNumber: file format mach-o 64-bit x86-64
Sections:
Idx Name Size VMA Type
0 __text 00000689 00000001000036f0 TEXT
1 __stubs 00000030 0000000100003d7a TEXT
2 __stub_helper 00000060 0000000100003dac TEXT
3 __cstring 00000118 0000000100003e0c DATA
4 __objc_methname 0000005a 0000000100003f24 DATA
5 __objc_classname 00000009 0000000100003f7e DATA
6 __objc_methtype 00000025 0000000100003f87 DATA
7 __unwind_info 00000048 0000000100003fac DATA
8 __got 00000010 0000000100004000 DATA
9 __cfstring 00000140 0000000100004010 DATA
10 __objc_classlist 00000008 0000000100004150 DATA
11 __objc_imageinfo 00000008 0000000100004158 DATA
12 __la_symbol_ptr 00000040 0000000100008000 DATA
13 __objc_const 00000168 0000000100008040 DATA
14 __objc_selrefs 00000030 00000001000081a8 DATA
15 __objc_classrefs 00000008 00000001000081d8 DATA
16 __objc_ivar 00000008 00000001000081e0 DATA
17 __objc_data 00000050 00000001000081e8 DATA
18 __data 00000008 0000000100008238 DATA
By issuing the command with the switches -s, --full-contents
we obtain a hex dump of the file, properly organised into sections. We need to come back on the VMA concept – this will be addressed in subsequent articles. For the very moment, we will just refer to the usual Wikipedia lemma: en.wikipedia.org/wiki/Virtual_memory.
I find it extremely interesting to compare the results of the previous command with those of the following:
gbiondo@tripleX Debugging % size -ml myNumber
Segment __PAGEZERO: 4294967296 (zero fill) (vmaddr 0x0 fileoff 0)
Segment __TEXT: 16384 (vmaddr 0x100000000 fileoff 0)
Section __text: 1673 (addr 0x1000036f0 offset 14064)
Section __stubs: 48 (addr 0x100003d7a offset 15738)
Section __stub_helper: 96 (addr 0x100003dac offset 15788)
Section __cstring: 280 (addr 0x100003e0c offset 15884)
Section __objc_methname: 90 (addr 0x100003f24 offset 16164)
Section __objc_classname: 9 (addr 0x100003f7e offset 16254)
Section __objc_methtype: 37 (addr 0x100003f87 offset 16263)
Section __unwind_info: 72 (addr 0x100003fac offset 16300)
total 2305
Segment __DATA_CONST: 16384 (vmaddr 0x100004000 fileoff 16384)
Section __got: 16 (addr 0x100004000 offset 16384)
Section __cfstring: 320 (addr 0x100004010 offset 16400)
Section __objc_classlist: 8 (addr 0x100004150 offset 16720)
Section __objc_imageinfo: 8 (addr 0x100004158 offset 16728)
total 352
Segment __DATA: 16384 (vmaddr 0x100008000 fileoff 32768)
Section __la_symbol_ptr: 64 (addr 0x100008000 offset 32768)
Section __objc_const: 360 (addr 0x100008040 offset 32832)
Section __objc_selrefs: 48 (addr 0x1000081a8 offset 33192)
Section __objc_classrefs: 8 (addr 0x1000081d8 offset 33240)
Section __objc_ivar: 8 (addr 0x1000081e0 offset 33248)
Section __objc_data: 80 (addr 0x1000081e8 offset 33256)
Section __data: 8 (addr 0x100008238 offset 33336)
total 576
Segment __LINKEDIT: 16384 (vmaddr 0x10000c000 fileoff 49152)
total 4295032832
We run
objdump --full-contents myNumber > myNumber.richHexDump
to obtain a file (myNumber.richHexDump). Reading this file helps us giving more sense to some other sections. More in detail, we see:
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Contents of section __TEXT,__cstring:
100003e0c 25400049 4e495449 414c4953 494e4720 %@.INITIALISING
100003e1c 52414e44 4f4d4e45 53530047 454e4552 RANDOMNESS.GENER
100003e2c 4154494e 47205241 4e444f4d 204e554d ATING RANDOM NUM
100003e3c 42455200 47656e65 72617465 64206e75 BER.Generated nu
100003e4c 6d626572 3a202569 00546865 20676976 mber: %i.The giv
100003e5c 656e206e 756d6265 72206973 20657861 en number is exa
100003e6c 63746c79 20696e20 74686520 6d696464 ctly in the midd
100003e7c 6c65206f 66207477 6f207065 72666563 le of two perfec
100003e8c 74207371 75617265 733a2025 6920616e t squares: %i an
100003e9c 64202569 2e205265 7475726e 696e6720 d %i. Returning
100003eac 74686520 6c6f7765 73740025 69206973 the lowest.%i is
100003ebc 20612070 65726665 63742073 71756172 a perfect squar
100003ecc 65002569 20697320 6e6f7420 61207065 e.%i is not a pe
100003edc 72666563 74207371 75617265 00546865 rfect square.The
100003eec 206e6561 72657374 20737175 61726520 nearest square
100003efc 746f2025 69206973 20256900 25692069 to %i is %i.%i i
100003f0c 73207072 696d6500 25692069 73206e6f s prime.%i is no
100003f1c 74207072 696d6500 t prime.
Contents of section __TEXT,__objc_methname:
100003f24 73657456 616c7565 3a007661 6c756500 setValue:.value.
100003f34 69735065 72666563 74537175 61726500 isPerfectSquare.
100003f44 72616e64 6f6d496e 6974006e 65617265 randomInit.neare
100003f54 73745065 72666563 74537175 61726500 stPerfectSquare.
100003f64 69735072 696d6500 5f76616c 75650054 isPrime._value.T
100003f74 692c565f 76616c75 6500 i,V_value.
Contents of section __TEXT,__objc_classname:
100003f7e 6d794e75 6d626572 00 myNumber.
---8<---8<------8<---8<---SNIP---8<---8<------8<---8<---
Thus the section __TEXT,__cstring
contains the strings that are used in the program, unsurprisingly. The section __TEXT,__objc_methname
contains the names of the methods defined in the class myNumber, and the name of the classes defined in the program are contained in the section __TEXT,__objc_classname
. This is not an Objective-C tutorial, so we don’t discuss methods such as setValue
.
It comes time to see the symbol table of this program. The symbol table of a program is an ADT used by the compiler to define the symbols of the program – more info can be found at the Wikipedia page en.wikipedia.org/wiki/Symbol_table. We’ll come back on this in another article.
So, to obtain the symbol table, we simply use the switches -t, --syms
. In this case, we have:
gbiondo@tripleX Debugging % objdump --syms myNumber
myNumber: file format mach-o 64-bit x86-64
SYMBOL TABLE:
00000001000036f0 l F __TEXT,__text -[myNumber randomInit]
0000000100003780 l F __TEXT,__text -[myNumber isPerfectSquare]
0000000100003800 l F __TEXT,__text -[myNumber nearestPerfectSquare]
0000000100003930 l F __TEXT,__text -[myNumber isPrime]
00000001000039f0 l F __TEXT,__text -[myNumber value]
0000000100003a10 l F __TEXT,__text -[myNumber setValue:]
0000000100008040 l O __DATA,__objc_const __OBJC_METACLASS_RO_$_myNumber
0000000100008088 l O __DATA,__objc_const __OBJC_$_INSTANCE_METHODS_myNumber
0000000100008120 l O __DATA,__objc_const __OBJC_$_INSTANCE_VARIABLES_myNumber
0000000100008148 l O __DATA,__objc_const __OBJC_$_PROP_LIST_myNumber
0000000100008160 l O __DATA,__objc_const __OBJC_CLASS_RO_$_myNumber
00000001000081e0 l O __DATA,__objc_ivar _OBJC_IVAR_$_myNumber._value
0000000100008238 l O __DATA,__data __dyld_private
0000000100008210 g O __DATA,__objc_data _OBJC_CLASS_$_myNumber
00000001000081e8 g O __DATA,__objc_data _OBJC_METACLASS_$_myNumber
0000000100000000 g F __TEXT,__text __mh_execute_header
0000000100003a30 g F __TEXT,__text _main
0000000000000000 *UND* _NSLog
0000000000000000 *UND* _OBJC_CLASS_$_NSObject
0000000000000000 *UND* _OBJC_METACLASS_$_NSObject
0000000000000000 *UND* ___CFConstantStringClassReference
0000000000000000 *UND* __objc_empty_cache
0000000000000000 *UND* _objc_alloc_init
0000000000000000 *UND* _objc_autoreleasePoolPop
0000000000000000 *UND* _objc_autoreleasePoolPush
0000000000000000 *UND* _objc_msgSend
0000000000000000 *UND* _objc_opt_new
0000000000000000 *UND* _rand
0000000000000000 *UND* _srand
0000000000000000 *UND* _time
0000000000000000 *UND* dyld_stub_binder
Apart from the usual methods that we discussed previously, we see that at a certain point, this program will use the methods _objc_alloc_init
, all the other methods prepended with _objc
; rand
, srand
, and time
.
We have found many references to subroutines that we took for granted, like if they were native instructions. An example could be _NSLog, but actually, this is a function defined somewhere else. It’s now time to understand where we can find it. We can discover the shared libraries used for linked files using the switch --dylibs-used
. In this case, we need to specify we are interested in the MachO executable, using the switch -m
. Dylibs are explained at the address: developer.apple.com/library/archive/documen... We have:
gbiondo@tripleX Debugging % objdump --dylibs-used -m myNumber
myNumber:
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1856.105.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1856.105.0)
/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
The first library loaded is the Apple Foundation Framework. Information can be found at the URL developer.apple.com/documentation/foundation.
libSystem
is the system library and, in turn, implements other libraries such as libc
, libm
(math library), libpthread
(POSIX threads), …
Seeing what this library contains may be a little tricky. On older MacOS versions (up until Big Sur), one could run:
gbiondo@vecho ~ % objdump -m --dylibs-used shared_cache/usr/lib/libSystem.B.dylib
shared_cache/usr/lib/libSystem.B.dylib:
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.50.1)
/usr/lib/system/libcache.dylib (compatibility version 1.0.0, current version 83.0.0, reexport)
/usr/lib/system/libcommonCrypto.dylib (compatibility version 1.0.0, current version 60178.40.2, reexport)
... ... ... ... ... ... ... ...
/usr/lib/system/libxpc.dylib (compatibility version 1.0.0, current version 2038.40.38, reexport)
However, in newer versions (Monterey) this doesn’t work anymore:
gbiondo@tripleX Debugging % objdump -m --dylibs-used shared_cache/usr/lib/libSystem.B.dylib
/Library/Developer/CommandLineTools/usr/bin/objdump: error: 'shared_cache/usr/lib/libSystem.B.dylib': No such file or directory
If you are wondering: the list of all libraries obtained before can be found in the file libSystem.B.dylib.included in the documentation.
The reason behind this discontinuity is the fact that with newer MacOS versions, Apple decided to push even more the virtualisation of the system. A discussion on the developer forums explains better the effects of this choice: developer.apple.com/forums/thread/655588.
We also included Foundation, hence the CoreFoundation framework libraries are in. The obvious implication is the fact that some Objective-C code is in the program as well, so its runtime is included (libobjc.A.dylib
).
Conclusions
Here I had multiple objectives. First, I wanted to illustrate in practical terms what I explained in my previous article. I wanted also to show how the command objdump
helps us to do binary analysis. Finally, I have introduced the hell of dynamic libraries.