Working with Fat Files: extracting a binary

MacOS Binary Analysis 101

Abstract

In this article, we will show how to obtain a system-specific executable from its fat version.

A prerequisite to understanding all the contents of this article is having a solid grasp of the MachO binary file format. Would you need to fill some gaps, I'd strongly recommend reading Parsing Mach-O files and the pages thereby linked.

Extracting the executable for an architecture

In the following examples, we have copied a well known fat binary, namely mv, in a working directory:

gbiondo@tripleX temp % ls -al
total 272
drwxr-xr-x   3 gbiondo  staff      96 17 Mar 15:56 .
drwxr-xr-x  20 gbiondo  staff     640 17 Mar 15:55 ..
-rwxr-xr-x   1 gbiondo  staff  135520 17 Mar 15:56 mv

We can quickly see it's a fat binary; in fact, it contains both the executable for the 'old' X86 architecture and the new arm:

gbiondo@tripleX temp % file mv 
mv: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
mv (for architecture x86_64):    Mach-O 64-bit executable x86_64
mv (for architecture arm64e):    Mach-O 64-bit executable arm64e

We want to extract the x86_64 Mach-O file from it; this would result in a purely x86_64 executable.

The most elegant method I found to date was by using the lipo utility. From its man page:

The lipo tool creates or operates on "universal" (multi-architecture) files. Generally, lipo reads a single input file and writes to a single output file, although some commands and options accept multiple input files. lipo will only ever write to a single output file, and input files are never modified in place.

The lipo tool supports several commands for creating universal files from single-architecture files, extracting single-architecture files from universal files, and displaying architecture information.

Furthermore, lipo can only perform one such command at a time, although some command flags may appear more than once. Some commands support additional options that can be used with that command. In addition, there are global options that are supported by multiple commands.

The arch_type arguments may be any of the supported architecture names listed in the man page arch(3).

So to extract the X64 MachO part, we can proceed as follows:

gbiondo@tripleX temp % lipo mv -remove arm64e -output mv2 
gbiondo@tripleX temp % ./mv2 mv2 mv_X64
gbiondo@tripleX temp % ls -al
total 416
drwxr-xr-x   4 gbiondo  staff     128 17 Mar 16:11 .
drwxr-xr-x  20 gbiondo  staff     640 17 Mar 15:55 ..
-rwxr-xr-x   1 gbiondo  staff  135520 17 Mar 15:56 mv
-rwxr-xr-x   1 gbiondo  staff   70320 17 Mar 16:10 mv_X64

This actually shows:

  1. we have created a smaller file, which originally was named mv2, that also
  2. worked perfectly as mv would have done; in fact, it renamed mv2...

To check that the file is consistently an X86 MachO:

gbiondo@tripleX temp % file mv_X64 
mv_X64: Mach-O universal binary with 1 architecture: [x86_64:Mach-O 64-bit executable x86_64]
mv_X64 (for architecture x86_64):    Mach-O 64-bit executable x86_64

There are many other functionalities lipo can do, for instance, we could have issued:

gbiondo@tripleX temp % lipo -info mv*
Architectures in the fat file: mv are: x86_64 arm64e 
Architectures in the fat file: mv_X64 are: x86_64

or

gbiondo@tripleX temp % lipo  -detailed_info mv*   
Fat header in: mv
fat_magic 0xcafebabe
nfat_arch 2
architecture x86_64
    cputype CPU_TYPE_X86_64
    cpusubtype CPU_SUBTYPE_X86_64_ALL
    capabilities 0x0
    offset 16384
    size 53936
    align 2^14 (16384)
architecture arm64e
    cputype CPU_TYPE_ARM64
    cpusubtype CPU_SUBTYPE_ARM64E
    capabilities PTR_AUTH_VERSION USERSPACE 0
    offset 81920
    size 53600
    align 2^14 (16384)
Fat header in: mv_X64
fat_magic 0xcafebabe
nfat_arch 1
architecture x86_64
    cputype CPU_TYPE_X86_64
    cpusubtype CPU_SUBTYPE_X86_64_ALL
    capabilities 0x0
    offset 16384
    size 53936
    align 2^14 (16384)

This last example is not different from what we'd obtain with otool -f -v mv*, so we strongly encourage the reader to play a bit around with the tool.

Are there any alternatives? The best alternative I found is the old dd UNIX utility.

I have done two different experiments: dd'ing only the first (16384+53936) = 70320 bytes and the whole arm64e offset, 81920 bytes. If you're wondering, the values 16384 and 53936 come from the lines in offset and size, in the listing above.

The results are as follows:

gbiondo@tripleX temp % dd if=mv of=mv_dd1 bs=70320 count=1
1+0 records in
1+0 records out
70320 bytes transferred in 0.000095 secs (739206660 bytes/sec)
gbiondo@tripleX temp % dd if=mv of=mv_dd2 bs=81920 count=1
1+0 records in
1+0 records out
81920 bytes transferred in 0.000895 secs (91528339 bytes/sec)
gbiondo@tripleX temp % ls -al mv*
-rwxr-xr-x  1 gbiondo  staff  135520 17 Mar 15:56 mv
-rwxr-xr-x  1 gbiondo  staff   70320 17 Mar 16:10 mv_X64
-rw-r--r--  1 gbiondo  staff   70320 18 Mar 09:30 mv_dd1
-rw-r--r--  1 gbiondo  staff   81920 18 Mar 09:30 mv_dd2

First of all, we note that the two newborn files are not executable. That's understandable, for dd doesn't care about permissions. Secondly, we observe that only the first file has the same size as the one extracted before - this justifies the conjecture that lipo actually strips the executable to the minimum possible size. We still don't know:

  1. if these files can actually run

  2. if so, if the result is the intended one

  3. and finally, if they are as stable as the original file.

Now, we aren't testing for the third condition. Focusing on the first two, we proceed by making them executable from the shell, and observing the results (Note: this would have been way better with a utility that gives more explicit results, such as cc - you may want to try the same method with it). We proceed to make the files executable:

gbiondo@tripleX temp % chmod +x mv_dd[12]
gbiondo@tripleX temp % ls -al mv_dd[12]
-rwxr-xr-x  1 gbiondo  staff  70320 18 Mar 09:30 mv_dd1
-rwxr-xr-x  1 gbiondo  staff  81920 18 Mar 09:30 mv_dd2

and then testing their functionality:

gbiondo@tripleX temp % ./mv_dd1 mv_dd2 mv_dd_obese
gbiondo@tripleX temp % ./mv_dd_obese mv_dd1 mv_dd_stripped
gbiondo@tripleX temp %  ls -al 
total 720
drwxr-xr-x   6 gbiondo  staff     192 18 Mar 09:40 .
drwxr-xr-x  20 gbiondo  staff     640 18 Mar 09:01 ..
-rwxr-xr-x   1 gbiondo  staff  135520 17 Mar 15:56 mv
-rwxr-xr-x   1 gbiondo  staff   70320 17 Mar 16:10 mv_X64
-rwxr-xr-x   1 gbiondo  staff   81920 18 Mar 09:30 mv_dd_obese
-rwxr-xr-x   1 gbiondo  staff   70320 18 Mar 09:30 mv_dd_stripped

Finally, we want to check how the OS recognizes the new executables. We have:

gbiondo@tripleX temp % file mv_dd_obese 
mv_dd_obese: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e]
mv_dd_obese (for architecture x86_64):    Mach-O 64-bit executable x86_64
mv_dd_obese (for architecture arm64e):    
gbiondo@tripleX temp % file mv_dd_stripped 
mv_dd_stripped: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e]
mv_dd_stripped (for architecture x86_64):    Mach-O 64-bit executable x86_64
mv_dd_stripped (for architecture arm64e):

Takeaways and other considerations

We have empirically proven that the lipo and dd approaches give a comparable result (the 'stripped' executable), but the headers of the files obtained with the second method are inconsistent.

We also have empirically proven that, apart from the header, lipo and dd with a block size of (offset+size) return a file of the same size (that makes us conjecturing that lipo does something similar and modify the header afterward).

Pro's and con's

The dd utility is present in all POSIX compliant systems, this would allow to process the executable in another system, if needed.

On the other hand, if we were to extract the executable for the second architecture with dd, we'd need to do some extra work.

Conclusions

Here the conclusion is really situational: I would use lipo unless I am forced not to do so, at least for consistency.

Did you find this article valuable?

Support RevEng3 - Reverse Engineering by becoming a sponsor. Any amount is appreciated!