Abstract
In this article, we will show how to obtain a system-specific executable from its fat version.
A prerequisite to understanding all the contents of this article is having a solid grasp of the MachO binary file format. Would you need to fill some gaps, I'd strongly recommend reading Parsing Mach-O files and the pages thereby linked.
Extracting the executable for an architecture
In the following examples, we have copied a well known fat binary, namely mv
, in a working directory:
gbiondo@tripleX temp % ls -al
total 272
drwxr-xr-x 3 gbiondo staff 96 17 Mar 15:56 .
drwxr-xr-x 20 gbiondo staff 640 17 Mar 15:55 ..
-rwxr-xr-x 1 gbiondo staff 135520 17 Mar 15:56 mv
We can quickly see it's a fat binary; in fact, it contains both the executable for the 'old' X86 architecture and the new arm:
gbiondo@tripleX temp % file mv
mv: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
mv (for architecture x86_64): Mach-O 64-bit executable x86_64
mv (for architecture arm64e): Mach-O 64-bit executable arm64e
We want to extract the x86_64 Mach-O file from it; this would result in a purely x86_64 executable.
The most elegant method I found to date was by using the lipo
utility. From its man page:
The
lipo
tool creates or operates on "universal" (multi-architecture) files. Generally,lipo
reads a single input file and writes to a single output file, although some commands and options accept multiple input files.lipo
will only ever write to a single output file, and input files are never modified in place.The
lipo
tool supports several commands for creating universal files from single-architecture files, extracting single-architecture files from universal files, and displaying architecture information.Furthermore,
lipo
can only perform one such command at a time, although some command flags may appear more than once. Some commands support additional options that can be used with that command. In addition, there are global options that are supported by multiple commands.The
arch_type
arguments may be any of the supported architecture names listed in the man pagearch(3)
.
So to extract the X64 MachO part, we can proceed as follows:
gbiondo@tripleX temp % lipo mv -remove arm64e -output mv2
gbiondo@tripleX temp % ./mv2 mv2 mv_X64
gbiondo@tripleX temp % ls -al
total 416
drwxr-xr-x 4 gbiondo staff 128 17 Mar 16:11 .
drwxr-xr-x 20 gbiondo staff 640 17 Mar 15:55 ..
-rwxr-xr-x 1 gbiondo staff 135520 17 Mar 15:56 mv
-rwxr-xr-x 1 gbiondo staff 70320 17 Mar 16:10 mv_X64
This actually shows:
- we have created a smaller file, which originally was named
mv2
, that also - worked perfectly as
mv
would have done; in fact, it renamedmv2
...
To check that the file is consistently an X86 MachO:
gbiondo@tripleX temp % file mv_X64
mv_X64: Mach-O universal binary with 1 architecture: [x86_64:Mach-O 64-bit executable x86_64]
mv_X64 (for architecture x86_64): Mach-O 64-bit executable x86_64
There are many other functionalities lipo can do, for instance, we could have issued:
gbiondo@tripleX temp % lipo -info mv*
Architectures in the fat file: mv are: x86_64 arm64e
Architectures in the fat file: mv_X64 are: x86_64
or
gbiondo@tripleX temp % lipo -detailed_info mv*
Fat header in: mv
fat_magic 0xcafebabe
nfat_arch 2
architecture x86_64
cputype CPU_TYPE_X86_64
cpusubtype CPU_SUBTYPE_X86_64_ALL
capabilities 0x0
offset 16384
size 53936
align 2^14 (16384)
architecture arm64e
cputype CPU_TYPE_ARM64
cpusubtype CPU_SUBTYPE_ARM64E
capabilities PTR_AUTH_VERSION USERSPACE 0
offset 81920
size 53600
align 2^14 (16384)
Fat header in: mv_X64
fat_magic 0xcafebabe
nfat_arch 1
architecture x86_64
cputype CPU_TYPE_X86_64
cpusubtype CPU_SUBTYPE_X86_64_ALL
capabilities 0x0
offset 16384
size 53936
align 2^14 (16384)
This last example is not different from what we'd obtain with otool -f -v mv*
, so we strongly encourage the reader to play a bit around with the tool.
Are there any alternatives? The best alternative I found is the old dd
UNIX utility.
I have done two different experiments: dd
'ing only the first (16384+53936) = 70320 bytes and the whole arm64e offset, 81920 bytes. If you're wondering, the values 16384 and 53936 come from the lines in offset and size, in the listing above.
The results are as follows:
gbiondo@tripleX temp % dd if=mv of=mv_dd1 bs=70320 count=1
1+0 records in
1+0 records out
70320 bytes transferred in 0.000095 secs (739206660 bytes/sec)
gbiondo@tripleX temp % dd if=mv of=mv_dd2 bs=81920 count=1
1+0 records in
1+0 records out
81920 bytes transferred in 0.000895 secs (91528339 bytes/sec)
gbiondo@tripleX temp % ls -al mv*
-rwxr-xr-x 1 gbiondo staff 135520 17 Mar 15:56 mv
-rwxr-xr-x 1 gbiondo staff 70320 17 Mar 16:10 mv_X64
-rw-r--r-- 1 gbiondo staff 70320 18 Mar 09:30 mv_dd1
-rw-r--r-- 1 gbiondo staff 81920 18 Mar 09:30 mv_dd2
First of all, we note that the two newborn files are not executable. That's understandable, for dd
doesn't care about permissions. Secondly, we observe that only the first file has the same size as the one extracted before - this justifies the conjecture that lipo actually strips the executable to the minimum possible size. We still don't know:
if these files can actually run
if so, if the result is the intended one
and finally, if they are as stable as the original file.
Now, we aren't testing for the third condition. Focusing on the first two, we proceed by making them executable from the shell, and observing the results (Note: this would have been way better with a utility that gives more explicit results, such as cc
- you may want to try the same method with it). We proceed to make the files executable:
gbiondo@tripleX temp % chmod +x mv_dd[12]
gbiondo@tripleX temp % ls -al mv_dd[12]
-rwxr-xr-x 1 gbiondo staff 70320 18 Mar 09:30 mv_dd1
-rwxr-xr-x 1 gbiondo staff 81920 18 Mar 09:30 mv_dd2
and then testing their functionality:
gbiondo@tripleX temp % ./mv_dd1 mv_dd2 mv_dd_obese
gbiondo@tripleX temp % ./mv_dd_obese mv_dd1 mv_dd_stripped
gbiondo@tripleX temp % ls -al
total 720
drwxr-xr-x 6 gbiondo staff 192 18 Mar 09:40 .
drwxr-xr-x 20 gbiondo staff 640 18 Mar 09:01 ..
-rwxr-xr-x 1 gbiondo staff 135520 17 Mar 15:56 mv
-rwxr-xr-x 1 gbiondo staff 70320 17 Mar 16:10 mv_X64
-rwxr-xr-x 1 gbiondo staff 81920 18 Mar 09:30 mv_dd_obese
-rwxr-xr-x 1 gbiondo staff 70320 18 Mar 09:30 mv_dd_stripped
Finally, we want to check how the OS recognizes the new executables. We have:
gbiondo@tripleX temp % file mv_dd_obese
mv_dd_obese: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e]
mv_dd_obese (for architecture x86_64): Mach-O 64-bit executable x86_64
mv_dd_obese (for architecture arm64e):
gbiondo@tripleX temp % file mv_dd_stripped
mv_dd_stripped: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e]
mv_dd_stripped (for architecture x86_64): Mach-O 64-bit executable x86_64
mv_dd_stripped (for architecture arm64e):
Takeaways and other considerations
We have empirically proven that the lipo
and dd
approaches give a comparable result (the 'stripped' executable), but the headers of the files obtained with the second method are inconsistent.
We also have empirically proven that, apart from the header, lipo
and dd
with a block size of (offset+size) return a file of the same size (that makes us conjecturing that lipo does something similar and modify the header afterward).
Pro's and con's
The dd
utility is present in all POSIX compliant systems, this would allow to process the executable in another system, if needed.
On the other hand, if we were to extract the executable for the second architecture with dd
, we'd need to do some extra work.
Conclusions
Here the conclusion is really situational: I would use lipo
unless I am forced not to do so, at least for consistency.