Disassemble raw AArch64 binary using LLVM tools - llvm

I can disassemble raw binary file using the following command:
> aarch64-linux-gnu-objdump -m aarch64 -b binary -D file.bin
Can I achieve the same effect with llvm-objdump and how? Maybe any other tool from LLVM toolchain?

The easiest way I've found to do this using only LLVM tools is to first objcopy the binary into an ELF and then objdump the ELF.
Convert the file
llvm-objcopy -I binary -B aarch64 --rename-section=.data=.text,code file.bin file.elf
Let's go through this option-by-option:
-I binary: specifies that the input is in raw binary, rather than ELF, form.
-B aarch64 (LLVM 91): specifies that the binary is to be interpreted as AArch64 machine code.
--rename-section=.data=.text,code: specifies that the section named .data that automatically gets created when copying from a binary file should instead be named .text and marked as executable code. This allows disassembly with -d to work later.
Disassemble the file
llvm-objdump -d file.elf
This one's pretty self-explanatory (and the same as you'd write with GNU objdump). -d says to disassemble all code sections, and the only code section is the one that we marked using --rename-section in the previous step.
1This command is for LLVM 9 and below. LLVM 10 has removed the binary-specific -B option in favor of specifying your output target with the -O option, so you'd instead write -O elf64-littleaarch64.

Related

Object code generation for new RISCV instruction emitted by LLVM backend

From https://github.com/riscv/riscv-llvm,
Using the llvm-riscv is fairly simple to build a full executable
however you need riscv64-unknown-*-gcc to do the assembling and
linking. An example of compiling hello world:
$ clang -target riscv64 -mriscv=RV64IAMFD -S hello.c -o hello.S
$ riscv64-unknown-elf-gcc -o hello.riscv hello.S
My question is: if I change the LLVM backend and get it to emit a new instruction in the hello.S file, how will riscv64-unknown-elf-gcc know how to convert it into object code? Do I also need to make changes in riscv64-unknown-elf-gcc so that it knows the format of the new instruction?
riscv64-unknown-elf-gcc calls as, i.e. usually GNU as from the binutils to assemble assembly code (i.e. hello.S in your snippet) into executable machine code. Thus you would have to modify the binutils if you want to assemble a new instruction.

C++ use a linux library on a mac (elf64-x86-64 on x86_64-apple-darwin)

I'm currently trying to compile a program on a Mac OS X (10.9) using a library initially compiled for Linux.
Is there a way to use this library? Here is the output of objdump -f libmylib.a:
Hour.o: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x0000000000000000
Menu.o: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x0000000000000000
Tools.o: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x0000000000000000
I know my current architecture is x86_64-apple-darwin13.0.0, and I wonder if, with the appropriate compiler flags, there is a way to make this compile.
One more thing, here is the error when trying to compile:
g++ -L /Users/gustavemonod/Desktop/ -o Parking Mother.o Keyboard.o -lncurses -ltcl -lmylib
ld: warning: ignoring file /Users/gustavemonod/lib/libmylib.a, file was built for archive which is not the architecture being linked (x86_64): /Users/gustavemonod/lib/libmylib.a
Linux uses a library format called ELF. Mac does not use ELF (instead, Mac uses a format called Mach-O), so I suspect that's going to be very difficult (if not actually impossible). You might be able to make the Linux Binary Compatibility from FreeBSD work.
You cannot link ELF objects (or archives or shared libraries) with Mach-O. You can try using Agner Fog's objconv utility, to convert x86-64 ELF to x86-64 Mach-O, which use the same (ELF) calling conventions. I wouldn't recommend this approach if you can compile from source.

Get the compiler options from a compiled executable?

It there a way to see what compiler and flags were used to create an executable file in *nix? I have an old version of my code compiled and I would like to see whether it was compiled with or without optimization. Google was not too helpful, but I'm not sure I am using the correct keywords.
gcc has a -frecord-gcc-switches option for that:
-frecord-gcc-switches
This switch causes the command line that was used to invoke the compiler to
be recorded into the object file that is being created. This switch is only
implemented on some targets and the exact format of the recording is target
and binary file format dependent, but it usually takes the form of a section
containing ASCII text.
Afterwards, the ELF executables will contain .GCC.command.line section with that information.
$ gcc -O2 -frecord-gcc-switches a.c
$ readelf -p .GCC.command.line a.out
String dump of section '.GCC.command.line':
[ 0] a.c
[ 4] -mtune=generic
[ 13] -march=x86-64
[ 21] -O2
[ 25] -frecord-gcc-switches
Of course, it won't work for executables compiled without that option.
For the simple case of optimizations, you could try using a debugger if the file was compiled with debug info. If you step through it a little, you may notice that some variables were 'optimized out'. That suggests that optimization took place.
If you compile with the -frecord-gcc-switches flag, then the command line compiler options will be written in the binary in the note section. See also the docs.
Another option is -grecord-gcc-swtiches (note, not -f but -g). According to gcc docs it'll put flags into dwarf debug info. And looks like it's enabled by default since gcc 4.8.
I've found dwarfdump program to be useful to extract those cflags. Note, strings program does not see them. Looks like dwarf info is compressed.
As long as the executable was compiled by gcc with -g option, the following should do the trick:
readelf --debug-dump=info /path/to/executable | grep "DW_AT_producer"
For example:
% cat test.c
int main() {
return 42;
}
% gcc -g test.c -o test
% readelf --debug-dump=info ./test | grep "DW_AT_producer"
<c> DW_AT_producer : (indirect string, offset: 0x2a): GNU C17 10.2.0 -mtune=generic -march=x86-64 -g
Sadly, clang doesn't seem to record options in similar way, at least in version 10.
Of course, strings would turn this up too, but one has to have at least some idea of what to look for as inspecting all the strings in real-world binary with naked eyes is usually impractical. E.g. with the binary from above example:
% strings ./test | grep march
GNU C17 10.2.0 -mtune=generic -march=x86-64 -g -O3
This is something that would require compiler support. You don't mention what compiler you are using but since you tagged your question linux I will assume you are using gcc -- which does not default the feature you're asking about (but -frecord-gcc-switches is an option to perform this).
If you want to inspect your binary, the strings command will show you everything that appears to be a readable character string within the file.
If you still have the compiler (same version) you used, and it is only one flag you're unsure about, you can try compiling your code again, once with and once without the flag. Then you can compare the executables. Your old one should be identical, or very similar, to one of the new ones.
I highly doubt it is possible:
int main()
{
}
When compiled with:
gcc -O3 -ffast-math -g main.c -o main
None of the parameters can be found in the generated object:
strings main | grep -O3
(no output)

Disassemble the executable created by g++ in mac osx

How can I see the disassembled version of the executable (eg. a.out) of a C++ program on Mac OSx?
It's not exactly what you're asking for, but g++ -S produces assembly from source code and can be expected to be more readable than a disassembled version.
If you can't recompile with -S (e.g. no source code), then gdb lets you disassemble, as does objdump --disassemble. Depends what you've installed.
See also: https://superuser.com/questions/206547/how-can-i-install-objdump-on-mac-os-x
Look at otool. i.e., otool -tv a.out
Edit: To add to Tony's answer, objdump also has name demangling for C++, i.e.,
objdump -tC a.out (IIRC)
I gave a previous answer on how to build and install the binutils for darwin.

difference between -h <name> and -o <outputfile> options in cc (C++)

I am building .so library and was wondering - what is the difference b/w -h and -o cc complier option (using the Sun Studio C++) ?
Aren't they are referring to the same thing - the name of the output file?
-o is the name of the file that will be written to disk by the compiler
-h is the name that will be recorded in ELF binaries that link against this file.
One common use is to provide library minor version numbers. For instance, if
you're creating the shared library libfoo, you might do:
cc -o libfoo.so.1.0 -h libfoo.so.1 *.o
ln -s libfoo.so.1.0 libfoo.so.1
ln -s libfoo.so libfoo.so.1
Then if you compile your hello world app and link against it with
cc -o hello -lfoo
the elf binary for hello will record a NEEDED entry for libfoo.so.1 (which you can
see by running elfdump -d hello ).
Then when you need to add new functions later, you could change the -o value to
libfoo.so.1.1 but leave the -h at libfoo.so.1 - all the programs you already built
with 1.0 still try to load libfoo.so.1 at runtime, so continue to work without being
rebuilt, but you'll see via ls that it's 1.1.
This is also sometimes used when building libraries in the same directory they're
used at runtime, if you don't have a separate installation directory or install
via a packaging system. To avoid crashing programs that are running when you
overwrite the library binary, and to avoid programs not being able to start when
you're in the middle of building, some Makefiles will do:
cc -o libfoo.so.1.new -h libfoo.so.1 *.o
rm libfoo.so.1 ; mv libfoo.so.1.new libfoo.so.1
(Makefiles built by the old Imake makefile generator from X commonly do this.)
They are referring to different names. Specifically, the -o option is the file's actual name - the one on the filesystem. The -h option sets the internal DT_SONAME in the final object file. This is the name by which the shared object is referenced internally by other modules. I believe it's the name that you also see when you run ldd on objects that link to it.
The -o option will name the output file while the -h option will set an intrinsic name inside the library. This intrinsic name has precedence over the file name when used by the dynamic loader and allows it to use predefined rules to peek the right library.
You can see what intrinsic name was recorded into a given library with that command:
elfdump -d xxx.so | grep SONAME
Have a look here for details:
http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter4-97194.html