Disassemble x86 code in a program at a specific address - c++

Is there any x86 disassembler framework that can be used to analyze code from a specific address in a program, as in:
info = disassemble( startAddress , stopAddress)
It should show every instruction and its operands and any other info that is good for analysis but it should have also fast mode where it isn't so important to obtain that much info for each instruction, but only for some of them that can be specified.

Is GNU binutils not good enough? Here's how to do that with the objdump utility:
# Disassemble from virtual addresses 0x80000000 to 80000100
objdump -d program --start-address=0x80000000 --stop-address=0x80000100

Google's protobuf uses libdisasm for that matter. Sad thing is that (judging from source code) it only supports ia32 and x86 and homepage states that "it is x86 specific and will not be expanded to include other CPU architectures". But since you didn't mention other archs, this library may be sufficient.

The cross-platform InstructionAPI suits this purpose, it will analyze the code and can print out the disassembly or provide a machine-independent view of the instructions for you to query. InstructionAPI is a shared library that you would link your code against.
http://www.paradyn.org/html/manuals.html

I would debug to the start point and look at the disassembly output of the debugger. A more brutal method is to disassemble it all and search for the function name in the disassembly file. objconv can do this, but it is slow on very big files.

Try BeaEngine

Related

Meaning of a gdb backtrace when there is not source code

I have a gdb backtrace of a crashed process, but I can't see the specific line in which the crash occurred because the source code was not in that moment. I don't understand some of the information given by the mentioned backtrace.
The backtrace is made of lines like the following one:
<path_to_binary_file>(_Z12someFunction+0x18)[0x804a378]
Notice that _Z12someFunction is the mangled name of int someFunction(double ).
My questions are:
Does the +0x18 indicate the offset, starting at _Z12someFunction address, of the assembly instruction that produced the crash?
If the previous question is affirmative, and taking into account that I am working with a 32-bit architecture, does the +0x18 indicates 0x18 * 4 bytes?
If the above is affirmative, I assume that the address 0x804a378 is the _Z12someFunction plus 0x18, am I right?
EDIT:
The error has ocurred in a production machine (no cores enabled), and it seems to be a timing-dependant bug, so it is not easy to reproduce it. That is because the information I am asking for is important to me in this occasion.
Most of your assumptions are correct. The +0x18 indeed means offset (in bytes, regardless of architecture) into the executable.
0x804a378 is the actual address in which the error occurred.
With that said, it is important to understand what you can do about it.
First of all, compiling with -g will produce debug symbols. You, rightfully, strip those for your production build, but all is not lost. If you take your original executable (i.e. - before you striped it), you can run:
addr2line -e executable
You can then feed into stdin the addresses gdb is giving you (0x804a378), and addr2line will give you the precise file and line to which this address refers.
If you have a core file, you can also load this core file with the unstriped executable, and get full debug info. It would still be somewhat mangled, as you're probably building with optimizations, but some variables should, still, be accessible.
Building with debug symbols and stripping before shipping is the best option. Even if you did not, however, if you build the same sources again with the same build tools on the same environment and using the same build options, you should get the same binary with the same symbols locations. If the bug is really difficult to reproduce, it might be worthwhile to try.
EDITED to add
Two more important tools are c++filt. You feed it a mangled symbol, and produces the C++ path to the actual source symbol. It works as a filter, so you can just copy the backtrace and paste it into c++filt, and it will give you the same backtrace, only more readable.
The second tool is gdb remote debugging. This allows you to run gdb on a machine that has the executable with debug symbols, but run the actual code on the production machine. This allows live debugging in production (including attaching to already running processes).
You are confused. What you are seeing is backtrace output from glibc's backtrace function, not gdb's backtrace.
but I can't see the specific line in which the crash occurred because
the source code was not in that moment
Now you can load executable in gdb and examine the address 0x804a378 to get line numbers. You can use list *0x804a378 or info symbol 0x804a378. See Convert a libc backtrace to a source line number and How to use addr2line command in linux.
Run man gcc, there you should see -g option that gives you possibility to add debug information to the binary object file, so when crash happens and the core is dumped gdb can detect exact lines where and why the crash happened, or you can run the process using gdb or attach to it and see the trace directly without searching for the core file.

Measure static memory usage for C++ ported to embedded platform

I have created a small program as a proof-of-concept for a system which are to be implemented on an embedded platform. The program is written in C++11 with use of std and compiled to run on a laptop. The final program which should be implemented later is an embedded system. We do not have access to the compiler of the embedded platform.
I would like to know if there is a way to determine a programs static memory (the size of the compiled binaries) in a sensible and comparable way when it should be ported to an embedded platform.
The requirement is that the size of the binary is less than 10kb.
Our binary has a size of 700Kb when compiled and stripped with the following flags:
g++ options: -Os -s -ffunction-sections -fdata-sections
linker options: -s -Wl,--gc-sections
strip libmodel.a -s -R .comment -R .gnu.version --strip-unneeded -R .note
It took up 4MB before we used strip and optimization options.
I am still way off and it is not really that big a program. How can I justify a comparison in any way with an equivalent program on an embedded platform.
Note that the size of the binary can be a little deceptive in the sense that uninitialised variables, the .bss sections, will not necessarily take up physical space in the binary as these are generally just noted as present without actually have any space given to them... this normally happens by the OS loader when it runs your program.
objdump (http://www.gnu.org/software/binutils/) or perhaps elfdump or the elf tool chain (http://sourceforge.net/apps/trac/elftoolchain/) will help you determine the size of your various segments, data and text, as well as the size of individual functions and globals etc. All these programs "look" into your compiled binary and extract a lot of information such as the size of the .text, .data section, list the various symbols, their locations and sizes, and can even dissasemble the .text section...
An example of using elfdump on an ELF image test.elf might be elfdump -z test.elf > output.txt. This will dump everything including text section dissassembly. For example, from an elfdump on my system I saw
Section #6: .text, type=NOBITS, addr=0x500, off=0x5f168
size=149404(0x2479c), link=0, info=0, align=16, entsize=1
flags=<WRITE,ALLOC,EXECINSTR>
Section #7: .text, type=NOBITS, addr=0x24c9c, off=0x5f168
size=362822(0x58946), link=0, info=0, align=4, entsize=1
flags=<WRITE,ALLOC,EXECINSTR,INCLUDE>
....
Section #9: .rodata, type=NOBITS, addr=0x7d5e4, off=0x5f168
size=7670(0x1df6), link=0, info=0, align=4, entsize=1
flags=<WRITE,ALLOC>
So I can see how much my code is taking up (the .text sections) and my read only data. Later in the file I then see...
Symbol table ".symtab"
Value Size Bind Type Section Name
----- ---- ---- ---- ------- ----
218 0x7c090 130 LOC FUNC .text IRemovedThisName
So I can see that my function IRemovedThisName takes 130 bytes. A quick script would allow you list functions sorted by size and variables sorted by size. This could point you at places to optimize...
For a good example of objdump try http://www.thegeekstuff.com/2012/09/objdump-examples/, specifically the section 3, which shows you how to get the contents of the section headers using the -h option.
As to how the program will compare on two different platforms I think you will just have to compile on both platforms and compare the results you get from your obj/elfdump on each system - the results will depend on the system instruction set, how well each compiler can optimize, general hardware architecture differences etc.
If you don't have access to the embedded system, you might try using a cross-compiler, configured for your eventual target, on your laptop. This would give you a binary suited to the embedded platform and the tools to analyze the file (i.e. the cross-platform version of objdump). This would give you some ball-park figures for how the program would look on the eventual embedded sys.
Hope this helps.
EDIT: This will also help How to get the size of a C function from inside a C program or with inline assembly?
It appeared that the included libraries took up an enormous of space (as it was pointed out in the comment) and by removing these it was possible to reduce the size to nearly nothing in combination with the following flags:
set(CMAKE_CXX_FLAGS "-Os -s -ffunction-sections -fdata-sections -DNO_STD -fno-rtti -fno-exceptions")
set(CMAKE_EXE_LINKER_FLAGS "-s -Wl,--gc-sections")
And stripping away any unnecessary code using:
strip libmodel.a -s -R .comment -R .gnu.version --strip-unneeded -R .note
The 4MB could be reduced to 9.4kb which is below our limit.
In summary, std takes up an tremendous amount of space.

How to map PC (ARMv5) address to source code?

I'm developing on an ARM9E processor running Linux. Sometimes my application crashes with the following message :
[ 142.410000] Alignment trap: rtspserverd (996) PC=0x4034f61c
Instr=0xe591300c Address=0x0000000d FSR 0x001
How can I translate the PC address to actual source code? In other words, how can I make sense out of this message?
With objdump. Dump your executable, then search for 4034f61c:.
The -x, --disassemble, and -l options are particularly useful.
You can turn on listings in the compiler and tell the linker to produce a map file. The map file will give you the meaning of the absolute addresses up to the function where the problem occurs, while the listing will help you pinpoint the exact location of the exception within the function.
For example in gcc you can do
gcc -Wa,-a,-ad -c foo.c > foo.lst
to produce a listing in the file foo.lst.
-Wa, sends the following options to the assembler (gas).
-a tells gas to produce a listing on standard output.
-ad tells gas to omit debug directives, which would otherwise add a lot of clutter.
The option for the GNU linker to produce a map file is -M or --print-map. If you link with gcc you need to pass the option to the linker with an option starting with -Wl,, for example -Wl,-M.
Alternatively you could also run your application in the debugger (e.g. gdb) and look at the stack dump after the crash with the bt command.

How can I output a C + Assembly program trace using GDB?

I'm debugging a nasty problem where #includeing a file(not anything I wrote, for the record) causes a crash in my program. This means, I have working and broken, with only one C(++) include statement changed. Some of the libraries I'm using don't have debugging information.
What I would like to do is get GDB to output every line of C++ executed for the program run, and x86 instructions where not available to a textfile in such a format that I can diff the two outputs and hopefully figure out what went wrong.
Is this easily possible in GDB?
You can check the difference between the pre-processed output in each version. For example:
gcc -dD -E a.cc -o a.pre
gcc -dD -E b.cc -o b.pre
diff -u a.pre b.pre
You can experiment with different "-d" settings to make that more verbose/concise. Maybe something in the difference of listings will be obvious. It's usually something like a struct which changes size depending on include files.
Failing that, if you really want to mess with per-instruction or line traces, you could probably use valgrind and see where the paths diverge, but I think you may be in for a world of pain. In fact you'll probably find valgrind finds your bug and then 100 you didn't know about :) I expect the problem is just a struct or other data size difference, and you won't need to bother.
You could get gdb to automate line tracing. It would be quite painful. Basically you'd need to script it to run "n" (next line) repeatedly until a crash, then check the logs. If you can script "b main", then "run", then infinite "n" that would do it. There's probably a built-in command to do it but I'm not aware of it.
I don't think GDB can do this; maybe a profile will help, though? Are you compiling with gcc? Look at the -p and -pf commands, I think those might be useful.
The disassemble command at the gdb prompt will disassemble the current function you are stopped in, but I don't think outputting the entire execution path is feasible.
What library are you including? If it is open source, you can recompile it with debugging symbols enabled. Also, if you're using Linux, most distributions have -dbg versions of packages for common libraries.

analysis of core file

I'm using Linux redhat 3, can someone explain how is that possible that i am able to analyze
with gdb , a core dump generated in Linux redhat 5 ?
not that i complaint :) but i need to be sure this will always work... ?
EDIT: the shared libraries are the same version, so no worries about that, they are placed in a shaerd storage so it can be accessed from both linux 5 and linux 3.
thanks.
You can try following commands of GDB to open a core file
gdb
(gdb) exec-file <executable address>
(gdb) set solib-absolute-prefix <path to shared library>
(gdb) core-file <path to core file>
The reason why you can't rely on it is because every process used libc or system shared library,which will definitely has changes from Red hat 3 to red hat 5.So all the instruction address and number of instruction in native function will be diff,and there where debugger gets goofed up,and possibly can show you wrong data to analyze. So its always good to analyze the core on the same platform or if you can copy all the required shared library to other machine and set the path through set solib-absolute-prefix.
In my experience analysing core file, generated on other system, do not work, because standard library (and other libraries your program probably use) typically will be different, so addresses of the functions are different, so you cannot even get a sensible backtrace.
Don't do it, because even if it works sometimes, you cannot rely on it.
You can always run gdb -c /path/to/corefile /path/to/program_that_crashed. However, if program_that_crashed has no debug infos (i.e. was not compiled and linked with the -g gcc/ld flag) the coredump is not that useful unless you're a hard-core debugging expert ;-)
Note that the generation of corefiles can be disabled (and it's very likely that it is disabled by default on most distros). See man ulimit. Call ulimit -c to see the limit of core files, "0" means disabled. Try ulimit -c unlimited in this case. If a size limit is imposed the coredump will not exceed the limit size, thus maybe cutting off valuable information.
Also, the path where a coredump is generated depends on /proc/sys/kernel/core_pattern. Use cat /proc/sys/kernel/core_pattern to query the current pattern. It's actually a path, and if it doesn't start with / then the file will be generated in the current working directory of the process. And if cat /proc/sys/kernel/core_uses_pid returns "1" then the coredump will have the file PID of the crashed process as file extension. You can also set both value, e.g. echo -n /tmp/core > /proc/sys/kernel/core_pattern will force all coredumps to be generated in /tmp.
I understand the question as:
how is it possible that I am able to
analyse a core that was produced under
one version of an OS under another
version of that OS?
Just because you are lucky (even that is questionable). There are a lot of things that can go wrong by trying to do so:
the tool chains gcc, gdb etc will
be of different versions
the shared libraries will be of
different versions
so no, you shouldn't rely on that.
You have asked similar question and accepted an answer, ofcourse by yourself here : Analyzing core file of shared object
Once you load the core file you can get the stack trace and get the last function call and check the code for the reason of crash.
There is a small tutorial here to get started with.
EDIT:
Assuming you want to know how to analyse core file using gdb on linux as your question is little unclear.