How make my GDB understand new instructions? - gdb

I implemented some additional x86 instructions on QEMU for research purpose.
To provide debugging facility for these newly added instructions,
I want GDB understand my new instructions when it debugs binary.
(Now it appears as bad instructions...)
Is there any method that i can do it without modifying GDB source code?
Such as inserting modules... or whatever. Thanks:)

gdb relies on the opcodes library to know how to disassemble. So, to see your new instructions, you simply have to modify this library. opcodes lives in the gdb source tree.

Related

Given program counter, find the source line in a shared library

I'm trying to debug a segfault in Android's surfaceflinger daemon on a custom made ARM board. The process crashes before dumping the call stack and register content, including the program counter.
Normally I would've used objdump and searched for the program counter. The problem is that part of the call stack is in a shared library. Without using gdb, how can I correlate the program counter with a line in the source file? That is, can the addresses of shared library instructions be determined without running the program?
The simplest solution is to load core dump into gdb and use info symbol <program counter address>, see https://stackoverflow.com/a/7648883/72178.
You can also use addr2line but you will have to provide library starting address in parameters to addr2line, see How to map function address to function in *.so files.
You need your program (and all the relevant shared libraries) to be compiled with debug information (in DWARF format), e.g. by passing some -g (or -g2 or -g3) flag to the GCC compiler when they are built. Notice that with GCC such a debugging option can be mixed with optimization options like -O2
Then you might use utilities like addr2line, or perhaps libraries like libbacktrace. FWIW, the GCC compiler itself (actually its cc1plus) uses that libbacktrace library to print a useful backtrace on SIGSEGV and other terminating signals (on compiler crashes).
BTW, you could (and probably should) enable core(5) dumping and do a post mortem analysis of it with gdb
Notice that due to ASLR, a shared library is loaded (actually mmap(2)-ed) at some "random" page.
Read Drepper's How to Write Shared Libraries paper.

I have a release binary with no debug information build with gcc , have source code

When I try to build the source with debug mode the stack shown is totally diffrent and in case of release there are only a few methods shown in the backtrace with gdb ,
Why does this happen ? Is this because in debug mode there are extra methods , How can have two methods have the same address in debug and release mode .
Also in that case How can I build to to have accurate address information with complete stack trace . Any help would be appreciated since I am new to debugging on Linux , Windows it was much easier it seems with pdb files .
As discussed in the comments to #rockoder's answer, besides lacking debug symbols (which would be included with -g) in an optimized build whole function calls may not be present any more due to inlining.
When I try to build the source with debug mode the stack shown is
totally diffrent and in case of release there are only a few methods
shown in the backtrace with gdb , Why does this happen ? Is this
because in debug mode there are extra methods?
It is probably just due to compiler optimizations. What you call release build is probably built with compiler speed optimizations enabled and debug symbols disabled. Speed optimizations include code inlining which just copies function code in place instead of calling it, so function is not visible in call stack.
There could also be some extra/different methods, if the code was written with some appropriate preprocessor checks.
How can have two methods have the same address in debug and release
mode .
Depends on what your debug and release mode are. If they use same compiler optimizations and differ only in debug information, methods will have same addresses. If you debug build is not optimized (-O0 on GCC) then methods will be larger, as much unnecessary work is done, for example variables are read from memory before every manipulation and written back after it. Since each method will probably be smaller, functions will have different addresses, as they are generally packed one after another.
Also in that case How can I build to to have accurate address
information with complete stack trace .
Enable debug information. On GCC that would be -g3 (or -g or similar). This adds enough information for code address <-> source line queries (either from debugger or crash stack dump).
Any help would be appreciated since I am new to debugging on Linux ,
Windows it was much easier it seems with pdb files .
Are there any significant differences with Windows binaries debugging?
g++/gcc has many options used for debugging a program but the most common one is -g. Refer link. The first option discussed is -g.
Some additional information here.
Example:
Compile code without -g:
g++ broken.cpp -o broken_release
Compile code with -g:
g++ -g broken.cpp -o broken_debug
Now fire ls -l and note the size different between the files broken_release and broken_debug. Size of broken_debug should be greater then that of broken_release. That's because it contains debug information which could be used by debuggers like gdb.

When debugging a C++ program with GDB the "next" command seems to skip source lines

When I debug my C++ program, I set a breakpoint on the main function. When the program starts running, it seems to have skipped several lines of source before the line at which it stops. What's the problem?
Your program is probably compiled with optimisation enabled, which means that the lines of source are not necessarily sequentially translated into machine code. Under optimisation, the execution of different parts of the source code can be re-ordered and interleaved - this is likely what you're seeing.
If you want to step through your source code in a simple, sequential line-by-line manner you will need to compile with no optimisation (-O0).
Alternatively, if you understand machine code you can use:
set disassemble-next-line on
which will show you the disassembly of the code that the debugger is stopped on alongside the source code line it belongs to.
You seem to have symbols for your program, as GDB happily reads them. However, do you have the source in the original place or are you perhaps debugging on a different machine?
What does:
info source
give you when you enter it on the command prompt? It should give you something along the lines of:
(gdb) info source
Current source file is hello.c
Compilation directory is /home/username/source
Located in /home/username/source/hello.c
Contains 7 lines.
Source language is c.
Compiled with DWARF 2 debugging format.
Includes preprocessor macro info.
if GDB has debug symbols and source available.
From the output, however, it looks like this part should be fine, so caf is likely right that this is about the optimization level of your compiler.
Keep in mind that this is the very reason for debug versus release settings. During development you'll perhaps want -O0 or -O1 combined with -ggdb -g3 if you're using GCC to compile. For other compilers the settings may be different. For a release you'll probably want to use the highest safe optimization value (see this link), -O2 for gcc or -O3 if you are using one of the widely used architectures and aren't afraid of nasty surprises.
Either way if you are serious about software development and consequently debugging, you should learn the very basics of the assembly language for your target CPUs. Why? Because sometimes the optimizer, especially in GCC, goes haywire and does stupid things even when you tell it to not trust your code, such as with -fno-strict-aliasing. I've encountered cases where it would happily use instructions on a SPARC which are supposed to be used only on aligned data, but there was no guarantee that the data we gave it was aligned. Anyway, it's the very reason Gentoo recommends -O2 instead of any higher value for optimization. If you don't know why an assembly instruction does what it does or why your program does something silly and you can't take the magnifying glass and step down to the assembly level, you'll be lost.
How to see the assembly code in GDB
As pointed out by caf you can use set disassemble-next-line on to see the disassembly at the current program counter if you are using GDB 7.0 or newer. On older GDB versions you may resort to the trusty old display command:
disp/i $pc
which sets an automatic display for the program counter ($pc). Perhaps a better and visually more appealing alternative, especially if you have a lot of screen estate, is to use layout asm and layout regs combined in GDB. See the following screen shot:

binary generation from LLVM

How does one generate executable binaries from the c++ side of LLVM?
I'm currently writing a toy compiler, and I'm not quite sure how to do the final step of creating an executable from the IR.
The only solution I currently see is to write out the bitcode and then call llc using system or the like. Is there a way to do this from the c++ interface instead?
This seems like it would be a common question, but I can't find anything on it.
LLVM does not ship the linker necessary to perform this task. It can only write out as assembler and then invoke the system linker to deal with it. You can see the source code of llvm-ld to see how it's done.

Is there any way to decompile Linux .so?

Is there any way to decompile Linux .so?
There are decompilers, but a decompiler might not emit code in the same language that the original program was written in.
There are also disassemblers, which will reassemble the machine code into assembly.
The Decompilation Wiki may be a good source of additional information.
You can disassemble the code with objdump(1) for example.