gdb backtrace mechanism - gdb

The mechanism that allows gdb to perform backtrace 1 is well explained.
Starting from the current frame, look at the return address
Look for a function whose code section contains that address.
Theoretically, there might be hundreds of thousands of functions to consider.
I was wondering if there are any inherent limitations that prevent gdb
from creating a lookup table with return address -> function name.

What makes you think GDB does a straight search through all functions? This isn't what happens. GDB organises symbols into a couple of different data structures that allow for more efficient mapping between addresses and the enclosing function.
A good place to start might be here: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=gdb/blockframe.c;h=d9c28e0a0176a1d91fec1df089fdc4aa382e8672;hb=HEAD#l118

The mechanism that allows gdb to perform backtrace 1 is well explained.
This isn't at all how GDB performs a backtrace.
The address stored in the rip register points to the current instruction, and has nothing to do with return address.
The return address is stored on the stack, or possibly in another register. To find where it is stored (on x86_64, and assuming e.g. Linux/ELF/DWARF file format), GDB looks up unwind descriptor that covers the current value of RIP. The unwind descriptor also tells GDB how to restore other registers to the state they were just before the current function was called.
You can see unwind descriptors with e.g. readelf -wf a.out command.
Once GDB knows how to find return address and restore registers, it can effectively perform an up command, stepping from current (called) frame into previous (caller) frame.
Now this process repeats, until either GDB finds a special unwind descriptor which says "I am the last, don't try to unwind past me", or some error occurs (e.g. restored RIP is 0).
Notably, nowhere in this process does GDB have to consider thousands of functions.

Related

Find out the source file (line number) where a global variable was initialized?

I have pretty large C++ code base of a shared library which is messed up with complicated conditional macro spaghetti so IDE has troubles with that. I examined it with GDB to find the initial value of a global variable as follows:
$ gdb libcomplex.so
(gdb) p some_global_var
$1 = 1024
So I figured out the value the variable was initialized with.
QUESTION: Is it possible to find out which source file (and maybe line number) it was initialized at with GDB?
I tried list some_global_var, but it simply prints nothing:
(gdb) list some_global_var
(gdb)
So on x86 you can put a limited number of hardware watchpoints on that variable being changed:
If you are lucky, on a global you can get away with
watch some_global_var
But the debugger may still decide that is not a fixed address, and do a software watchpoint.
So you need to get the address, and watch exactly that:
p &some_global_var
(int*)0x000123456789ABC
watch (int*)0x000123456789ABC
Now, when you restart, the debugger should pop out when the value is first initialised, perhaps to zero, and/or when it is initialised to the unexpected value. If you are lucky, listing the associated source code will tell you how it came to be initialised. As others have stated you may then need to deduce why that line of code generated that value, which can be a pain with complex macros.
If that doesn't help you, or it stops many times unexpectedly during startup, then you should initially disable the watchpoint, then starti to restart you program and stop as soon as possible. Then p your global, and if it does not yet have the magic value, enable the watchpoint and continue. Hopefully this will skip the irrelevant startup and zoom in on the problem value.
You could use rr (https://rr-project.org/) to record a trace of the program, then you could reverse-execute to find the location. E.g.:
rr replay
(gdb) continue
...
(gdb) watch -l some_global_var
(gdb) reverse-continue

Opening a C++ crash dump does not show the right line in the call stack

I see that when I open a C++ crash dump in Visual Studio, I find that the call stack points to - either the line from which it jumped to the next frame in that function, or sometimes the next line after the line from which it jumped to the next frame in that function. Why is that? What is the logic behind that?
TIA!
Basically the location of call is not recorded; the location of return is recorded. So the return location is displayed.
The call stack is extracted from the stack. When you call a functiom, the return location in your code where the instruction pointer is going to go when the function finishes is placed on the stack.
The debugger/call stack display software reverse engineers the data on the stack to work out where this return will be. Then pdb files are used to map the location of return to lines of code.
Two branches of one if clause could have different spots where you call a function, but both return at the exact same instruction. Determining which of the two where used to call the function is impractical, while knowing where the function returns to is easy and reliable. And that line is usually enough information to debug the problem.
On top of that, optimizations by the compiler break down the idea that you are runnimg C++ code line by line; you are actually writing code generated by C++ code. An instruction in the generated machine code may correspond to parts of multiple different C++ code lines.
Between the two, having the call stack frames pointing a line off is not rare. Sometimes it is estremely far off; and with identical comdat folding sometimes it is the wrong function entirely.

How to change returned value of function

There is a function in this program, that currently returns a 1. I would prefer for it to return a 0.
uregs[R_PC] is the program counter.
arg0 is the program counter offset from where we left the function (assembly, "ret").
From this I deduce: we can add the offset to the program counter, uregs[R_PC]+arg0, to find the address of the return value.
I have allocated a 32-bit "0", and I try to write 2 bytes of it into the address where the return value lives (our function expects to return a BOOL16, so we only need 2 bytes of 0):
sudo dtrace -p "$(getpid)" -w -n '
int *zero;
BEGIN { zero=alloca(4); *zero=0; }
pid$target::TextOutA:return {
copyout(zero, uregs[R_PC]+arg0, 2);
}'
Of course I get:
dtrace: error on enabled probe ID 2 (ID 320426: pid60498:gdi32.dll.so:TextOutA:return): invalid address (0x41f21c) in action #1 at DIF offset 60
uregs[R_PC] is presumably a userspace address. Probably copyout() wants a kernel address.
How do I translate the userspace address uregs[R_PC] to kernel-space? I know that with copyin() we can read data stored at user-space address, into kernel-space. But that doesn't give us the kernel address of that memory.
Alternatively: is there some other way to change the return value using DTrace?
DTrace is not the right tool for this. You should instead use a debugger like dbx, mdb or gdb.
In the meantime, I'll try to clarify some of the concepts that you've mentioned.
To begin, you may well see in the source code for a simple function that there is a single return. It is quite possible that the compiled result, i.e. the function's machine-specific implementation, also contains only a single point of exit. Typically, however, the implementation is likely to contain more than one exit point and it may be useful for a developer to know from which specific one a function returned. It is this information, described as an offset from the start of the function, that is given by a return probe's arg0. Your D script, then, is attempting to update part of the program or library itself; although the addition of arg0 makes the destination address somewhat random, the result is most likely still within the text section, which is read-only.
Secondly, in the common case, a function's implementation returns a value by storing it in a specific register; e.g. %rax on amd64. Thus overriding a return value would neccessitate overriding a register value. This is impossible because DTrace's access to the user-land registers is read-only.
It is possible that a function is implemented in such a way that, as it returns, it recovers the return value from a specific memory location before writing it into the appropriate register. If this were the case then one could, indeed, modify the value in memory (given its location) just before it is accessed. However, this is going to work for only a subset of cases: the return value might equally be contained in another register or else simply expressed as a constant in the program text itself. In any case, it would be far more trouble than it's worth given the existence of more appropriate debugging tools.

gdb : findind every jumps to an address

I'm trying to understand a small binary using gdb but there is something I can't find a way to achieve : how can I find the list of jumps that point to a specified address?
I have a small set of instructions in the disassembled code and I want to know where it is called.
I first thought about searching the corresponding instruction in .text, but since there are many kind of jumps, and address can be relative, this can't work.
Is there a way to do that?
Alternatively, if I put a breakpoint on this address, is there a way to know the address of the previous instruction (in this case, the jump)?
If this is some subroutine being called from other places, then it must respect some ABI while it's called.
Depending on a CPU used, the return address (and therefore a place from where it was called) will be stored somewhere (on stack or in some registers). If you replace original code with the one that examines this, you can create a list of return addresses. Or simpler, as you suggested, if you use gdb and put a breakpoint at that routine, you can see from where it was called by using a bt command.
If it was actual jump (versus a "jump to subroutine") that led you there (which I doubt, if it's called from many places, unless it's a kind of longjmp/setjmp), then you will probably not be able to determine where this was called from, unless the CPU you are using allows you to trace the execution in some way.

trap invalid opcode rip rsp

We see a couple of below mentioned messages in /var/log/messages for one of our application:
Sep 18 03:24:23 <machine_name> kernel: application_name[14682] trap invalid opcode rip:f6c6e3ce rsp:ffc366bc error:0
...
Sep 18 03:19:35 <machine_name> kernel: application_name[4434] general protection rip:f6cd43a2 rsp:ffdfab0c error:7b2
I am not able to make what’s these output means and how we can track the function / code that is causing the issue. Further what is 'trap invalid opcode' and 'general protection' means?
Usually that means that your program's instruction pointer points to data or garbage. That's commonly caused by writing to stray pointers and such.
One scenario would be that your code writes (through a stray pointer) over some class' virtual table, replacing the member function addresses with nonsense. The next time you call one of the class' virtual functions, your program will interpret the garbage as an address and jump to that address. If whatever data lies at this address happens to not to be a valid machine code instruction for your processor, you would see this error.
There is another possibility that can cause 'invalid' op codes, that would be hardware not supporting newer opcode/instruction sets(SSE 4/5) or it not being from the right manufacturer(both AMD and Intel have some specific opcodes that work only on their processors) or just not having permission to exectute certain ops(though this would probably show up as something else).
From the above I would take RIP to be 'register(?) instruction pointer' and RSP to be 'register stack pointer', in which case you could use a debugger and set an execution hardware breakpoint on the specified address(RIP) and trace back what is calling it.(it seems your using linux or unix, so this is quite vague). if you are on windows, try using a custom exception filter to capture the EXCEPTION_ILLEGAL_INSTRUCTION event to get a little more information