I have a function that calls a big chunk of code that isn't mine. I'm trying to debug it, but just stepping until it returns an error has been slow and painful.
Is there a way to break when the function is "about to" return a negative number? I think I can set a breakpoint on the return register, but I only want it for the current thread.. and ideally, it would be nice to break at "return -1;" so the stack and function variables are still readable.
(I tried using the reverse debugging, which would be perfect, but gdb 8.1 crashes when I enable reverse debugging on my ubuntu 16 system.)
Related
I have pretty large C++ code base of a shared library which is messed up with complicated conditional macro spaghetti so IDE has troubles with that. I examined it with GDB to find the initial value of a global variable as follows:
$ gdb libcomplex.so
(gdb) p some_global_var
$1 = 1024
So I figured out the value the variable was initialized with.
QUESTION: Is it possible to find out which source file (and maybe line number) it was initialized at with GDB?
I tried list some_global_var, but it simply prints nothing:
(gdb) list some_global_var
(gdb)
So on x86 you can put a limited number of hardware watchpoints on that variable being changed:
If you are lucky, on a global you can get away with
watch some_global_var
But the debugger may still decide that is not a fixed address, and do a software watchpoint.
So you need to get the address, and watch exactly that:
p &some_global_var
(int*)0x000123456789ABC
watch (int*)0x000123456789ABC
Now, when you restart, the debugger should pop out when the value is first initialised, perhaps to zero, and/or when it is initialised to the unexpected value. If you are lucky, listing the associated source code will tell you how it came to be initialised. As others have stated you may then need to deduce why that line of code generated that value, which can be a pain with complex macros.
If that doesn't help you, or it stops many times unexpectedly during startup, then you should initially disable the watchpoint, then starti to restart you program and stop as soon as possible. Then p your global, and if it does not yet have the magic value, enable the watchpoint and continue. Hopefully this will skip the irrelevant startup and zoom in on the problem value.
You could use rr (https://rr-project.org/) to record a trace of the program, then you could reverse-execute to find the location. E.g.:
rr replay
(gdb) continue
...
(gdb) watch -l some_global_var
(gdb) reverse-continue
I see that when I open a C++ crash dump in Visual Studio, I find that the call stack points to - either the line from which it jumped to the next frame in that function, or sometimes the next line after the line from which it jumped to the next frame in that function. Why is that? What is the logic behind that?
TIA!
Basically the location of call is not recorded; the location of return is recorded. So the return location is displayed.
The call stack is extracted from the stack. When you call a functiom, the return location in your code where the instruction pointer is going to go when the function finishes is placed on the stack.
The debugger/call stack display software reverse engineers the data on the stack to work out where this return will be. Then pdb files are used to map the location of return to lines of code.
Two branches of one if clause could have different spots where you call a function, but both return at the exact same instruction. Determining which of the two where used to call the function is impractical, while knowing where the function returns to is easy and reliable. And that line is usually enough information to debug the problem.
On top of that, optimizations by the compiler break down the idea that you are runnimg C++ code line by line; you are actually writing code generated by C++ code. An instruction in the generated machine code may correspond to parts of multiple different C++ code lines.
Between the two, having the call stack frames pointing a line off is not rare. Sometimes it is estremely far off; and with identical comdat folding sometimes it is the wrong function entirely.
I'm debugging a large, ancient, piece of code that we just upgraded the OS/driver for, the entire thing is running 32 bit. The original developers of the code are long gone and much of it is still a black box to me.
I'm running it on the debugger. I narrowed down on a particular if statement within a larger loop, I need the 'else' part of the for loop to run to update some variables, but it was never running; implying that the variable that is being checked in the 'if' statement is always true.
Eventually I stepped into the method call (a simple getter on a private boolean) and printed the content of the variable. When I print the variable it is false, and the 'else' method will be entered when I return.
To experiment I've tried allowing the loop to run for 10 minutes, the 'else' method is never entered (as indicated by a breakpoint not being hit). Then when I print the variable being checked it's false and the variable is entered. It doesn't matter how long I let it run, or how many times I break and continue before printing the variable, the same pattern holds, I enter the 'else' method IFF I print the content of the variable that is being checked first.
To rule out some sort of datarace I've tried sitting at the breakpoint in question for the length of time it takes to do a print statement, a delay without a print doesn't result in entering the 'else' method.
What could cause such an odd behavior? Since we had issues with different architectures, running a 32 bit program on a 64 bit OS and, more importantly, the driver that it uses was not tested for 32 bit for years until they recompiled the driver for me under a 32 bit architecture, which would make me suspect the driver except that particular line of code that is misbehaving isn't touching the driver in any way. Still I suspect some sort of overflow or underflow may be happening due to a confusion caused by trying to force an old 32 bit program to run.
However, even assuming this could cause such an odd behavior, I don't know how to confirm if that is happening or otherwise debug a program where the act of looking at it changes it's behavior. I'd love any tip on what could cause such a problem or how I could move forward with debugging it.
Dammit Jim I'm a programmer, not a Quantum Mechanic!
It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.
I've been looking around but was unable to figure out how one could print out in GDB the result of an evaluation. For example, in the code below:
if (strcmp(current_node->word,min_node->word) > 0)
min_node = current_node;
(above I was trying out a possible method for checking alphabetical order for strings, and wasn't absolutely certain it works correctly.)
Now I could watch min_node and see if the value changes but in more involved code this is sometimes more complicated. I am wondering if there is a simple way to watch the evaluation of a test on the line where GDB / program flow currently is.
There is no expression-level single stepping in gdb, if that's what you are asking for.
Your options are (from most commonly to most infrequently used):
evaluate the expression in gdb, doing print strcmp(current_node->word,min_node->word). Surprisingly, this works: gdb can evaluate function calls, by injecting code into the running program and having it execute the code. Of course, this is fairly dangerous if the functions have side effects or may crash; in this case, it is so harmless that people typically won't think about potential problems.
perform instruction-level (assembly) single-stepping (ni/si). When the call instruction is done, you find the result in a register, according to the processor conventions (%eax on x86).
edit the code to assign intermediate values to variables, and split that into separate lines/statements; then use regular single-stepping and inspect the variables.
you may simply try to type in :
call "my_funtion()"
as far as i rember, though it won't work when a function is inlined.