Understanding gdb output - c++

I am trying to use the push_back function for a vector in c++.
I am getting a seg fault and when I ran the gdb to find the exact reason.
I get the following.
$1={px = 0xbfffe9c4, pn = { pi_ = 0x8049c0b}}
I do not have much experience with gdb and cannot find anything related to this specific issue online.

My magic ball tells me you crashed while dereferencing a shared_ptr. Follow the px member, since that is the actual pointer value of interest to you. For example, you can try:
print $1.px
and if the pointer points to a valid memory area:
print *$1.px
The gdb debugger will provide you with a lot of information, but some of the more useful things: backtrace, up, down, info locals, and if you are multi-threaded, thread apply all backtrace. If you are debugging live, then of course you will need breakpoint, next, step and continue. You should be able to use gdb's help for more information, and the gdb manual is readily available online.

Related

Segmentation fault on one Linux machine but not another with C++ code

I have been having a peculiar problem. I have developed a C++ program on a Linux cluster at work. I have tried to use it home on an Ubuntu 14.04 machine, but the program, which is composed of 6 files: main.hpp,main.cpp (dependent on) sarsa.hpp,sarsa.cpp (class Sarsa) (dependent on) wec.hpp,wec.cpp, does compile, but when I run it it either returns segmenation fault or does not enter one fundamental function of the class Sarsa.
The main code calls the constructor and setter functions without problems:
Sarsa run;
run.setVectorSize(memory,3,tilings,1000);
etc.
However, it cannot run the public function episode , since learningRate, which should contain a large integer, returns 0 for all episodes (iterations).
learningRate[episode]=run.episode(numSteps,graph);}
I tried to debug the code with gdb, which has returned:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000408f4a in main () at main.cpp:152
152 learningRate[episode]=run.episode(numSteps,graph);}
I also tried valgrind, which returned:
==10321== Uninitialised value was created by a stack allocation
==10321== at 0x408CAD: main (main.cpp:112)
But no memory leakage issues.
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
In the file, I use C++v11 language (I would be expecting errors at compile-time,though), so I even compiled with g++ -std=c++0x, but there were no improvements.
Unluckily, because of the size of the code, I cannot post it here. I would really appreciate any help with this problem. Am I missing anything obvious? Could you help me at least with the debugging?
Thank you in advance for the help.
Correction:
main.cpp:
Definition of the global array:
`#define numEpisodes 10
int learningRate[numEpisodes];`
Towards the end of the main function:
for (int episode; episode<numEpisodes; episode++) {
if (episode==(numEpisodes-1)) { // Save the simulation data only at the
graph=true;} // last episode
learningRate[episode]=run.episode(numSteps,graph);}
As the code you just added to the question reveals, the problem arises because you did not initialize the episode variable. The behavior of any code that uses its value before you assign one is undefined, so it is entirely reasonable that the program behaves differently in one environment than in another.
A segmentation fault indicates an invalid memory access. Usually this means that somewhere, you're reading or writing past the end of an array, or through an invalid pointer, or through an object that has already been freed. You don't necessarily get the segmentation fault at the point where the bug occurs; for instance, you could write past the end of an array onto heap metadata, which causes a crash later on when you try to allocate or release an unrelated object. So it's perfectly reasonable for a program to appear to work on one system but crash on another.
In this case, I'd start by looking at learningRate[episode]. What is the value of episode? Is it within the bounds of learningRate?
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
It's possible to set breakpoints in functions other than main.cpp.
break location
Set a breakpoint at the given location, which can specify a function name, a line number, or an address of an instruction.
At least, I think that's your question. You'll also need to know how to step into functions.
More importantly, you need to learn what your tools are trying to tell you. A segfault is the operating system's reaction to an attempt to dereference memory that doesn't belong to you. One common reason for that is trying to dereference NULL. Another would be trying to dereference a pointer that was never initialized. The Valgrind error message suggests that you may have an unitialized pointer.
Without the code, I can't tell you why the pointer isn't initialized when you run the program on your home system, but is (apparently) initialized when you run it at work. I suspect that you don't have the necessary data on your home system, but you'll need to investigate and figure that out. The fundamental question to keep asking yourself is "what is different between my home computer an dmy work computer?"

Debugging lambda memory corruption || Automatically watch object pointer in GDB

TL;DR: How do I automatically add a watch in gdb when a function is called so I can debug some memory corruption?
I am currently dealing with some memory corruption in C++
I am mostly seeing 4-5 types of reaccuring crashes - all of which make little to no sense, so I'm guessing it has to be related to memory corruption.
These crashes only happen on the production server, round about every 2-5hours.
Most of them consist of accessing or passing a null pointer where it cant possibly have existed in the first place.
One of these places is a lambda capturing this. (see below)
Obviously looked at core dumps and even had gdb attached while it crashed
valgrind: I've spent hours staring at multiple instances of valgrind with no success.
Enabled gccs stack protection (-fstack-protector-all)
I have tried looking over the code & the changes, but it has been impossible for me to find anything (100k lines of code total, "On master, 10,437 files have changed and there have been 3,352,600 additions and 85,495 deletions." since the last release on the production server). I might have just plain missed something, or not looked in the right spots - I cant tell.
Used cppcheck to see if there was something plain obvious wrong with the code
If there is an easier/more straight forward method to finding where the corruption occurs feel free to suggest that too.
Lets look at some simplified code.
I have a class, Socket, which manages a client connection.
It is constructed something like this
Listener::OnAccept(fd){
Socket* s = new Socket();
if (s->Setup(fd)){
// push into a vector and do some other things
}
}
Socket::Setup calls (virtual) OnConnect of the Socket class, which then creates a ping event, using a lambda:
Socket::OnConnect(){
m_pingEvent = new Event([this](Event* e){
if (!this->GotPong()){
// close connection
}else{
this->Ping();
}
}, 30 /*seconds*/, true /* loop */);
}
Event accepts an std::function as the callback
m_pingEvent is deleted in the destructor (if set) which will cancel the event if it is running.
What happens (rarely) is that the lambda calls Ping on a nullptr, which calls m_pingPacket->Send() on this=0x1f8, which leads to a segfault.
My question - or rather my proposed solution - would be watching the captured this pointer for writing, which definitely shouldnt happen.
There is only one small issue with that..
How would I even watch such a high ammount of pointers without manually adding each one? (about 400 concurrent connections with a lot (dis)connects)
As for the captured data I found this is in the __closure object:
(gdb) frame 2
#2 0x081b9d63 in operator() (e=0x9b2a748, __closure=0xb5a8318)
at net/socket/Client.cpp:151
151 net/socket/Client.cpp: No such file or directory.
(gdb) ptype __closure
type = const struct {
net::socket::Client * const __this;
} * const
Which I can get when creating the lambda easily by just moving the lambda to "auto callback = " which will be of type:
(gdb) info locals
callback = {__this = 0xb4dd0948}
(gdb) ptype callback
type = struct {
net::socket::Client * const __this;
}
(gdb) print callback
$1 = {__this = 0xb4dd0948}
(This is gcc version 4.7.2 (Debian 4.7.2-5) for reference, might be different with other compilers/versions)
Shortly before posting I realized the struct would probably change address once moved into the std::function (is this correct?)
I've been digging through the gnu "functional" header, but I havent really been able to find anything yet, I'll keep looking (and updating this)
Another note: I am posting this full describtion with all of the details included in case anyone has an easier solution for me. (XY Problem)
Edit:
(gdb) print *(void**)m_pingEvent->m_callback._M_functor._M_unused._M_object
$8 = (void *) 0xb4dd56d8
(gdb) print this
$4 = (net::socket::Client * const) 0xb4dd56d8
Found it :)
Edit2:
break net/socket/Client.cpp:158
commands
silent
watch -l m_pingEvent->m_callback._M_functor._M_unused._M_object
continue
end
This has two flaws: you can only watch 4 addresses at a time & there is no way to delete the watch once the object will be freed.
Soo it's unusable.
Edit 3:
I've figured out how to do the watching using this python script I wrote (linking this one externally since it's quite long): https://gist.github.com/imermcmaps/4a6d8a1577118645acf3
Next issue is making sense of the output..
Added watch 7 -> 0x10eb2200
Hardware watchpoint 7: -location m_pingEvent->m_callback._M_functor._M_unused._M_obj
Old value = (void *) 0x10eba4b0
New value = (void *) 0x10eba400
net::Packet::Packet (this=0x10eb1088) at ../shared/net/Packet.cpp:13
Like it's saying it changed from an old value, which shouldn't even be the original value, since I'm checking if the this pointer and the pointer value match, which they do.
Edit 4 (yay):
Turns out watch -l doesnt work like i want it to.
Manually grabbing the address and then watching that address seems to work
How do I automatically add a watch in gdb when a function is called so
I can debug some memory corruption?
Memory corruption is often detected after the real corruption has already occurred by some modules loaded within your process. So manual debugging may not be very useful for real complex projects.Because any third party modules/library which is loaded within your process may also lead to this problem. From your post it looks like this problem is not reproducible always which indicates that this might be related to threading/synchronization problem which leads to some sort of memory corruption. So based on my experience i strongly suggest you to concentrate on reproducing the problem under dynamic tools(Valgrind/Helgrind).
However as you have mentioned in your question that you are able to attach your program using Valgrind. So you may want to attach your program(a.out) in case you have not done in this way.
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would automatically attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This seems to be the best possible way to find out the root cause of your problem.
However I think that there may be some data racing scenario which is leading to memory corruption.So you may want to use Helgrind to check/find data racing/threading problem which might be leading to this problem.
For more information on these, you may refer the following post:
https://stackoverflow.com/a/22658693/2724703
https://stackoverflow.com/a/22617989/2724703

How to watch fixed memory free using gdb because of corruption

I have an array of pointers which holds the interface details.
For example
tIfInfoStruct *gapIfTable[16];
memory has been allocated for the pointer while interface creation.
For example
gapIfTable[14] = 0x39cc345.
After some sequence of operations , the value of gapIfTable[14] becomes NULL(0x0). I want to watch, which part of the program was releasing the memory.
Whether I could be able to track gapIfTable[14] using
gdb> watch *0x39cc345
I want my program to be stopped on gdb when the above memory address becomes NULL, so that I can get the back trace in Gdb to find the culprit. I am running a multi threaded program.
Please correct If my understanding is wrong.
If am wrong , please help me out with some solutions.
gdb> watch *0x39cc345
This watches memory at location 0x39cc345, and not the memory at location &gapIfTable[14] that becomes NULL.
So you probably want to use watch *(gapIfTable+14) instead.

Tracing write access to class instance/memory range in gdb

I am trying to debug a small operating system I have written in an university course in C++. At runtime somewhere one of my objects is getting corrupted. It seems like this happens due to accidentally writing to the wrong memory address. As I am unable to find the place where this happens from pure looking at the code, I need another way.
As this is an operating system, I cannot attach tools like valgrind to it, but I can run it in emulators (bochs/qemu) with gdb attached.
Is there a way in gdb to trace write access to a class instance or more general a specific memory range? I would like to break as soon as write access happens, so I can verify if this is valid or not.
You can put a watchpoint:
watch x
This will break when x is modified. x can be any type of variable. If you have:
class A;
A x;
Then gdb will break whenever x is modified.
You can actually put a watchpoint on any expression, and gdb will break when the expression changes. Be careful with this, though, because if the expression isn't something that the underlying hardware supports, gdb will have to evaluate this after every instruction, which leads to awful performance. For example, if A above is a class with many members then gdb can watch the entire instance x, but the way it'll work is:
execute an instruction
jump to the debug breakpoint
check if x has changed
return to the program
Naturally, this is very slow. If x is an int then gdb can use a hardware breakpoint.
If you have a specific memory address you can watch it too:
watch *0x1234
This will break when the contents of [0x1234] changes.
You can also set a read breakpoint using rwatch, or awatch to set a read/write breakpoint.
If you know at least approximately where it happens you can also just use "display" instead of watch and manually step line by line until you see when the change happens. Watching an address using "watch" is just too painfully slow.

Debugging a memory error with GDB and C++

I'm running my C++ program in gdb. I'm not real experienced with gdb, but I'm getting messages like:
warning: HEAP[test.exe]:
warning: Heap block at 064EA560 modified at 064EA569 past requested size of 1
How can I track down where this is happening at? Viewing the memory doesn't give me any clues.
Thanks!
So you're busting your heap. Here's a nice GDB tutorial to keep in mind.
My normal practice is to set a break in known good part of the code. Once it gets there step through until you error out. Normally you can determine the problem that way.
Because you're getting a heap error I'd assume it has to do with something you're putting on the heap so pay special attention to variables (I think you can use print in GDB to determine it's memory address and that may be able to sync you with where your erroring out). You should also remember that entering functions and returning from functions play with the heap so they may be where your problem lies (especially if you messed your heap before returning from a function).
You can probably use a feature called a "watch point". This is like a break point but the debugger stops when the memory is modified.
I gave a rough idea on how to use this in an answer to a different question.
If you can use other tools, I highly recommend trying out Valgrind. It is an instrumentation framework, that can run your code in a manner that allows it to, typically, stop at the exact instruction that causes the error. Heap errors are usually easy to find, this way.
One thing you can try, as this is the same sort of thing as the standard libc, with the MALLOC_CHECK_ envronment variable configured (man libc).
If you keep from exiting gdb (if your application quit's, just use "r" to re-run it), you can setup a memory breakpoint at that address, "hbreak 0x64EA569", also use "help hbreak" to configure condition's or other breakpoitn enable/disable options to prevent excessively entering that breakpoint....
You can just configure a log file, set log ... setup a stack trace on every break, "display/bt -4", then hit r, and just hold down the enter key and let it scroll by
"or use c ## to continue x times... etc..", eventually you will see that same assertion, then you will now have (due to the display/bt) a stacktrace which you can corolate to what code was modifying on that address...
I had similar problem when I was trying to realloc array of pointers to my structures, but instead I was reallocating as array of ints (because I got the code from tutorial and forgot to change it). The compiler wasnt correcting me because it cannot be checked whats in size argument.
My variable was:
itemsetList_t ** iteration_isets;
So in realloc instead of having:
iteration_isets = realloc(iteration_isets, sizeof(itemsetList_t *) * max_elem);
I had:
iteration_isets = realloc(iteration_isets, sizeof(int) * max_elem);
And this caused my heap problem.