I am investigating a crash, based on the available core dump. The application crashing is a C++ program, built with gcc and running on RH5. The backtrace seems valid till the #1 frame. There trying to print an object I get
<invalid address>, <error reading variable>
Since I have the address of the object from the #2 frame is it a valid presumption that I can somehow 'dump' the memory in which the object is allocated and still collect some info. Furthermore, instead of trying to guess how the object is aligned, can I force gdb to print the address as if it is an object, even though it detects some error. My idea is that maybe the object has already been deleted, but just maybe the memory is still there and I can print some member variable.
Please comment on is that possible, and if so, how it should be done in gdb terms. 10x.
Well, if you have an address you can always do:
print *(class MyClass*)pointer_var
Related
Hello I use MacBook M1 (OSX) and I have a dangling pointer which it seems I can't catch.
I am using Clion and LLDB as a debuger.
When I run my code I get:
Exception: EXC_BAD_ACCESS (code=1, address=0x18)
However this does not really shows me or I can't understand where exactly is the bad pointer.
I am attaching also screenshot of my editor and the debugger window:
I have read something about zombie objects which when enabled allows you to catch dangling pointers. How can I do that?
EXC_BAD_ACCESS (defined in /usr/include/mach/exception_types.h) has a code (which is a kern_return_t) and a subcode. kern_return_t is defined in /usr/include/mach/kern_return.h and 1 means KERN_INVALID_ADDRESS, so this was not a protection problem but an actual invalid address access. The subcode (0x18) is the address accessed.
A small number like 0x18 usually means that your code accessed a field that was 0x18 bytes into an object, but the object pointer was null. So the first thing to do is look at all the accesses in that line of code (or around it if you are debugging optimized code) and make sure none of them are null. This might also be a null vtable, and the 0x18 the vtable offset, i.e. one of the methods of the object, so look for calls as well. However, I didn't see any suspicious looking pointer values in your locals, so maybe it's some subobject?
If it isn't obvious from there which pointer is bad, you could run your code under ASAN (address sanitizer) - if the bad pointer access is because of a use after free ASAN will often find those quickly. Note, Zombie objects is an ObjC only thing, that doesn't look relevant in your code.
If that doesn't get it, the most straightforward way to diagnose this sort of error is to look at the disassembly, for instance just run:
(lldb) disassemble
The current PC will be marked in the output. That instruction will be some kind of memory access, often dereferencing a register with an offset or something like that. For instance:
ldr w9, [x9, #0x18]
is loading memory 0x18 bytes off from the value in register x9. If this were the instruction, the next question is what program entity is currently occupying x9? lldb might know, you can ask it by doing:
(lldb) image lookup -va $pc
That will tell you everything lldb knows about that pc, among other things the last set of entries will be where all the known variables are currently located. Look for one that is in x9. If there isn't one listed in x9, then maybe one of the currently visible variables was temporarily copied into x9, in which case you have to look up in the instruction stream to see what was the last value that got copied into x9.
I'm debugging a non-trivial software project where I have a bunch of objects located on the heap. At some point in time (at least) one of these objects gets corrupted.
I added a const member to my class to serve as a canary and indeed, it gets corrupted during executing. Typically I'd add a watchpoint to this variable to figure out when the memory is written to. However, I don't know which instance gets overwritten, as any information stored in the class gets corrupted as well.
I have too many objects to set a watchpoint on each of them and I haven't been able to reproduce with a smaller input set. Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
Any suggestions on how to proceed from here?
Probably this won't be specific enough, but when I had a similar problem, here is what I ended up doing. I'm assuming you can reproduce your problem in a deterministic fashion.
My strategy was to find which instance caused the problem first. This I did with a counter on a specific line that exposes the symptom. For example, on Visual Studio, I would setup a breakpoint that triggers on the 100000th hit, so that it never does; but Visual Studio still tells you how many times the breakpoint is encountered during execution. By trial and error, I would find that the problem occurs on the say, 20th time the breakpoint is encountered, and so I would set the breakpoint to trigger on the 19th iteration, to be able to discriminate the appropriate instance before corruption occurred.
Starting from there, I could have the address of the variable that was corrupted before it was, and play with the debugger to find out what is going on: gather enough information about the faulty instance.
Then, I did setup breakpoints at strategic places, which were triggered by conditions : eg. trigger only for an instance with the appropriate address, or with specific values in members.
You'll probably get to when the symptom occurs precisely, but not to the problem, but that's still something.
Hope this helps!
Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
You are confused: if valgrind told you that you are doing invalid read (presumably because the object has been freed), then you are reading danging (already freed) object, and that is exactly your problem.
You shouldn't try to access such objects, and the fact that your canary has been changed / corrupted after you freed / deleted the object is irrelevant.
I managed to find out what was causing my issue. Turns out the object I was looking at never existed in the first place. Like #employed-russian, I wondered whether my object might have been deleted somewhere I wasn't aware of. Putting a breakpoint on the destructor yielded nothing so the only reasonable explanation is the pointer itself being invalid, pointing to memory that wasn't a valid instance of my class.
Lo and behold; the pointer I was dereferencing was left uninitialized by some constructor of another class. I figured it out when I added an explicit check for null and Valgrind's error became Conditional jump or move depends on uninitialised value(s). By using --track-origins=yes, I quickly figured out the source of the uninitialized data, i.e. the pointer missing from the initialization list.
(I know uninitialized values can be detected by the compiler with -Wuninitialized but apparently my version of clang (apple) didn't feel like mentioning it with -Wall enabled.)
I have been having a peculiar problem. I have developed a C++ program on a Linux cluster at work. I have tried to use it home on an Ubuntu 14.04 machine, but the program, which is composed of 6 files: main.hpp,main.cpp (dependent on) sarsa.hpp,sarsa.cpp (class Sarsa) (dependent on) wec.hpp,wec.cpp, does compile, but when I run it it either returns segmenation fault or does not enter one fundamental function of the class Sarsa.
The main code calls the constructor and setter functions without problems:
Sarsa run;
run.setVectorSize(memory,3,tilings,1000);
etc.
However, it cannot run the public function episode , since learningRate, which should contain a large integer, returns 0 for all episodes (iterations).
learningRate[episode]=run.episode(numSteps,graph);}
I tried to debug the code with gdb, which has returned:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000408f4a in main () at main.cpp:152
152 learningRate[episode]=run.episode(numSteps,graph);}
I also tried valgrind, which returned:
==10321== Uninitialised value was created by a stack allocation
==10321== at 0x408CAD: main (main.cpp:112)
But no memory leakage issues.
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
In the file, I use C++v11 language (I would be expecting errors at compile-time,though), so I even compiled with g++ -std=c++0x, but there were no improvements.
Unluckily, because of the size of the code, I cannot post it here. I would really appreciate any help with this problem. Am I missing anything obvious? Could you help me at least with the debugging?
Thank you in advance for the help.
Correction:
main.cpp:
Definition of the global array:
`#define numEpisodes 10
int learningRate[numEpisodes];`
Towards the end of the main function:
for (int episode; episode<numEpisodes; episode++) {
if (episode==(numEpisodes-1)) { // Save the simulation data only at the
graph=true;} // last episode
learningRate[episode]=run.episode(numSteps,graph);}
As the code you just added to the question reveals, the problem arises because you did not initialize the episode variable. The behavior of any code that uses its value before you assign one is undefined, so it is entirely reasonable that the program behaves differently in one environment than in another.
A segmentation fault indicates an invalid memory access. Usually this means that somewhere, you're reading or writing past the end of an array, or through an invalid pointer, or through an object that has already been freed. You don't necessarily get the segmentation fault at the point where the bug occurs; for instance, you could write past the end of an array onto heap metadata, which causes a crash later on when you try to allocate or release an unrelated object. So it's perfectly reasonable for a program to appear to work on one system but crash on another.
In this case, I'd start by looking at learningRate[episode]. What is the value of episode? Is it within the bounds of learningRate?
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
It's possible to set breakpoints in functions other than main.cpp.
break location
Set a breakpoint at the given location, which can specify a function name, a line number, or an address of an instruction.
At least, I think that's your question. You'll also need to know how to step into functions.
More importantly, you need to learn what your tools are trying to tell you. A segfault is the operating system's reaction to an attempt to dereference memory that doesn't belong to you. One common reason for that is trying to dereference NULL. Another would be trying to dereference a pointer that was never initialized. The Valgrind error message suggests that you may have an unitialized pointer.
Without the code, I can't tell you why the pointer isn't initialized when you run the program on your home system, but is (apparently) initialized when you run it at work. I suspect that you don't have the necessary data on your home system, but you'll need to investigate and figure that out. The fundamental question to keep asking yourself is "what is different between my home computer an dmy work computer?"
I have an array of pointers which holds the interface details.
For example
tIfInfoStruct *gapIfTable[16];
memory has been allocated for the pointer while interface creation.
For example
gapIfTable[14] = 0x39cc345.
After some sequence of operations , the value of gapIfTable[14] becomes NULL(0x0). I want to watch, which part of the program was releasing the memory.
Whether I could be able to track gapIfTable[14] using
gdb> watch *0x39cc345
I want my program to be stopped on gdb when the above memory address becomes NULL, so that I can get the back trace in Gdb to find the culprit. I am running a multi threaded program.
Please correct If my understanding is wrong.
If am wrong , please help me out with some solutions.
gdb> watch *0x39cc345
This watches memory at location 0x39cc345, and not the memory at location &gapIfTable[14] that becomes NULL.
So you probably want to use watch *(gapIfTable+14) instead.
I'm on amd64 architecture. The following code gets rejected by the g++ compiler at the "if" statement:
void * newmem=malloc(n);
if(newmem==0xefbeaddeefbeadde){
with the error message:
error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
I can't seem to find the magic incantation needed to get it going (and I don't want to use -fpermissive). Any help appreciated.
Background:
I'm hunting an ugly bug which crashes my program while requesting memory in some STL new operation (at least gdb told me that). Thinking it could be some memory overrun of one of the allocated memory chunks being, by bad luck, adjacent to memory used by the OS to manage memory lists of my program, I quickly overrode new(), new plus their delete counterparts with own routines that added memory fencing; and while the application still crashes (all fences intact (sigh)), gdb now told me this:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000604f5e in construct (__val=..., __p=0xefbeaddeefbeadde, this=<optimized out>)
at /usr/include/c++/4.6/ext/new_allocator.h:108
108 { ::new((void *)__p) _Tp(__val); }
I noted that the pointer __p has as value the pointer representation of the value I used for one of my memory fences (0xdeadbeef), hence my wish to catch this earlier in my new() to try and dump out some more complex values in my program.
Additional note: the function where it crashes runs flawlessly a couple of million times before it crashes (intermixed with dozens of other routines which all also run a couple of thousand to million times), using valgrind does not seem like an option atm because it takes 6hrs and 11 Gb before my program crashes.
One of them is an integer, and the other a pointer. Perhaps an ugly cast would do
if(newmem==(void*)0xefbeaddeefbeadde){
For your question: you have 2 options - cast the pointer to 64bit number or cast the number to void*
For the background: as this takes too long, it could be a memory leak. So, try valgrind - no need to wait for crash, just run your program through valgrind and use the appropriate options. valgrind could help you to catch undefined behavior, too.
Another thing - compile your program with -O0 optimization level and with max level of debug symbols - -ggdb3. Also, if your executable does not generate a core dump, when this error occurs, make it generate. Then leave it working, while you're trying different approaches to catch the error.
In case you don't succeed, you'll at least have a (hopefully) nice core dump, that you can examine with gdb your_exe the_core_dump. Then, watch the core frame by frame, watch the variables, etc.
If it's not a memory leak, it could be a memory corruption somewhere, undefined behavior or something like this. If a good core dump is generated, you'll probably catch the error. Or, at least, it could give you useful information about the case.
Another thing, that could be useful, if you generate a core dump (no matter if it's a good or a bad one) - the size of the dump. If it's too large (the definition of "large" this depends on your program), then you most probably have a memory leak (it could be some kind of cross-reference issue, if you use smart pointers and if so - valgrind will not catch it)
That probably won't work that way. Next time when your program runs, malloc may never return that value. There is something else you must do, rather than just checking on the return value of new!