How to tell if a dump file represents a crash

How to tell if a dump file represents a crash - c++

I have a dump file collected by an user experiencing a problem while running my C++ application, and I'm trying to figure out whether the application has crashed. From what I can see in the dump file, there is no exception or error indicating a crash, nor are there any deadlocked threads competing for a mutex or other synchronisation object.
The program location for the main thread at which the dump was created is a call to a virtual function, but the vtable for the object is correct and points to the right function, e.g. the code is
FindIt = m_EntityList[m_ListPos]->GetChainageOffset(X,Y,Chainage,Offset);
and looking at the variables I see
Am I right in thinking that this doesn't seem to be a page fault / illegal dereference or am I missing something here? I've checked all the other threads in my thread pool and they are inactive. Would a dump file always have an exception code and error information in the case of a crash?

I would suspect dereferencing of already freed object.
This could be either your class object or some object inside problematic method. Do you set every pointer to NULL after using delete/free?
//Edit:
This was supposed to be a comment instead of answer, but I clicked incorrect button (phone app)

Related

How to map the information in a Segmentation fault to the line in the source code?

I get a Segmentation fault (Address not found). The outcome I receive looks something like this:
#3 Object "/some/dir/bin/myProject", at 0x415526, in
#2 Object "/some/dir/bin/myProject", at 0x40d3f9, in
#1 Object "/some/dir/bin/myProject", at 0x409b1a, in
#0 Object "/lib64/libpthread.so.0", at 0x7f9a82a78cd0, in pthread_mutex_lock
I would like to know to which command the positions translate in myProject.cpp. How can I do that?
The reason why I do not examine this with gdb is that this error is caused by the way multiple threads communicate and human intervention introduces so much wait time that this error does not happen. The error is also sporadic...
Additional question:
My understanding of the error Address not found is that this is probably caused by dereferencing a nullprt or when trying to access a container at the wrong place. Would you agree?
EDIT
Thank`s for suggesting to debug the core dump files. The problem with this is: When there is this error, the program does not crash! The error (or a similar one) is printed to console and the program gets stuck (multithreading again...). It does not terminate, so I guess that there are no dump files (correct me if I am wrong!). I could not find any...

Debugging a Corrupted Object on the Heap

I'm debugging a non-trivial software project where I have a bunch of objects located on the heap. At some point in time (at least) one of these objects gets corrupted.
I added a const member to my class to serve as a canary and indeed, it gets corrupted during executing. Typically I'd add a watchpoint to this variable to figure out when the memory is written to. However, I don't know which instance gets overwritten, as any information stored in the class gets corrupted as well.
I have too many objects to set a watchpoint on each of them and I haven't been able to reproduce with a smaller input set. Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
Any suggestions on how to proceed from here?

Probably this won't be specific enough, but when I had a similar problem, here is what I ended up doing. I'm assuming you can reproduce your problem in a deterministic fashion.
My strategy was to find which instance caused the problem first. This I did with a counter on a specific line that exposes the symptom. For example, on Visual Studio, I would setup a breakpoint that triggers on the 100000th hit, so that it never does; but Visual Studio still tells you how many times the breakpoint is encountered during execution. By trial and error, I would find that the problem occurs on the say, 20th time the breakpoint is encountered, and so I would set the breakpoint to trigger on the 19th iteration, to be able to discriminate the appropriate instance before corruption occurred.
Starting from there, I could have the address of the variable that was corrupted before it was, and play with the debugger to find out what is going on: gather enough information about the faulty instance.
Then, I did setup breakpoints at strategic places, which were triggered by conditions : eg. trigger only for an instance with the appropriate address, or with specific values in members.
You'll probably get to when the symptom occurs precisely, but not to the problem, but that's still something.
Hope this helps!

Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
You are confused: if valgrind told you that you are doing invalid read (presumably because the object has been freed), then you are reading danging (already freed) object, and that is exactly your problem.
You shouldn't try to access such objects, and the fact that your canary has been changed / corrupted after you freed / deleted the object is irrelevant.

I managed to find out what was causing my issue. Turns out the object I was looking at never existed in the first place. Like #employed-russian, I wondered whether my object might have been deleted somewhere I wasn't aware of. Putting a breakpoint on the destructor yielded nothing so the only reasonable explanation is the pointer itself being invalid, pointing to memory that wasn't a valid instance of my class.
Lo and behold; the pointer I was dereferencing was left uninitialized by some constructor of another class. I figured it out when I added an explicit check for null and Valgrind's error became Conditional jump or move depends on uninitialised value(s). By using --track-origins=yes, I quickly figured out the source of the uninitialized data, i.e. the pointer missing from the initialization list.
(I know uninitialized values can be detected by the compiler with -Wuninitialized but apparently my version of clang (apple) didn't feel like mentioning it with -Wall enabled.)

Access violation when using boost::signals2 and unloading DLLs

I'm trying to figure out this problem.
Suppose, you have a code that uses boost::signals2 for communicating between objects. Lets call them "colorscales". Code for these colorscales is usually situated in the same DLL as the code that uses them. Let's call it main.dll
But sometimes code from other DLLs needs to use these objects and this is where the problems begin.
Basically, the application is pretty big and most of the DLLs are loaded to do some work and then they are unloaded. This is not the case with DLL that contain colorscales code, it's neved unloaded during application normal runtime.
So, when one of the DLLs is loaded (lets call it tools.dll) and some code runs, it may want to use these colorscale objects and communicate with them, so I connect to the signals these objects provide.
The problem is that boost is pretty lazy and all clever, and when you disconnect() slots, it doesn't actually erase connection and stuff that is associated with it (like boost::bind object and such). It just sets a flag that this connection is now disconnected and cleans it up on later (actually, it clean up 2 of these objects when you connect new slots and 1 of them when you invoke signal as of version 1.57). You probably already see where this is coming to.
So, you when you don't need more tools, you disconnect these signals and then application unloads tools.dll.
Then at a later stage, some code executes from the main.dll which causes one of colorscale signals invoked. boost::signals2 goes to invoke it, but before it tries to clean up one disconnected slot. This is where access violation happens, because internally connection had a shared_state object or something like this, that tries to clean itself up in a thread-safe way. But it faces the problem, that the code that it tries to call is already not there, because DLL is unloaded, so the Access Violation exception is thrown.
I've tried to fix this by invoking signal with some dummy parameters before DLL is unloaded and also by connecting and then disconnecting more slots (this one was a stupid idea, because it doesn't solve problem, but just multiplies it) some predefined amount of times (2 or 3 times more than there are slots at all).
It worked, or I thought so, because now it doesn't crash instantly, but rather crashes the next time you load the same tools.dll. I still need to figure out where and why does it crash, but it's somewhere else inside boost.
So, I wanted to ask, what are my options of fixing it?
My thoughts were
Implementing my own connection that works in a more simple way
Providing a more simple way to communicate, like, callbacks, for instance
Finding a workaround for boost being so lazy and smart.

Well, it seems that I've found the cause of the crash after the fix.
So, basically, what happens, when you use the workaround described above (calling signal with dummy parameters multiple times), what it does is that it replaces _shared_state object that was created from boost code from main.dll by another _shared_state object that is created from boost code from tools.dll. This object maintains pointer to reference counter (of type derived from boost::detail::sp_counter_base) inside.
Then the tools.dll unloads and the object remains, but its virtual table is pointing to the code that is no longer there. Let's look at the virtual table of the reference counter to understand what's going on.
[0] 0x000007fed8a42fe5 tools.dll!boost::detail::sp_counted_impl_p<...>::`vector deleting destructor'(unsigned int)
[1] 0x000007fed8a4181b tools.dll!boost::detail::sp_counted_impl_p<...>::dispose(void)
[2] 0x000007fed8a4458e tools.dll!boost::detail::sp_counted_base::destroy(void)
[3] 0x000007fed8a43c42 SegyTools.dll!boost::detail::sp_counted_impl_p<...>::get_deleter(class type_info const &)
[4] 0x000007fed8a42da6 tools.dll!boost::detail::sp_counted_impl_p<...>::get_untyped_deleter(void)
As you can see, all these method are connected to the disposal of reference counter, so the problem doesn't arise before you try to do the same trick second time. So, the trick with disconnecting all signals to try to get rid of all the code from tools.dll doesn't work as expected and the next time you try to do the trick, Access Violation occurs.

Calling shared libraries without releasing the memory

In Ubuntu 14.04, I have a C++ API as a shared library which I am opening using dlopen, and then creating pointers to functions using dlsym. One of these functions CloseAPI releases the API from memory. Here is the syntax:
void* APIhandle = dlopen("Kinova.API.USBCommandLayerUbuntu.so", RTLD_NOW|RTLD_GLOBAL);
int (*CloseAPI) = (int (*)()) dlsym(APIhandle,"CloseAPI");
If I ensure that during my code, the CloseAPI function is always called before the main function returns, then everything seems fine, and I can run the program again the next time. However, if I Ctrl-C and interrupt the program before it has had time to call CloseAPI, then on the next time I run the program, I get a return error whenever I call any of the API functions. I have no documentation saying what this error is, but my intuition is that there is some sort of lock on the library from the previous run of the program. The only thing that allows me to run the program again, is to restart my machine. Logging in and out does not work.
So, my questions are:
1) If my library is a shared library, why am I getting this error when I would have thought a shared library can be loaded by more than one program simultaneously?
2) How can I resolve this issue if I am going to be expecting Ctrl-C to be happening often, without being able to call CloseAPI?

So, if you do use this api correctly then it requires you to do proper clean up after use (which is not really user friendly).
First of all, if you really need to use Ctrl-C, allow program to end properly on this signal: Is destructor called if SIGINT or SIGSTP issued?
Then use a technique with a stack object containing a resource pointer (to a CloseAPI function in this case). Then make sure this object will call CloseAPI in his destructor (you may want to check if CloseAPI wasn't called before). See more in "Effective C++, Chapter 3: Resource Management".
That it, even if you don't call CloseAPI, pointer container will do it for you.
p.s. you should considering doing it even if you're not going to use Ctrl-C. Imagine exception occurred and your program has to be stopped: then you should be sure you don't leave OS in an undefined state.

I get an exception if I leave the program running for a while

Platform : Win32
Language : C++
I get an error if I leave the program running for a while (~10 min).
Unhandled exception at 0x10003fe2 in ImportTest.exe: 0xC0000005: Access violation reading location 0x003b1000.
I think it could be a memory leak but I don't know how to find that out.
Im also unable to 'free()' memory because it always causes (maybe i shouldn't be using free() on variables) :
Unhandled exception at 0x76e81f70 in ImportTest.exe: 0xC0000005: Access violation reading location 0x0fffffff.
at that stage the program isn't doing anything and it is just waiting for user input
dllHandle = LoadLibrary(L"miniFMOD.dll");
playSongPtr = (playSongT)GetProcAddress(dllHandle,"SongPlay");
loadSongPtr = (loadSongT)GetProcAddress(dllHandle,"SongLoadFromFile");
int songHandle = loadSongPtr("FILE_PATH");
// ... {just output , couldn't cause errors}
playSongPtr(songHandle);
getch(); // that is where it causes an error if i leave it running for a while
Edit 2:
playSongPtr(); causes the problem. but i don't know how to fix it

I think it's pretty clear that your program has a bug. If you don't know where to start looking, a useful technique is "divide and conquer".
Start with your program in a state where you can cause the exception to happen. Eliminate half your code, and try again. If the exception still happens, then you've got half as much code to look through. If the exception doesn't happen, then it might have been related to the code you just removed.
Repeat the above until you isolate the problem.
Update: You say "at that stage the program isn't doing anything" but clearly it is doing something (otherwise it wouldn't crash). Is your program a console mode program? If so, what function are you using to wait for user input? If not, then is it a GUI mode program? Have you opened a dialog box and are waiting for something to happen? Have you got any Windows timers running? Any threads?
Update 2: In light of the small snippet of code you posted, I'm pretty sure that if you try to remove the call to the playSongPtr(songHandle) function, then your problem is likely to go away. You will have to investigate what the requirements are for "miniFMOD.dll". For example, that DLL might assume that it's running in a GUI environment instead of a console program, and may do things that don't necessarily work in console mode. Also, in order to do anything in the background (including playing a song), that DLL probably needs to create a thread to periodically load the next bit of the song and queue it in the play buffer. You can check the number of threads being created by your program in Task Manager (or better, Process Explorer). If it's more than one, then there are other things going on that you aren't directly controlling.

The error tells you that memory is accessed which you have not allocated at the moment. It could be a pointer error like dereferencing NULL. Another possibility is that you use memory after you freed it.
The first step would be to check your code for NULL reference checks, i.e. make sure you have a valid pointer before you use it, and to check the lifecycle of all allocated and freed resources. Writing NULL's over references you just freed might help find the problem spot.

I doubt this particular problem is a memory leak; the problem is dereferencing a pointer that does not point to something useful. To check for a memory leak, watch your process in your operating system's process list tool (task manager, ps, whatever) and see if the "used memory" value keeps growing.
On calling free: You should call free() once and only once on the non-null values returned from malloc(), calloc() or strdup(). Calling free() less than once will lead to a memory leak. Calling free() more than once will lead to memory corruption.
You should get a stack trace to see what is going on when the process crashes. Based on my reading of the addresses involved you probably have a stack overflow or have an incorrect pointer calculation using a stack address (in C/C++ terms: an "auto" variable.) A stack trace will tell you how you got to the point where it crashed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js