Rare segmentation fault during object creation with new - c++

In a Java application, I use JNI to call several C++ methods. One of the methods creates an object that has to persist after the method finished and that is used in other method calls. To this end, I create a pointer of the object, which I return to Java as a reference for later access (note: the Java class implements Closable and in the close method, I call a method to delete the object).
However, in rare cases, approximately after 50.000 calls, the C++ code throws a segmentation fault. Based on the content of the log file, only a few lines of code are suspicious to be the source of error (they between the last printed log message and the next one):
MyObject* handle = new MyObject(some_vector, shared_ptr1, shared_ptr2);
handles.insert(handle); // handles is a std::set
jlong handleId = (jlong) handle;
I'd like to know whether there is a possible issue here apart from the fact that I'm using old-style C pointers. Could multi-threading be a problem? Or could the pointer ID be truncated when converted to jlong?
I also want to note that from my previous experience, I'm aware that the log is only a rough indicator of where a segmentation fault occurred. It may as well have been occurred later in the code and the next log message was simply not printed yet. However, reproducing this error may take 1-2 days, so I'd like to check out whether these lines have a problem.

After removing my std::set from the code, the error did not occur anymore. Conclusion: std::set in multithreading must be protected to avoid unrecoverable crashes.

Related

Is there a better way to see which function caused a exception other than using catch

I'm having problems with locating the address from which a error occurred, my whole code is running inside of a "try" statement and sadly whenever something is wrong I need to find the error using the old try and fail method by deleting parts of my code. Is there a better way to do it?
My current code:
try
{
do
{
if (somefunction)
if (somefunction2)
if (somefunction3)
if (somefunction4)
}
while (false);
}
catch (...)
{
// todo: somehow get the address where the error occurred
Logger::Log("Exception\n");
}
A simple solution to find out where an exception comes from is to use a unique message within each function. Catch the exception object and print the message. Or perhaps use even a different type of exception which will allow you to efficiently handle each case differently if that's what you want to do.
As for getting an "address", the trace of function calls that lead to the current point of execution is called a stacktrace (or backtrace). The stacktrace would contain information such as addresses. Theres no standard way to get a stacktrace yet, although it has been proposed for C++23.
However, once you've caught the exception, the stack will have been "unwound" such that you can't know where the exception came from. What you could do, is get the stack trace in the code that may be throwing (each of them since you don't know which one is the thrower) and store the trace in the exception. A central place to do that would be within the constructor of a custom exception type. This pattern is common in standard exception handling of modern languages.
Lastly, you don't necessarily need to make any changes to the program, if you instead run the program in a debugger and break on a throw, you can get all the information you can possibly get.

Access violation when using boost::signals2 and unloading DLLs

I'm trying to figure out this problem.
Suppose, you have a code that uses boost::signals2 for communicating between objects. Lets call them "colorscales". Code for these colorscales is usually situated in the same DLL as the code that uses them. Let's call it main.dll
But sometimes code from other DLLs needs to use these objects and this is where the problems begin.
Basically, the application is pretty big and most of the DLLs are loaded to do some work and then they are unloaded. This is not the case with DLL that contain colorscales code, it's neved unloaded during application normal runtime.
So, when one of the DLLs is loaded (lets call it tools.dll) and some code runs, it may want to use these colorscale objects and communicate with them, so I connect to the signals these objects provide.
The problem is that boost is pretty lazy and all clever, and when you disconnect() slots, it doesn't actually erase connection and stuff that is associated with it (like boost::bind object and such). It just sets a flag that this connection is now disconnected and cleans it up on later (actually, it clean up 2 of these objects when you connect new slots and 1 of them when you invoke signal as of version 1.57). You probably already see where this is coming to.
So, you when you don't need more tools, you disconnect these signals and then application unloads tools.dll.
Then at a later stage, some code executes from the main.dll which causes one of colorscale signals invoked. boost::signals2 goes to invoke it, but before it tries to clean up one disconnected slot. This is where access violation happens, because internally connection had a shared_state object or something like this, that tries to clean itself up in a thread-safe way. But it faces the problem, that the code that it tries to call is already not there, because DLL is unloaded, so the Access Violation exception is thrown.
I've tried to fix this by invoking signal with some dummy parameters before DLL is unloaded and also by connecting and then disconnecting more slots (this one was a stupid idea, because it doesn't solve problem, but just multiplies it) some predefined amount of times (2 or 3 times more than there are slots at all).
It worked, or I thought so, because now it doesn't crash instantly, but rather crashes the next time you load the same tools.dll. I still need to figure out where and why does it crash, but it's somewhere else inside boost.
So, I wanted to ask, what are my options of fixing it?
My thoughts were
Implementing my own connection that works in a more simple way
Providing a more simple way to communicate, like, callbacks, for instance
Finding a workaround for boost being so lazy and smart.
Well, it seems that I've found the cause of the crash after the fix.
So, basically, what happens, when you use the workaround described above (calling signal with dummy parameters multiple times), what it does is that it replaces _shared_state object that was created from boost code from main.dll by another _shared_state object that is created from boost code from tools.dll. This object maintains pointer to reference counter (of type derived from boost::detail::sp_counter_base) inside.
Then the tools.dll unloads and the object remains, but its virtual table is pointing to the code that is no longer there. Let's look at the virtual table of the reference counter to understand what's going on.
[0] 0x000007fed8a42fe5 tools.dll!boost::detail::sp_counted_impl_p<...>::`vector deleting destructor'(unsigned int)
[1] 0x000007fed8a4181b tools.dll!boost::detail::sp_counted_impl_p<...>::dispose(void)
[2] 0x000007fed8a4458e tools.dll!boost::detail::sp_counted_base::destroy(void)
[3] 0x000007fed8a43c42 SegyTools.dll!boost::detail::sp_counted_impl_p<...>::get_deleter(class type_info const &)
[4] 0x000007fed8a42da6 tools.dll!boost::detail::sp_counted_impl_p<...>::get_untyped_deleter(void)
As you can see, all these method are connected to the disposal of reference counter, so the problem doesn't arise before you try to do the same trick second time. So, the trick with disconnecting all signals to try to get rid of all the code from tools.dll doesn't work as expected and the next time you try to do the trick, Access Violation occurs.

Alternative to exceptions for methods that return objects

I have a class that can perform many types of transformations to a given object.
All data is supplied by the users (via hand crafted command files) but for the most part there is no way to tell if the file commands are valid until we start transforming it (if we checked everything first we'd end up doing the same amount of work as the transformations themselves)
These transformation methods are called on the object and it returns a newly transformed object however if there is a problem we throw an exception since returning an object (even a blank object) could be confused with success whereas an exception sends a clear signal there was a problem and an object can not be returned. It also means we don't need to call a "get last error" type function to find out exactly what went wrong (error code, message, statement, etc.)
However from reading numerous answers on SO it seems this is incorrect since invalid user input is not an exceptional circumstance and due to the complexity of this thing its not uncommon for the user to submit incorrect command files.
Performance is not really an issue here because if we hit an error we have to stop and go back to the user.
If I use return codes I'd have to take the output object as a parameter and then to check the return codes I'd need long nested if blocks (ie. the way you check HRESULT from COM)
How would I go about avoiding exceptions in this case while still keeping the code reasonably tidy?
The design of your problem really lends itself for exceptions. The fact that program execution will halt or be invalid once the user supplies a bad input file, is a sign of "exceptional circumstance". Sure, a lot of program runs will end in an exception being thrown (and caught), but this is in one execution of the program.
The things you are reading about exceptions being slow when used for every other circumstance, is only valid if the program can recover, and has to recover often (e.g. a compiler that fails to find a header in a search directory, and has to look at the next directory in the list of search directories, which really happens a lot).
Use exceptions. When in doubt, measure if this is killing performance. Cleaner code >> following what people say on SO. But then again, that would be a reason to ignore what I just said.
Well, I wouldn't (avoid exceptions). If you really need an "error reporting" mechanism (and you do), there is not much besides return values and exceptions. I think the whole argument about what is exceptional enough (and therefor deserves exceptions) and what is not, is not that important to keep you from solving your problem. Especially if performace isn't an issue. But if you REALLY want avoid exceptions, you could use some kind of global queues to queue your error iformation. But that is utterly ugly, if you ask me.

My code crashes on delete this

I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).

How to change my error handling method

I can't seem to get my head around why people say C++ exceptions are better. For example, I have an application which loads function objects from shared objects to be used in the application. What goes on is something like this:
bool LoadFunctions()
{
//Get Function factory.
FunctionFactory& oFactory = GetFunctionFactory();
//Create functions from the factory and use.
}
FunctionFactory& GetFunctionFactory()
{
//Get shared object handle.
void* pHandle = dlopen("someso.so");
//Get function ptr for Factory getter.
typedef FunctionFactory* (*tpfFacGet)();
tpfFacGet pF = static_cast<tpfFacGet>(dlsym(pHandle, "GetFactory"));
//Call function and return object.
return *((*pF)());
}
Now, it's easy to see that loads of stuff can go wrong. If I did it like I always do, I'd return pointers instead of references, and I'd check if they were NULL and print an error message and get out if they weren't. That way, I know where things went wrong and I can even try to recover from that (i.e. If I successfully load the factory and fail to load just a single function, I may still continue). What I don't understand is how to use exceptions in such a scenario and how to recover the program rather than printing an error message and qutting. Can someone tell me how I am to do this in C++ish way?
We don't even need return codes. If a problem occurs it should be in the exception.
int main()
{
try
{
LoadFunctions();
// if we're here, everything succeeded!
}
catch(std::exception _e)
{
// output exception message, quit gracefully
}
// IRRESPECTIVE OF SUCCESS/FAILURE WE END UP HERE
return 0;
} // eo main
EDIT:
Okay, so lets say that you have an alternative method of loading functions should LoadFunctions() fail. You might be tempted to call that in the catch handler, but this way you'll quickly end up with a huge amount of nested exception handlers which just complicates things.
So now we get down to the question of design. LoadFunctions should succeed if functions are loaded and throw out an exception if it does not. In this hypothetical example of an alternative method of loading functions, that call should be within the LoadFunctions method. This alternative method does not need to be visible to the caller.
At the top level we either end up with functions, or we do not. Writing good exception handling, in my opinion is about getting rid of grey areas. The function did what it was told to do, or it didn't.
There is, as you say, a lot that can go wrong. You won't catch a bad cast there by the way. If the symbol exists but is not the type you are casting it to, you will just get a nasty shock later.
If you were to avoid exceptions you will need somewhere to report the error. As your LoadFunctions and GetFunctionFactory() do not know how you wish to handle the error (log it? print it to stderr? Put up a message box?) The only thing it can do is generate the error.
A common way do to that in C is to pass in a parameter into which it can put the error if one occurs, and for each function to "check" success before continuing. This can make the flow rather tricky.
The C++ concept of "throwing" the exception means that you do not need to keep passing a pointer (or reference) through each function. Where the error occurs you generate it and "throw" it - a bit like "shouting" it. This causes all code (other than cleanup in destructors) to halt until it finds a catcher that handles the error the way that is required.
Note that exceptions should only generally be used to handle errors, not a normal occurrence like encountering "end of file" when this is the way you know a read has completed.
Using exceptions instead of return-values (or any other method) is not supposed to change the behaviour of the code, only how it is written and organized. That means basically that first you decide what your recovery of a certain error is, be it more graceful or less, then you write the code to perform that.
Most experienced programmers (all practically) agree that exceptions are a much better method than return values. You can't see the big difference in short examples of a few functions, but in real systems of thousands of functions and types you would see it clearly. I will not get into more details of how it is better.
I suggest anyway you should just get yourself used to using exceptions by default. However note that using exceptions has some somewhat delicate issues (e.g. RAII http://en.wikipedia.org/wiki/RAII), that ultimately make your code better, but you should read about them in a book (I won't be able to describe here and feel that I do justice to the subject).
I think the book "Effective c++ / Scott Meyer" deals with that, certainly "Exceptional C++ / Herb Sutter". These books are a good jump start for any c++ developer if you havent read them anyway.