Hallo,
I have a quite strange problem in one of my C++ projects:
I wrote a C++ Socket wrapper, that tries to connect to a given host and port (via IPv4/TCP) and throws a SocketException (derived from std::runtime_error), if an error occurs (e.g. 'Connection refused'). The exception is caught properly and an error message is written to console as expected, but apparently the destructor of my Socket class is not called (it should output a message to std::cerr, too, but the message only appears if connection works and Socket is destroyed later on if it goes out of stack, e.g. on end of the function that tries to utilize the socket). The destructor should close the encapsulated socket, but on exception thrown the socket remains open (you can see it with lsof as socket of unknown type), so no code in the destructor seems to be executed at all).
As I couldn't reproduce this problem with a simple testcase, my guess is that it somehow has to do with the quite complex structure of my project: I have a core application containing the code for the Socket class and providing a Singleton class which offers methods that implement the protocol used for communication and return the results of a request, each call to one of these methods generates its own instance of a Socket and provides it with the necessary information about host and port to use. To simplify socket generation and managment, a std::auto_ptr is used, which should delete the Socket if method has finished and stack is cleaned up, which works properly according to console output, but it should work the same way on an exception thrown, at least that is what was my opinion until now.
The core is able to load plugins in shared object format by dlopen and gets a pointer to the plugin's class instance via an extern C declared function in the shared object.
This instance now uses the Singleton provided by the core to communicate with the server and show retrieved data.
My question is: are there limitations to stack unwinding when using shared objects, or where should I look for the thing I missed out to make this work properly?
If your exception is thrown from the constructor, the destructor will not be called.
Ok, forget that one. Another look in the code showed that there was the possibility that an exception could have been thrown already in constructor so that the destructor would not have been called, as it's described in C++ standard. Not throwing in the constructor solved the problem. That's what programming in Java is doing to your C++ skills ^^
Excuse the noise, please.
If you are programming on linux, you might be triggering a problem where the exception thrown from a shared library is not caught properly (problem with exception type determining). This problem is explained here and here, and I am sure you could google up more pages explaining the same problem.
If that is a problem, I am still looking for a solution :(
Related
In a Java application, I use JNI to call several C++ methods. One of the methods creates an object that has to persist after the method finished and that is used in other method calls. To this end, I create a pointer of the object, which I return to Java as a reference for later access (note: the Java class implements Closable and in the close method, I call a method to delete the object).
However, in rare cases, approximately after 50.000 calls, the C++ code throws a segmentation fault. Based on the content of the log file, only a few lines of code are suspicious to be the source of error (they between the last printed log message and the next one):
MyObject* handle = new MyObject(some_vector, shared_ptr1, shared_ptr2);
handles.insert(handle); // handles is a std::set
jlong handleId = (jlong) handle;
I'd like to know whether there is a possible issue here apart from the fact that I'm using old-style C pointers. Could multi-threading be a problem? Or could the pointer ID be truncated when converted to jlong?
I also want to note that from my previous experience, I'm aware that the log is only a rough indicator of where a segmentation fault occurred. It may as well have been occurred later in the code and the next log message was simply not printed yet. However, reproducing this error may take 1-2 days, so I'd like to check out whether these lines have a problem.
After removing my std::set from the code, the error did not occur anymore. Conclusion: std::set in multithreading must be protected to avoid unrecoverable crashes.
I'm trying to figure out this problem.
Suppose, you have a code that uses boost::signals2 for communicating between objects. Lets call them "colorscales". Code for these colorscales is usually situated in the same DLL as the code that uses them. Let's call it main.dll
But sometimes code from other DLLs needs to use these objects and this is where the problems begin.
Basically, the application is pretty big and most of the DLLs are loaded to do some work and then they are unloaded. This is not the case with DLL that contain colorscales code, it's neved unloaded during application normal runtime.
So, when one of the DLLs is loaded (lets call it tools.dll) and some code runs, it may want to use these colorscale objects and communicate with them, so I connect to the signals these objects provide.
The problem is that boost is pretty lazy and all clever, and when you disconnect() slots, it doesn't actually erase connection and stuff that is associated with it (like boost::bind object and such). It just sets a flag that this connection is now disconnected and cleans it up on later (actually, it clean up 2 of these objects when you connect new slots and 1 of them when you invoke signal as of version 1.57). You probably already see where this is coming to.
So, you when you don't need more tools, you disconnect these signals and then application unloads tools.dll.
Then at a later stage, some code executes from the main.dll which causes one of colorscale signals invoked. boost::signals2 goes to invoke it, but before it tries to clean up one disconnected slot. This is where access violation happens, because internally connection had a shared_state object or something like this, that tries to clean itself up in a thread-safe way. But it faces the problem, that the code that it tries to call is already not there, because DLL is unloaded, so the Access Violation exception is thrown.
I've tried to fix this by invoking signal with some dummy parameters before DLL is unloaded and also by connecting and then disconnecting more slots (this one was a stupid idea, because it doesn't solve problem, but just multiplies it) some predefined amount of times (2 or 3 times more than there are slots at all).
It worked, or I thought so, because now it doesn't crash instantly, but rather crashes the next time you load the same tools.dll. I still need to figure out where and why does it crash, but it's somewhere else inside boost.
So, I wanted to ask, what are my options of fixing it?
My thoughts were
Implementing my own connection that works in a more simple way
Providing a more simple way to communicate, like, callbacks, for instance
Finding a workaround for boost being so lazy and smart.
Well, it seems that I've found the cause of the crash after the fix.
So, basically, what happens, when you use the workaround described above (calling signal with dummy parameters multiple times), what it does is that it replaces _shared_state object that was created from boost code from main.dll by another _shared_state object that is created from boost code from tools.dll. This object maintains pointer to reference counter (of type derived from boost::detail::sp_counter_base) inside.
Then the tools.dll unloads and the object remains, but its virtual table is pointing to the code that is no longer there. Let's look at the virtual table of the reference counter to understand what's going on.
[0] 0x000007fed8a42fe5 tools.dll!boost::detail::sp_counted_impl_p<...>::`vector deleting destructor'(unsigned int)
[1] 0x000007fed8a4181b tools.dll!boost::detail::sp_counted_impl_p<...>::dispose(void)
[2] 0x000007fed8a4458e tools.dll!boost::detail::sp_counted_base::destroy(void)
[3] 0x000007fed8a43c42 SegyTools.dll!boost::detail::sp_counted_impl_p<...>::get_deleter(class type_info const &)
[4] 0x000007fed8a42da6 tools.dll!boost::detail::sp_counted_impl_p<...>::get_untyped_deleter(void)
As you can see, all these method are connected to the disposal of reference counter, so the problem doesn't arise before you try to do the same trick second time. So, the trick with disconnecting all signals to try to get rid of all the code from tools.dll doesn't work as expected and the next time you try to do the trick, Access Violation occurs.
This is the singleton
#pragma once
class ContextManager {
public:
static ContextManager& Instance() {
static ContextManager instance;
return instance;
}
zmq::context_t& GetContext() { return ctx_;}
private:
zmq::context_t ctx_;
~ContextManager() {}
};
I have a DLL with some useful Network utilities, built on ZeroMQ and using this singleton for not having to pass context around.
I link this DLL to an EXE which runs a test-suite. This test suite works, sending and receiving some messages. When the program exits, the ContextManager destructor crashes saying "Assertion failed: Successful WSASTARTUP not yet performed (......\src\signaler
.cpp:137)"
More details:
The application is single-threaded.
If I just call the Instance.GetContext() method from the .EXE and return (no test running, no more calls to the DLL interface), then it fails too.
If I define this singleton before the main (thus, inside the exe without using the object from the DLL) then it works.
WSastartup is called just once and it works.
I do not want to expose any implementation details to the DLL clients, so I would like to have this singleton inside the DLL. How could achieve this?
The problem is that WinSock, which is used by ZMQ, requires a call to WSAStartup() before use. If you then call WSAShutdown() and use ZMQ, it looks as if WSAStartup() had never been called, hence the failed assertion. On a more abstract level, the timespan between WSAStartup() and WSAShutdown() must completely contain the lifetime of the ZMQ context.
Function-level statics in C++ are created on demand but destroyed (I believe in unspecified order) after main() returns. You don't show the call to WSAStartup(), but I guess it is somewhere inside main(). Similarly, the call to WSAShutdown() is before the end of main, but that would still put it before the destruction of function-static objects, hence the problems you are seeing.
Two possible fixes:
Allocate the context using new and never delete it. The only time you would delete it is on program shutdown, shortly before the OS itself would reclaim all the resources used by the program. This is a simple and pragmatic fix.
A little more complicated would be to bind the calls to WSAStartup()/WSAShutdown() to the ctor/dtor of the singleton. In the ctor, start WinSock and then create the context. In the destructor, destroy the context and then release WinSock.
You could also create two functions similar to WSAStartup() and WSAShutdown() for your DLL, but that's inconvenient and ugly. Also, I would at least consider not using singletons unless absolutely necessary. Forcing a certain use of your code on the user is a nuisance, but that's just my personal opinion.
hope you're having a good day.
I'm working on a class to wrap the Berkley C Networking API, so far I've only gotten a TCP server/client going.
The issue I'm having ironically is not with the networking, but with the stack and heap. Perhaps I simply don't understand it fully, but when I use something like:
ClientSocket *mysock = new ClientSocket();
And just call functions using the -> operator, it works perfectly fine - my SocketException class gets caught no problem, if an error occurs.
But, when I use:
ClientSocket mysock;
And any exceptions get thrown while calling a function using the . operator, it shows:
terminate called after throwing an instance of 'SocketException'
Aborted
And just throws me back to a terminal prompt.
Forgot to add, I am wrapping the calls in try/catch blocks.
I'm aware that the first example is using the 'new' keyword to return a pointer to the new ClientSocket instance on the heap, and the second is for the stack, but I don't know the problem.
I'm thinking that I'm missing something about pointers/references/stack/heap, but I have no idea what is happening. The code often runs just fine, but if any exceptions are thrown.... >:(
EDIT: On the links page, Client.cxx and Server.cxx are the example files! Thanks for pointing that out, Eric.
Help with this would be greatly appreciated. The sources for this project are at:
links to all the files: http://furryhead.co.cc/problem.html
(I couldn't paste more than 2 links, and I have 4 files so this will have to do until someone can merge the links into my post)
Beware: Socket.cxx is rather large, as it contains ServerSocket, ClientSocket, and SocketException definitions.
The commands to compile all the above files are:
g++ -c Socket.cxx -o Socket.o
g++ -c Server.cxx -o Server.o
g++ -c Client.cxx -o Client.o
g++ Server.o Socket.o -o server
g++ Client.o Socket.o -o client
Thanks!
Little update, as per Jon's recommendation, I looked up the docs for the socket functions and it now has better error reporting - I check the 'errno' variable and throw an exception based on that. (That, and I don't set it to nonblocking... ;) ) - Just wanted to update and say thanks! :D
To me this sounds like an exception being thrown for a legitimate reason, and during stack unwinding some object's (possibly the ClientSocket's?) destructor throws. Since the runtime cannot resolve this situation in any meaningful way (two exceptions "thrown" at the same time), the terminate function is called and the program is shut down.
The unanswered question is why some destructor would throw if the object it belongs to is allocated on the stack. But to answer this question requires more data. Perhaps you can dig a little deeper and test my hypothesis?
By the way, if this is indeed the case, and since no destructor should ever throw anything, we can conclude that the offending class is fatally flawed.
Update: seems I was on the money.
Your class Socket's destructor calls close, and close can throw. This is a serious programming error. You should wrap the close call in the destructor in
try {
close();
}
catch(...)
{ /* this space intentionally left blank */ }
and the crash will go away.
Second update (not crashing anymore)
If recv is returning -1, this means that the socket is in non-blocking mode and there is no data currently available to be received. That's not an error, but a feature. You shouldn't be throwing an exception there. What you should be doing exactly depends on whether you want to use the socket in blocking or non-blocking mode.
Jon has already solved the terminate problem, but there is another problem in the original code.
The reason that a dynamically allocated object seemed to work is that when there is an exception elsewhere, it is leaked and the destructor is never called. While this avoids the double-exception problem, it causes another set later...
The C++ standard provides the std::set_terminate function which lets you specify what function std::terminate should actually call. std::terminate should only get called in dire circumstances, and sure enough the situations the standard describes for when it's called are dire (e.g. an uncaught exception). When std::terminate does get called the situation seems analagous to being out of memory -- there's not really much you can sensibly do.
I've read that it can be used to make sure resources are freed -- but for the majority of resources this should be handled automatically by the OS when the process exits (e.g. file handles). Theoretically I can see a case for if say, you needed to send a server a specific message when exiting due to a crash. But the majority of the time the OS handling should be sufficient.
When is using a terminate handler the Right Thing(TM)?
Update: People interested in what can be done with custom terminate handlers might find this non-portable trick useful.
This is just optimistic:
but for the majority of resources this should be handled automatically by the OS when the process exits
About the only resources that the OS handles automatically are "File Handles" and "Memory" (And this may vary across OS's).
Practically all other resources (and if somebody has a list of resources that are automatically handled by OS's I
would love that) need to be manually released by the OS.
Your best bet is to avoid exit using terminate() and try a controlled shut down by forcing the stack to unwind correctly.
This will make sure that all destructors are called correctly and your resources are released (via destructors).
About the only thing I would do is log the problem. So that when it does happened I could go back and fix the code so that it does not happen again. I like my code to unwind the stack nicely for resource deallocation, but this is an opinion some people like abrupt halts when things go badly.
My list of when terminate is called:
In general it is called when the exception handling mechanism cannot find a handler for a thrown exception. Some specific examples are:
An exception escapes main()
Note: It is implementation defined whether the stack is unwound here.
Thus I always catch in main and then rethrow (if I do not explicitly handle).
That way I guarantee unwinding of the stack (across all platforms) and still get the benefits of the OS exception handling mechanism.
Two exceptions propagating simultaneously.
An exception escapes a desatructor while another exception is propagating.
The expression being thrown generates an exception
An exception before or after main.
If an exception escapes the constructor/destructor of a global object.
If an exception escapes the destructor of a function static variable.
(ie be careful with constructors/destructors of nonlocal static object)
An exception escapes a function registered with atexit().
A rethrow when no exception is currently propagating.
An unlisted exception escapes a method/function that has exception specifier list.
via unexpected.
Similar to a statement made in Martin York's answer, about the only thing I do in a custom terminate handler is log the problem so I can identify and correct the offending code. This is the only instance I find that using a custom terminate handler is the Right Thing.
Since it is implementation-defined whether or not the stack is unwound before std::terminate() is called, I sometimes add code to generate a backtrace in order to locate an uncaught exception1.
1) This seems to work for me when using GCC on Linux platforms.
I think the right question would be how to avoid the calls to terminate handler, rather than when to use it.