hope you're having a good day.
I'm working on a class to wrap the Berkley C Networking API, so far I've only gotten a TCP server/client going.
The issue I'm having ironically is not with the networking, but with the stack and heap. Perhaps I simply don't understand it fully, but when I use something like:
ClientSocket *mysock = new ClientSocket();
And just call functions using the -> operator, it works perfectly fine - my SocketException class gets caught no problem, if an error occurs.
But, when I use:
ClientSocket mysock;
And any exceptions get thrown while calling a function using the . operator, it shows:
terminate called after throwing an instance of 'SocketException'
Aborted
And just throws me back to a terminal prompt.
Forgot to add, I am wrapping the calls in try/catch blocks.
I'm aware that the first example is using the 'new' keyword to return a pointer to the new ClientSocket instance on the heap, and the second is for the stack, but I don't know the problem.
I'm thinking that I'm missing something about pointers/references/stack/heap, but I have no idea what is happening. The code often runs just fine, but if any exceptions are thrown.... >:(
EDIT: On the links page, Client.cxx and Server.cxx are the example files! Thanks for pointing that out, Eric.
Help with this would be greatly appreciated. The sources for this project are at:
links to all the files: http://furryhead.co.cc/problem.html
(I couldn't paste more than 2 links, and I have 4 files so this will have to do until someone can merge the links into my post)
Beware: Socket.cxx is rather large, as it contains ServerSocket, ClientSocket, and SocketException definitions.
The commands to compile all the above files are:
g++ -c Socket.cxx -o Socket.o
g++ -c Server.cxx -o Server.o
g++ -c Client.cxx -o Client.o
g++ Server.o Socket.o -o server
g++ Client.o Socket.o -o client
Thanks!
Little update, as per Jon's recommendation, I looked up the docs for the socket functions and it now has better error reporting - I check the 'errno' variable and throw an exception based on that. (That, and I don't set it to nonblocking... ;) ) - Just wanted to update and say thanks! :D
To me this sounds like an exception being thrown for a legitimate reason, and during stack unwinding some object's (possibly the ClientSocket's?) destructor throws. Since the runtime cannot resolve this situation in any meaningful way (two exceptions "thrown" at the same time), the terminate function is called and the program is shut down.
The unanswered question is why some destructor would throw if the object it belongs to is allocated on the stack. But to answer this question requires more data. Perhaps you can dig a little deeper and test my hypothesis?
By the way, if this is indeed the case, and since no destructor should ever throw anything, we can conclude that the offending class is fatally flawed.
Update: seems I was on the money.
Your class Socket's destructor calls close, and close can throw. This is a serious programming error. You should wrap the close call in the destructor in
try {
close();
}
catch(...)
{ /* this space intentionally left blank */ }
and the crash will go away.
Second update (not crashing anymore)
If recv is returning -1, this means that the socket is in non-blocking mode and there is no data currently available to be received. That's not an error, but a feature. You shouldn't be throwing an exception there. What you should be doing exactly depends on whether you want to use the socket in blocking or non-blocking mode.
Jon has already solved the terminate problem, but there is another problem in the original code.
The reason that a dynamically allocated object seemed to work is that when there is an exception elsewhere, it is leaked and the destructor is never called. While this avoids the double-exception problem, it causes another set later...
Related
TL;DR: How do I automatically add a watch in gdb when a function is called so I can debug some memory corruption?
I am currently dealing with some memory corruption in C++
I am mostly seeing 4-5 types of reaccuring crashes - all of which make little to no sense, so I'm guessing it has to be related to memory corruption.
These crashes only happen on the production server, round about every 2-5hours.
Most of them consist of accessing or passing a null pointer where it cant possibly have existed in the first place.
One of these places is a lambda capturing this. (see below)
Obviously looked at core dumps and even had gdb attached while it crashed
valgrind: I've spent hours staring at multiple instances of valgrind with no success.
Enabled gccs stack protection (-fstack-protector-all)
I have tried looking over the code & the changes, but it has been impossible for me to find anything (100k lines of code total, "On master, 10,437 files have changed and there have been 3,352,600 additions and 85,495 deletions." since the last release on the production server). I might have just plain missed something, or not looked in the right spots - I cant tell.
Used cppcheck to see if there was something plain obvious wrong with the code
If there is an easier/more straight forward method to finding where the corruption occurs feel free to suggest that too.
Lets look at some simplified code.
I have a class, Socket, which manages a client connection.
It is constructed something like this
Listener::OnAccept(fd){
Socket* s = new Socket();
if (s->Setup(fd)){
// push into a vector and do some other things
}
}
Socket::Setup calls (virtual) OnConnect of the Socket class, which then creates a ping event, using a lambda:
Socket::OnConnect(){
m_pingEvent = new Event([this](Event* e){
if (!this->GotPong()){
// close connection
}else{
this->Ping();
}
}, 30 /*seconds*/, true /* loop */);
}
Event accepts an std::function as the callback
m_pingEvent is deleted in the destructor (if set) which will cancel the event if it is running.
What happens (rarely) is that the lambda calls Ping on a nullptr, which calls m_pingPacket->Send() on this=0x1f8, which leads to a segfault.
My question - or rather my proposed solution - would be watching the captured this pointer for writing, which definitely shouldnt happen.
There is only one small issue with that..
How would I even watch such a high ammount of pointers without manually adding each one? (about 400 concurrent connections with a lot (dis)connects)
As for the captured data I found this is in the __closure object:
(gdb) frame 2
#2 0x081b9d63 in operator() (e=0x9b2a748, __closure=0xb5a8318)
at net/socket/Client.cpp:151
151 net/socket/Client.cpp: No such file or directory.
(gdb) ptype __closure
type = const struct {
net::socket::Client * const __this;
} * const
Which I can get when creating the lambda easily by just moving the lambda to "auto callback = " which will be of type:
(gdb) info locals
callback = {__this = 0xb4dd0948}
(gdb) ptype callback
type = struct {
net::socket::Client * const __this;
}
(gdb) print callback
$1 = {__this = 0xb4dd0948}
(This is gcc version 4.7.2 (Debian 4.7.2-5) for reference, might be different with other compilers/versions)
Shortly before posting I realized the struct would probably change address once moved into the std::function (is this correct?)
I've been digging through the gnu "functional" header, but I havent really been able to find anything yet, I'll keep looking (and updating this)
Another note: I am posting this full describtion with all of the details included in case anyone has an easier solution for me. (XY Problem)
Edit:
(gdb) print *(void**)m_pingEvent->m_callback._M_functor._M_unused._M_object
$8 = (void *) 0xb4dd56d8
(gdb) print this
$4 = (net::socket::Client * const) 0xb4dd56d8
Found it :)
Edit2:
break net/socket/Client.cpp:158
commands
silent
watch -l m_pingEvent->m_callback._M_functor._M_unused._M_object
continue
end
This has two flaws: you can only watch 4 addresses at a time & there is no way to delete the watch once the object will be freed.
Soo it's unusable.
Edit 3:
I've figured out how to do the watching using this python script I wrote (linking this one externally since it's quite long): https://gist.github.com/imermcmaps/4a6d8a1577118645acf3
Next issue is making sense of the output..
Added watch 7 -> 0x10eb2200
Hardware watchpoint 7: -location m_pingEvent->m_callback._M_functor._M_unused._M_obj
Old value = (void *) 0x10eba4b0
New value = (void *) 0x10eba400
net::Packet::Packet (this=0x10eb1088) at ../shared/net/Packet.cpp:13
Like it's saying it changed from an old value, which shouldn't even be the original value, since I'm checking if the this pointer and the pointer value match, which they do.
Edit 4 (yay):
Turns out watch -l doesnt work like i want it to.
Manually grabbing the address and then watching that address seems to work
How do I automatically add a watch in gdb when a function is called so
I can debug some memory corruption?
Memory corruption is often detected after the real corruption has already occurred by some modules loaded within your process. So manual debugging may not be very useful for real complex projects.Because any third party modules/library which is loaded within your process may also lead to this problem. From your post it looks like this problem is not reproducible always which indicates that this might be related to threading/synchronization problem which leads to some sort of memory corruption. So based on my experience i strongly suggest you to concentrate on reproducing the problem under dynamic tools(Valgrind/Helgrind).
However as you have mentioned in your question that you are able to attach your program using Valgrind. So you may want to attach your program(a.out) in case you have not done in this way.
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would automatically attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This seems to be the best possible way to find out the root cause of your problem.
However I think that there may be some data racing scenario which is leading to memory corruption.So you may want to use Helgrind to check/find data racing/threading problem which might be leading to this problem.
For more information on these, you may refer the following post:
https://stackoverflow.com/a/22658693/2724703
https://stackoverflow.com/a/22617989/2724703
I created a thread using C++11 thread class and I want the thread to sleep in a loop.
When the this_thread::sleep_for() function is called, I get exception saying:
Run-Time Check Failure #2 - Stack around the variable '_Now' was
corrupted.
My code is below:
std::chrono::milliseconds duration( 5000 );
while (m_connected)
{
this->CheckConnection();
std::this_thread::sleep_for(duration);
}
I presume _Now is a local variable somewhere deep in implementation of sleep_for. If it gets corrupt, either there is bug in that function (unlikely) or some other part of your application is writing to dangling pointers (much more likely).
The most likely cause is that you, some time before calling the sleep_for, give out pointer to local variable that stays around and is written to by other thread while this thread sleeps.
If you were on Linux, I'd recommend you to try valgrind (though I am not certain it can catch invalid access to stack), but on Windows I don't know about any tool for debugging this kind of problems. You can do careful review and you can try disabling various parts of functionality to see when the problem goes away to narrow down where it might be.
I also used to use duma library with some success, but it can only catch invalid access to heap, not stack.
Note: Both clang and gcc are further in implementing C++11 than MSVC++, so if you don't use much Windows-specific stuff, it might be easy to port and try valgrind on it. Gcc and especially clang are also known for giving much better static diagnostics than MSVC++, so if you compile it with gcc or clagn, you may get some warning that will point you to the problem.
There are two possible solutions to the problem: I don't understand the c++ semantics or g++ does.
I am programming a simple network game now. I have been building a library the game uses to communicate over the network. There is a class designated to handle the connection between the apps. Another class implements server functionality so it possess a method accept(). The method is to return a Connection class.
There are a few way to return the class. I have tried these three:
Connection accept() {
...
return Connection(...);
}
Connection* accept() {
...
return new Connection(...);
}
Connection& accept() {
...
Connection *temp = new Connection(...);
return *temp;
}
All three were accepted by g++. The problem is that the third is somewhat faulty. When you use internal information of the object of type Connection, you will fail. I don't know what is wrong because all fields within the object look like initiasized. My problem is that when I use any function from protocol buffers library my program is terminated by Segmentation fault. The function below fails every it calls the protobuf library.
Annoucement Connection::receive() throw(EmptySocket) {
if(raw_input->GetErrno() != 0) throw EmptySocket();
CodedInputStream coded_input(raw_input);
google::protobuf::uint32 n;
coded_input.ReadVarint32(&n);
char *b;
int m;
coded_input.GetDirectBufferPointer((const void**)&b, &m);
Annoucement ann;
ann.ParseFromArray(b, n);
coded_input.Skip(n);
return ann;
}
I get this every time:
Program received signal SIGSEGV,
Segmentation fault. 0x08062106 in
google::protobuf::io::FileInputStream::CopyingFileInputStream::GetErrno
(this=0x20) at
/usr/include/google/protobuf/io/zero_copy_stream_impl.h:104
When I changed the accept() to the second version, it finnaly worked (the first is good too but I modified conception in the meanwhile).
Have you come across any problem that is similiar to this one? Why the third version of accept() is wrong? How should I debug the program to find such a horrible bug (I thought protobuf need some fix whereas the problem was not there)?
First, returning by reference something allocated on the heap is a sure recipe for a memory leak so I would never suggest actually doing that.
The second case can still result in a leak unless the ownership semantics are very well specified. Have you considered using a smart pointer instead of a raw pointer?
As for why it doesn't work, it probably has to do with ownership semantics and not because you're returning by reference, but I can't see a problem in the posted code.
"How should I debug the program to find such a horrible bug?"
If you are on Linux try running under valgrind - that should pick up any memory scribbling going on.
You overlooked raw_input=0x20 which is obviously an invalid pointer. This is in the helpful message you got in the debugger after the segfault.
For general problems of this type, learn to use Valgrind’s memcheck, which gives you messages about where your program abused memory.
Meanwhile I suggest you make sure you understand pass by value vs pass by reference (both pointer and C++ reference) and know when constructors, copy constructors and destructors are called.
Say you have the following destructor in a mutex class wrapping up pthread mutex calls:
~mutex()
{
pthread_mutex_destroy(&m_mutex);
}
If this fails (returns non-zero) we can't throw an exception obviously. How best do we deal with this?
Write an error message and call abort(). Hard, visible failure is often preferable to continuing blithely on when the impossible appears to have happened.
I don't think there's a lot you can do other than ignore it (possibly logging a message, especially if you get EBUSY since this could indicate a serious logic error in your program).
You may take a look at boost::threads: if you are building release - return code will not be checked, and if you are build debug version - abort() will be called with error message printed, BOOST_VERIFY is user for this
In my opinion, the only sane recourse in such a case is assert(3) - something went horribly wrong, so somebody has to investigate ...
I suggest a run-time assertion. If it fails, you are in posix's land of undefined behavior.
The fact that it's inside a destructor is irrelevant. The problem is that you cannot recover if destruction fails (except ignoring it). It's always the case, no matter what language you use.
Hallo,
I have a quite strange problem in one of my C++ projects:
I wrote a C++ Socket wrapper, that tries to connect to a given host and port (via IPv4/TCP) and throws a SocketException (derived from std::runtime_error), if an error occurs (e.g. 'Connection refused'). The exception is caught properly and an error message is written to console as expected, but apparently the destructor of my Socket class is not called (it should output a message to std::cerr, too, but the message only appears if connection works and Socket is destroyed later on if it goes out of stack, e.g. on end of the function that tries to utilize the socket). The destructor should close the encapsulated socket, but on exception thrown the socket remains open (you can see it with lsof as socket of unknown type), so no code in the destructor seems to be executed at all).
As I couldn't reproduce this problem with a simple testcase, my guess is that it somehow has to do with the quite complex structure of my project: I have a core application containing the code for the Socket class and providing a Singleton class which offers methods that implement the protocol used for communication and return the results of a request, each call to one of these methods generates its own instance of a Socket and provides it with the necessary information about host and port to use. To simplify socket generation and managment, a std::auto_ptr is used, which should delete the Socket if method has finished and stack is cleaned up, which works properly according to console output, but it should work the same way on an exception thrown, at least that is what was my opinion until now.
The core is able to load plugins in shared object format by dlopen and gets a pointer to the plugin's class instance via an extern C declared function in the shared object.
This instance now uses the Singleton provided by the core to communicate with the server and show retrieved data.
My question is: are there limitations to stack unwinding when using shared objects, or where should I look for the thing I missed out to make this work properly?
If your exception is thrown from the constructor, the destructor will not be called.
Ok, forget that one. Another look in the code showed that there was the possibility that an exception could have been thrown already in constructor so that the destructor would not have been called, as it's described in C++ standard. Not throwing in the constructor solved the problem. That's what programming in Java is doing to your C++ skills ^^
Excuse the noise, please.
If you are programming on linux, you might be triggering a problem where the exception thrown from a shared library is not caught properly (problem with exception type determining). This problem is explained here and here, and I am sure you could google up more pages explaining the same problem.
If that is a problem, I am still looking for a solution :(