Socket send() hangs when in CLOSE_WAIT state - c++

I have a C++ server application, written using the POCO framework. The server application is acting as a HTTP server in this case. There is a client application which I don't control and cannot debug that is causing a problem in the server. The client requests a large file, which is returned as the HTTP response. During the return of the file the client closes the connection. I see the socket move to the CLOSE_WAIT state, indicating that the client has sent a FIN. The trouble is that in my application the send() function then hangs causing one of my HTTP threads to be basically lost, and once all the threads enter this state the server is unresponsive.
The send code is inside the POCO framework, but looks like this:
do
{
if (_sockfd == POCO_INVALID_SOCKET) throw InvalidSocketException();
rc = ::send(_sockfd, reinterpret_cast<const char*>(buffer), length, flags);
}
while (_blocking && rc < 0 && lastError() == POCO_EINTR);
if (rc < 0) error();
return rc;
(flags are 0 in calls to this function). I tried to detect this state by adding the following code:
char c;
int r;
int rc;
do
{
// Check if FIN received
while ((r = recv(_sockfd, &c, 1, MSG_DONTWAIT)) == 1) {}
if (r == 0) { ::close(_sockfd); _sockfd = POCO_INVALID_SOCKET; } // FIN received
if (_sockfd == POCO_INVALID_SOCKET) throw InvalidSocketException();
rc = ::send(_sockfd, reinterpret_cast<const char*>(buffer), length, flags);
}
while (_blocking && rc < 0 && lastError() == POCO_EINTR);
if (rc < 0) error();
return rc;
This appears to make things better, but still not solve the problem. I end up with the server not hanging as quickly, but many more CLOSE_WAIT sockets, so I think I have partially solved the thread hanging issue, but I have still not tidied up correctly from the broken socket. With this change in place the problem happens less, but still happens, so I think the key to this is understanding why send() hangs.
I'm testing this code on linux.

To cleanly close a socket:
Call shutdown with SD_SEND.
Keep reading from the socket until read returns zero or a fatal error.
Close the socket.
Do not attempt to access the socket after you've closed it.
Your code has two major issues. It doesn't ensure that close is always called on the socket no matter what happens, and it can access the socket after it has closed it. The former is causing your CLOSE_WAIT problem. The latter is a huge security hole.

Related

C++ - AF_UNIX socket hangs

I'm currently trying to get a client/daemon communication via an AF_UNIX socket up and running.
At the moment the client successfully sends a message, the daemon receives and processes it and then should send the message back.
Well, that's where the problem is. As soon as the daemon tries to send the message back nothing happens, the client hangs, trying to read a message, and if I kill the client the daemon dies with it.
Following is the daemon code:
//successful call to accept, I have a file descriptor now...
int c = 0;
while((c = recv(fd, (char*)&buf[0], bufferSize, 0)))
{
if(c == -1 || c == 0)
break;
tmp.append(buf.begin(), buf.begin()+c);
}
writeLog(tmp);
tmp = evaluateMsg(tmp);
writeLog(tmp);
//I assume this send call is hanging
if(send(fd, tmp.c_str(), tmp.size(), 0) < 0)
writeLog("Could not write message back!");
close(fd);
And this is the client code:
//connect(); is successful
//send(); as well - the recv(); call is hanging forever
while((c = recv(sockfd, (char*)&buf[0], 1024, 0)))
{
if(c == -1)
{
cout<<"Error";
break;
}
else if(c == 0)
break;
tmp.append(buf.begin(), buf.begin()+c);
}
Please note that the code is heavily cut down for the sake of simplicity and readability (especially the code to daemonize and create the actual AF_UNIX socket (which are both successful)).
UPDATE:
I could verify that the client-side recv() call is never returning, which means that the daemon-side send() call is hanging. Why?
I don't see any reason that the daemon side recv() loop will end. Why would recv() return 0 or -1 if the socket is still open?
You should understand when the client finished sending data on the application level, the content should make it clear, and then finish the recv() loop and continue to the send() part of the server.
Alright, the solution was pretty simple.
#selalerer was right about the return value of recv() which leads to this working code snippet:
while((c = recv(fd, (char*)&buf[0], bufferSize, 0)))
{
if(c == -1)
/* handle error */
tmp.append(buf.begin(), buf.begin()+c);
if(c < bufferSize)
//no more to read, therefore stop reading
break;
}

How to catch a "connection reset by peer" error in C socket?

I have a C++ and Qt application which part of it implements a C socket client. Some time ago by app crashed because something happened with the server; the only thing I got from that crash was a message in Qt Creator's Application Output stating
recv_from_client: Connection reset by peer
I did some research on the web about this "connection reset by peer" error and while some threads here in SO and other places did managed to explain what is going on, none of them tells how to handle it - that is, how can I "catch" the error and continue my application without a crash (particularly the method where I read from the server is inside a while loop, so I'ld like to stop the while loop and enter in another place of my code that will try to re-establish the connection).
So how can I catch this error to handle it appropriately? Don't forget that my code is actually C++ with Qt - the C part is a library which calls the socket methods.
EDIT
Btw, the probable method from which the crash originated (given the "recv_from_client" part of the error message above) was:
int hal_socket_read_from_client(socket_t *obj, u_int8_t *buffer, int size)
{
struct s_socket_private * const socket_obj = (struct s_socket_private *)obj;
int retval = recv(socket_obj->client_fd, buffer, size, MSG_DONTWAIT); //last = 0
if (retval < 0)
perror("recv_from_client");
return retval;
}
Note: I'm not sure if by the time this error occurred, the recv configuration was with MSG_DONTWAIT or with 0.
Just examine errno when read() returns a negative result.
There is normally no crash involved.
while (...) {
ssize_t amt = read(sock, buf, size);
if (amt > 0) {
// success
} else if (amt == 0) {
// remote shutdown (EOF)
} else {
// error
// Interrupted by signal, try again
if (errno == EINTR)
continue;
// This is fatal... you have to close the socket and reconnect
// handle errno == ECONNRESET here
// If you use non-blocking sockets, you also have to handle
// EWOULDBLOCK / EAGAIN here
return;
}
}
It isn't an exception or a signal. You can't catch it. Instead, you get an error which tells you that the connection has been resetted when trying to work on that socket.
int rc = recv(fd, ..., ..., ..., ...);
if (rc == -1)
{
if (errno == ECONNRESET)
/* handle it; there isn't much to do, though.*/
else
perror("Error while reading");
}
As I've written, there isn't much you can do. If you're using some I/O multiplexer, you may want to remove that file descriptor from further monitoring.

standard C++ TCP socket, connect fails with EINTR when using std::async

I am having trouble using the std::async to have tasks execute in parallel when the task involves a socket.
My program is a simple TCP socket server written in standard C++ for Linux. When a client connects, a dedicated port is opened and separate thread is started, so each client is serviced in their own thread.
The client objects are contained in a map.
I have a function to broadcast a message to all clients. I originally wrote it like below:
// ConnectedClient is an object representing a single client
// ConnectedClient::SendMessageToClient opens a socket, connects, writes, reads response and then closes socket
// broadcastMessage is the std::string to go out to all clients
// iterate through the map of clients
map<string, ConnectedClient*>::iterator nextClient;
for ( nextClient = mConnectedClients.begin(); nextClient != mConnectedClients.end(); ++nextClient )
{
printf("%s\n", nextClient->second->SendMessageToClient(broadcastMessage).c_str());
}
I have tested this and it works with 3 clients at a time. The message gets to all three clients (one at a time), and the response string is printed out three times in this loop. However, it is slow, because the message only goes out to one client at a time.
In order to make it more efficient, I was hoping to take advantage of std::async to call the SendMessageToClient function for every client asynchronously. I rewrote the code above like this:
vector<future<string>> futures;
// iterate through the map of clients
map<string, ConnectedClient*>::iterator nextClient;
for ( nextClient = mConnectedClients.begin(); nextClient != mConnectedClients.end(); ++nextClient )
{
printf("start send\n");
futures.push_back(async(launch::async, &ConnectedClient::SendMessageToClient, nextClient->second, broadcastMessage, wait));
printf("end send\n");
}
vector<future<string>>::iterator nextFuture;
for( nextFuture = futures.begin(); nextFuture != futures.end(); ++nextFuture )
{
printf("start wait\n");
nextFuture->wait();
printf("end wait\n");
printf("%s\n", nextFuture->get().c_str());
}
The code above functions as expected when there is only one client in the map. That you see "start send" quickly followed by "end send", quickly followed by "start wait" and then 3 seconds later (I have a three second sleep on the client response side to test this) you see the trace from the socket read function that the response comes in, and then you see "end wait"
The problem is that when there is more than one client in the map. In the part of the SendMessageToClient function that opens and connects to the socket, it fails in the code identified below:
// connected client object has a pipe open back to the client for sending messages
int clientSocketFileDescriptor;
clientSocketFileDescriptor = socket(AF_INET, SOCK_STREAM, 0);
// set the socket timeouts
// this part using setsockopt is omitted for brevity
// host name
struct hostent *server;
server = gethostbyname(mIpAddressOfClient.c_str());
if (server == 0)
{
close(clientSocketFileDescriptor);
return "";
}
//
struct sockaddr_in clientsListeningServerAddress;
memset(&clientsListeningServerAddress, 0, sizeof(struct sockaddr_in));
clientsListeningServerAddress.sin_family = AF_INET;
bcopy((char*)server->h_addr, (char*)&clientsListeningServerAddress.sin_addr.s_addr, server->h_length);
clientsListeningServerAddress.sin_port = htons(mPortNumberClientIsListeningOn);
// The connect function fails !!!
if ( connect(clientSocketFileDescriptor, (struct sockaddr *)&clientsListeningServerAddress, sizeof(clientsListeningServerAddress)) < 0 )
{
// print out error code
printf("Connected client thread: fail to connect %d \n", errno);
close(clientSocketFileDescriptor);
return response;
}
The output reads: "Connected client thread: fail to connect 4".
I looked this error code up, it is explained thus:
#define EINTR 4 /* Interrupted system call */
I searched around on the internet, all I found were some references to system calls being interrupted by signals.
Does anyone know why this works when I call my send message function one at a time, but it fails when the send message function is called using async? Does anyone have a different suggestion how I should send a message to multiple clients?
First, I would try to deal with the EINTR issue. connect ( ) has been interrupted (this is the meaning of EINTR) and does not try again because you are using and asynch descriptor.
What I usually do in such a circumstance is to retry: I wrap the function (connect in this case) in a while cycle. If connect succeeds I break out of the cycle. If it fails, I check the value of errno. If it is EINTR I try again.
Mind that there are other values of errno that deserve a retry (EWOULDBLOCK is one of them)

Socket can't accept connections when non-blocking?

EDIT: Messed up my pseudo-coding of the accept call, it now reflects what I'm actually doing.
I've got two sockets going. I'm trying to use send/recv between the two. When the listening socket is blocking, it can see the connection and receive it. When it's nonblocking, I put a busy wait in (just to debug this) and it times out, always with the error EWOULDBLOCK. Why would the listening socket not be able to see a connection that it could see when blocking?
The code is mostly separated in functions, but here's some pseudo-code of what I'm doing.
int listener = -2;
int connector = -2;
int acceptedSocket = -2;
getaddrinfo(port 27015, AI_PASSIVE) results loop for listener socket
{
if (listener socket() == 0)
{
if (listener bind() == 0)
if (listener listen() == 0)
break;
listener close(); //if unsuccessful
}
}
SetBlocking(listener, false);
getaddrinfo("localhost", port 27015) results loop for connector socket
{
if (connector socket() == 0)
{
if (connector connect() == 0)
break; //if connect successful
connector close(); //if unsuccessful
}
}
loop for 1 second
{
acceptedSocket = listener accept();
if (acceptedSocket > 0)
break; //if successful
}
This just outputs a huge list errno of EWOULDBLOCK before ultimately ending the timeout loop. If I output the file descriptor for the accepted socket in each loop interation, it is never assigned a file descriptor.
The code for SetBlocking is as so:
int SetBlocking(int sockfd, bool blocking)
{
int nonblock = !blocking;
return ioctl(sockfd,
FIONBIO,
reinterpret_cast<int>(&nonblock));
}
If I use a blocking socket, either by calling SetBlocking(listener, true) or removing the SetBlocking() call altogether, the connection works no problem.
Also, note that this connection with the same implementation works in Windows, Linux, and Solaris.
Because of the tight loop you are not letting the OS complete your request. That's the difference between VxWorks and others - you basically preempt your kernel.
Use select(2) or poll(2) to wait for the connection instead.

Properly writing to a nonblocking socket in C++

I'm having a strange problem while attempting to transform a blocking socket server into a nonblocking one. Though the message was only received once when being sent with blocking sockets, using nonblocking sockets the message seems to be received an infinite number of times.
Here is the code that was changed:
return ::write(client, message, size);
to
// Nonblocking socket code
int total_sent = 0, result = -1;
while( total_sent < size ) {
// Create a temporary set of flags for use with the select function
fd_set working_set;
memcpy(&working_set, &master_set, sizeof(master_set));
// Check if data is available for the socket - wait 1 second for timeout
timeout.tv_sec = 1;
timeout.tv_usec = 0;
result = select(client + 1, NULL, &working_set, NULL, &timeout);
// We are able to write - do so
result = ::write(client, &message[total_sent], (size - total_sent));
if (result == -1) {
std::cerr << "An error has occured while writing to the server."
<< std::endl;
return result;
}
total_sent += result;
}
return 0;
EDIT: The initialization of the master set looks like this:
// Private member variables in header file
fd_set master_set;
int sock;
...
// Creation of socket in class constructor
sock = ::socket(PF_INET, socket_type, 0);
// Makes the socket nonblocking
fcntl(sock,F_GETFL,0);
FD_ZERO(&master_set);
FD_SET(sock, &master_set);
...
// And then when accept is called on the socket
result = ::accept(sock, NULL, NULL);
if (result > 0) {
// A connection was made with a client - change the master file
// descriptor to note that
FD_SET(result, &master_set);
}
I have confirmed that in both cases, the code is only being called once for the offending message. Also, the client side code hasn't changed at all - does anyone have any recommendations?
fcntl(sock,F_GETFL,0);
How does that make the socket non-blocking?
fcntl(sock, F_SETFL, O_NONBLOCK);
Also, you are not checking if you can actually write to the socket non-blocking style with
FD_ISSET(client, &working_set);
I do not believe that this code is really called only once in the "non blocking" version (quotes because it is not really non-blocking yet as Maister pointed out, look here), check again. If the blocking and non blocking versions are consistent, the non blocking version should return total_sent (or size). With return 0 instead caller is likely to believe nothing was sent. Which would cause infinite sending... is it not what's happening ?
Also your "non blocking" code is quite strange. You seem to use select to make it blocking anyway... Ok, with a timeout of 1s, but why don't you make it really non blocking ? ie: remove all the select stuff and test for error case in write() with errno being EWOULDBLOCK. select or poll are for multiplexing.
Also you should check errors for select and use FD_ISSET to check if socket is really ready. What if the 1 s timeout really happen ? Or if select is stopped by some interruption ? And if an error occurs in write, you should also write which error, that is much more useful than your generic message. But I guess this part of code is still far from finished.
As far as I understand your code it should probably look somewhat like that (if the code is running in an unique thread or threaded, or forking when accepting a connection would change details):
// Creation of socket in class constructor
sock = ::socket(PF_INET, socket_type, 0);
fcntl(sock, F_SETFL, O_NONBLOCK);
// And then when accept is called on the socket
result = ::accept(sock, NULL, NULL);
if (result > 0) {
// A connection was made with a client
client = result;
fcntl(client, F_SETFL, O_NONBLOCK);
}
// Nonblocking socket code
result = ::write(client, &message[total_sent], (size - total_sent));
if (result == -1) {
if (errno == EWOULDBLOCK){
return 0;
}
std::cerr << "An error has occured while writing to the server."
<< std::endl;
return result;
}
return size;