Recv() call hangs after remote host terminates - c++

My problem is that I have a thread that is in a recv() call. The remote host suddenly terminates (without a close() socket call) and the recv() call continues to block. This is obviously not good because when I am joining the threads to close the process (locally) this thread will never exit because it is waiting on a recv that will never come.
So my question is what method do people generally consider to be the best way to deal with this issue? There are some additional things of note that should be known before answering:
There is no way for me to ensure that the remote host closes the socket prior to exit.
This solution cannot use external libraries (such as boost). It must use standard libraries/features of C++/C (preferably not C++0x specific).
I know this has likely been asked in the past but id like to get someones take as to how to correct this issue properly (without doing something super hacky which I would have done in the past).
Thanks!

Assuming you want to continue to use blocking sockets, you can use the SO_RCVTIMEO socket option:
SO_RCVTIMEO and SO_SNDTIMEO
Specify the receiving or sending timeouts until reporting an
error. The parameter is a struct timeval. If an input or out-
put function blocks for this period of time, and data has been
sent or received, the return value of that function will be the
amount of data transferred; if no data has been transferred and
the timeout has been reached then -1 is returned with errno set
to EAGAIN or EWOULDBLOCK just as if the socket was specified to
be nonblocking. If the timeout is set to zero (the default)
then the operation will never timeout.
So, before you begin receiving:
struct timeval timeout = { timo_sec, timo_usec };
int r = setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout));
assert(r == 0); /* or something more user friendly */
If you are willing to use non-blocking I/O, then you can use poll(), select(), epoll(), kqueue(), or whatever the appropriate event dispatching mechanism is for your system. The reason you need to use non-blocking I/O is that you need to allow the system call to recv() to return to notify you that there is no data in the socket's input queue. The example to use is a little bit more involved:
for (;;) {
ssize_t bytes = recv(s, buf, sizeof(buf), MSG_DONTWAIT);
if (bytes > 0) { /* ... */ continue; }
if (bytes < 0) {
if (errno == EWOULDBLOCK) {
struct pollfd p = { s, POLLIN, 0 };
int r = poll(&p, 1, timo_msec);
if (r == 1) continue;
if (r == 0) {
/*...handle timeout */
/* either continue or break, depending on policy */
}
}
/* ...handle errors */
break;
}
/* connection is closed */
break;
}

You can use TCP keep-alive probes to detect if the remote host is still reachable. When keep-alive is enabled, the OS will send probes if the connection has been idle for too long; if the remote host doesn't respond to the probes, then the connection is closed.
On Linux, you can enable keep-alive probes by setting the SO_KEEPALIVE socket option, and you can configure the parameters of the keep-alive with the TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_KEEPINTVL socket options. See tcp(7) and socket(7) for more info on those.
Windows also uses the SO_KEEPALIVE socket option for enabling keep-alive probes, but for configuring the keep-alive parameters, use the SIO_KEEPALIVE_VALS ioctl.

You could use select()
From http://linux.die.net/man/2/select
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
select() blocks until the first event (read ready, write ready, or exception) on one or more file descriptors or a timeout occurs.

sockopts and select are probably the ideal choices. An additional option that you should consider as a backup is to send your process a signal (for example using the alarm() call). This should force any syscall in progress to exit and set errno to EINTR.

Related

Can I write to a closed socket and forcefully correct the broken pipe error?

I have an application that runs on a large number of processors. On processor 0, I have a function that writes data to a socket if it is open. This function runs in a loop in a separate thread on processor 0, i.e. processor 0 is responsible for its own workload and has an extra thread running the communication on the socket.
//This function runs on a loop, called every 1.5 seconds
void T_main_loop(const int& client_socket_id, bool* exit_flag)
{
//Check that socket still connected.
int error_code;
socklen_t error_code_size = sizeof(error_code);
getsockopt(client_socket_id, SOL_SOCKET, SO_ERROR, &error_code, &error_code_size);
if (error_code == 0)
{
//send some data
int valsend = send(client_socket_id , data , size_of_data , 0);
}
else
{
*(exit_flag) = false; //This is used for some external logic.
//Can I fix the broklen pipe here somehow?
}
}
When the client socket is closed, the program should just ignore the error, and this is standard behavior as far as I am aware.
However, I am using an external library (PETSc) that is somehow detecting the broken pipe error and closing the entire parallel (MPI) environment:
[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket
I would like to leave the configuration of this library completely untouched if at all possible. Open to any robust workarounds that are possible.
By default, the OS sends the thread SIGPIPE if it tries to write into a (half) closed pipe or socket.
One option to disable the signal is to do signal(SIGPIPE, SIG_IGN);.
Another option is to use MSG_NOSIGNAL flag for send, e.g. send(..., MSG_NOSIGNAL);.

UnrealEngine4: Recv function would keep blocking when TCP server shutdown

I use a blocking FSocket in client-side that connected to tcp server, if there's no message from server, socket thread would block in function FScoket::Recv(), if TCP server shutdown, socket thread is still blocking in this function. but when use blocking socket of BSD Socket API, thread would pass from recv function and return errno when TCP server shutdown, so is it the defect of FSocket?
uint32 HRecvThread::Run()
{
uint8* recv_buf = new uint8[RECV_BUF_SIZE];
uint8* const recv_buf_head = recv_buf;
int readLenSeq = 0;
while (Started)
{
//if (TcpClient->Connected() && ClientSocket->GetConnectionState() != SCS_Connected)
//{
// // server disconnected
// TcpClient->SetConnected(false);
// break;
//}
int32 bytesRead = 0;
//because use blocking socket, so thread would block in Recv function if have no message
ClientSocket->Recv(recv_buf, readLenSeq, bytesRead);
.....
//some logic of resolution for tcp msg bytes
.....
}
delete[] recv_buf;
return 0
}
As I expected, you are ignoring the return code, which presumably indicates success or failure, so you are looping indefinitely (not blocking) on an error or end of stream condition.
NB You should allocate the recv_buf on the stack, not dynamically. Don't use the heap when you don't have to.
There is a similar question on the forums in the UE4 C++ Programming section. Here is the discussion:
https://forums.unrealengine.com/showthread.php?111552-Recv-function-would-keep-blocking-when-TCP-server-shutdown
Long story short, in the UE4 Source, they ignore EWOULDBLOCK as an error. The code comments state that they do not view it as an error.
Also, there are several helper functions you should be using when opening the port and when polling the port (I assume you are polling since you are using blocking calls)
FSocket::Connect returns a bool, so make sure to check that return
value.
FSocket::GetLastError returns the UE4 Translated error code if an
error occured with the socket.
FSocket::HasPendingData will return a value that informs you if it
is safe to read from the socket.
FSocket::HasPendingConnection can check to see your connection state.
FSocket::GetConnectionState will tell you your active connection state.
Using these helper functions for error checking before making a call to FSocket::Recv will help you make sure you are in a good state before trying to read data. Also, it was noted in the forum posts that using the non-blocking code worked as expected. So, if you do not have a specific reason to use blocking code, just use the non-blocking implementation.
Also, as a final hint, using FSocket::Wait will block until your socket is in a desirable state of your choosing with a timeout, i.e. is readable or has data.

How to Break C++ Accept Function?

When doing socket programming, with multi-threading,
if a thread is blocked on Accept Function,
and main thread is trying to shut down the process,
how to break the accept function in order to pthread_join safely?
I have vague memory of how to do this by connection itself to its own port in order to break the accept function.
Any solution will be thankful.
Cheers
Some choices:
a) Use non-blocking
b) Use AcceptEx() to wait on an extra signal, (Windows)
c) Close the listening socket from another thread to make Accept() return with an error/exception.
d) Open a temporary local connection from another thread to make Accept() return with the temp connection
The typical approach to this is not to use accept() unless there is something to accept! The way to do this is to poll() the corresponding socket with a suitable time-out in a loop. The loop checks if it is meant to exit because a suitably synchronized flag was set.
An alternative is to send the blocked thread a signal, e.g., using pthread_kill(). This gets out of the blocked accept() with a suitable error indication. Again, the next step is to check some flag to see if the thread is meant to exit. My preference is the first approach, though.
Depending on your system, if it is available, I would use a select function to wait for the server socket to have a read, indicating a socket is trying to connect. The amount of time to time to wait for a connection can be set/adjusted to to what every time you want to wait for a client to connect(infinity, to seconds, to 0 which will just check and return). The return status needs to be checked to see if the time limit was reached (no socket is trying to connect), or if there is something waiting to be serviced (your server socket indicating there is a client which would like to connect). You can then execute the accept knowing there is a socket to connect based on the returned status.
If available I would use a select function with a timeout in a loop to achieve this functionality.
as Glenn suggested
The select function with a timeout value will wait for a socket to connect for a set period of time. If a socket attempts to connect it can be accepted during that period. By looping this select with a timeout it is possible to check for new connections until the break condition is met.
Here is an example:
std::atomic<bool> stopThread;
void theThread ( std::atomic<bool> & quit )
{
struct timeval tv;
int activity;
...
while(!quit)
{
// reset the time value for select timeout
tv.tv_sec = 0;
tv.tv_usec = 1000000;
...
//wait for an activity on one of the sockets
activity = select( max_sd + 1 , &readfds , NULL , NULL , &tv);
if ((activity < 0) && (errno!=EINTR))
{
printf("select error");
}
if (FD_ISSET(master_socket, &readfds))
{
if ((new_socket = accept(master_socket, (struct sockaddr *)&address, (socklen_t*)&addrlen))<0)
{
perror("accept");
exit(EXIT_FAILURE);
}
...
}
}
int main(int argc, char** argv)
{
...
stopThread = false;
std::thread foo(theThread, std::ref(stopThread));
...
stopThread = true;
foo.join();
return 0;
}
A more complete example of 'Select' http://www.binarytides.com
I am pretty new to C++ so I am sure my code and answer can be improved.
Sounds like what you are looking for is this: You set a special flag variable known to the listening/accepting socket, and then let the main thread open a connection to the listening/accepting socket. The listening/accepting socket/thread has to check the flag every time it accepts a connection in order to know when to shut down.
Typically if you want to do multi-threaded networking, you would spawn a thread once a connection is made (or ready to be made). If you want to lower the overhead, a thread pool isn't too hard to implement.

How to avoid DOS attack in this code?

I have a code written in C/C++ that look like this:
while(1)
{
//Accept
struct sockaddr_in client_addr;
int client_fd = this->w_accept(&client_addr);
char client_ip[64];
int client_port = ntohs(client_addr.sin_port);
inet_ntop(AF_INET, &client_addr.sin_addr, client_ip, sizeof(client_ip));
//Listen first string
char firststring[512];
memset(firststring,0,512);
if(this->recvtimeout(client_fd,firststring,sizeof(firststring),u->timeoutlogin) < 0){
close(client_fd);
}
if(strcmp(firststring,"firststr")!=0)
{
cout << "Disconnected!" << endl;
close(client_fd);
continue;
}
//Send OK first string
send(client_fd, "OK", 2, 0);
//Listen second string
char secondstring[512];
memset(secondstring,0,512);
if(this->recvtimeout(client_fd,secondstring,sizeof(secondstring),u->timeoutlogin) < 0){
close(client_fd);
}
if(strcmp(secondstring,"secondstr")!=0)
{
cout << "Disconnected!!!" << endl;
close(client_fd);
continue;
}
//Send OK second string
send(client_fd, "OK", 2, 0);
}
}
So, it's dossable.
I've write a very simple dos script in perl that takedown the server.
#Evildos.pl
use strict;
use Socket;
use IO::Handle;
sub dosfunction
{
my $host = shift || '192.168.4.21';
my $port = 1234;
my $firststr = 'firststr';
my $secondstr = 'secondstr';
my $protocol = getprotobyname('tcp');
$host = inet_aton($host) or die "$host: unknown host";
socket(SOCK, AF_INET, SOCK_STREAM, $protocol) or die "socket() failed: $!";
my $dest_addr = sockaddr_in($port,$host);
connect(SOCK,$dest_addr) or die "connect() failed: $!";
SOCK->autoflush(1);
print SOCK $firststr;
#sleep(1);
print SOCK $secondstr;
#sleep(1);
close SOCK;
}
my $i;
for($i=0; $i<30;$i++)
{
&dosfunction;
}
With a loop of 30 times, the server goes down.
The question is: is there a method, a system, a solution that can avoid this type of attack?
EDIT: recvtimeout
int recvtimeout(int s, char *buf, int len, int timeout)
{
fd_set fds;
int n;
struct timeval tv;
// set up the file descriptor set
FD_ZERO(&fds);
FD_SET(s, &fds);
// set up the struct timeval for the timeout
tv.tv_sec = timeout;
tv.tv_usec = 0;
// wait until timeout or data received
n = select(s+1, &fds, NULL, NULL, &tv);
if (n == 0){
return -2; // timeout!
}
if (n == -1){
return -1; // error
}
// data must be here, so do a normal recv()
return recv(s, buf, len, 0);
}
I don't think there is any 100% effective software solution to DOS attacks in general; no matter what you do, someone could always throw more packets at your network interface than it can handle.
In this particular case, though, it looks like your program can only handle one connection at a time -- that is, incoming connection #2 won't be processed until connection #1 has completed its transaction (or timed out). So that's an obvious choke point -- all an attacker has to do is connect to your server and then do nothing, and your server is effectively disabled for (however long your timeout period is).
To avoid that you would need to rewrite the server code to handle multiple TCP connections at once. You could either do that by switching to non-blocking I/O (by passing O_NONBLOCK flag to fcntl()), and using select() or poll() or etc to wait for I/O on multiple sockets at once, or by spawning multiple threads or sub-processes to handle incoming connections in parallel, or by using async I/O. (I personally prefer the first solution, but all can work to varying degrees). In the first approach it is also practical to do things like forcibly closing any existing sockets from a given IP address before accepting a new socket from that IP address, which means that any given attacking computer could only tie up a maximum of one socket on your server at a time, which would make it harder for that person to DOS your machine unless he had access to a number of client machines.
You might read this article for more discussion about handling many TCP connections at the same time.
The main issue with DOS and DDOS attacks is that they play on your weakness: namely the fact that there is a limited memory / number of ports / processing resources that you can use to provide the service. Even if you have infinite scalability (or close) using something like the Amazon farms, you'll probably want to limit it to avoid the bill going through the roof.
At the server level, your main worry should be to avoid a crash, by imposing self-preservation limits. You can for example set a maximum number of connections that you know you can handle and simply refuse any other.
Full strategies will include specialized materials, like firewalls, but there is always a way to play them and you will have to live with that.
For example of nasty attacks, read about Slow Loris on wikipedia.
Slowloris tries to keep many connections to the target web server open and hold them open as long as possible. It accomplishes this by opening connections to the target web server and sending a partial request. Periodically, it will send subsequent HTTP headers, adding to—but never completing—the request. Affected servers will keep these connections open, filling their maximum concurrent connection pool, eventually denying additional connection attempts from clients.
There are many variants of DOS attacks, so a specific answer is quite difficult.
Your code leaks a filehandle when it succeeds, this will eventually make you run out of fds to allocate, making accept() fail.
close() the socket when you're done with it.
Also, to directly answer your question, there is no solution for DOS caused by faulty code other than correcting it.
This isn't a cure-all for DOS attacks, but using non-blocking sockets will definitely help for scalability. And if you can scale-up, you can mitigate many DOS attacks. This design changes includes setting both the listen socket used in accept calls and the client connection sockets to non-blocking.
Then instead of blocking on a recv(), send(), or an accept() call, you block on either a poll, epoll, or select call - then handle that event for that connection as much as you are able to. Use a reasonable timeout (e.g. 30 seconds) such that you can wake up from polling call to sweep and close any connections that don't seem to be progressing through your protocol chain.
This basically requires every socket to have it's own "connection" struct that keeps track of the state of that connection with respect to the protocol you implement. It likely also means keeping a (hash) table of all sockets so they can be mapped to their connection structure instance. It also means "sends" are non-blocking as well. Send and recv can return partial data amounts anyway.
You can look at an example of a non-blocking socket server on my project code here. (Look around line 360 for the start of the main loop in Run method).
An example of setting a socket into non-blocking state:
int SetNonBlocking(int sock)
{
int result = -1;
int flags = 0;
flags = ::fcntl(sock, F_GETFL, 0);
if (flags != -1)
{
flags |= O_NONBLOCK;
result = fcntl(sock , F_SETFL , flags);
}
return result;
}
I would use boost::asio::async_connector from boost::asio functionality to create multiple connection handlers (works both on single and multi-threaded environment). In the single threaded case, you just need to run from time to time boost::asio::io_service::run in order to make sure communications have time to be processed
The reason why you want to use asio is because its very good at handling asynchronous communication logic, so it won't block (as in your case) if a connection gets blocked. You can even arrange how much processing you want to devote to opening new connections, while keep serving existing ones

How to set time out for receiving message fromt client of server with non-blocking mode?

I have a server with 2 connections SOCKET which is connected with clients and I set this server is non-blocking mode which don't stop when sending or recieving message. I want to set time out for a SOCKET of each connections, but if I use the following code:
void getMessage(SOCKET connectedSocket, int time){
string error = R_ERROR;
// Using select in winsock
fd_set set;
timeval tm;
FD_ZERO(&set);
FD_SET(connectedSocket, &set);
tm.tv_sec = time; // time
tm.tv_usec = 0; // 0 millis
switch (select(connectedSocket, &set, 0, 0, &tm))
{
case 0:
// timeout
this->disconnect();
break;
case 1:
// Can recieve some data here
return this->recvMessage();
break;
default:
// error - handle appropriately.
break;
}
return error;
}
My server is not none-blocking mode any more! I have to wait until the end of 1st connection's time out to get message from the 2nd connection! That's not what I expect! So, is there any way to set time out for non-blocking mode? Or I have to handle it myself?
select is a demultiplexing mechanism. While you are using it to determine when data is ready on a single socket or timeout, it was actually designed to return data ready status on many sockets (hence the fd_set). Conceptually, it is the same with poll, epoll and kqueue. Combined with non-blocking I/O, these mechanisms provide an application writer with the tools to implement a single threaded concurrent server.
In my opinion, your application does not need that kind of power. Your application will only be handling two connections, and you are already using one thread per connection. I believe leaving the socket in blocking I/O mode is more appropriate.
If you insist on non-blocking mode, my suggestion is to replace the select call with something else. Since what you want from select is an indication of read readiness or timeout for a single socket, you can achieve a similar effect with recv passed with appropriate parameters and with the appropriate timeout set on the socket.
tm.tv_sec = time;
tm.tv_usec = 0;
setsockopt(connectedSocket, SOL_SOCKET, SO_RCVTIMEO, (char *)&tm, sizeof(tm));
char c;
swtich (recv(connectedSocket, &c, 1, MSG_PEEK|MSG_WAITALL)) {
case -1:
if (errno == EAGAIN) {
// handle timeout ...
} else {
// handle other error ...
}
break;
case 0: // FALLTHROUGH
default:
// handle read ready ...
break;
}
From man recv:
MSG_PEEK -- This flag causes the receive operation to return data from the beginning of the receive queue without removing that data from the queue. Thus, a subsequent receive call will return the same data.
MSG_WAITALL (since Linux 2.2) -- This flag requests that the operation block until the full request is satisfied. However, the call may still return less data than requested if a signal is caught, an error or disconnect occurs, or the next data to be received is of a different type than that returned.
As to why select is behaving in the way you observed. While the select call is thread-safe, it is likely fully guarded against reentrancy. So, one thread's call to select will only come in after another thread's call completes (the calls to select are serialized). This is inline with its function as a demultiplexer. It's purpose is to serve as a single arbiter for which connections are ready. As such, it wants to be controlled by a single thread.