Set connect timout using setsockopt in Linux

Set connect timout using setsockopt in Linux - c++

I am writing a linux Qt5/c++ app that tries to connect to a peer using a QTcpSocket. I call
tcpsocket->connectToHost(address,port,options)
When the peer is available it works great and connects immediately. However, when the peer is not available: The first time I call the above, the connect waits 1 minute before I receive a SocketTimeoutError (5). Then, every subsequent call to connect might wait a second before I receive a ConnectionRefusedError (0), or might wait a full minute (depending on the system tested).
Is there a setsockopt I can use to reduce the time waiting for initial connect?
I should point out that I already set some socket options in order to quickly notify me of a lost connection (see below). Hopefully these aren't causing the 1 minute initial connection error delay:
int enableKeepAlive = 1;
setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &enableKeepAlive, sizeof(enableKeepAlive));
int maxIdle = 5; /* seconds */
setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &maxIdle, sizeof(maxIdle));
int count = 3; // send up to 3 keepalive packets out, then disconnect if no response
setsockopt(fd, SOL_TCP, TCP_KEEPCNT, &count, sizeof(count));
int interval = 2; // send a keepalive packet out every 2 seconds (after the 5 second idle period)
setsockopt(fd, SOL_TCP, TCP_KEEPINTVL, &interval, sizeof(interval));

Rather than rely on setsockopt(), why don't you instead set your socket to non-blocking mode and perform an asynchronous connect(). You'd then block on select(), poll() or whatever event demultiplexing mechanism you are using, setting the timeout to whatever you desire. Once it becomes writable you know the connection is complete.

Related

C++ decrease modbus_connect timeout

I'd like to try 10 immediate modbus connections. However, everytime I fail to connect, and I have to wait for 2 minutes for the next connection because the previous modbus_connect call is still actively listening. So, if I fail to connect 10 times, I have to wait for 20 minutes.
int max_tries = 10;
int retries = 0;
while ((modbus_connect(ctx) == -1) && retries < max_retries){
retries++;
// wait 2 mins
// I need to remove this waiting time
}
Can someone help me to reduce the time for timeout? I'm using Libmodbus v3.1.6

If you are talking about TCP connections, the behavior of your program may be correct.
There are several things involved here because you say you establish the connection and "is actively listening." Can't be both.
If you establish the connection, the only thing I can think of is that normally the connect (low level, not modbus) will try several times (after connecting) to send SYN packets (more or less two minutes) and if there is no response drop the connection.
That may be one problem.
If you are listening, you have to set the SO_REUSEADDR socket option.
In any case, you should verify errno and get the error description to know what is happening to your connection.

C++ tcp socket connection retry method

After developing a sample client server application which can exchange some data, I'm trying to implement the retry mechanism into it. Currently my application is following below protocol:
Client connects to server (non blocking mode) with 3 secs timeout and with 2 reties.
Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.
Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.
For the above implementation code sections look likes something below:
connect() and select() for opening connection
select() and send() for data send
select() and recv() for data receiving
Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retring the same methods. But not recalling connect().
I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.
Any suggestions?
Example code for receiving socket data
bool CTCPCommunication::ReceiveSocketData(char* pchBuff, int iBuffLen)
{
bool bReturn = true;
//check whether the socket is ready to receive
fd_set stRead;
FD_ZERO(&stRead);
FD_SET(m_hSocket, &stRead);
int iRet = select(0, &stRead, NULL, NULL, &m_stTimeout);
//if socket is not ready this line will be hit after 3 sec timeout and go to the end
//if it is ready control will go inside the read loop and reads data until data ends or
//socket error is getting triggered continuously for more than 3 secs.
if ((iRet > 0) && (FD_ISSET(m_hSocket, &stRead)))
{
DWORD dwStartTime = GetTickCount();
DWORD dwCurrentTime = 0;
while ((iBuffLen-1) > 0)
{
int iRcvLen = recv(m_hSocket, pchBuff, iBuffLen-1, 0);
dwCurrentTime = GetTickCount();
//receive failed due to socket error
if (iRcvLen == SOCKET_ERROR)
{
if((dwCurrentTime - dwStartTime) >= SOCK_TIMEOUT_SECONDS * 1000)
{
WRITELOG("Call to socket API 'recv' failed after 3 secs continuous retries, error: %d", WSAGetLastError());
bReturn = false;
break;
}
}
//connection closed by remote host
else if (iRcvLen == 0)
{
WRITELOG("recv() returned zero - time to do something: %d", WSAGetLastError());
break;
}
pchBuff += iRcvLen;
iBuffLen -= iRcvLen;
}
}
else
{
WRITELOG("Call to API 'select' failed inside 'ReceiveSocketData', error: %d", WSAGetLastError());
bReturn = false;
}
return bReturn;
}

Currently my application is following below protocol:
Client connects to server (non blocking mode) with 3 secs timeout and with 2 retries.
You can't retry a connection. You have to close the socket whose connect attempt failed, create a new socket, and call connect() again.
Start sending data from client with fixed length. Send has some error checking whether it is sending the complete data or not.
This isn't necessary in blocking mode: the POSIX standard guarantees that a blocking-mode send() will send all the data, or fail with an error.
Receive response (timeout: 3secs) from server and verify that. If incorrect response received, re-send the data and wait for response. Repeat this for two times if failed.
This is a bad idea. Most probably all the data willl arrive including all the retries, or none of it. You need to make sure that your transactions are idempotent if you use this technique. You also need to pay close attention to the actual timeout period. 3 seconds is not adequate in general. A starting point is double the expected service time.
For the above implementation code sections look likes something below:
connect() and select() for opening connection
select() and send() for data send
select() and recv() for data receiving
You don't need the select() in blocking mode. You can just set a read timeout with SO_RCVTIMEO.
Now I'm making the retries based on return types of the socket functions, and if send() or recv() fails I'm retrying the same methods. But not recalling connect().
I tested the thing by restarting the server in between the data transfer, and as a result client fails to communicate with the server and it quits after several retries, I believe this is happening as because there is no connect() call on retry methods.
If that was true you would get an error that said so.

Recv() call hangs after remote host terminates

My problem is that I have a thread that is in a recv() call. The remote host suddenly terminates (without a close() socket call) and the recv() call continues to block. This is obviously not good because when I am joining the threads to close the process (locally) this thread will never exit because it is waiting on a recv that will never come.
So my question is what method do people generally consider to be the best way to deal with this issue? There are some additional things of note that should be known before answering:
There is no way for me to ensure that the remote host closes the socket prior to exit.
This solution cannot use external libraries (such as boost). It must use standard libraries/features of C++/C (preferably not C++0x specific).
I know this has likely been asked in the past but id like to get someones take as to how to correct this issue properly (without doing something super hacky which I would have done in the past).
Thanks!

Assuming you want to continue to use blocking sockets, you can use the SO_RCVTIMEO socket option:
SO_RCVTIMEO and SO_SNDTIMEO
Specify the receiving or sending timeouts until reporting an
error. The parameter is a struct timeval. If an input or out-
put function blocks for this period of time, and data has been
sent or received, the return value of that function will be the
amount of data transferred; if no data has been transferred and
the timeout has been reached then -1 is returned with errno set
to EAGAIN or EWOULDBLOCK just as if the socket was specified to
be nonblocking. If the timeout is set to zero (the default)
then the operation will never timeout.
So, before you begin receiving:
struct timeval timeout = { timo_sec, timo_usec };
int r = setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &timeout, sizeof(timeout));
assert(r == 0); /* or something more user friendly */
If you are willing to use non-blocking I/O, then you can use poll(), select(), epoll(), kqueue(), or whatever the appropriate event dispatching mechanism is for your system. The reason you need to use non-blocking I/O is that you need to allow the system call to recv() to return to notify you that there is no data in the socket's input queue. The example to use is a little bit more involved:
for (;;) {
ssize_t bytes = recv(s, buf, sizeof(buf), MSG_DONTWAIT);
if (bytes > 0) { /* ... */ continue; }
if (bytes < 0) {
if (errno == EWOULDBLOCK) {
struct pollfd p = { s, POLLIN, 0 };
int r = poll(&p, 1, timo_msec);
if (r == 1) continue;
if (r == 0) {
/*...handle timeout */
/* either continue or break, depending on policy */
}
}
/* ...handle errors */
break;
}
/* connection is closed */
break;
}

You can use TCP keep-alive probes to detect if the remote host is still reachable. When keep-alive is enabled, the OS will send probes if the connection has been idle for too long; if the remote host doesn't respond to the probes, then the connection is closed.
On Linux, you can enable keep-alive probes by setting the SO_KEEPALIVE socket option, and you can configure the parameters of the keep-alive with the TCP_KEEPCNT, TCP_KEEPIDLE, and TCP_KEEPINTVL socket options. See tcp(7) and socket(7) for more info on those.
Windows also uses the SO_KEEPALIVE socket option for enabling keep-alive probes, but for configuring the keep-alive parameters, use the SIO_KEEPALIVE_VALS ioctl.

You could use select()
From http://linux.die.net/man/2/select
int select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, struct timeval *timeout);
select() blocks until the first event (read ready, write ready, or exception) on one or more file descriptors or a timeout occurs.

sockopts and select are probably the ideal choices. An additional option that you should consider as a backup is to send your process a signal (for example using the alarm() call). This should force any syscall in progress to exit and set errno to EINTR.

TCP connection accepted, but writing data causes it to use a stale connection

The server (192.168.1.5:3001), is running Linux 3.2, and is designed to only accept one connection at a time.
The client (192.168.1.18), is running Windows 7. The connection is a wireless connection. Both programs are written in C++.
It works great 9 in 10 connect/disconnect cycles. The tenth-ish (randomly happens) connection has the server accept the connection, then when it later actually writes to it (typically 30+s later), according to Wireshark (see screenshot) it looks like it's writing to an old stale connection, with a port number that the client has FINed (a while ago), but the server hasn't yet FINed. So the client and server connections seems to get out of sync - the client makes new connections, and the server tries writing to the previous one. Every subsequent connection attempt fails once it gets in this broken state. The broken state can be initiated by going beyond the maximum wireless range for a half a minute (as before 9 in 10 cases this works, but it sometimes causes the broken state).
Wireshark screenshot behind link
The red arrows in the screenshot indicate when the server started sending data (Len != 0), which is the point when the client rejects it and sends a RST to the server. The coloured dots down the right edge indicate a single colour for each of the client port numbers used. Note how one or two dots appear well after the rest of the dots of that colour were (and note the time column).
The problem looks like it's on the server's end, since if you kill the server process and restart, it resolves itself (until next time it occurs).
The code is hopefully not too out-of-the-ordinary. I set the queue size parameter in listen() to 0, which I think means it only allows one current connection and no pending connections (I tried 1 instead, but the problem was still there). None of the errors appear as trace prints where "// error" is shown in the code.
// Server code
mySocket = ::socket(AF_INET, SOCK_STREAM, 0);
if (mySocket == -1)
{
// error
}
// Set non-blocking
const int saveFlags = ::fcntl(mySocket, F_GETFL, 0);
::fcntl(mySocket, F_SETFL, saveFlags | O_NONBLOCK);
// Bind to port
// Union to work around pointer aliasing issues.
union SocketAddress
{
sockaddr myBase;
sockaddr_in myIn4;
};
SocketAddress address;
::memset(reinterpret_cast<Tbyte*>(&address), 0, sizeof(address));
address.myIn4.sin_family = AF_INET;
address.myIn4.sin_port = htons(Port);
address.myIn4.sin_addr.s_addr = INADDR_ANY;
if (::bind(mySocket, &address.myBase, sizeof(address)) != 0)
{
// error
}
if (::listen(mySocket, 0) != 0)
{
// error
}
// main loop
{
...
// Wait for a connection.
fd_set readSet;
FD_ZERO(&readSet);
FD_SET(mySocket, &readSet);
const int aResult = ::select(getdtablesize(), &readSet, NULL, NULL, NULL);
if (aResult != 1)
{
continue;
}
// A connection is definitely waiting.
const int fileDescriptor = ::accept(mySocket, NULL, NULL);
if (fileDescriptor == -1)
{
// error
}
// Set non-blocking
const int saveFlags = ::fcntl(fileDescriptor, F_GETFL, 0);
::fcntl(fileDescriptor, F_SETFL, saveFlags | O_NONBLOCK);
...
// Do other things for 30+ seconds.
...
const int bytesWritten = ::write(fileDescriptor, buffer, bufferSize);
if (bytesWritten < 0)
{
// THIS FAILS!! (but succeeds the first ~9 times)
}
// Finished with the connection.
::shutdown(fileDescriptor, SHUT_RDWR);
while (::close(fileDescriptor) == -1)
{
switch(errno)
{
case EINTR:
// Break from the switch statement. Continue in the loop.
break;
case EIO:
case EBADF:
default:
// error
return;
}
}
}
So somewhere between the accept() call (assuming that is exactly the point when the SYN packet is sent), and the write() call, the client's port gets changed to the previously-used client port.
So the question is: how can it be that the server accepts a connection (and thus opens a file descriptor), and then sends data through a previous (now stale and dead) connection/file descriptor? Does it need some sort of option in a system call that's missing?

I'm submitting an answer to summarize what we've figured out in the comments, even though it's not a finished answer yet. It does cover the important points, I think.
You have a server that handles clients one at a time. It accepts a connection, prepares some data for the client, writes the data, and closes the connection. The trouble is that the preparing-the-data step sometimes takes longer than the client is willing to wait. While the server is busy preparing the data, the client gives up.
On the client side, when the socket is closed, a FIN is sent notifying the server that the client has no more data to send. The client's socket now goes into FIN_WAIT1 state.
The server receives the FIN and replies with an ACK. (ACKs are done by the kernel without any help from the userspace process.) The server socket goes into the CLOSE_WAIT state. The socket is now readable, but the server process doesn't notice because it's busy with its data-preparation phase.
The client receives the ACK of the FIN and goes into FIN_WAIT2 state. I don't know what's happening in userspace on the client since you haven't shown the client code, but I don't think it matters.
The server process is still preparing data for a client that has hung up. It's oblivious to everything else. Meanwhile, another client connects. The kernel completes the handshake. This new client will not be getting any attention from the server process for a while, but at the kernel level the second connection is now ESTABLISHED on both ends.
Eventually, the server's data preparation (for the first client) is complete. It attempts to write(). The server's kernel doesn't know that the first client is no longer willing to receive data because TCP doesn't communicate that information! So the write succeeds and the data is sent out (packet 10711 in your wireshark listing).
The client gets this packet and its kernel replies with RST because it knows what the server didn't know: the client socket has already been shut down for both reading and writing, probably closed, and maybe forgotten already.
In the wireshark trace it appears that the server only wanted to send 15 bytes of data to the client, so it probably completed the write() successfully. But the RST arrived quickly, before the server got a chance to do its shutdown() and close() which would have sent a FIN. Once the RST is received, the server won't send any more packets on that socket. The shutdown() and close() are now executed, but don't have any on-the-wire effect.
Now the server is finally ready to accept() the next client. It begins another slow preparation step, and it's falling further behind schedule because the second client has been waiting a while already. The problem will keep getting worse until the rate of client connections slows down to something the server can handle.
The fix will have to be for you to make the server process notice when a client hangs up during the preparation step, and immediately close the socket and move on to the next client. How you will do it depends on what the data preparation code actually looks like. If it's just a big CPU-bound loop, you have to find some place to insert a periodic check of the socket. Or create a child process to do the data preparation and writing, while the parent process just watches the socket - and if the client hangs up before the child exits, kill the child process. Other solutions are possible (like F_SETOWN to have a signal sent to the process when something happens on the socket).

Aha, success! It turns out the server was receiving the client's SYN, and the server's kernel was automatically completing the connection with another SYN, before the accept() had been called. So there definitely a listening queue, and having two connections waiting on the queue was half of the cause.
The other half of the cause was to do with information which was omitted from the question (I thought it was irrelevant because of the false assumption above). There was a primary connection port (call it A), and the secondary, troublesome connection port which this question is all about (call it B). The proper connection order is A establishes a connection (A1), then B attempts to establish a connection (which would become B1)... within a time frame of 200ms (I already doubled the timeout from 100ms which was written ages ago, so I thought I was being generous!). If it doesn't get a B connection within 200ms, then it drops A1. So then B1 establishes a connection with the server's kernel, waiting to be accepted. It only gets accepted on the next connection cycle when A2 establishes a connection, and the client also sends a B2 connection. The server accepts the A2 connection, then gets the first connection on the B queue, which is B1 (hasn't been accepted yet - the queue looked like B1, B2). That is why the server didn't send a FIN for B1 when the client had disconnected B1. So the two connections the server has are A2 and B1, which are obviously out of sync. It tries writing to B1, which is a dead connection, so it drops A2 and B1. Then the next pair are A3 and B2, which are also invalid pairs. They never recover from being out of sync until the server process is killed and the TCP connections are all reset.
So the solution was to just change a timeout for waiting on the B socket from 200ms to 5s. Such a simple fix that had me scratching my head for days (and fixed it within 24 hours of putting it on stackoverflow)! I also made it recover from stray B connections by adding socket B to the main select() call, and then accept()ing it and close()ing it immediately (which would only happen if the B connection took longer than 5s to establish). Thanks #AlanCurry for the suggestion of adding it to the select() and adding the puzzle piece about the listen() backlog parameter being a hint.

ioctlsocket or recv takes more time to execute in windows socket programming?

In socket programming, some data is sent to the server, and as soon as server receives it sends the acknowledgement response message. it is more than 1 byte, so i check for more than one byte check while receiving, here i am losing around 120-200ms. Which is a very big issue. As client need to send ack back for this acknowledgement. I have sniffed to see data is arrived to my IP at the same time when server has sent. but recv or ioctlsocket(to check more than 1 byte is ready to be read) takes time to read more than one byte. How can i resolve this. The code is as follows.
DWORD RecvCount = 0;
char szBuff1[2048];
bool stop = false;
while(!stop)
{
ioctlsocket(*socket, FIONREAD, &RecvCount);
if(RecvCount > 1)
stop = true;
}
int Res = recv(*socket, szBuff1, RecvCount,0);

You should disable the Nagle algorithm on windows as otherwise the socket will sit on your data until the buffer is full (or at least wait a couple of hundred milliseconds before sending it anyway).
You do this by setting the TCP_NODELAY socket option:
int flag = 1;
int result = setsockopt(m_Socket,IPPROTO_TCP,TCP_NODELAY,(char *) &flag,sizeof(int));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js