Should I implement my own TCP/IP socket timeouts?

Should I implement my own TCP/IP socket timeouts? - c++

The software I'm working on needs to be able to connect to many servers in a short period of time, using TCP/IP. The software runs under Win32. If a server does not respond, I want to be able to quickly continue with the next server in the list.
Sometimes when a remote server does not respond, I get a connection timeout error after roughly 20 seconds. Often the timeout comes quicker.
My problem is that these 20 seconds hurts the performance of my software, and I would like my software to give up sooner (after say 5 seconds). I assume that the TCP/IP stack (?) in Windows automatically adjusts the timeout based on some parameters?
Is it sane to override this timeout in my application, and close the socket if I'm unable to connect within X seconds?
(It's probably irrelevant, but the app is built using C++ and uses I/O completion ports for asynchronous network communication)

If you use IO completion ports and async operations, why do you need to wait for a connect to complete before continuing with the next server on the list? Use ConnectEx and pass in an overlapped structure. This way the individual server connect time will no add up, the total connect time is the max server connect time not the sum.

On Linux you can
int syncnt = 1;
int syncnt_sz = sizeof(syncnt);
setsockopt(sockfd, IPPROTO_TCP, TCP_SYNCNT, &syncnt, syncnt_sz);
to reduce (or increase) the number of SYN retries per connect per socket. Unfortunately, it's not portable to Windows.
As for your proposed solution: closing a socket while it is still in connecting state should be fine, and it's probably the easiest way. But since it sounds like you're already using asynchronous completions, can you simply try to open four connections at a time? If all four time out, at least it will only take 20 seconds instead of 80.

All configurable TCP/IP parameters for Windows are here
See TcpMaxConnectRetransmissions

You might consider trying to open many connections at once (each with its own socket), and then work with the one that responds first. The others can be closed.
You could do this with non-blocking open calls, or with blocking calls and threads. Then the lag waiting for a connection to open shouldn't be any more than is minimally nessecary.

You have to be careful when you override the socket timeout. If you are too aggressive and attempt to connect to many servers very quickly then the windows TCP/IP stack will assume your application is an internet worm and throttle it down. If this happens, then the performance of your application will become even worse.
The details of when exactly the throttling back occurs is not advertised, but the timeout you propose ( 5 seconds ) should be OK, in my experience.
The details that are available about this can be found here

Related

How to send and receive data up to SO_SNDTIMEO and SO_RCVTIMEO without corrupting connection?

I am currently planning how to develop a man in the middle network application for TCP server that would transfer data between server and client. It would behave as regular client for server and server for remote client without modifying any data. It will be optionally used to detect and measure how long server or client is not able to receive data that is ready to be received in situation when connection is inactive.
I am planning to use blocking send and recv functions. Before any data transfer I would call a setsockopt function to set SO_SNDTIMEO and SO_RCVTIMEO to about 10 - 20 miliseconds assuming it will force blocking send and recv functions to return early in order to let another active connection data to be routed. Running thread per connection looks too expensive. I would not use async sockets here because I can not find guarantee that they will get complete in a parts of second especially when large data amount is being sent or received. High data delays does not look good. I would use very small buffers here but calling function for each received byte looks overkill.
My next assumption would be that is safe to call send or recv later if it has previously terminated by timeout and data was received less than requested.
But I am confused by contradicting information available at msdn.
send function
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740149%28v=vs.85%29.aspx
If no error occurs, send returns the total number of bytes sent, which
can be less than the number requested to be sent in the len parameter.
SOL_SOCKET Socket Options
https://msdn.microsoft.com/en-us/library/windows/desktop/ms740532%28v=vs.85%29.aspx
SO_SNDTIMEO - The timeout, in milliseconds, for blocking send calls.
The default for this option is zero, which indicates that a send
operation will not time out. If a blocking send call times out, the
connection is in an indeterminate state and should be closed.
Are my assumptions correct that I can use these functions like this? Maybe there is more effective way to do this?
Thanks for answers

While you MIGHT implement something along the ideas you have given in your question, there are preferable alternatives on all major systems.
Namely:
kqueue on FreeBSD and family. And on MAC OSX.
epoll on linux and related types of operating systems.
IO completion ports on Windows.
Using those technologies allows you to process traffic on multiple sockets without timeout logics and polling in an efficient, reactive manner. They all can be considered successors of the ancient select() function in socket API.
As for the quoted documentation for send() in your question, it is not really confusing or contradicting. Useful network protocols implement a mechanism to create "backpressure" for situations where a sender tries to send more data than a receiver (and/or the transport channel) can accomodate for. So, an application can only provide more data to send() if the network stack has buffer space ready for it.
If, for example an application tries to send 3Kb worth of data and the tcp/ip stack has only room for 800 bytes, send() might succeed and return that it used 800 bytes of the 3k offered bytes.
The basic approach to forwarding the data on a connection is: Do not read from the incoming socket until you know you can send that data to the outgoing socket. If you read greedily (and buffer on application layer), you deprive the communication channel of its backpressure mechanism.
So basically, the "send capability" should drive the receive actions.
As for using timeouts for this "middle man", there are 2 major scenarios:
You know the sending behavior of the sender application. I.e. if it has some intent on sending any data within your chosen receive timeout at any time. Some applications only send sporadically and any chosen value for a receive timeout could be wrong. Even if it is supposed to send at a specific time interval, your timeouts will cause trouble once someone debugs the sending application.
You want the "middle man" to work for unknown applications (which must not use some encryption for middle man to have a chance, of course). There, you cannot pick any "adequate" timeout value because you know nothing about the sending behavior of the involved application(s).

As a previous poster has suggested, I strongly urge you to reconsider the design of your server so that it employs an asynchronous I/O strategy. This may very well require that you spend significant time learning about each operating systems' preferred approach. It will be time well-spent.
For anything other than a toy application, using blocking I/O in the manner that you suggest will not perform well. Even with short timeouts, it sounds to me as though you won't be able to service new connections until you have completed the work for the current connection. You may also find (with short timeouts) that you're burning more CPU time spinning waiting for work to do than actually doing work.
A previous poster wisely suggested taking a look at Windows I/O completion ports. Take a look at this article I wrote in 2007 for Dr. Dobbs. It's not perfect, but I try to do a decent job of explaining how you can design a simple server that uses a small thread pool to handle potentially large numbers of connections:
Windows I/O Completion Ports
http://www.drdobbs.com/cpp/multithreaded-asynchronous-io-io-comple/201202921
If you're on Linux/FreeBSD/MacOSX, take a look at libevent:
Libevent
http://libevent.org/
Finally, a good, practical book on writing TCP/IP servers and clients is "Practical TCP/IP Sockets in C" by Michael Donahoe and Kenneth Calvert. You could also check out the W. Richard Stevens texts (which cover the topic completely for UNIX.)
In summary, I think you should take some time to learn more about asynchronous socket I/O and the established, best-of-breed approaches for developing servers.
Feel free to private message me if you have questions down the road.

Interminent Delays in C++ Tcp Communication in Linux

I have a device which sends data every 20 milliseconds over TCP. I have an application which connects to this device, starts the socket communication. My Application listens on a seperate thread and reads the data as fast as data is ready, puts data aside, and some other thread processes it. Device is directly connected to the computer via ethernet cable.
I see a strange problem and I am trying to understand the reason why, Almost once in every minute, it takes approximately 50 milliseconds to receive a packet from the device. I do a blocking read which will try reading for a second, and will finish as fast as data is ready, normally it takes approximately 20 ms as I would expect, but like I said before there are times it takes 50 ms even though it is very rare(1 in 3000). What I noticed is the packets after late packet arrives immediately, so it makes me think that there's some delay on the network layer. I also examined the timestamps of the packets(which is given by the device), they are consistenly increasing by 20 ms's.
Is it normal to see delays like that when the device is directly connected to the computer, Since it is TCP there might be lots of effort under the hood(CRC checks, out of order packages, retransmissions, etc). I still want to find an alternative way to prevent this delay than accepting the fact that it might happen.
Any insights will be greatly appreciated.

It's probably result of Nagle's algorithm which is turned on by default in TCP/IP socket.
Use setsockopt() to set the TCP_NODELAY flag on socket that sends data to turn it off.

Most efficient way to handle a client connection (socket programming)

For every single tutorials and examples I have seen on the internet for Linux/Unix socket tutorials, the server side code always involves an infinite loop that checks for client connection every single time.
Example:
http://www.thegeekstuff.com/2011/12/c-socket-programming/
http://tldp.org/LDP/LG/issue74/tougher.html#3.2
Is there a more efficient way to structure the server side code so that it does not involve an infinite loop, or code the infinite loop in a way that it will take up less system resource?

the infinite loop in those examples is already efficient. the call to accept() is a blocking call: the function does not return until there is a client connecting to the server. code execution for the thread which called the accept() function is halted, and does not take any processing power.
think of accept() as a call to join() or like a wait on a mutex/lock/semaphore.
of course, there are many other ways to handle incoming connection, but those other ways deal with the blocking nature of accept(). this function is difficult to cancel, so there exists non-blocking alternatives which will allow the server to perform other actions while waiting for an incoming connection. one such alternative is using select(). other alternatives are less portable as they involve low-level operating system calls to signal the connection through a callback function, an event or any other asynchronous mechanism handled by the operating system...

For C++ you could look into boost.asio. You could also look into e.g. asynchronous I/O functions. There is also SIGIO.
Of course, even when using these asynchronous methods, your main program still needs to sit in a loop, or the program will exit.

The infinite loop is there to maintain the server's running state, so when a client connection is accepted, the server won't quit immediately afterwards, instead it'll go back to listening for another client connection.
The listen() call is a blocking one - that is to say, it waits until it receives data. It does this is an extremely efficient way, using zero system resources (until a connection is made, of course) by making use of the operating systems network drivers that trigger an event (or hardware interrupt) that wakes the listening thread up.

Here's a good overview of what techniques are available - The C10K problem.

When you are implementing a server that listens for possibly infinite connections, there is imo no way around some sort of infinite loops. Usually this is not a problem at all, because when your socket is not marked as non-blocking, the call to accept() will block until a new connection arrives. Due to this blocking, no system resources are wasted.
Other libraries that provide like an event-based system are ultimately implemented in the way described above.

In addition to what has already been posted, it's fairly easy to see what is going on with a debugger. You will be able to single-step through until you execute the accept() line, upon which the 'sigle-step' highlight will disappear and the app will run on - the next line is not reached. If you put a breadkpoint on the next line, it will not fire until a client connects.

We need to follow the best practice on writing client -server programing. The best guide I can recommend you at this time is The C10K Problem . There are specific stuff we need to follow in this case. We can go for using select or poll or epoll. Each have there own advantages and disadvantages.
If you are running you code using latest kernel version, then I would recommend to go for epoll. Click to see sample program to understand epoll.
If you are using select, poll, epoll then you will be blocked until you get an event / trigger so that your server will not run in to infinite loop by consuming your system time.
On my personal experience, I feel epoll is the best way to go further as I observed the threshold of my server machine on having 80k ACTIVE connection was very less on comparing it will select and poll. The load average of my server machine was just 3.2 on having 80k active connection :)
On testing with poll, I find my server load average went up to 7.8 on reaching 30k active client connection :(.

Best approach for writing a Linux Server in C (phtreads, select or fork ? )

i got a very specific question about server programming in UNIX (Debian, kernel 2.6.32). My goal is to learn how to write a server which can handle a huge amount of clients. My target is more than 30 000 concurrent clients (even when my college mentions that 500 000 are possible, which seems QUIIITEEE a huge amount :-)), but i really don't know (even whats possible) and that is why I ask here. So my first question. How many simultaneous clients are possible? Clients can connect whenever they want and get in contact with other clients and form a group (1 group contains a maximum of 12 clients). They can chat with each other, so the TCP/IP package size varies depending on the message sent.
Clients can also send mathematical formulas to the server. The server will solve them and broadcast the answer back to the group. This is a quite heavy operation.
My current approach is to start up the server. Than using fork to create a daemon process. The daemon process binds the socket fd_listen and starts listening. It is a while (1) loop. I use accept() to get incoming calls.
Once a client connects I create a pthread for that client which will run the communication. Clients get added to a group and share some memory together (needed to keep the group running) but still every client is running on a different thread. Getting the access to the memory right was quite a hazzle but works fine now.
In the beginning of the programm i read out the /proc/sys/kernel/threads-max file and according to that i create my threads. The amount of possible threads according to that file is around 5000. Far away from the amount of clients i want to be able to serve.
Another approach i consider is to use select () and create sets. But the access time to find a socket within a set is O(N). This can be quite long if i have more than a couple of thousands clients connected. Please correct me if i am wrong.
Well, i guess i need some ideas :-)
Groetjes
Markus
P.S. i tag it for C++ and C because it applies to both languages.

The best approach as of today is an event loop like libev or libevent.
In most cases you will find that one thread is more than enough, but even if it isn't, you can always have multiple threads with separate loops (at least with libev).
Libev[ent] uses the most efficient polling solution for each OS (and anything is more efficient than select or a thread per socket).

You'll run into a couple of limits:
fd_set size: This is changable at compile time, but has quite a low limit by default, this affects select solutions.
Thread-per-socket will run out of steam far earlier - I suggest putting the longs calculations in separate threads (with pooling if required), but otherwise a single thread approach will probably scale.
To reach 500,000 you'll need a set of machines, and round-robin DNS I suspect.
TCP ports shouldn't be a problem, as long as the server doesn't connection back to the clients. I always seem to forget this, and have to be reminded.
File descriptors themselves shouldn't be too much of a problem, I think, but getting them into your polling solution may be more difficult - certainly you don't want to be passing them in each time.

I think you can use the event model(epoll + worker threads pool) to solve this problem.
first listen and accept in main thread, if the client connects to the server, the main thread distribute the client_fd to one worker thread, and add epoll list, then this worker thread will handle the reqeust from the client.
the number of worker thread can be configured by the problem, and it must be no more the the 5000.

Ensuring data is being read with async_read

I am currently testing my network application in very low bandwidth environments. I currently have code that attempts to ensure that the connection is good by making sure I am still receiving information.
Traditionally I have done this by recording the timestamp in my ReadHandler function so that each time it gets called I know I have received data on the socket. With very low bandwidths this isn't sufficient because my ReadHandler is not getting called frequently enough.
I was toying around with the idea of writing my own completion condition function (right now I am using tranfer_at_least(1)) thinking it would get called more frequently and I could record my timestamp there, but I was wondering if there wasn't some other more standard way to go about this.

We had a similar issue in production: some of our connections may be idle for days, but we must detect if the remote is dead ASAP.
We solved it by enabling the TCP_KEEPALIVE option:
boost::asio::socket_base::keep_alive option(true);
mSocketTCP.set_option(option);
which had to be accompanied by new startup script that writes sensible values to /proc/sys/net/ipv4/tcp_keepalive_* which have very long timeouts by default (on LInux)

You can use the read_some method to get partial reads, and deal with the book keeping. This is more efficient than transfer_at_least(1), but you still have to keep track of what is going on.
However, a cleaner approach is just to use a concurrent deadline_timer. If the timer goes off before you are finished, then is taking too long and cancel whatever is going on. If not, just stop the timer and continue. Something like:
boost::asio::deadline_timer t;
t.expires_from_now(boost::posix_time::seconds(20));
t.async_wait(bind(&Class::timed_out, this, _1));
// Do stuff.
if (!t.cancel()) {
// Timer went off, abort
}
// And the timeout method
void Class::timed_out(error_code const& error)
{
if (error == boost::asio::error::operation_aborted) return;
// Deal with the timeout, close the socket, etc.
}

I don't know how to handle low latency of network from within application. Can you be sure if it's network latency, or if peer server or peer application busy and react slowly. Does it matter if it network/server/application quilt?
Even if you can discover network latency and find it's big, what are you going to do?
You can not improve the situation.
Consider other critical case which is a subset of what you're trying to handle - network is down (e.g. you disconnect cable from your machine). Since it a subset of your problem you want to handle it too.
Let's examine the network down effect on active TCP connection.How can you discover your active TCP connection is still alive? Calling send() will success, but it merely says that the message queued in TCP outgoing queue in kernel. TCP stack will try to send it, but since TCP ACK won't be sent back, TCP stack on your side will try to resend it again and again. You can see your message in netstat output (Send-Q column).
I'm aware of the following ways to deal with it:
One standard way is TCP keep alive proposed #Cubby.
Another way is to implement Keep Alive mechanism. Send Keep Alive req message and peer is obligated to send back Keep Alive ack message.
If you don't receive ack message after predefined timeout, try to send Keep Alive req N more times (e.g. N=2). If still no success, close the socket and open it again. If peer server is not available you'll not be abable to open connection, since TCP 3 way handshake requires peer to respond.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js