In the context of webservices, I've seen the term "TCP connection churn" used. Specifically Twitter finagle has ways to avoid it happening. How does it happen? What does it mean?
There might be multiple uses for this term, but I've always seen it used in cases where many TCP connections are being made in a very short space of time, causing performance issues on the client and potentially the server as well.
This often occurs when client code is written which automatically connects on a TCP failure of any sort. If this failure happens to be a connection failure before the connection is even made (or very early on in the protocol exchange) then the client can go into a near-busy loop constantly making connections. This can cause performance issues on the client side - firstly that there is a process in a very busy loop sucking up CPU cycles, and secondly that each connection attempt consumes a client-side port number - if this goes fast enough the software can wrap around when they hit the maximum port number (as a port is only a 16-bit number this certainly isn't impossible).
While writing robust code is a worthy aim, this simple "automatic retry" approach is a little too naive. You can see similar problems in other contexts - e.g. a parent process continually restarting a child process which immediately crashes. One common mechanism to avoid it is some sort of increasing back-off. So, when the first connection fails you immediately reconnect. If it fails again within a short time (e.g. 30 seconds) then you wait, say, 2 seconds before reconnecting. If it fails again within 30 seconds, you wait 4 seconds, and so on. Read the Wikipedia article on exponential backoff (or this blog post might be more appropriate for this application) for more background on this technique.
This approach has the advantage that it doesn't overwhelm the client or server, but it also means the client can still recover without manual intervention (which is especially crucial for software on an unattended server, for example, or in large clusters).
In cases where recovery time is critical, simple rate-limiting of TCP connection creation is also quite possible - perhaps no more than 1 per second or something. If there are many clients per server, however, this more simplistic approach can still leave the server's swamped by the load of accepting then closing a high connection rate.
One thing to note if you plan to employ exponential backoff - I suggest imposing a maximum wait time, or you might find that prolonged failures leave a client taking too long to recover once the server end does start accepting connections again. I would suggest something like 5 minutes as a reasonable maximum in most circumstances, but of course it depends on the application.
Related
I am following this code of an C++ http server. One of the requirement is concurrency. That seems to be taken care of by the following chunk of code:
if(true) {
if(pthread_create(&thread, 0, handle_request, pcliefd) < 0) {
perror("pthread_create()");
}
} else {
handle_request(pcliefd);
}
I then come across a simpler code in this article. pthread is not used here. The response is handle by a write nested inside while(1). I suppose this simpler code does not meet the concurrency requirement? Anyways, what is the point of using thread to handle concurrency if the response is so simple? Is there something bigger behind this requirement?
The goal of your first linked question was to demonstrate a minimum of concurrency. Not a useful amount, just the bare minimum. Your second link doesn't even have that, as you correctly assumed.
Real webservers will be more complex. For starters, you don't want too much concurrency; there are only a limited number of CPU cores in your computer. See std::thread::hardware_conccurency
Anyways, what is the point of using thread to handle concurrency if the response is so simple?
This is actually a good question. The problem you face, when you want to handle a large number of clients is, that the read() and write() system calls are usually blocking. That means, they block your current thread as long as they take to complete the requested operation.
Say you have two clients, that send a request to your single threaded, non-concurrent server. Client A belongs to some lonely guy in a mountain hut with a real slow internet connection. Your listen() call returns and your program calls the handler routine for client A. Now while the bits slowly trickle through the mountain cable and your handler routine waits for the request to be transmitted, a second client B connects to your server. This one belongs to a business man at his high speed office internet access.
The problem here is, that even if your response is so simple, the high speed client still has to wait until your handler routine returns and can process the next request. One slow client can slow down all the other clients, which is obviously not what you want.
You can solve that problem using two approaches:
(that is the attempt in your code) you create a new thread for each client. That way if a slow client is blocking the handling routine for a long time, the other clients are still able to proceed with their request. The problem here is that a large number of clients creates a large number of threads. Context switching thousands of threads can be a massive performance issue. So for a small number of concurrent clients this is fine, but for large scale high performance servers we need something better.
You use a non-blocking API of the operating system. How exactly that works is different between operating systems. And even on a single OS there might exist different such APIs. Ususally you want to use a platform independed library if you need this type of concurrency support. An excellent library here is Boost Asio.
The two approaches can be mixed. For the best performance you would want to have as many threads as you have processor cores. Each thread handles requests concurrently using and asynchronous (non-blocking) API. This is usually done with a worker pool and a task queue.
I am building a system that sends and receives UDP packets to multiple pieces of remote hardware.
A function mySend passes new information to send to a third-party API that I must use to construct the actual UDP datagram. The API locks a mutex during its work constructing and sending the datagram.
A function myRecv runs in a worker thread, repeatedly asking the third-party API to poll for new data. The API invokes a UDP-receive function which runs select and recvfrom to grab any responses from the remote hardware.
The thread that listens and handles incoming packets is problematic at the moment due to the design of the API I'm using to decode those packets, which locks its own mutex around the call to the UDP-receive function. But this function performs a blocking select.
The consequence is that the mutex is almost always locked by the receive thread and, in fact, the contention is so bad that mySend is practically never able to obtain the lock. The result is that the base thread is effectively deadlocked.
To fix this, I'm trying to justify making the listen socket non-blocking and performing a usleep between select calls where no data was available.
Now, if my blocking select had a 3-second timeout, that's not the same as performing a non-blocking select every 3 seconds (in the worst case) because of the introduction of latency in looking for and consequently handling incoming packets. So the usleep period has to be a lot lower, say 300-500ms.
My concern is mostly in the additional system calls — this is a lot more calls to select, and new calls to usleep. At times I will expect next to no incoming data for tens of seconds or even minutes, but there will also likely be periods during which I might expect to receive perhaps 40KB over a few seconds.
My first instinct, if this were all my own software, would be to tighten up the use of mutexes such that no locking was in place around select at all, and then there'd be no problem. But I'd like to avoid hacking about in the 3rd-party API if I don't have to.
Simple time-based profiling is not really enough at this stage because this mechanism needs to scale really well, and I don't have the means to test at scale right now. Consequently I'm trying to gather some anecdotal evidence in order to steer my decision-making.
Is moving to a non-blocking socket the right approach?
Or would I be better off hacking up the third-party API (which I'd rather not do) to tighten their mutex usage?
I, my team and the developers of the 3rd party library have all come to the conclusion that the hack is suitable enough for deployment, and outweighs the questions posed and disadvantages associated with my potential alternative workarounds.
The real solution is, of course, to push a proper design fix into the 3rd party library; this is a way off as it would be fairly extensive and nobody really cares enough, but it does give us the answer to this question.
Im designing a c++ client application that listens to multiple ports for streams of short messages. After reading up on ACE, POCO, boost::asio and all the Proactor like design patterns, I am about to start with boost::asio.
One thing I notice it a constant theme of using Asynchronous socket IO, yet have not read a good problem description that async io solves. Are all these design patterns based on the assumption of HTTP web server design?
Since web servers are the most common application for complex latency sensitive concurrent socket programming, im starting to wonder if most of these patterns/idioms are geared for this one application.
My application will listen to a handful of sockets for short and frequent messages. A separate thread will need to combine all messages for processing. One thing that I am looking at design patterns for, is separating the connection management from the data processing. I wish to have the connections to try reconnect after a disconnect and have the processing thread continue as if nothing happened. What design pattern is recommended here?
I dont see how async io will improve performance in my case.
You're on the right track. It's smart to ask "why" especially with all the hype around asynchronous and event driven. There are applications other than the web. Consider message queues and financial transactions like high frequency trading. Basically any time that waiting costs money or loses an opportunity to serve a customer is a candidate for async. The web is a big example because the network is so much faster than the database. As always, ask "does this make sense" for your app. Async adds a lot of complexity if you're not benefiting from it.
Your short, rapid messages may actually benefit a lot from async if the mean time between messages is comparable to the time required to process each message, especially if that processing includes persistence. But you don't have to rush into async. Instrument your blocking code and see whether you really have a bottleneck.
I hope this helps.
Using the blocking calls pattern would entail:
1. Listening on a socket vector of size N
2. When a message arrives, you wake up with a start in time K, find and start processing the message, employing a time T (it does not matter if the processing is offloaded to another thread: in this case T becomes your offloading time)
3. You finish examining the vector and GOTO 1
So you might say that if M messages arrive, and another message arrives during the K+M*T dispatching time, the M+1-th message will find itself waiting K+M*T time. Is this acceptable for your expected values of K (constant), M (function of traffic) and T (function of resources and system load)?
Asynchronous processing, well, actually does not exist. There will always be a "synchronous" IO loop somewhere, only it will be so well integrated in the kernel (or even hardware) that it will run 10-100x faster than your own, and therefore will be likely to scale better with large values of M. The delay is still in the form K1+M*T1, but this time K1 and T1 are much lower. Or maybe K1 is a bit higher and T1 is significantly lower: the architecture "scales better" for large values of M.
If your values of M are usually low, then the advantages of asynchronicity are proportionally smaller. In the absurd case when you only have one message in the application lifetime, synchronous or asynchronous makes next to no difference.
Take into account another factor: if the number of messages becomes really large, asynchronicity has its advantages; but if the messages themselves are independent (the changes caused by message A do not influence processing of message B), then you can remain synchronous and scale horizontally, preparing a number Z of "message concentrators" each receiving a fraction M/Z of the total traffic.
If the processing requires performing other calls to other services (cache, persistence, information retrieval, authentication...), increasing the T factor, then you'll be better off turning to multithreaded or even asynchronous. With multithreading you shave T to a fraction of its value (the dispatching time only). Asynchronous in a sense does the same, but shaving even more, and taking care of more programming boilerplate for you.
I am currently writing a bittorrent client. I am getting to the stage in my program where I need to start thinking about whether multiple threads would improve my program and how many I would need.
I assume that I would assign one thread to deal with the trackers because the program may be in contact with several (1-5 roughly) of them at once, but will only need to contact them in an interval assigned by the tracker (around 20 minutes), so won't be very intensive on the program.
The program will be in regular contact with numerous peers to download pieces of files from them. The following is taken from the Bittorrent Specification Wiki:
Implementer's Note: Even 30 peers is plenty, the official client version 3 in fact only actively forms new connections if it has less than 30 peers and will refuse connections if it has 55. This value is important to performance. When a new piece has completed download, HAVE messages (see below) will need to be sent to most active peers. As a result the cost of broadcast traffic grows in direct proportion to the number of peers. Above 25, new peers are highly unlikely to increase download speed. UI designers are strongly advised to make this obscure and hard to change as it is very rare to be useful to do so.
It suggests that I should be in contact with roughly 30 peers. What would be a good thread model to use for my Bittorrent Client? Obviously I don't want to assign a thread to each peer and each tracker, but I will probably need more than just the main thread. What do you suggest?
I don't see a lot of need for multithreading here. Having too many threads also means having a lot of communication between these to make sure everyone is doing the right thing at the right time.
For the networking, keep everything on one thread and just multiplex using nonblocking I/O. On Unix systems this would be a setup with select/poll (or platform-specific extensions such as epoll); on Windows this would be completion ports.
You can even add the disk I/O into this, which would make the communication between the threads trivial since there isn't any :-)
If you want to consider threads to be containers for separate components, the disk I/O could go into another thread. You could use blocking I/O in this case, since there isn't a lot of multiplexing anyway.
Likewise, in such a scenario, tracker handling could go into a different thread as well since it's a different component from peer handling. Same for DHT.
You might want to offload the checksum-checking to a separate thread. Not quite sure how complex this gets, but if there's significant CPU use involved then putting it away from the I/O stuff doesn't sound that bad.
As you tagged your question [C++] I suggest std:thread of C++11 . A nice tutorial (among lots of others) you find here.
Concerning the number of threads: You can use 30 threads without any problem and have them check whether there is something to do for them and putting them to sleep for a reasonable time between the checks. The operating system will take care of the rest.
I have a program in C++, using the standard socket API, running on Ubuntu 7.04, that holds open a socket to a server. My system lives behind a router. I want to figure out how long it could take to get a socket error once my program starts sending AFTER the router is cut off from the net.
That is, my program may go idle (waiting for the user). The router is disconnected from the internet, and then my program tries to communicate over that socket.
Obviously it's not going to know quickly, because TCP is quite adept at keeping a socket alive under adverse network conditions. This causes TCP to retry a lot of times, a lot of ways, before it finally gives up.
I need to establish some kind of 'worst case' time that I can give to the QA group (and the customer), so that they can test that my code goes into a proper offline state.
(for reference, my program is part of a pay at pump system for gas stations, and the server is the system that authorizes payment transactions. It's entirely possible for the station to be cut off from the net for a variety of reasons, and the customer just wants to know what to expect).
EDIT: I wasn't clear. There's no human being waiting on this thing, this is just for a back office notation of system offline. When the auth doesn't come back in 30 seconds, the transaction is over and the people are going off to do other things.
EDIT: I've come to the conclusion that the question isn't really answerable in the general case. The number of factors involved in determining how long a TCP connection takes to error out due to a downstream failure is too dependent on the exact equipment and failure for there to be a simple answer.
You should be able to use:
http://linux.die.net/man/2/getsockopt
with:
SO_RCVTIMEO and SO_SNDTIMEO
to determine the timeouts involved.
This link: http://linux.die.net/man/7/socket
talks about more options that may be of interest to you.
In my experience, just picking a time is usually a bad idea. Even when it sounds reasonable, arbitrary timeouts usually misbehave in practice. The general result is that the application becomes unusable when the environment falls outside of the norm.
Especially for financial transactions, this should be avoided. Perhaps providing a cancel button and some indication that the transaction is taking longer than expected would be a better solution.
I would twist the question around the other way: how long is a till operator prepared to stand there looking stupid in front of the customer before they say, of it must not be working lets to this the manual way.
So pick some time like 1 minute (assuming your network is not auto disconnect, and thus will reconnect when traffic occurs)
Then use that time for how long your program waits before giving up. Closing the socket etc. Displays error message. Maybe even a count down timer while waiting, so the till operator has an idea how much long the system is going to wait...
Then they know the transaction failed, and that it's manual time.
Otherwise depending on you IP stack, the worse case time-out could be 'never times-out'.
I think the best approach is not to try and determine the timeout being used, but to actually specify the timeout yourself.
Depending on your OS you can either:-
use setsockopt() with option SO_SNDTIMEO,
use non-blocking send() and then use select() with a timeout
use non-blocking send(), and have a timeout on receiving the expected data.