Non-blocking sockets multi-threaded receive model

Non-blocking sockets multi-threaded receive model - c++

I'm working on an multi-threaded server application for learning puposes. My problem right now is receiving the data.The first time I wrote the application I used WSAAsyncSelect, but I didn't like how it was working ( the window dependency was stupid in my opinion, even if you hid the window.). So I re-wrote it and now I have a thread that goes through the connected clients and checks if there is any data to be be received and pass it to the worker threads. This works for a small ammount of clients, but I think that for a larger amount it might delay other clients too much. A solution I've read about is having a thread for each client, but there are thread limitations. Another solution would be IOCP ( Windows ), but I need to find a good documentation, since the examples I found were a bit too ambigous ( I might be the problem here )
The language I'm using C/C++ under Microsoft Visual Studio 2013 ( WinSock, but I would like to write it in a multiplatform way )

If one thread is enough to handle all your clients, consider having N threads and distribute the clients (e.g. through hashing or in order of reception) among the available threads.
The key is that the overall number of threads (first in your process: polling+worker... but also in the whole system) must remain constant and not exceeding the number of processing resources (CPUs/cores)
This distinction between polling thread(s) and worker threads is the right way to go. Decoupling using queue structures, freeing the polling threads to do their job (i.e. polling) unhindered from the (arbitrary) logic running on the worker threads.

On Windows, IOCP is the standard model for scalable async IO. It solves all the problems you mention. It has a sane programming model (with a few API design errors). I believe there are self-contained samples. To learn IOCP I'd create a very simple chat server using this technology and try to get it absolutely right.
The select has the disadvantage that it does not scale (as you said).
If you want portability, look into async IO libraries like Boost asio. They use an IOCP-like model on all modern platforms. They are callback-driven.

Related

Designing a multi-client tcp server to process data

I am attempting to rewrite my current project to include more features and stability, and need some help designing it. Here is the jist of it (for linux):
TCP_SERVER receives connection (auth packet)
TCP_SERVER starts a new (thread/fork) to handle the new client
TCP_SERVER will be receiving many packets from client > which will be added to a circular buffer
A separate thread will be created for that client to process those packets and build a list of objects
Another thread should be created to send parts of the list of objects to another client
The reason to separate all the processing into threads is because server will be getting many packets and the processing wont be able to keep up (which needs to be quick, as its time sensitive) (im not sure if tcp will drop packets if the internal buffer gets too large?), and another thread to send to another client to keep the processing fast as possible.
So for each new connection, 3 threads should be created. 1 to receive packets, 1 to process them, and 1 to send the processed data to another client (which is technically the same person/ip just on a different device)
And i need help designing this, as how to structure this, what to use (forks/threads), what libraries to use.

Trying to do this yourself is going to cause you a world of pain. Focus on your actual application, and leverage an existing socket handling framework. For example, you said:
for each new connection, 3 threads should be created
That statement says the following:
1. You haven't done this before, at scale, and haven't realized the impact all these threads will have.
2. You've never benchmarked thread creation or synchronous operations.
3. The number of things that can go wrong with this approach is pretty overwhelming.
Give some serious thought to using an existing library that does most of this for you. Getting the scaffolding right around this can literally take years, and you're better off focusing on your code rather than all the random plumbing.
The Boost C++ libraries seem to have a nice Async C++ socket handling infrastructure. Combine this with some of the existing C++ thread pools and you could likely have a highly performant solution up fairly quickly.
I would also question your use of C++ for this. Java and C# both do highly scalable socket servers pretty well, and some of the higher level language tooling (Spring, Guarva, etc) can be very, very valuable. If you ever want to secure this, via TLS or another mechanism, you'll also probably find this much easier in Java or C# than in C++.
Some of the major things you'll care about:
1. True Async I/O will be a huge perf and scalability win. Try really hard to do this. The boost asio library looks pretty nice.
2. Focus on your features and stability, rather than building a new socket handling platform.
3. Threads are expensive, avoid creating them. Thread pools are your friend.

You plan to create one-or-more threads for every connection your server handles. Threads are not free, they come with a memory and CPU overhead, and when you have many active threads you also begin to have resource contention.
What usage pattern do you anticipate? Do you expect that when you have 8 connections, all 8 network threads will be consuming 100% of a cpu core pushing/pulling packets? Or do you expect them to have a relatively low turn-around?
As you add more threads, you will begin to have to spend more time competing for resources in things like mutexes etc.
A better pattern is to have one or more thread for network io - most os'es have mechanisms for saying "tell me when one or more of these network connections has io" which is an efficiency saving over having lots of individual threads all doing the same thing for just one connection.
Then for actual processing, spin up a pool of worker threads to do actual work, allowing you to minimize the competition for resources. You can monitor work load to determine if you need to spin up more to meet delivery requirements.
You might also want to look into something to implement the network IO infrastructure for you; I've had really good performance results with libevent but then I've only had to deal with very high performance/reliability networking systems.

Thread per connection vs Reactor pattern (with a thread pool)?

I want to write a simple multiplayer game as part of my C++ learning project.
So I thought, since I am at it, I would like to do it properly, as opposed to just getting-it-done.
If I understood correctly: Apache uses a Thread-per-connection architecture, while nginx uses an event-loop and then dedicates a worker [x] for the incoming connection. I guess nginx is wiser, since it supports a higher concurrency level. Right?
I have also come across this clever analogy, but I am not sure if it could be applied to my situation. The analogy also seems to be very idealist. I have rarely seen my computer run at 100% CPU (even with a umptillion Chrome tabs open, Photoshop and what-not running simultaneously)
Also, I have come across a SO post (somehow it vanished from my history) where a user asked how many threads they should use, and one of the answers was that it's perfectly acceptable to have around 700, even up to 10,000 threads. This question was related to JVM, though.
So, let's estimate a fictional user-base of around 5,000 users. Which approach should would be the "most concurrent" one?
A reactor pattern running everything in a single thread.
A reactor pattern with a thread-pool (approximately, how big do you suggest the thread pool should be?
Creating a thread per connection and then destroying the thread the connection closes.
I admit option 2 sounds like the best solution to me, but I am very green in all of this, so I might be a bit naive and missing some obvious flaw. Also, it sounds like it could be fairly difficult to implement.
PS: I am considering using POCO C++ Libraries. Suggesting any alternative libraries (like boost) is fine with me. However, many say POCO's library is very clean and easy to understand. So, I would preferably use that one, so I can learn about the hows of what I'm using.

Reactive Applications certainly scale better, when they are written correctly. This means
Never blocking in a reactive thread:
Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
If a tasks could take a long time it must be broken up (cooperative multitasking)
Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.
I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.
If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:
What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?
I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.
Best of luck learning C++ and with which ever choice you make.

Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.
http://www.kegel.com/c10k.html
The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:
http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/
The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.

FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1

I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.

as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.
game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications.
on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.
so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.
so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.

C++ Server - To Thread or not to Thread?

I'm working on a game server, written in C++, and I'm trying to decide how many threads to use and what tasks to thread. The basic server skeleton consists of keyboard I/O and output to a console, accepting incoming connects, sending outgoing connects, and doing the game "stuff".
What I'd like to know is which things should be given a separate thread. Should each connect have its own thread? I know this is variable, it depends on the project or so, but I would like it to support a pretty decent number of players (somewhere in the hundreds if possible).

The standard answer should always be: Try it the simplest way first, and only look for ways to improve performance if the simple way isn't good enough. However, re-architecting a large C++ program can be a painful experience, so some guesses about performance in advance may be appropriate.
Theoretically, hundreds of threads are probably OK on modern machines. The NPTL implementation for Linux was tested with tens of thousands of threads, as I recall. If that's the easiest way for you to implement, it may be the right answer.
However, high-performance web servers and similar typically use event-driven models instead. Consider a library like libevent. I'm sure there are C++ libraries for the same purpose.
I personally believe that languages without first-class continuations, or at least coroutines, are poor choices for this kind of work, but the C language family is how we get work done today, so off we go. :-)

A good solution could be to use a Thread pool.
Idea is to let the main thread dispatch equitably all connexions in a fixed number of threads.
With a good design, you can easily set the number of thread on runtime.
You can find more informations here.
Create more threads than you have CPU cores is not productive, and adding too threads decrease performances due to time taken for switching between threads.
By example, for compiling a large project (it's not exactly the same thing, but it's valid for both case), it's often recommended to use no more thread than number of CPU cores + 1.

A very common technique is to have the game server run on one thread to monitor several connections (i.e. sockets) by using a select on each socket. When data is available, grab the data and enqueue it in a producer/consumer type model for the game engine to pick up.
This is by no means the be-all-end-all implementation, but it should be enough to get you started. Sounds like a cool project. Good luck!

If you setup the connections and utilize them in a manner that cause the thread to block waiting on IO then you should be able to service all of the connections and the keyboard on one thread. You may not want to put the console output on that same thread, as I've seen cases (on windows at least), where the speed of writing to the console is actually a bottleneck (i.e. if the console window is minimized the process runs considerably faster).
If the work of your game engine parallelizes well then you probably want to set use as many threads as there are CPUs less one (for the OS and the other two threads). If you expect the client to run on the same machine the server will want to detect that and scale back the number of threads it uses.

How to use non-blocking sockets with multiple threads?

I have read that working with more than 64 sockets in a thread is dangerous(?). But -at least for me- Non-blocking sockets are used for avoiding complicated thread things. Since there is only one listener socket, how am i supposed to split sockets into threads and use them with select() ? Should i create fd_sets for each thread or what ? And how am i supposed to assign a client to a thread, since I can only pass values in the beginning with CreateThread() ?

No no no, you got a few things wrong there.
First, the ideal way to handle many sockets is to have a thread pool which will do the work in front of the sockets (clients).
Another thread, or two (actually in the amount of CPUs as far as I know), do the connection accepting.
Now, when a an event occurs, such as a new connection, it is being dispatched to the thread pool to be processed.
Second, it depends on the actual implementation and environment.
For example, in Windows there's something called IOCP.
If you ask me - do not bother with the lower implementation but instead use a framework such as BOOST::ASIO or ACE.
I personally like ASIO. The best thing about those frameworks is that they are usually cross-platform (nix, Windows etc').
So, my answer is a bit broad but I think it's to the best that you take these facts into consideration before diving into code/manuals/implementation.
Good luck!

Well, what you have read is wrong. Many powerful single-threaded applications have been written with non-blocking sockets and high-performance I/O demultiplexers like epoll(4) and kqueue(2). Their advantage is that you setup your wait events upfront, so the kernel does not have to copy ton of file descriptors and [re-]setup lots of stuff on each poll.
Then there are advantages to threading if your primary goal is throughput, and not latency.
Check out this great overview of available techniques: The C10K problem.

The "ideal way to handle many sockets" is not always - as Poni seems to believe - to "have a thread pool."
What does "ideal" pertain to? Is it ease of programming? Best performance?
Since he recommends not bothering "with the lower implementation" and "use a framework such as BOOST::ASIO or ACE" I guess he means ease of programming.
Had he had a performance angle on Windows he would have recommended "something called IOCPs." IOCPs are "IO Control Ports" which will allow implementation of super-fast IO-applications using just a handful of threads (one per available core is recommended). IOCP applications run circles around any thread-pool equivalent which he would have known if he'd ever written code using them. IOCPs are not used alongside thread pools but instead of them.
There is no IOCP equivalent in Linux.
Using a framework on Windows may result in a faster "time to market" product but the performance will be far from what it might have been had a pure IOCP implementation been chosen.
The performance difference is such that OS-specific code implementations should be considered. If a generic solution is chosen anyway, at least performance would "not have been given away accidentally."

Most suitable asynchronous socket model for an instant messenger client?

I'm working on an instant messenger client in C++ (Win32) and I'm experimenting with different asynchronous socket models. So far I've been using WSAAsyncSelect for receiving notifications via my main window. However, I've been experiencing some unexpected results with Winsock spawning additionally 5-6 threads (in addition to the initial thread created when calling WSAAsyncSelect) for one single socket.
I have plans to revamp the client to support additional protocols via DLL:s, and I'm afraid that my current solution won't be suitable based on my experiences with WSAAsyncSelect in addition to me being negative towards mixing network with UI code (in the message loop).
I'm looking for advice on what a suitable asynchronous socket model could be for a multi-protocol IM client which needs to be able to handle roughly 10-20+ connections (depending on amount of protocols and protocol design etc.), while not using an excessive amount of threads -- I am very interested in performance and keeping the resource usage down.
I've been looking on IO Completion Ports, but from what I've gathered, it seems overkill. I'd very much appreciate some input on what a suitable socket solution could be!
Thanks in advance! :-)

There are four basic ways to handle multiple concurrent sockets.
Multiplexing, that is using select() to poll the sockets.
AsyncSelect which is basically what you're doing with WSAAsyncSelect.
Worker Threads, creating a single thread for each connection.
IO Completion Ports, or IOCP. dp mentions them above, but basically they are an OS specific way to handle asynchronous I/O, which has very good performance, but it is a little more confusing.
Which you choose often depends on where you plan to go. If you plan to port the application to other platforms, you may want to choose #1 or #3, since select is not terribly different from other models used on other OS's, and most other OS's also have the concept of threads (though they may operate differently). IOCP is typically windows specific (although Linux now has some async I/O functions as well).
If your app is Windows only, then you basically want to choose the best model for what you're doing. This would likely be either #3 or #4. #4 is the most efficient, as it calls back into your application (similar, but with better peformance and fewer issues to WSAsyncSelect).
The big thing you have to deal with when using threads (either IOCP or WorkerThreads) is marshaling the data back to a thread that can update the UI, since you can't call UI functions on worker threads. Ultimately, this will involve some messaging back and forth in most cases.
If you were developing this in Managed code, i'd tell you to look at Jeffrey Richter's AysncEnumerator, but you've chose C++ which has it's pros and cons. Lots of people have written various network libraries for C++, maybe you should spend some time researching some of them.

consider to use the ASIO library you can find in boost (www.boost.org).

Just use synchronous models. Modern operating systems handle multiple threads quite well. Async IO is really needed in rare situations, mostly on servers.

In some ways IO Completion Ports (IOCP) are overkill but to be honest I find the model for asynchronous sockets easier to use than the alternatives (select, non-blocking sockets, Overlapped IO, etc.).
The IOCP API could be clearer but once you get past it it's actually easier to use I think. Back when, the biggest obstacle was platform support (it needed an NT based OS -- i.e., Windows 9x did not support IOCP). With that restriction long gone, I'd consider it.

If you do decide to use IOCP (which, IMHO, is the best option if you're writing for Windows) then I've got some free code available which takes away a lot of the work that you need to do.
Latest version of the code and links to the original articles are available from here.
And my views on how my framework compares to Boost::ASIO can be found here: http://www.lenholgate.com/blog/2008/09/how-does-the-socket-server-framework-compare-to-boostasio.html.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js