Handling more than 1024 sockets? - c++

I'm working on a MMO game server project and I have a problem. That's select() method's limit. I want to handle more than 1024 socket I/O with a single thread. I want to make this with single thread because I've tried to make a multi-thread handling system. That system creates 3 thread (for example in 4 cores processor; 1 is main, 3 is select() handlers) that handles select() method but there is an other problem again, now our limit is gone to 3072 (1024 * 3) and that isn't a solution! After that idea, I want to make a non-blocking socket system, with this system I've called 2 different select method in 1 single thread like this; "select() select()". They returns in order and I can handle them in order. But there is an other problem I think. If I want to implement a thread like "while(true){ select() select()}" and select() methods (non-blocking) retuns, I'll overload CPU like a empty "while(true)" block. If I want to make a select() timeout, I can't handle bottom select() in realtime. Now I can't make a algorithm for that. Can anybody help me about this?
NOTE: I don't want to use poll-epoll-wsapoll etc. (poll cannot handle microseconds, it isn't fast as select!) and libevent like 3rd party libraries (I want to make my own!)
FINALLY SOLUTION (I think): I don't need to handle nanoseconds for a I/O operation because there is no sense to handle it. Poll is a good way to handle more than 1024 socket I/O. I'll research something for understanding MMO systems. And the last one is I'll make some tests and I'll try somethings before I ask a question :) Thanks!
EDIT: I'm new in this Q&A platform. Can you tell me what's wrong with my question after giving a negative point? :)

Using select is fundamentally wrong with this many (thousands) of connections. While select is usually faster when you have only a very small number of sockets (maybe tens,) it scales horribly to several thousand and more. Everywhere that I know of, select slows down linearly with the number of connections (it's even worse than that, but I wouldn't go into the details.)
Even poll doesn't do much better than select at scaling to thousands of connections. It doesn't have select's (low) limit on the number of file descriptors you can poll, but it still scales linearly with the number of connections.
What you really should use are platform-specific facilities like epoll and kqueue. They scale extremely better (usually O(1),) but obviously they aren't portable.
I seriously suggest that you consider something like libev that is a portable, highly-tested and a thin wrapper around platform-specific facilities and services.
This is because platform-specific methods (e.g. select, poll, epoll, kqueue, I/O completion ports, event ports, etc.) are different form each other and none of them is available on more than one or two platforms, or their limits and the details of their behaviors differ slightly. These facilities might even change from one version of an OS to the next (e.g. epoll on Linux 2.6.9, IIRC.)
Even if you are not concerned with portability or future-proofing your code, such a library can provide you with more functionality and a nicer interface.
Two more libraries you can try are libevent (a little larger and slower, but more features) and libuv (if you need Windows portability.)

Given the requirements you have set, your problem has no solution.
The normal way to overcome select()'s limit of FD_SETSIZZE (1024) file descriptors is to use poll() (or even better alternatives epoll and kqueue) but you've rejected that option.
Otherwise, you could always overcome the problem by calling select() multiple times in parallel in different threads with different sets of file descriptors... but you've rejected that option too.
I don't believe there can really be any other solution!
Perhaps you should explain why both the poll() et al option and the thread option are not suitable. Your requirements seem like artificial limitations without justification.

Related

Multiuser chat server c++

I am building a Chat Server (which allows private messages between users) in c++ ... just as a challenge for me, and I've hit a dead point... where I don't know what may be better.
By the way: I am barely new to C++; that's why I want a challenge... so if there are other optimal ways, multithreading, etc... let me know please.
Option A
I have a c++ application running, that has an array of sockets, reads all the input (looping through all the sockets) in every loop (1second loop I guess) and stores it to DB (a log is required), and after that, loops again over all the sockets sending what's needed in every socket.
Pros: One single process, contained. Easy to develop.
Cons: I see it hardly scalable, and a single focus of failure ... I mean, what about performance with 20k sockets?
Option B
I have a c++ application listening to connections.
When a connection is received, it forks a subprocess that handles that socket... reading and saving to a DB all the input of the user. And checking all the required output from DB on every loop to write to the socket.
Pros: If the daemon is small enough, having a process per socket is likely more scalable. And at the same time if a process fails, all the others are kept online.
Cons: Harder to develop. May be it consumes too much resources to maintain a process for each connection.
What option do you think is the best? Any other idea or suggestion is welcome :)
As mentioned in the comments, there is an additional alternative which is to use select() or poll() (or, if you don't mind making your application platform-specific, something like epoll()). Personally I would suggest poll() because I find it more convenient, but I think only select() is available on at least some versions of Windows - I don't know whether running on Windows is important to you.
The basic approach here is that you first add all your sockets (including a listen socket, if you're listening for connections) to a structure and then call select() or poll() as appropriate. This call will block your application until at least one of the socket has some data to read, and then you get woken up and you go through the socket(s) that are ready for reading, process the data and then jump back into blocking again. You generally do this in a loop, something like:
while (running) {
int rc = poll(...);
// Handle active file descriptors here.
}
This is a great way to write an application which is primarily IO-bound - i.e. it spends much more time handling network (or disk) traffic than it does actually processing the data with the CPU.
As also mentioned in the comments, another approach is to fork a thread per connection. This is quite effective, and you can use simple blocking IO in each thread to read and write to that connection. Personally I would advise against this approach for several reasons, most of which are largely personal preference.
Firstly, it's fiddly to handle connections where you need to write large amounts of data at a time. A socket can't guarantee to write all pending data at once (i.e. the amount that it sent may not be the full amount you requested). In this case you have to buffer up the pending data locally and wait until there's room in the socket to send it. This means at any given time, you might be waiting for two conditions - either the socket is ready to send, or the socket is ready to read. You could, of course, avoid reading from the socket until all the pending data is sent, but this introduces latency into handling the data. Or, you could use select() or poll() on just that connection - but if so, why bother using threads at all, just handle all the connections that way. You could also use two threads per connection, one for reading and one for writing, which is probably the best approach if you're not confident whether you can always send all messages in a single call, although this doubles the number of threads you need which could make your code more complicated and slightly increase resource usage.
Secondly, if you plan to handle many connections, or a high connection turnover, threads are somewhat more of a load on the system than using select() or friends. This isn't a particularly big deal in most cases, but it's a factor for larger applications. This probably isn't a practical issue unless you were writing something like a webserver that was handling hundreds of requests a second, but I thought it was relevant to mention for reference. If you're writing something of this scale you'd likely end up using a hybrid approach anyway, where you multiplexed some combination of processes, threads and non-blocking IO on top of each other.
Thirdly, some programmers find threads complicated to deal with. You need to be very careful to make all your shared data structures thread-safe, either with exclusive locking (mutexes) or using someone else's library code which does this for you. There are a lot of examples and libraries out there to help you with this, but I'm just pointing out that care is needed - whether multithreaded coding suits you is a matter of taste. It's relatively easy to forget to lock something and have your code work fine in testing because the threads don't happen to contend that data structure, and then find hard-to-diagnose issues when this happens under higher load in the real world. With care and discipline, it's not too hard to write robust multithreaded code and I have no objection to it (though opinions vary), but you should be aware of the care required. To some extent this applies to writing any software, of course, it's just a matter of degree.
Those issues aside, threads are quite a reasonable approach for many applications and some people seem to find them easier to deal with than non-blocking IO with select().
As to your approaches, A will work but is wasteful of CPU because you have to wake up every second regardless of whether there's actual useful work to do. Also, you introduce up to a second's delay in handling messages, which could be irritating for a chat server. In general I would suggest that something like select() is a much better approach than this.
Option B could work although when you want to send messages between connections you're going to have to use something like pipes to communicate between processes and that's a bit of a pain. You'll end up having to wait on both your incoming pipe (for data to send) as well as the socket (for data to receive) and thus you end up effectively with the same problem, having to wait on two filehandles with something like select() or threads. Really, as others have said, threads are the right way to process each connection separately. Separate processes are also a little more expensive of resources than threads (although on platforms such as Linux the copy-on-write approach to fork() means it's not actually too bad).
For small applications with only, say, tens of connections there's not an awful lot technically to choose between threads and processes, it largely depends on which style appeals to you more. I would personally use non-blocking IO (some people call this asynchronous IO, but that's not how I would use the term) and I've written quite a lot of code that does that as well as lots of multithreaded code, but it's still only my personal opinion really.
Finally, if you want to write portable non-blocking IO loops I strongly suggest investigating libev (or possbily libevent but personally I find the former easier to use and more performant). These libraries use different primitives such as select() and poll() on different platforms so your code can remain the same, and they also tend to offer slightly more convenient interfaces.
If you have any more questions on any of that, feel free to ask.

C++ Server - To Thread or not to Thread?

I'm working on a game server, written in C++, and I'm trying to decide how many threads to use and what tasks to thread. The basic server skeleton consists of keyboard I/O and output to a console, accepting incoming connects, sending outgoing connects, and doing the game "stuff".
What I'd like to know is which things should be given a separate thread. Should each connect have its own thread? I know this is variable, it depends on the project or so, but I would like it to support a pretty decent number of players (somewhere in the hundreds if possible).
The standard answer should always be: Try it the simplest way first, and only look for ways to improve performance if the simple way isn't good enough. However, re-architecting a large C++ program can be a painful experience, so some guesses about performance in advance may be appropriate.
Theoretically, hundreds of threads are probably OK on modern machines. The NPTL implementation for Linux was tested with tens of thousands of threads, as I recall. If that's the easiest way for you to implement, it may be the right answer.
However, high-performance web servers and similar typically use event-driven models instead. Consider a library like libevent. I'm sure there are C++ libraries for the same purpose.
I personally believe that languages without first-class continuations, or at least coroutines, are poor choices for this kind of work, but the C language family is how we get work done today, so off we go. :-)
A good solution could be to use a Thread pool.
Idea is to let the main thread dispatch equitably all connexions in a fixed number of threads.
With a good design, you can easily set the number of thread on runtime.
You can find more informations here.
Create more threads than you have CPU cores is not productive, and adding too threads decrease performances due to time taken for switching between threads.
By example, for compiling a large project (it's not exactly the same thing, but it's valid for both case), it's often recommended to use no more thread than number of CPU cores + 1.
A very common technique is to have the game server run on one thread to monitor several connections (i.e. sockets) by using a select on each socket. When data is available, grab the data and enqueue it in a producer/consumer type model for the game engine to pick up.
This is by no means the be-all-end-all implementation, but it should be enough to get you started. Sounds like a cool project. Good luck!
If you setup the connections and utilize them in a manner that cause the thread to block waiting on IO then you should be able to service all of the connections and the keyboard on one thread. You may not want to put the console output on that same thread, as I've seen cases (on windows at least), where the speed of writing to the console is actually a bottleneck (i.e. if the console window is minimized the process runs considerably faster).
If the work of your game engine parallelizes well then you probably want to set use as many threads as there are CPUs less one (for the OS and the other two threads). If you expect the client to run on the same machine the server will want to detect that and scale back the number of threads it uses.

Regarding handling more than 1024 socket descriptors

I have written a chat server using C on Linux. I have tested the same and it works fine with respect to performance. The only thing which lags is that I am using select system call for handling of sockets descriptors. Since select has the limit of 1024 so at max my chat server can handle only 1024 users concurrently.
I know that the other option which I can use is poll, but not so sure about it and its performance as compared to select.
Please suggest me the most effective way by which I can resolve this situation.
poll() can be used as an almost drop-in replacement for select(), and will allow you to exceed 1024 file descriptors (you can make make the array passed to poll() as large as you want).
It will have similar performance characteristics to select(), since both require the kernel and userspace application to scan the entire array - but if select() is working OK for you, then poll() should too. (There is actually a slight performance improvement in poll() - the .events field, specifying the events you are interested in for each file descriptor, is not changed by poll(), so you don't have to rebuild the array before every call like you do with the file descriptor sets passed to select()).
If you later find yourself having performance problems caused by scanning the poll file descriptor array, you can consider switching to the epoll interface, which is more complicated but also scales better with very large numbers of file descriptors.
Your question is known as the C10K problem (how to deal with more than 10 thousands simultaneous connections). You'll find lot of resources on the web, e.g. this one.
And you should consider select as an obsolete system call. Even with only dozens of file descriptors, you should at least prefer poll
Notice that Qt and Gtk provide you with an event loop machinery, often using poll (and QtCore or Glib can be used outside of graphical interfaces). There is also libev and libevent. I suggest using one of them.
Linux has no 1024 limit on select(). But:
select() performance is very poor
FreeBSD does :)
Your can use poll(). But its performance suffers when number of active connections increases.
Using epoll() is preferable on Linux however I would suggest to use libevent
libevent is fast, clean and portable way to implement heavy loaded servers and for linux it has epoll under the hood.

How to use non-blocking sockets with multiple threads?

I have read that working with more than 64 sockets in a thread is dangerous(?). But -at least for me- Non-blocking sockets are used for avoiding complicated thread things. Since there is only one listener socket, how am i supposed to split sockets into threads and use them with select() ? Should i create fd_sets for each thread or what ? And how am i supposed to assign a client to a thread, since I can only pass values in the beginning with CreateThread() ?
No no no, you got a few things wrong there.
First, the ideal way to handle many sockets is to have a thread pool which will do the work in front of the sockets (clients).
Another thread, or two (actually in the amount of CPUs as far as I know), do the connection accepting.
Now, when a an event occurs, such as a new connection, it is being dispatched to the thread pool to be processed.
Second, it depends on the actual implementation and environment.
For example, in Windows there's something called IOCP.
If you ask me - do not bother with the lower implementation but instead use a framework such as BOOST::ASIO or ACE.
I personally like ASIO. The best thing about those frameworks is that they are usually cross-platform (nix, Windows etc').
So, my answer is a bit broad but I think it's to the best that you take these facts into consideration before diving into code/manuals/implementation.
Good luck!
Well, what you have read is wrong. Many powerful single-threaded applications have been written with non-blocking sockets and high-performance I/O demultiplexers like epoll(4) and kqueue(2). Their advantage is that you setup your wait events upfront, so the kernel does not have to copy ton of file descriptors and [re-]setup lots of stuff on each poll.
Then there are advantages to threading if your primary goal is throughput, and not latency.
Check out this great overview of available techniques: The C10K problem.
The "ideal way to handle many sockets" is not always - as Poni seems to believe - to "have a thread pool."
What does "ideal" pertain to? Is it ease of programming? Best performance?
Since he recommends not bothering "with the lower implementation" and "use a framework such as BOOST::ASIO or ACE" I guess he means ease of programming.
Had he had a performance angle on Windows he would have recommended "something called IOCPs." IOCPs are "IO Control Ports" which will allow implementation of super-fast IO-applications using just a handful of threads (one per available core is recommended). IOCP applications run circles around any thread-pool equivalent which he would have known if he'd ever written code using them. IOCPs are not used alongside thread pools but instead of them.
There is no IOCP equivalent in Linux.
Using a framework on Windows may result in a faster "time to market" product but the performance will be far from what it might have been had a pure IOCP implementation been chosen.
The performance difference is such that OS-specific code implementations should be considered. If a generic solution is chosen anyway, at least performance would "not have been given away accidentally."

Shall we use poll() or select()?

I'm fully aware of the major differences between poll() and select():
select() only supports a fixed amount of file descriptors
select() is supposedly supported on more systems
poll() allows slightly more fine-grained control of event types
poll() implementations may differ in certain details
However, they both accomplish the same task in roughly the same way. So:
Shall we use poll() or select()?
EDIT: I might add that I'm not interested in epoll() since portability is of concern to me. Furthermore, libev(ent) is not an option either, since I'm asking this question because I'm writing my own replacement library for libev(ent).
All remotely modern systems have poll, and it's a greatly superior interface to select/pselect in almost all ways:
poll allows more fine-grained detection of status than select.
poll does not have limits on the max file descriptor you can use (and more importantly, does not have critical vulnerabilities when you fail to check for file descriptors past the FD_SETSIZE limit).
The only disadvantages I can think of to using poll are that:
unlike pselect, poll cannot atomically unmask/mask signals, so you can't use it for waiting for a set of events that includes both file descriptor activity and signals unless you resort to the self-pipe trick.
poll only has millisecond resolution for the wait timeout, rather than microsecond (select) or nanosecond (pselect).
Certainly portability of poll is not a consideration anymore. Any system old enough to lack poll is full of so many vulnerabilities it should not be connected to a network.
In summary, unless you have very special needs (tiny timeout intervals, nasty signal interactions, scaling to millions of persistent connections, etc.) I would simply use poll and be done with it. As others have mentioned, libevent is also an option, but it's not clean/safe code (its use of select actually invokes dangerous UB trying to workaround the limitations of select!) and I find code that uses libevent is generally a lot more unnecessarily complicated than code that simply uses poll directly.
If you are writing for GNU/Linux, you should look at epoll(7).
But for most cross platform support, you could look into using libevent.
http://libevent.org/
Actually, it is hard to recommend a single poll/select implementation without knowing the specifics of what you are trying to do.
I would actually recommend boost::asio, then you can try both implementations and test to see what suits your setup best.
I would use libev or libevent. These libraries are cross-platform and abstract away the details of the underlying implementation (e.g. poll, select.)
Depending on your exact needs, I would recommend either poll or ::boost::asio. I find libevent to be kind of cumbersome and it has all kinds of stuff in it that's oriented towards C and/or towards higher-level protocol handling in it.
I would not recommend select. I have seen implementations of select invisibly fail in weird and bizarre ways because the descriptor limit was exceeded. And the best you can do is to make it fail in an obvious way. Maybe this is very unlikely with your application, but I wouldn't chance it.
And nowadays poll is available almost everywhere select is. About the only place it isn't is Windows. But, IMHO, if you want cross-platform portability to that platform you are better off using a nice wrapper like ::boost::asio that nicely wraps the most efficient OS technique.
Apple's poll() has trouble with TTYs, IME. Where portability is a concern select() might therefore be a better choice.