Multiuser chat server c++ - c++

I am building a Chat Server (which allows private messages between users) in c++ ... just as a challenge for me, and I've hit a dead point... where I don't know what may be better.
By the way: I am barely new to C++; that's why I want a challenge... so if there are other optimal ways, multithreading, etc... let me know please.
Option A
I have a c++ application running, that has an array of sockets, reads all the input (looping through all the sockets) in every loop (1second loop I guess) and stores it to DB (a log is required), and after that, loops again over all the sockets sending what's needed in every socket.
Pros: One single process, contained. Easy to develop.
Cons: I see it hardly scalable, and a single focus of failure ... I mean, what about performance with 20k sockets?
Option B
I have a c++ application listening to connections.
When a connection is received, it forks a subprocess that handles that socket... reading and saving to a DB all the input of the user. And checking all the required output from DB on every loop to write to the socket.
Pros: If the daemon is small enough, having a process per socket is likely more scalable. And at the same time if a process fails, all the others are kept online.
Cons: Harder to develop. May be it consumes too much resources to maintain a process for each connection.
What option do you think is the best? Any other idea or suggestion is welcome :)

As mentioned in the comments, there is an additional alternative which is to use select() or poll() (or, if you don't mind making your application platform-specific, something like epoll()). Personally I would suggest poll() because I find it more convenient, but I think only select() is available on at least some versions of Windows - I don't know whether running on Windows is important to you.
The basic approach here is that you first add all your sockets (including a listen socket, if you're listening for connections) to a structure and then call select() or poll() as appropriate. This call will block your application until at least one of the socket has some data to read, and then you get woken up and you go through the socket(s) that are ready for reading, process the data and then jump back into blocking again. You generally do this in a loop, something like:
while (running) {
int rc = poll(...);
// Handle active file descriptors here.
}
This is a great way to write an application which is primarily IO-bound - i.e. it spends much more time handling network (or disk) traffic than it does actually processing the data with the CPU.
As also mentioned in the comments, another approach is to fork a thread per connection. This is quite effective, and you can use simple blocking IO in each thread to read and write to that connection. Personally I would advise against this approach for several reasons, most of which are largely personal preference.
Firstly, it's fiddly to handle connections where you need to write large amounts of data at a time. A socket can't guarantee to write all pending data at once (i.e. the amount that it sent may not be the full amount you requested). In this case you have to buffer up the pending data locally and wait until there's room in the socket to send it. This means at any given time, you might be waiting for two conditions - either the socket is ready to send, or the socket is ready to read. You could, of course, avoid reading from the socket until all the pending data is sent, but this introduces latency into handling the data. Or, you could use select() or poll() on just that connection - but if so, why bother using threads at all, just handle all the connections that way. You could also use two threads per connection, one for reading and one for writing, which is probably the best approach if you're not confident whether you can always send all messages in a single call, although this doubles the number of threads you need which could make your code more complicated and slightly increase resource usage.
Secondly, if you plan to handle many connections, or a high connection turnover, threads are somewhat more of a load on the system than using select() or friends. This isn't a particularly big deal in most cases, but it's a factor for larger applications. This probably isn't a practical issue unless you were writing something like a webserver that was handling hundreds of requests a second, but I thought it was relevant to mention for reference. If you're writing something of this scale you'd likely end up using a hybrid approach anyway, where you multiplexed some combination of processes, threads and non-blocking IO on top of each other.
Thirdly, some programmers find threads complicated to deal with. You need to be very careful to make all your shared data structures thread-safe, either with exclusive locking (mutexes) or using someone else's library code which does this for you. There are a lot of examples and libraries out there to help you with this, but I'm just pointing out that care is needed - whether multithreaded coding suits you is a matter of taste. It's relatively easy to forget to lock something and have your code work fine in testing because the threads don't happen to contend that data structure, and then find hard-to-diagnose issues when this happens under higher load in the real world. With care and discipline, it's not too hard to write robust multithreaded code and I have no objection to it (though opinions vary), but you should be aware of the care required. To some extent this applies to writing any software, of course, it's just a matter of degree.
Those issues aside, threads are quite a reasonable approach for many applications and some people seem to find them easier to deal with than non-blocking IO with select().
As to your approaches, A will work but is wasteful of CPU because you have to wake up every second regardless of whether there's actual useful work to do. Also, you introduce up to a second's delay in handling messages, which could be irritating for a chat server. In general I would suggest that something like select() is a much better approach than this.
Option B could work although when you want to send messages between connections you're going to have to use something like pipes to communicate between processes and that's a bit of a pain. You'll end up having to wait on both your incoming pipe (for data to send) as well as the socket (for data to receive) and thus you end up effectively with the same problem, having to wait on two filehandles with something like select() or threads. Really, as others have said, threads are the right way to process each connection separately. Separate processes are also a little more expensive of resources than threads (although on platforms such as Linux the copy-on-write approach to fork() means it's not actually too bad).
For small applications with only, say, tens of connections there's not an awful lot technically to choose between threads and processes, it largely depends on which style appeals to you more. I would personally use non-blocking IO (some people call this asynchronous IO, but that's not how I would use the term) and I've written quite a lot of code that does that as well as lots of multithreaded code, but it's still only my personal opinion really.
Finally, if you want to write portable non-blocking IO loops I strongly suggest investigating libev (or possbily libevent but personally I find the former easier to use and more performant). These libraries use different primitives such as select() and poll() on different platforms so your code can remain the same, and they also tend to offer slightly more convenient interfaces.
If you have any more questions on any of that, feel free to ask.

Related

Use case for async IO when dealing with one socket and multiple threads

I'm struggling to see why some programmers recommend asynchronous IO when there is only one socket, like the common case of UDP. This is more directed at ASIO which is the basis of what we will be getting in C++17 but applies generally to any async library.
Is there a use case where it makes sense? I can't see how the performance would ever be better than two threads, one blocking on read (then queueing packets for a threadpool) and one blocking on write with a conditional variable waiting for packets to send. Preferably using the multipacket functions there to avoid operating system overhead.
Is there anything in the pipeline to help the efficiency of UDP or single socket TCP in ASIO? Pretty much all the ASIO examples show synchronous reading and writing. ie you only do another read or write in the handler for the last one. So there is very little benefit gained per socket, certainly nothing that can be better than dedicated recv and write threads for those examples if dealing with only one socket.
Am I missing something here?
Generally speaking, ASIO might yield worse performance than multithreading. With true multithreading and multiple cores (stndard nowadays) you will have the chance of serving two clients at exactly the same time (this will never happen with a single-threaded ASIO model). However, if, for instance, your tasks are IO-bound, use common resource with synchronized access (single-threaded DB, for instance) or are subject to any other locks, any benefit of multithreading will vanish.
From the other hand, ASIO model is much simpler, doesn't require any synchronization, allows one to compile the program in single-threaded mode (thus, for example, increasing performance of memory allocations, eliminating need for atomic access, etc). In many scenarios those benefits outweight the drawbacks.

Is there a better way to use asynchronous TCP sockets in C++ rather than poll or select?

I recently started writing some C++ code that uses sockets, which I'd like to be asynchronous. I've read many posts about how poll and select can be used to make my sockets asynchronous (using poll or select to wait for a send or recv buffer), but on my server side I have an array of struct pollfd, where every time the listening socket accepts a connection, it adds it to the array of struct pollfd so that it can monitor that socket's recv (POLLIN).
My problem is that if I have 5000 sockets connected to my listening socket on my server, then the array of struct pollfd would be of size 5000, since it would be monitoring all the connected sockets BUT the only way I know how to check if a recv for a socket is ready, is by looping through all the items in the array of struct pollfd to find the ones whose revents equals POLLIN. This just seems kind of inefficient, when the number of connected sockets because very large. Is there a better way to do this?
How does the boost::asio library handle async_accept, async_send, etc...? How should I handle it?
What the heck, I will go ahead and write up an answer.
I am going to ignore the "asynchronous" vs "non-blocking" terminology because I believe it is irrelevant to your question.
You are worried about performance when handling thousands of network clients, and you are right to be worried. You have rediscovered the C10K problem. Back when the Web was young, people saw a need for a small number of fast servers to handle a large number of (relatively) slow clients. The existing select/poll type interfaces require linear scans -- in both kernel and user space -- across all sockets to determine which are ready. If many sockets are often idle, your server can wind up spending more time figuring out what work to do than doing actual work.
Fast-forward to today, where we have basically two approaches for dealing with this problem:
1) Use one thread per socket and just issue blocking reads and writes. This is usually the simplest to code, in my opinion, and modern operating systems are quite good at letting idle threads sleep peacefully out of the way without imposing any significant performance overhead. In my experience, this approach works very well for hundreds of clients; I cannot personally say how it will work for thousands.
2) Use one of the platform-specific interfaces that were introduced to tackle the C10K problem. That means epoll (Linux), kqueue (BSD/Mac), or completion ports (Windows). (If you think epoll is the same as poll, look again.) All of these will only notify your application about sockets that are actually ready, avoiding the wasteful linear scan across idle connections. There are several libraries that make these platform-specific interfaces easier to use, including libevent, libev, and Boost.Asio. You will find that all of them ultimately invoke epoll on Linux, kqueue on BSD, and so on, whenever such interfaces are available.

Should I use multiple threads for a multi socket client?

I understand that for most cases using threads in Qt networking is overkill and unnecessary, especially if you do it the proper way and use the readyRead() signal. However, my "client" application will have multiple sockets open (about 5) at one time. It is possible for there to be data coming in on all sockets at the same time. I am really not going to be doing any intense processing with the incoming data. Simply reading it in and then sending out a signal to update the GUI with the newly received data. Do you think a single thread application should be able to handle all of the data coming in?
I understand that I haven't shown you any code and that my description is pretty vague and it could very well depend on how it performs once implemented, but from a general design perspective and your guys' expertise, what is your opinion?
Unless you are receiving really high-bandwidth streams (e.g. megabytes per second rather than kilobytes per second), a single-threaded design should be sufficient. Keep in mind that the OS's networking stack is running "in the background" at all times, receiving TCP packets and storing the received data inside fixed-size in-kernel memory buffers. This happens in parallel with your program's execution, so in most cases the fact that your program is single-threaded and busy dealing with a GUI update (or another socket) won't hamper your computer's reception of TCP packets.
The case where a single-threaded design would cause a slowdown of TCP traffic is if your program (via Qt) didn't call recv() quickly enough, such that the kernel's TCP-receive buffer for a socket became entirely filled with data. At that point the kernel would have no choice but to start dropping incoming TCP packets for that socket, which would cause the server to have to re-send those TCP packets, and that would cause the socket's TCP receive rate to slow down, at least temporarily. However, that problem can be avoided by making sure the buffers never (or at least rarely) get full.
The obvious way to do that is to ensure that your program reads all of the incoming data as quickly as possible -- something that QTCPSocket does by default. The only thing you need to do is make sure that your GUI updates don't take an inordinate amount of time -- and Qt's widget-update routines are fairly efficient, so they shouldn't, unless you have a really elaborate GUI or an inefficient custom paintEvent() routine or etc.
If that's not sufficient, the next thing you could do (if necessary) is tell the OS's TCP stack to increase the size of its in-kernel TCP receive buffer, e.g. by doing:
int fd = myQTCPSocketObject.descriptor();
int newBufSizeBytes = 128*1024; // request 128kB kernel recv-buffer for this socket
if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &newBufSizeBytes, sizeof(newBufSizeBytes)) != 0) perror("setsockopt");
Doing that would give your (single) thread more time to react before incoming packets start getting dropped for lack of in-kernel buffer space.
If, after trying all that, you still aren't getting the network performance you need, then you can try going multithreaded. I doubt it will come to that, but if it does, it needn't affect your program's design too much; you'd just write a wrapper class (called SocketThread or something) that holds your QTCPSocket object and runs an internal thread that handles the reading from the socket, and emits a bytesReceived(QByteArray) signal whenever the thread reads data from the socket. The rest of your code would remain approximately the same; just modify it to hold the SocketThread object instead of a QTCPSocket, and connect the SocketThread's bytesReceived(QByteArray) signal to a corresponding slot (via a QueuedConnection, of course, for thread-safety) and use that instead of responding directly to readReady().
Implement it without threads, using a thread-considerate design(*), measure the delay your data experiences, decide if it is within acceptable bounds. Then decide if you need to use threads to capture it more rapidly.
From your description, the key bottleneck is going to be GUI reception of the "data ready" signal, render it. If you use the approach of sending lots of these signals, your GUI is goign to be doing more re-renders.
If you use a single-thread approach, you can marshal the network reads and get all the updates and then refresh the GUI directly. As you've described it, this sounds like it will have the least degree of contention.
(* try to avoid constructs which will require an entire rewrite if you go threaded, but don't put so much effort into making it thread-proof that it will actually need threads to make it efficient, e.g. don't wrap everything with mutex calls)
I do not know much about Qt, but this could be a typical scenario where you use select() to multiplex multiple socket accesses with a single thread.
If the thread for selecting is used mainly for handling the data from/to the sockets you will be very fast(as you will have less context switches). So if you are not transfer really huge amounts of data it is likely possible that you will be faster will a single threaded solution.
That being said, i would go with the solution that fits the most for your needs, something that you can implement in a fair amount of time. Implementing select (async) can be quite a hassle, an overkill that might not be needed.
It's a C-like approach, but i hope i could help anyway.

Designing a multi-client tcp server to process data

I am attempting to rewrite my current project to include more features and stability, and need some help designing it. Here is the jist of it (for linux):
TCP_SERVER receives connection (auth packet)
TCP_SERVER starts a new (thread/fork) to handle the new client
TCP_SERVER will be receiving many packets from client > which will be added to a circular buffer
A separate thread will be created for that client to process those packets and build a list of objects
Another thread should be created to send parts of the list of objects to another client
The reason to separate all the processing into threads is because server will be getting many packets and the processing wont be able to keep up (which needs to be quick, as its time sensitive) (im not sure if tcp will drop packets if the internal buffer gets too large?), and another thread to send to another client to keep the processing fast as possible.
So for each new connection, 3 threads should be created. 1 to receive packets, 1 to process them, and 1 to send the processed data to another client (which is technically the same person/ip just on a different device)
And i need help designing this, as how to structure this, what to use (forks/threads), what libraries to use.
Trying to do this yourself is going to cause you a world of pain. Focus on your actual application, and leverage an existing socket handling framework. For example, you said:
for each new connection, 3 threads should be created
That statement says the following:
1. You haven't done this before, at scale, and haven't realized the impact all these threads will have.
2. You've never benchmarked thread creation or synchronous operations.
3. The number of things that can go wrong with this approach is pretty overwhelming.
Give some serious thought to using an existing library that does most of this for you. Getting the scaffolding right around this can literally take years, and you're better off focusing on your code rather than all the random plumbing.
The Boost C++ libraries seem to have a nice Async C++ socket handling infrastructure. Combine this with some of the existing C++ thread pools and you could likely have a highly performant solution up fairly quickly.
I would also question your use of C++ for this. Java and C# both do highly scalable socket servers pretty well, and some of the higher level language tooling (Spring, Guarva, etc) can be very, very valuable. If you ever want to secure this, via TLS or another mechanism, you'll also probably find this much easier in Java or C# than in C++.
Some of the major things you'll care about:
1. True Async I/O will be a huge perf and scalability win. Try really hard to do this. The boost asio library looks pretty nice.
2. Focus on your features and stability, rather than building a new socket handling platform.
3. Threads are expensive, avoid creating them. Thread pools are your friend.
You plan to create one-or-more threads for every connection your server handles. Threads are not free, they come with a memory and CPU overhead, and when you have many active threads you also begin to have resource contention.
What usage pattern do you anticipate? Do you expect that when you have 8 connections, all 8 network threads will be consuming 100% of a cpu core pushing/pulling packets? Or do you expect them to have a relatively low turn-around?
As you add more threads, you will begin to have to spend more time competing for resources in things like mutexes etc.
A better pattern is to have one or more thread for network io - most os'es have mechanisms for saying "tell me when one or more of these network connections has io" which is an efficiency saving over having lots of individual threads all doing the same thing for just one connection.
Then for actual processing, spin up a pool of worker threads to do actual work, allowing you to minimize the competition for resources. You can monitor work load to determine if you need to spin up more to meet delivery requirements.
You might also want to look into something to implement the network IO infrastructure for you; I've had really good performance results with libevent but then I've only had to deal with very high performance/reliability networking systems.

Handling more than 1024 sockets?

I'm working on a MMO game server project and I have a problem. That's select() method's limit. I want to handle more than 1024 socket I/O with a single thread. I want to make this with single thread because I've tried to make a multi-thread handling system. That system creates 3 thread (for example in 4 cores processor; 1 is main, 3 is select() handlers) that handles select() method but there is an other problem again, now our limit is gone to 3072 (1024 * 3) and that isn't a solution! After that idea, I want to make a non-blocking socket system, with this system I've called 2 different select method in 1 single thread like this; "select() select()". They returns in order and I can handle them in order. But there is an other problem I think. If I want to implement a thread like "while(true){ select() select()}" and select() methods (non-blocking) retuns, I'll overload CPU like a empty "while(true)" block. If I want to make a select() timeout, I can't handle bottom select() in realtime. Now I can't make a algorithm for that. Can anybody help me about this?
NOTE: I don't want to use poll-epoll-wsapoll etc. (poll cannot handle microseconds, it isn't fast as select!) and libevent like 3rd party libraries (I want to make my own!)
FINALLY SOLUTION (I think): I don't need to handle nanoseconds for a I/O operation because there is no sense to handle it. Poll is a good way to handle more than 1024 socket I/O. I'll research something for understanding MMO systems. And the last one is I'll make some tests and I'll try somethings before I ask a question :) Thanks!
EDIT: I'm new in this Q&A platform. Can you tell me what's wrong with my question after giving a negative point? :)
Using select is fundamentally wrong with this many (thousands) of connections. While select is usually faster when you have only a very small number of sockets (maybe tens,) it scales horribly to several thousand and more. Everywhere that I know of, select slows down linearly with the number of connections (it's even worse than that, but I wouldn't go into the details.)
Even poll doesn't do much better than select at scaling to thousands of connections. It doesn't have select's (low) limit on the number of file descriptors you can poll, but it still scales linearly with the number of connections.
What you really should use are platform-specific facilities like epoll and kqueue. They scale extremely better (usually O(1),) but obviously they aren't portable.
I seriously suggest that you consider something like libev that is a portable, highly-tested and a thin wrapper around platform-specific facilities and services.
This is because platform-specific methods (e.g. select, poll, epoll, kqueue, I/O completion ports, event ports, etc.) are different form each other and none of them is available on more than one or two platforms, or their limits and the details of their behaviors differ slightly. These facilities might even change from one version of an OS to the next (e.g. epoll on Linux 2.6.9, IIRC.)
Even if you are not concerned with portability or future-proofing your code, such a library can provide you with more functionality and a nicer interface.
Two more libraries you can try are libevent (a little larger and slower, but more features) and libuv (if you need Windows portability.)
Given the requirements you have set, your problem has no solution.
The normal way to overcome select()'s limit of FD_SETSIZZE (1024) file descriptors is to use poll() (or even better alternatives epoll and kqueue) but you've rejected that option.
Otherwise, you could always overcome the problem by calling select() multiple times in parallel in different threads with different sets of file descriptors... but you've rejected that option too.
I don't believe there can really be any other solution!
Perhaps you should explain why both the poll() et al option and the thread option are not suitable. Your requirements seem like artificial limitations without justification.