Reading data from multiple tcp connection - c++

Consider situation where you have 200 detectors that are connected to your program through tcp sockets. They are quite frequently sending their data and
I would like to handle it as efficiently as possible.
I can think of 2 approaches for this problem, but I'm quite new in QT so I don't know which one is better, if any.
Create a threadpool that will be running 200 objects derived from QRunnable(), each object will consist of a socket and slots that will be connected to this socket signals, so that all data concerning one detector will be handled in that one object. (In it's run() method there will be QEventLoop)
Create 200 objects, each object will consist of socket and connect those 200 socket signals to one slot in the main thread. So It will handle data from 200 detectors in one slot.
Which approach would be better consdering the fact that in the first approach there will be created 200 QEventLoops (for each object)?

There is no need to get down to epoll directly. You could use something dedicated like uvw, for example.

I think any solution can work, although I definitely recommend avoiding a thread-per-connection solution, as 200 threads is about 198 threads too many and would not be very efficient.
The way I would do it is to create one thread and run a select() (or poll() or epoll() or whatever) event loop inside that thread to handle the 200 TCP connections there, using non-blocking I/O. When data arrives in that thread, that thread can parse the data into the appropriate chunks and then send the parsed/assembled data on to the main thread (if necessary) via a Queued signal/slot connection (or qApp->postEvent() if you prefer doing it that way). (Doing the networking in a separate thread would help prevent GUI actions from interfering with network performance, and vice-versa)
Creating ~200 QTCPSocket objects in the network thread (and having the network thread run the Qt-standard QEventLoop to handle them) would also probably work well; the last time I tried that I ran into some performance issues with Qt's implementation of networking on some platforms, but that was back in the Qt4 days so I'm optimistic that Qt has improved their implementation's efficiency since then.

In all cases, you don't want more threads than there are logical processor cores. Distribute the objects across the threads. Using a QRunnable that spins an eventloop is fairly pointless, even if I admit to demonstrating it in a SO answer on someone's request. Eventloops aren't cheap either - each takes a few kilobytes of stack, at least on my platform. Thus it's better to just use QThread that has a single eventloop per thread, and then distribute the network objects across the threads in a round-robin fashion.

Related

Boost Asio share same io_service along disposible objects

I'm developing a program, which consists of bunch of Active Objects, that sending messages to each other. I'm using one same io_service to initialize all these objects. So they're working to end of the software life.
I'm using the Active Objects ,let's say, one for file operation, another for serial IO, another for local database connection and one to communicate all of these.
However I couldn't be sure about the objects with short lives. I'm using the short lived objects to open tcp socket to send a quick message to a remote endpoint then dispose the socket immediately. I'm thinking to make these also asynchronous.
The question is, should I use the same io_service for these short lived objects or should I create a new io_service for each socket ?
I'm developing a program, which consists of bunch of Active Objects, that sending messages to each other. I'm using one same io_service to initialize all these objects. So they're working to end of the software life.
Sounds like a good fit. I would recommend using Chris Kohlhoff's recipe if you need operations to be more efficient on machines with multiple processors.
However I couldn't be sure about the objects with short lives. I'm using the short lived objects to open tcp socket to send a quick message to a remote endpoint then dispose the socket immediately. I'm thinking to make these also asynchronous.
There's nothing wrong with having few(er)long-lived asio io_service objects (e.g. you could create the same number of io_services as there are processors on the machine), and short lived objects that use the io_service. I would say this is more efficient as well since you don't have to fire-up a thread to call io_service::run on each (short-lived?) io_service and you can avoid unnecessary context switching.
Making the sockets asynchronous is also needed if you want/need to avoid blocking in your thread(s), especially if there are network issues, etc.

boost ASIO and message passing between thread

I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
Save the messages synchronously as soon as I receive them over the same thread.
Save the messages asynchronously on a separate thread.
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Use one io service per thread and use the post function to communicate between threads. Here I have to worry about lock contention. Should I?
Use Linux domain sockets to pass messages between threads. No lock contention as far as I understand. Here I can probably use BOOST_ASIO_DISABLE_THREADS macro to get some performance boost.
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
The messages are exactly 8 bytes in length.
Cannot use an external database. The database must be embedded in the running
process.
I am thinking about using RocksDB as the embedded
database.
I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service for it. I would instead create an io_service::strand on your existing io_service instance and use the strand::dispatch() member function (instead of io_service::post()) for any blocking database tasks. Using a strand in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service instance available to service non-database tasks.
Why might this be better than using a separate io_service instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch() will execute in the current thread if it can (i.e. if no task is already running in the strand), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.
After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.
You probably only need one io_service -- you can create additional threads which will process events occurring within the io_service by providing boost::asio::io_service::run as the thread function. This should scale well for receiving 8-byte messages from clients over the network socket.
For storing the messages in the database, it depends on the database & interface. If it's multi-threaded, then you might as well just send each message to the DB from the thread that received it. Otherwise, I'd probably set up a boost::lockfree::queue where a single reader thread pulls items off and sends them to the database, and the io_service threads append new messages to the queue when they arrive.
Is that the most efficient approach? I dunno. It's definitely simple, and gives you a baseline that you can profile if it's not fast enough for your situation. But I would recommend against designing something more complicated at first: you don't know whether you'll need it at all, and unless you know a lot about your system, it's practically impossible to say whether a complicated approach would perform any better than the simple one.
void Consumer( lockfree::queue<uint64_t> &message_queue ) {
// Connect to database...
while (!Finished) {
message_queue.consume_all( add_to_database ); // add_to_database is a Functor that takes a message
cond_var.wait_for( ... ); // Use a timed wait to avoid missing a signal. It's OK to consume_all() even if there's nothing in the queue.
}
}
void Producer( lockfree::queue<uint64_t> &message_queue ) {
while (!Finished) {
uint64_t m = receive_from_network( );
message_queue.push( m );
cond_var.notify_all( );
}
}
Assuming that the constraint of using cxx11 is not too hard in your situtation, I would try to use the std::async to make an asynchronous call to the embedded DB.

C++ select()/threads or an alternative to handle multiple clients in a server

I am trying to learn C++ (with prior programming knowledge) by creating a server application with multiple clients. Server application will run on Raspberry Pi/Debian (Raspbian). I thought this would also be a good opportunity to learn about low-level concurrent programming with threads (e.g. POSIX). Then I came across with select() function which basically allows usage of blocking functions in a single thread to handle multiple clients, which is interesting. Some people here on StackOverflow mentioned that threads cause a lot of overhead and select() seems to be a nice alternative.
In reality, I will have 1-3 clients connected but I would like to keep my application flexible. As a structure design, I was thinking about a main thread invoking a data thread (processing stuff non-stop) and a server thread (listening for incoming connections). Since accept() call is blocking, the latter one needs to be a separate thread. If a client connects, then for each client, I may need a separate thread as well.
At the end, worker thread will write to the shared memory and client threads will read from there and communicate with the clients. Some people were opposing to the usage of threads but in my understanding, threads are good if they are invoked rarely (and long living) and if there are blocking function calls. For the last one as it seems there is the select() function, which used in a loop, allows for handling of multiple sockets in a single thread.
I think at least for the data processing and server accept() call, I will need 2 separate threads initiated at the beginning. I may handle all clients with select() in a single thread or separate threads. What would be the correct approach and are there smarter alternatives?

Should I use multiple threads for a multi socket client?

I understand that for most cases using threads in Qt networking is overkill and unnecessary, especially if you do it the proper way and use the readyRead() signal. However, my "client" application will have multiple sockets open (about 5) at one time. It is possible for there to be data coming in on all sockets at the same time. I am really not going to be doing any intense processing with the incoming data. Simply reading it in and then sending out a signal to update the GUI with the newly received data. Do you think a single thread application should be able to handle all of the data coming in?
I understand that I haven't shown you any code and that my description is pretty vague and it could very well depend on how it performs once implemented, but from a general design perspective and your guys' expertise, what is your opinion?
Unless you are receiving really high-bandwidth streams (e.g. megabytes per second rather than kilobytes per second), a single-threaded design should be sufficient. Keep in mind that the OS's networking stack is running "in the background" at all times, receiving TCP packets and storing the received data inside fixed-size in-kernel memory buffers. This happens in parallel with your program's execution, so in most cases the fact that your program is single-threaded and busy dealing with a GUI update (or another socket) won't hamper your computer's reception of TCP packets.
The case where a single-threaded design would cause a slowdown of TCP traffic is if your program (via Qt) didn't call recv() quickly enough, such that the kernel's TCP-receive buffer for a socket became entirely filled with data. At that point the kernel would have no choice but to start dropping incoming TCP packets for that socket, which would cause the server to have to re-send those TCP packets, and that would cause the socket's TCP receive rate to slow down, at least temporarily. However, that problem can be avoided by making sure the buffers never (or at least rarely) get full.
The obvious way to do that is to ensure that your program reads all of the incoming data as quickly as possible -- something that QTCPSocket does by default. The only thing you need to do is make sure that your GUI updates don't take an inordinate amount of time -- and Qt's widget-update routines are fairly efficient, so they shouldn't, unless you have a really elaborate GUI or an inefficient custom paintEvent() routine or etc.
If that's not sufficient, the next thing you could do (if necessary) is tell the OS's TCP stack to increase the size of its in-kernel TCP receive buffer, e.g. by doing:
int fd = myQTCPSocketObject.descriptor();
int newBufSizeBytes = 128*1024; // request 128kB kernel recv-buffer for this socket
if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &newBufSizeBytes, sizeof(newBufSizeBytes)) != 0) perror("setsockopt");
Doing that would give your (single) thread more time to react before incoming packets start getting dropped for lack of in-kernel buffer space.
If, after trying all that, you still aren't getting the network performance you need, then you can try going multithreaded. I doubt it will come to that, but if it does, it needn't affect your program's design too much; you'd just write a wrapper class (called SocketThread or something) that holds your QTCPSocket object and runs an internal thread that handles the reading from the socket, and emits a bytesReceived(QByteArray) signal whenever the thread reads data from the socket. The rest of your code would remain approximately the same; just modify it to hold the SocketThread object instead of a QTCPSocket, and connect the SocketThread's bytesReceived(QByteArray) signal to a corresponding slot (via a QueuedConnection, of course, for thread-safety) and use that instead of responding directly to readReady().
Implement it without threads, using a thread-considerate design(*), measure the delay your data experiences, decide if it is within acceptable bounds. Then decide if you need to use threads to capture it more rapidly.
From your description, the key bottleneck is going to be GUI reception of the "data ready" signal, render it. If you use the approach of sending lots of these signals, your GUI is goign to be doing more re-renders.
If you use a single-thread approach, you can marshal the network reads and get all the updates and then refresh the GUI directly. As you've described it, this sounds like it will have the least degree of contention.
(* try to avoid constructs which will require an entire rewrite if you go threaded, but don't put so much effort into making it thread-proof that it will actually need threads to make it efficient, e.g. don't wrap everything with mutex calls)
I do not know much about Qt, but this could be a typical scenario where you use select() to multiplex multiple socket accesses with a single thread.
If the thread for selecting is used mainly for handling the data from/to the sockets you will be very fast(as you will have less context switches). So if you are not transfer really huge amounts of data it is likely possible that you will be faster will a single threaded solution.
That being said, i would go with the solution that fits the most for your needs, something that you can implement in a fair amount of time. Implementing select (async) can be quite a hassle, an overkill that might not be needed.
It's a C-like approach, but i hope i could help anyway.

Calling boost::asio::read() in a thread blocks calling thread or process?

I'm quite new to network programming and I'm writing a program that should accept many TCP connections and receive data from them. To make things go parallel, the agent should read data from each socket in a new thread. I decided to use boost::asio instead of raw *nix sockets to make things simpler. Though this seems to be a wrong decision...
I wonder if I calling boost::asio::read or boost::asio::read_some blocks only its calling thread or blocks process? Yes I should write my own small test and see results myself, but I have no access to my Linux box right now. Just thinking about code that I should write tomorrow at university.
So if it blocks the process, what's correct way of implementing a server/client architecture that accepts many clients at same time?
Notes:
I'm having difficulties about design decisions. Any suggestion is appropriate.
The read and read_some calls are both blocking, and will only block the current thread for Linux and Win32 (and probably most others, just don't have direct expericence).
You might want to look into using async_read instead though if you are having a large number of incoming connections, as you might acctually do better performance wise using a smaller number of threads than number of connections. Boost does provide examples of using the thread pool to handle client connections.