Multiple Threads Writing to a Socket

Multiple Threads Writing to a Socket - c++

I created a TCP client application and decided to handle incoming data with a new thread using the pthread library in c.
However, I read somewhere that unexpected things could happen when multiple threads try to write to the same file descriptor for a socket connection.
what is the best approach to ensure these 'unexpected things' don't happen.
Is there even a need to using threads in the first place?
NB: My decision to use threads was to prevent any blocking operations.

To avoid blocking, you should research asynchronous operations. You can either learn how your particularly platform handles them, or use a library, such as ASIO (https://think-async.com/) which will handle it for you.

Might I recommend using libuv? It's highly maintained (core of node.js) and cross-platform.
Also, you should not use select() that's old school. If you did it yourself you should use epoll() on Linux. It scales much better.
You should only have one writer thread ever. That thread should write when the socket is not busy.
Checkout libuv - it handles all this messiness for you but still keeps you close to the metal. https://nikhilm.github.io/uvbook/networking.html
My approach on stuff like this is to typically have one writer thread, and a fast reader callback, usually just allocating memory for the incoming data, which then just delegates to 1 or more processing threads. If you want it fast avoid memcpy at all cost, and allocate a big buffer to began with.

Yes, you need to create threads if you want to do anything else while waiting for incoming TCP data
Yes, you need to take care of unexpected things that may happen in a multi threaded programs
You should use Mutexes to prevent the so called unexpected things. the pthread library you're using to create threads also contains synchronization primitives.
A sample program may look like this
pthread_mutex_t tcp_lock;
void ThreadFunction()
{
pthread_mutex_lock(&tcp_lock);
// Do your stuffs
pthread_mutex_unlock(&tcp_lock);
}
int MainThread()
{
pthread_mutex_lock(&tcp_lock);
// Do your stuffs
pthread_mutex_unlock(&tcp_lock);
}

Related

boost ASIO and message passing between thread

I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
Save the messages synchronously as soon as I receive them over the same thread.
Save the messages asynchronously on a separate thread.
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Use one io service per thread and use the post function to communicate between threads. Here I have to worry about lock contention. Should I?
Use Linux domain sockets to pass messages between threads. No lock contention as far as I understand. Here I can probably use BOOST_ASIO_DISABLE_THREADS macro to get some performance boost.
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
The messages are exactly 8 bytes in length.
Cannot use an external database. The database must be embedded in the running
process.
I am thinking about using RocksDB as the embedded
database.

I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service for it. I would instead create an io_service::strand on your existing io_service instance and use the strand::dispatch() member function (instead of io_service::post()) for any blocking database tasks. Using a strand in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service instance available to service non-database tasks.
Why might this be better than using a separate io_service instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch() will execute in the current thread if it can (i.e. if no task is already running in the strand), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.

After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.

You probably only need one io_service -- you can create additional threads which will process events occurring within the io_service by providing boost::asio::io_service::run as the thread function. This should scale well for receiving 8-byte messages from clients over the network socket.
For storing the messages in the database, it depends on the database & interface. If it's multi-threaded, then you might as well just send each message to the DB from the thread that received it. Otherwise, I'd probably set up a boost::lockfree::queue where a single reader thread pulls items off and sends them to the database, and the io_service threads append new messages to the queue when they arrive.
Is that the most efficient approach? I dunno. It's definitely simple, and gives you a baseline that you can profile if it's not fast enough for your situation. But I would recommend against designing something more complicated at first: you don't know whether you'll need it at all, and unless you know a lot about your system, it's practically impossible to say whether a complicated approach would perform any better than the simple one.

void Consumer( lockfree::queue<uint64_t> &message_queue ) {
// Connect to database...
while (!Finished) {
message_queue.consume_all( add_to_database ); // add_to_database is a Functor that takes a message
cond_var.wait_for( ... ); // Use a timed wait to avoid missing a signal. It's OK to consume_all() even if there's nothing in the queue.
}
}
void Producer( lockfree::queue<uint64_t> &message_queue ) {
while (!Finished) {
uint64_t m = receive_from_network( );
message_queue.push( m );
cond_var.notify_all( );
}
}

Assuming that the constraint of using cxx11 is not too hard in your situtation, I would try to use the std::async to make an asynchronous call to the embedded DB.

Calling boost::asio::read() in a thread blocks calling thread or process?

I'm quite new to network programming and I'm writing a program that should accept many TCP connections and receive data from them. To make things go parallel, the agent should read data from each socket in a new thread. I decided to use boost::asio instead of raw *nix sockets to make things simpler. Though this seems to be a wrong decision...
I wonder if I calling boost::asio::read or boost::asio::read_some blocks only its calling thread or blocks process? Yes I should write my own small test and see results myself, but I have no access to my Linux box right now. Just thinking about code that I should write tomorrow at university.
So if it blocks the process, what's correct way of implementing a server/client architecture that accepts many clients at same time?
Notes:
I'm having difficulties about design decisions. Any suggestion is appropriate.

The read and read_some calls are both blocking, and will only block the current thread for Linux and Win32 (and probably most others, just don't have direct expericence).
You might want to look into using async_read instead though if you are having a large number of incoming connections, as you might acctually do better performance wise using a smaller number of threads than number of connections. Boost does provide examples of using the thread pool to handle client connections.

Multiuser chat server c++

I am building a Chat Server (which allows private messages between users) in c++ ... just as a challenge for me, and I've hit a dead point... where I don't know what may be better.
By the way: I am barely new to C++; that's why I want a challenge... so if there are other optimal ways, multithreading, etc... let me know please.
Option A
I have a c++ application running, that has an array of sockets, reads all the input (looping through all the sockets) in every loop (1second loop I guess) and stores it to DB (a log is required), and after that, loops again over all the sockets sending what's needed in every socket.
Pros: One single process, contained. Easy to develop.
Cons: I see it hardly scalable, and a single focus of failure ... I mean, what about performance with 20k sockets?
Option B
I have a c++ application listening to connections.
When a connection is received, it forks a subprocess that handles that socket... reading and saving to a DB all the input of the user. And checking all the required output from DB on every loop to write to the socket.
Pros: If the daemon is small enough, having a process per socket is likely more scalable. And at the same time if a process fails, all the others are kept online.
Cons: Harder to develop. May be it consumes too much resources to maintain a process for each connection.
What option do you think is the best? Any other idea or suggestion is welcome :)

As mentioned in the comments, there is an additional alternative which is to use select() or poll() (or, if you don't mind making your application platform-specific, something like epoll()). Personally I would suggest poll() because I find it more convenient, but I think only select() is available on at least some versions of Windows - I don't know whether running on Windows is important to you.
The basic approach here is that you first add all your sockets (including a listen socket, if you're listening for connections) to a structure and then call select() or poll() as appropriate. This call will block your application until at least one of the socket has some data to read, and then you get woken up and you go through the socket(s) that are ready for reading, process the data and then jump back into blocking again. You generally do this in a loop, something like:
while (running) {
int rc = poll(...);
// Handle active file descriptors here.
}
This is a great way to write an application which is primarily IO-bound - i.e. it spends much more time handling network (or disk) traffic than it does actually processing the data with the CPU.
As also mentioned in the comments, another approach is to fork a thread per connection. This is quite effective, and you can use simple blocking IO in each thread to read and write to that connection. Personally I would advise against this approach for several reasons, most of which are largely personal preference.
Firstly, it's fiddly to handle connections where you need to write large amounts of data at a time. A socket can't guarantee to write all pending data at once (i.e. the amount that it sent may not be the full amount you requested). In this case you have to buffer up the pending data locally and wait until there's room in the socket to send it. This means at any given time, you might be waiting for two conditions - either the socket is ready to send, or the socket is ready to read. You could, of course, avoid reading from the socket until all the pending data is sent, but this introduces latency into handling the data. Or, you could use select() or poll() on just that connection - but if so, why bother using threads at all, just handle all the connections that way. You could also use two threads per connection, one for reading and one for writing, which is probably the best approach if you're not confident whether you can always send all messages in a single call, although this doubles the number of threads you need which could make your code more complicated and slightly increase resource usage.
Secondly, if you plan to handle many connections, or a high connection turnover, threads are somewhat more of a load on the system than using select() or friends. This isn't a particularly big deal in most cases, but it's a factor for larger applications. This probably isn't a practical issue unless you were writing something like a webserver that was handling hundreds of requests a second, but I thought it was relevant to mention for reference. If you're writing something of this scale you'd likely end up using a hybrid approach anyway, where you multiplexed some combination of processes, threads and non-blocking IO on top of each other.
Thirdly, some programmers find threads complicated to deal with. You need to be very careful to make all your shared data structures thread-safe, either with exclusive locking (mutexes) or using someone else's library code which does this for you. There are a lot of examples and libraries out there to help you with this, but I'm just pointing out that care is needed - whether multithreaded coding suits you is a matter of taste. It's relatively easy to forget to lock something and have your code work fine in testing because the threads don't happen to contend that data structure, and then find hard-to-diagnose issues when this happens under higher load in the real world. With care and discipline, it's not too hard to write robust multithreaded code and I have no objection to it (though opinions vary), but you should be aware of the care required. To some extent this applies to writing any software, of course, it's just a matter of degree.
Those issues aside, threads are quite a reasonable approach for many applications and some people seem to find them easier to deal with than non-blocking IO with select().
As to your approaches, A will work but is wasteful of CPU because you have to wake up every second regardless of whether there's actual useful work to do. Also, you introduce up to a second's delay in handling messages, which could be irritating for a chat server. In general I would suggest that something like select() is a much better approach than this.
Option B could work although when you want to send messages between connections you're going to have to use something like pipes to communicate between processes and that's a bit of a pain. You'll end up having to wait on both your incoming pipe (for data to send) as well as the socket (for data to receive) and thus you end up effectively with the same problem, having to wait on two filehandles with something like select() or threads. Really, as others have said, threads are the right way to process each connection separately. Separate processes are also a little more expensive of resources than threads (although on platforms such as Linux the copy-on-write approach to fork() means it's not actually too bad).
For small applications with only, say, tens of connections there's not an awful lot technically to choose between threads and processes, it largely depends on which style appeals to you more. I would personally use non-blocking IO (some people call this asynchronous IO, but that's not how I would use the term) and I've written quite a lot of code that does that as well as lots of multithreaded code, but it's still only my personal opinion really.
Finally, if you want to write portable non-blocking IO loops I strongly suggest investigating libev (or possbily libevent but personally I find the former easier to use and more performant). These libraries use different primitives such as select() and poll() on different platforms so your code can remain the same, and they also tend to offer slightly more convenient interfaces.
If you have any more questions on any of that, feel free to ask.

Multiple threads with locks vs single threads?

I am designing a client and server socket program.
I have a file to be transferred to the server from the client using UDP, I repeat I am using UDP.....
I am sending through UDP so, the sending rate is too fast then the receiver, so I have created 3 threads listening on the same socket, so that when one thread is doing some work(I mean writing to a file using fwrite) with the received data the other thread can recv from the client.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads. I am right in thinking???
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? " ... Please guide me...

I would use one thread. Saves the complications. You can buffer the data and use asynchronous writes
http://www.gnu.org/s/hello/manual/libc/Asynchronous-Reads_002fWrites.html

Cache the data before writing it.
Let the writing happen in another thread.
Doing it the way you do will require locking the socket.
Q1: yes you do need to lock it (very slow!). Why not use a separate file descriptor in each thread? the problem comes mostly with the current file position managed by that descriptor.
Q2: Neither. If data needs ordering (yes, UDP!) you should still buffer it. RAM is much faster then disk IO. Feed a stream to buffer it and handle the data in that stream in a separate thread.

Similar to Ed's answer, I'd suggest using asynchronous I/O and a single thread for your server. Though I find using Boost.Asio easier than posix AIO.

My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads
Yes, you always have to use locks when multiple threads are writing to a single object (file, memory, etc).
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? "
I would use two threads. The first thread does nothing but read from the socket and store the data in memory. The second thread reads data from memory and writes it to the file. Treat the memory buffer as a FIFO queue and use a mutex to protect the queue pointers. You'll gain nothing from a third thread. In fact, it would probably harm performance and it definitely makes the problem far more complicated.

First, try to avoid using UDP for bulk transfers. If you use UDP you have to reinvent your own flow control protocol, as well as logic for retransmission and reordering. From the sounds of it, your problems boil down to missing flow control - so why not just use TCP?
Anyway, don't put your file writing in another thread. Modern OSes will internally buffer disk writes in any case - you'll only start blocking if you're writing data much faster than the disk can keep up, in which case buffering inside your process will only buy you another few seconds at most. Switch to TCP, or implement a proper flow control mechanism.

Pollable signalling between threads

I'm working on a project, where a primary server thread needs to dispatch events to a series of worker threads. The work that goes on in the worker threads relies on polling (ie. epoll or kqueue depending on the UNIX system in question) with timeouts on these operations needing to be handles. This means, that a normal conditional variable or semaphore structure is not viable for this dispatch, as it would make one or the other block resulting in an unwanted latency between either handling the events coming from polling or the events originating from the server thread.
So, I'm wondering what the most optimal construct for dispatching such events between threads in a pollable fashion is? Essentially, all that needs to be delivered is a pollable "signal" that tells the worker thread, that it has more events to fetch. I've looked at using UNIX pipes (unnamed ones, as it's internal to the process) which seems like a decent solution given that a single byte can be written to the pipe and read back out when the queue is cleared -- but, I'm wondering if this is the best approach available? Or the fastest?
Alternatively, there is the possibility to use signalfd(2) on Linux, but as this is not available on BSD systems, I'd rather like to avoid this construct. I'm also wondering how great the overhead in using system signals actually is?

Jan Hudec's answer is correct, although I wouldn't recommend using signals for a few reasons:
Older versions of glibc emulated pselect and ppoll in a non-atomic fashion, making them basically worthless. Even when you used the mask correctly, signals could get "lost" between the pthread_sigprocmask and select calls, meaning they don't cause EINTR.
I'm not sure signalfd is any more efficient than the pipe. (Haven't tested it, but I don't have any particular reason to believe it is.)
signals are generally a pain to get right. I've spent a lot of effort on them (see my sigsafe library) and I'd recommend avoiding them if you can.
Since you're trying to have asynchronous handling portable to several systems, I'd recommend looking at libevent. It will abstract epoll or kqueue for you, and it will even wake up workers on your behalf when you add a new event. See event.c
2058 static inline int
2059 event_add_internal(struct event *ev, const struct timeval *tv,
2060 int tv_is_absolute)
2061 {
...
2189 /* if we are not in the right thread, we need to wake up the loop */
2190 if (res != -1 && notify && EVBASE_NEED_NOTIFY(base))
2191 evthread_notify_base(base);
...
2196 }
Also,
The worker thread deals with both socket I/O and asynchronous disk I/O, which means that it is optimally always waiting for the event queuing mechanism (epoll/kqueue).
You're likely to be disappointed here. These event queueing mechanisms don't really support asynchronous disk I/O. See this recent thread for more details.

As far as performance goes, the cost of system call is comparably huge to other operations, so it's the number of system calls that matters. There are two options:
Use the pipes as you wrote. If you have any useful payload for the message, you get one system call to send, one system call to wait and one system call to receive. Try to pass any relevant data down the pipe instead of reading them from a shared structure to avoid additional overhead from locking.
The select and poll have variants, that also waits for signals (pselect, ppoll). Linux epoll can do the same using signalfd, so it remains a question whether kqueue can wait for signals, which I don't know. If it can, than you could use them (you are using different mechanism on Linux and *BSD anyway). It would save you the syscall for reading if you don't have good use for the passed data.
I would expect passing the data over socket to be more efficient if it allows you do do away with any other locking.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js