Multiple threads with locks vs single threads? - c++

I am designing a client and server socket program.
I have a file to be transferred to the server from the client using UDP, I repeat I am using UDP.....
I am sending through UDP so, the sending rate is too fast then the receiver, so I have created 3 threads listening on the same socket, so that when one thread is doing some work(I mean writing to a file using fwrite) with the received data the other thread can recv from the client.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads. I am right in thinking???
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? " ... Please guide me...

I would use one thread. Saves the complications. You can buffer the data and use asynchronous writes
http://www.gnu.org/s/hello/manual/libc/Asynchronous-Reads_002fWrites.html

Cache the data before writing it.
Let the writing happen in another thread.
Doing it the way you do will require locking the socket.
Q1: yes you do need to lock it (very slow!). Why not use a separate file descriptor in each thread? the problem comes mostly with the current file position managed by that descriptor.
Q2: Neither. If data needs ordering (yes, UDP!) you should still buffer it. RAM is much faster then disk IO. Feed a stream to buffer it and handle the data in that stream in a separate thread.

Similar to Ed's answer, I'd suggest using asynchronous I/O and a single thread for your server. Though I find using Boost.Asio easier than posix AIO.

My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads
Yes, you always have to use locks when multiple threads are writing to a single object (file, memory, etc).
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? "
I would use two threads. The first thread does nothing but read from the socket and store the data in memory. The second thread reads data from memory and writes it to the file. Treat the memory buffer as a FIFO queue and use a mutex to protect the queue pointers. You'll gain nothing from a third thread. In fact, it would probably harm performance and it definitely makes the problem far more complicated.

First, try to avoid using UDP for bulk transfers. If you use UDP you have to reinvent your own flow control protocol, as well as logic for retransmission and reordering. From the sounds of it, your problems boil down to missing flow control - so why not just use TCP?
Anyway, don't put your file writing in another thread. Modern OSes will internally buffer disk writes in any case - you'll only start blocking if you're writing data much faster than the disk can keep up, in which case buffering inside your process will only buy you another few seconds at most. Switch to TCP, or implement a proper flow control mechanism.

Related

fopen and fwrite to the same file from multiple threads

This is similar but a bit different to existing questions. Say I have many threads that open the same file but they all do their own fopen and maintain their own FILE pointer.
a) is it necessary to lock fwrite calls if they have their own FILE ptrs?
b) if it is necessary, is locking around fwrite enough or will they potentially flush at different times and end up intermingling when they flush? If yes, would locking on fwrite and then fflush cover it?
This question can not be answered in the context of programming languages. As far as programming language is concerned, those file handles are completely independent objects, and whatever you do with one has no effect whatsoever on another.
The question is on the operating system - can it handle multiple write operation to the same underlying file at the same time. In other words, are those writes atomic. I can't say for all of them, but in Linux, for example, writes for less than PIPE_BUF size are atomic.
For the quick measure, yeah, you can put a lock around the I/O part. That'd work, I guarantee it. As for flusing I/O cache, I'd recommend not doing that. It's always best to let OS to handle I/O timing because kernel knows what's going on the best. You are not gonna have it in effect immediately after calling flush anyway because it's that complicated. Just like the other flush operations(java GC, glFlush and so on). If you choose to stick to this option, please be mindful of a start and an end point of the concurrent I/O op. You wouldn't want a case where the main thread closes the file and another worker thread tries to do I/O on that.
The general solution to this problem is creating a thread that handles the file exclusively. If other thread should read/write from/to the file, they must ask the thread to do that for them. This is tricky, I know. You'd need to compose a simple protocol, sync mechanism, but in a nutshell, it goes like this:
prep a queue, a cv(condition variable), a lock. create a thread and open the file. Doesn't matter who opens the file
The thread spawns and waits for the queue to be filled in
Other threads send a request I/O op to the thread. The request includes the data for the file and an op code.
The thread handles the requests from the queue. This is where the real I/O happens.
You could use anonymous FIFO instead of a queue. Or skip the opcode part if the file is write-only.
Unlike network I/O, modern OSes can't do file I/Os in a non-blocking manner. So expect a significant blocking time(io wait). Also, there's this problem where the queue fills up too quick and eats a lot of memory when I/O is relatively slow. There will be a case where the whole program should wait for the I/O to complete before terminating itself. Not much you can do about it. You could close the file from another thread while I/O is in progress on Linux(close() is MT-safe ), I don't know how that's gonna work on other OS.
There are alternatives like async file I/O or overlapped I/O which involves signal handling or callbacks. Using these doesn't require a creating of a thread but each has pros and cons, mostly regarding portability.

Multiple Threads Writing to a Socket

I created a TCP client application and decided to handle incoming data with a new thread using the pthread library in c.
However, I read somewhere that unexpected things could happen when multiple threads try to write to the same file descriptor for a socket connection.
what is the best approach to ensure these 'unexpected things' don't happen.
Is there even a need to using threads in the first place?
NB: My decision to use threads was to prevent any blocking operations.
To avoid blocking, you should research asynchronous operations. You can either learn how your particularly platform handles them, or use a library, such as ASIO (https://think-async.com/) which will handle it for you.
Might I recommend using libuv? It's highly maintained (core of node.js) and cross-platform.
Also, you should not use select() that's old school. If you did it yourself you should use epoll() on Linux. It scales much better.
You should only have one writer thread ever. That thread should write when the socket is not busy.
Checkout libuv - it handles all this messiness for you but still keeps you close to the metal. https://nikhilm.github.io/uvbook/networking.html
My approach on stuff like this is to typically have one writer thread, and a fast reader callback, usually just allocating memory for the incoming data, which then just delegates to 1 or more processing threads. If you want it fast avoid memcpy at all cost, and allocate a big buffer to began with.
Yes, you need to create threads if you want to do anything else while waiting for incoming TCP data
Yes, you need to take care of unexpected things that may happen in a multi threaded programs
You should use Mutexes to prevent the so called unexpected things. the pthread library you're using to create threads also contains synchronization primitives.
A sample program may look like this
pthread_mutex_t tcp_lock;
void ThreadFunction()
{
pthread_mutex_lock(&tcp_lock);
// Do your stuffs
pthread_mutex_unlock(&tcp_lock);
}
int MainThread()
{
pthread_mutex_lock(&tcp_lock);
// Do your stuffs
pthread_mutex_unlock(&tcp_lock);
}

send the full contents of a ring buffer on subscription and then send new data

I'm a beginner in boost::asio.
I need to code a module which reads from a pipe and puts the data into a ring buffer (I've no problem in how to implement this part).
Another part of the module waits for a consumer to open a new TCP connection or unix domain socket and when the connection is made it sends the full ring buffer contents and then it will send any new data as soon as it is pushed into the ring buffer. Multiple consumers are allowed and one consumer can open a new connection at any time.
The first naive implementation I thought of is to keep a separate asio::streambuf for every connection and push the entire ring buffer into it on connection and then every new data, but it seems a very sub-optimal method to do it both in memory and cpu cycles as data has to be copyed for every connection, maybe multiple times as I don't know if boost::asio::send (or the linux tcp/ip stack) does a copy of the data.
As my idea is to use no multi threading at all, I'm thinking of using some form of custom asio::streambuf derived class which shares the actual buffer with the ring buffer, but keeps a separate state of the read pointer without the need of any lock.
It seems mine it is a pretty unusual need, because I'm unable to find any related documentation/question which deals with a similar subject and the boost documentation seems pretty brief and scarce to me (see e.g.: http://www.boost.org/doc/libs/1_57_0/doc/html/boost_asio/reference/basic_streambuf.html).
It would be nice if someone could point me to some ideas that I could take as starting point to implement my design or point me to an alternative design if he/she considers mine bad, un-implementable and/or improvable.
You should just do what you intend to.
You absolutely don't need a streambuf to use with Boost Asio: http://www.boost.org/doc/libs/release/doc/html/boost_asio/reference/buffer.html
If the problem is how to avoid having the producer "wait" until all consumers (read: connections) are done transmitting the data, you can always use ye olde trick of alternating output buffers.
Many ring buffer implementations allow direct splicing of a complete sequence of elements at once, (e.g. boost lockfree spsc_queue cache memory access). You could use such an operation to your advantage.
Also relevant:
TCP Zero copy using boost
It appears, that performance is a topic here.
Independent of whether boost::asio is used or some hand knitted solution, performance (throughput) might be down the the drain already by the fact (as stated in the comment section of the OP), that single bytes are being traded (read from the pipe).
After the initial "burst phase" when a consumer connects, single bytes trickle from the pipe to the connected consumer sockets with read() and write() operations per byte (or a few bytes, if the application is not constantly polling).
Given that (the fact that the price for system calls read() and write() is paid for small amounts of data), I dare theorize that anything about multiple queues or single queue etc. is already in the shadow of that basic "design flaw". I put "design flaw" in quotes as it cannot always be avoided to have to handle exactly such a situation.
So, if throughput cannot be optimized anyway, I would recommend the most simple and straightforward solution which can be conceived.
The "no threads" statement in the OP implies non-blocking file descriptors for both the accept socket, the consumer data sockets and the pipe. Will this be another 100% CPU/core eating polling application? If this is not some kind of special ops hyper-optimized problem, I would rather not advice to use non-blocking file descriptors. Also, I would not worry about zero-copy or not.
One easy approach with threads would be to have the consumer sockets non-blocking, while pipe is in blocking mode. The thread which reads the pipe then pumps the data into a queue and calls the function which services all currently connected consumers. The listen socket (the one calling accept()) is in signaled state, when new client connections are pending. With mechanisms like kqueue (bsd) or epoll (linux etc.) or WaitForMultipleObjects (windows), the pipe reader thread can react to that situation as well.
In the times when nothing is to be done, your application is sleeping/blocking and friendly to our environment :)

multilevel threads takes time

I have created a module to transfer the data using multiple sockets using TCP client server communication. This transfers the file of 20MB in 10 secs.
Multiple sockets sends/receives the data in each of their a separate thread.
When I launch the module from another worker thread the time taken to send the same file increases to 40 secs.
Please let me know any solutions to avoid the time lagging.
Are you synchronizing the threads to read the content from the file at Client side and write back to file at server side? This adds time.
Along with this, by default you will having context switching time between multiple threads at both client and server.
One problem may be disk caching and seeking. If you are not already doing this, try interleaving blocks transferred by different threads more finely (like, say, to 4KB, so bytes 0...4095 transferred by 1st thread, 4096...8191 by 2nd thread, etc).
Also avoid mutexes, for example by having each thread know what it's supposed to read and send, or write and receive, when thread starts, so no inter-thread communication is needed. Aborting the whole transfer can be done by an atomic flag variable (checked by each thread after transferring a block) instead of mutexes.
Also on receiving end, make sure to do buffering in memory, so that you write to destination file sequentially. That is, if one thread transfers blocks faster than some other, those "early" blocks are just kept in memory until all the preceding blocks have been received and written.
If buffer size becomes an issue here, you may need to implement some inter-thread synchronization at one end (doesn't matter much, wether you slow down receiving or sending), to prevent the fastest thread getting too far ahead of the slowest thread, but for file sizes in the order of tens of megabytes, on PCs, this should not become an issue.

Waiting on a condition (pthread_cond_wait) and a socket change (select) simultaneously

I'm writing a POSIX compatible multi-threaded server in c/c++ that must be able to accept, read from, and write to a large number of connections asynchronously. The server has several worker threads which perform tasks and occasionally (and unpredictably) queue data to be written to the sockets. Data is also occasionally (and unpredictably) written to the sockets by the clients, so the server must also read asynchronously. One obvious way of doing this is to give each connection a thread which reads and writes from/to its socket; this is ugly, though, since each connection may persist for a long time and the server thus may have to hold hundred or thousand threads just to keep track of connections.
A better approach would be to have a single thread that handled all communications using the select()/pselect() functions. I.e., a single thread waits on any socket to be readable, then spawns a job to process the input that will be handled by a pool of other threads whenever input is available. Whenever the other worker threads produce output for a connection, it gets queued, and the communication thread waits for that socket to be writable before writing it.
The problem with this is that the communication thread may be waiting in the select() or pselect() function when output is queued by the worker threads of the server. It's possible that, if no input arrives for several seconds or minutes, a queued chunk of output will just wait for the communication thread to be done select()ing. This shouldn't happen, however--data should be written as soon as possible.
Right now I see a couple solutions to this that are thread-safe. One is to have the communication thread busy-wait on input and update the list of sockets it waits on for writing every tenth of a second or so. This isn't optimal since it involves busy-waiting, but it will work. Another option is to use pselect() and send the USR1 signal (or something equivalent) whenever new output has been queued, allowing the communication thread to update the list of sockets it is waiting on for writable status immediately. I prefer the latter here, but still dislike using a signal for something that should be a condition (pthread_cond_t). Yet another option would be to include, in the list of file descriptors on which select() is waiting, a dummy file that we write a single byte to whenever a socket needs to be added to the writable fd_set for select(); this would wake up the communications server because that particular dummy file would then be readable, thus allowing the communications thread to immediately update it's writable fd_set.
I feel intuitively, that the second approach (with the signal) is the 'most correct' way to program the server, but I'm curious if anyone knows either which of the above is the most efficient, generally speaking, whether either of the above will cause race conditions that I'm not aware of, or if anyone knows of a more general solution to this problem. What I really want is a pthread_cond_wait_and_select() function that allows the comm thread to wait on both a change in sockets or a signal from a condition.
Thanks in advance.
This is a fairly common problem.
One often used solution is to have pipes as a communication mechanism from worker threads back to the I/O thread. Having completed its task a worker thread writes the pointer to the result into the pipe. The I/O thread waits on the read end of the pipe along with other sockets and file descriptors and once the pipe is ready for read it wakes up, retrieves the pointer to the result and proceeds with pushing the result into the client connection in non-blocking mode.
Note, that since pipe reads and writes of less then or equal to PIPE_BUF are atomic, the pointers get written and read in one shot. One can even have multiple worker threads writing pointers into the same pipe because of the atomicity guarantee.
Unfortunately, the best way to do this is different for each platform. The canonical, portable way to do it is to have your I/O thread block in poll. If you need to get the I/O thread to leave poll, you send a single byte on a pipe that the thread is polling. That will cause the thread to exit from poll immediately.
On Linux, epoll is the best way. On BSD-derived operating systems (including OSX, I think), kqueue. On Solaris, it used to be /dev/poll and there's something else now whose name I forget.
You may just want to consider using a library like libevent or Boost.Asio. They give you the best I/O model on each platform they support.
Your second approach is the cleaner way to go. It's totally normal to have things like select or epoll include custom events in your list. This is what we do on my current project to handle such events. We also use timers (on Linux timerfd_create) for periodic events.
On Linux the eventfd lets you create such arbitrary user events for this purpose -- thus I'd say it is quite accepted practice. For POSIX only functions, well, hmm, perhaps one of the pipe commands or socketpair I've also seen.
Busy-polling is not a good option. First you'll be scanning memory which will be used by other threads, thus causing CPU memory contention. Secondly you'll always have to return to your select call which will create a huge number of system calls and context switches which will hurt overall system performance.