This is similar but a bit different to existing questions. Say I have many threads that open the same file but they all do their own fopen and maintain their own FILE pointer.
a) is it necessary to lock fwrite calls if they have their own FILE ptrs?
b) if it is necessary, is locking around fwrite enough or will they potentially flush at different times and end up intermingling when they flush? If yes, would locking on fwrite and then fflush cover it?
This question can not be answered in the context of programming languages. As far as programming language is concerned, those file handles are completely independent objects, and whatever you do with one has no effect whatsoever on another.
The question is on the operating system - can it handle multiple write operation to the same underlying file at the same time. In other words, are those writes atomic. I can't say for all of them, but in Linux, for example, writes for less than PIPE_BUF size are atomic.
For the quick measure, yeah, you can put a lock around the I/O part. That'd work, I guarantee it. As for flusing I/O cache, I'd recommend not doing that. It's always best to let OS to handle I/O timing because kernel knows what's going on the best. You are not gonna have it in effect immediately after calling flush anyway because it's that complicated. Just like the other flush operations(java GC, glFlush and so on). If you choose to stick to this option, please be mindful of a start and an end point of the concurrent I/O op. You wouldn't want a case where the main thread closes the file and another worker thread tries to do I/O on that.
The general solution to this problem is creating a thread that handles the file exclusively. If other thread should read/write from/to the file, they must ask the thread to do that for them. This is tricky, I know. You'd need to compose a simple protocol, sync mechanism, but in a nutshell, it goes like this:
prep a queue, a cv(condition variable), a lock. create a thread and open the file. Doesn't matter who opens the file
The thread spawns and waits for the queue to be filled in
Other threads send a request I/O op to the thread. The request includes the data for the file and an op code.
The thread handles the requests from the queue. This is where the real I/O happens.
You could use anonymous FIFO instead of a queue. Or skip the opcode part if the file is write-only.
Unlike network I/O, modern OSes can't do file I/Os in a non-blocking manner. So expect a significant blocking time(io wait). Also, there's this problem where the queue fills up too quick and eats a lot of memory when I/O is relatively slow. There will be a case where the whole program should wait for the I/O to complete before terminating itself. Not much you can do about it. You could close the file from another thread while I/O is in progress on Linux(close() is MT-safe ), I don't know how that's gonna work on other OS.
There are alternatives like async file I/O or overlapped I/O which involves signal handling or callbacks. Using these doesn't require a creating of a thread but each has pros and cons, mostly regarding portability.
Related
I would like to write multithreading-safe logger using lock-free queue. Logging threads will push messages to queue and logger will be popping them and send to output. I consider how to solve that issue- sending to output.
I would like to avoid using mutex/locks as long as it is possible.
So, let's assume that I am going to use C++ streams to write to the file/console. We can assume that target system is Linux.
Ok, writing to stream must be just a wrapper ( perhaps a advanced wrapper) for system call offered by Unix write. From what I know syscalls are atomic ( only one process can execute syscall at the same time). So, it is tempting not to use locks to make safe writing to file.
But write is a system call but it doesn't guarantees writing "whole output". It returns number of bytes which are succesfully written to the file.
Basically, my question is:
How to solve it? Is it possible to avoid mutex? ( I think it is not possible). And please mark my considerations, am I wrong?
Igor is right: just have one thread do all the log writes. Keep in mind that the kernel has to do locking to synchronize access to the open file descriptor (which keeps track of the file position), so by doing writes from multiple cores you're causing contention inside the kernel. Even worse, you're making system calls from multiple cores, which means the kernel's code / data accesses will dirty your caches on multiple cores.
See this paper for more about the impact of making system calls on the performance of user-space code after the syscall completes. (And about data / instruction cache misses inside the kernel for infrequent syscalls). It definitely makes sense to have one thread doing all the system calls, at least all the write system calls, to keep that part of your process's footprint isolated to one core. As well as the locking contention inside the kernel.
That FlexSC paper is about an idea for batching system calls to reduce user->kernel->user transitions, but they also measure overhead for the normal synchronous system-call method. More important is the discussion of cache-pollution from making system calls.
Alternatively, if you can let multiple threads write to your log file, you could just do that and not use the queue at all.
It's not guaranteed that a large write will finish uninterrupted, but a small to medium sized write should (almost?) always copy its whole buffer on most OSes. Especially if you're writing to a file, not a pipe. IDK how Linux write() behaves when it's preempted, but I expect it usually resumes to finish the write instead of returning without having written all the requested bytes. Partial writes might be more likely when interrupted by a signal.
It is guaranteed that bytes from two write() system calls won't be mixed together; all the bytes from one will be before or after the bytes from the other. You're correct that partial writes are a potential problem, though. I forget if the glibc syscall wrapper will resume the call for you on EINTR. Although in that case, it means no bytes actually got written, or it would have returned success with a byte count.
You should test this, for partial writes and for performance. kernel-space locking might be cheaper than the overhead of your lock-free queue, but making system calls from every thread that generates log messages might be worse for performance. (And when you test this, make sure you do it with some real work happening in your user-space process, not just a loop that only calls write.)
What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).
Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.
The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...
Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.
The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.
The normal implementations of a work queue I have seen involve mutexes and condition variables.
Consumer:
A) Acquires Lock
B) While Queue empty
Wait on Condition Variable (thus suspending thread and releasing lock)
C) Work object retrieved from queue
D) Lock is released
E) Do Work
F) GOTO A
Producer:
A) Acquires Lock
B) Work is added to queue
C) condition variable is signaled (potentially releasing worker)
D) Lock is released
I have been browsing some code and I saw an implementation using POSIX pipes (I have not seen this technique before).
Consumer:
A) Do select on pipe (thus suspending thread while no work)
B) Get Job from pipe
C) Do Work
D) GOTO A
Producer:
A) Write Job to pipe.
Since the producer and consumer are threads inside the same application (thus they share the same address space and thus pointers between them are valid); the jobs are written to the pipe as the address of the work object (A C++ object). So all that has to be written/read from the pipe is an 8-byte address.
My question is:
Is this a common technique (have I been sheltered from this) and what are the advantages/disadvantages?
My curiosity was piqued because the pipe technique does not involve any visible lock or signals (it may be hidden in the select). So I was wondering if this would be more efficient?
Edit:
Based on comments in #Maxim Yegorushkin answer.
Actually the "Producer" in this scenario is involved in a lot of high volume IO from lots of source in parallel. So I suspect that the original author though it very desirable that this thread did not block under any circumstances, but also did not want to high cost work in the "Producer" thread.
As it's been mentioned here already, people use pipes as queues to avoid blocking on a condition variable in a non-blocking I/O thread (i.e. the thread that handles multiple sockets and blocks on select/epoll). If an I/O thread blocks on a condition variable or a mutex it can't do non-blocking I/O any more.
Some say that writing into a pipe involves a system call and may increase latency when the volume of inter-thread events is high. That is only true for naive pipe-based queue implementations.
Advanced implementations use lock-free linked lists of jobs/events and only when the first job is added to the list the pipe is written to to wake the target I/O thread from the blocking epoll call (essentially using pipe as an edge-triggered notification mechanism but not for passing pointers to jobs/events). Because it takes a few micro-seconds to wake up a thread there may be more jobs/events posted to that thread's event queue during this time but every subsequent event doesn't require writing to the pipe, until later time when the I/O thread wakes up and consumes all events in the queue. Also, in newer Linux kernel a faster eventfd can be used instead of pipe to wake up an I/O thread.
I have done this. It's old-school but it works.
The reason I did it this way was I needed to wake up the same thread on either a job for it to do or read input from another source, so select() was involved.
It is because of select and how it is structured. As you can see in the man page
select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking.
The key in the above is the 'waiting until one or more of the FDs become ready'. That is the synchronization point between the two threads.
I think the answer is that the pipe technique does not give as good performance as it involves system calls, which are relatively expensive. But it does mean that all that tricky locking and sleeping and waking gets taken care of for you.
I've used both myself, but pipes only for occasional non performance critical applications.
EDIT: I suppose I might as well make the standard recommendation since nobody has come along with any clearly authoritative comments.
Standard recommendation being: Try both and benchmark them. It's the one true way to find out which performs better...
I am designing a client and server socket program.
I have a file to be transferred to the server from the client using UDP, I repeat I am using UDP.....
I am sending through UDP so, the sending rate is too fast then the receiver, so I have created 3 threads listening on the same socket, so that when one thread is doing some work(I mean writing to a file using fwrite) with the received data the other thread can recv from the client.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads. I am right in thinking???
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? " ... Please guide me...
I would use one thread. Saves the complications. You can buffer the data and use asynchronous writes
http://www.gnu.org/s/hello/manual/libc/Asynchronous-Reads_002fWrites.html
Cache the data before writing it.
Let the writing happen in another thread.
Doing it the way you do will require locking the socket.
Q1: yes you do need to lock it (very slow!). Why not use a separate file descriptor in each thread? the problem comes mostly with the current file position managed by that descriptor.
Q2: Neither. If data needs ordering (yes, UDP!) you should still buffer it. RAM is much faster then disk IO. Feed a stream to buffer it and handle the data in that stream in a separate thread.
Similar to Ed's answer, I'd suggest using asynchronous I/O and a single thread for your server. Though I find using Boost.Asio easier than posix AIO.
My 1st question is when I am using a fwrite with multiple threads I have to use locks as the file pointer is shared between the threads
Yes, you always have to use locks when multiple threads are writing to a single object (file, memory, etc).
My 2nd question is "Will there be any improvement in the performance if I use multiple threads to fwrite using locks over using a single thread to do the fwrite work with no locks...??? "
I would use two threads. The first thread does nothing but read from the socket and store the data in memory. The second thread reads data from memory and writes it to the file. Treat the memory buffer as a FIFO queue and use a mutex to protect the queue pointers. You'll gain nothing from a third thread. In fact, it would probably harm performance and it definitely makes the problem far more complicated.
First, try to avoid using UDP for bulk transfers. If you use UDP you have to reinvent your own flow control protocol, as well as logic for retransmission and reordering. From the sounds of it, your problems boil down to missing flow control - so why not just use TCP?
Anyway, don't put your file writing in another thread. Modern OSes will internally buffer disk writes in any case - you'll only start blocking if you're writing data much faster than the disk can keep up, in which case buffering inside your process will only buy you another few seconds at most. Switch to TCP, or implement a proper flow control mechanism.
Is there any advantage in using file writing with overlapped IO in Windows, vs just doing the file writing in a separate thread that I create?
[Edit - please note that I'm doing the file writes without system caching, ie I use the FILE_FLAG_NO_BUFFERING flag in CreateFile)
Since all writes are cached in the system cache by default, there is little advantage to doing overlapped I/O or creating a separate thread for writes at all. Most WriteFile calls are just memcpys at their core, which are lazily written to disk by the OS in an optimal fashion with other writes.
You can, of course, turn off buffered I/O via flags to CreateFile and then there are advantages to doing some sort of async I/O - but you probably didn't/shouldn't do that.
Edit
The OP has clarified they are in fact using unbuffered I/O. In that case the two suggested solutions are nearly identical; internally Windows uses a thread pool to service async I/O requests. But hypothetically Windows can be more efficient because their half is implemented in the kernel, has less context switches, etc.
One advantage with overlapped I/O is that it lets a single thread (or more usually a pool of threads) to handle an arbitrary number of I/O requests concurrently. This might not be an advantage for an single-user, desktop application, but for a server application that can get requests for I/O from many different clients it can be a major win.
Possibly because overlapped I/O in windows will tell Windows to write out the file on it's own time in background, as opposed to spawning a whole new thread and engaging in a blocking operation?