Share a file descriptor between threads

Share a file descriptor between threads - c++

I have many POSIX threads, two reader that read from serial port and others write to same port using a file descriptor. How can I share same descriptor between them? I have synchronized read/write and write/write actions between all threads by semaphores.
Note: I'm supposing a file descriptor should be shared between threads of same process but my code fails to run with a EBUSY error when second reader tries to read from port. (asked a question before)
Update
This is a little weird situation, even if only one thread is present at runtime, any call to read() after write() return -l with EBUSY error. Maybe I'm asking wrong question. There should be a some kind of flush after each write() to make sure that device is free? or somehow force write() to block?

Clearly, the EBUSY return code signals that the port is in use, and should be queried again later. Your threads should just wait a little bit and try again, until the command passes.
You sort of mention in one of your comments that the system behind the port is a mechanical one, which would explain why it could take a little while for a command to get processed.
I think the "one thread to handle IO" is the best approach. Each read/write would block the thread and avoid the EBUSY problem you are witnessing. All you would have left to do is implement a command queue (very easy with std::queue or similar and just just one mutex to sync all accesses).
UPDATE: reading your update, I guess that EBUSY are just the sign that commands are really slow to execute, and finish a little while after the system call returned, to the point that even when one single thread is doing IO, it may experience it. As I said at the beginning of my answer, have the thread wait a bit before reissuing its command, and that should do it.

Open file with 'O_NONBLOCK' flag.

Related

fopen and fwrite to the same file from multiple threads

This is similar but a bit different to existing questions. Say I have many threads that open the same file but they all do their own fopen and maintain their own FILE pointer.
a) is it necessary to lock fwrite calls if they have their own FILE ptrs?
b) if it is necessary, is locking around fwrite enough or will they potentially flush at different times and end up intermingling when they flush? If yes, would locking on fwrite and then fflush cover it?

This question can not be answered in the context of programming languages. As far as programming language is concerned, those file handles are completely independent objects, and whatever you do with one has no effect whatsoever on another.
The question is on the operating system - can it handle multiple write operation to the same underlying file at the same time. In other words, are those writes atomic. I can't say for all of them, but in Linux, for example, writes for less than PIPE_BUF size are atomic.

For the quick measure, yeah, you can put a lock around the I/O part. That'd work, I guarantee it. As for flusing I/O cache, I'd recommend not doing that. It's always best to let OS to handle I/O timing because kernel knows what's going on the best. You are not gonna have it in effect immediately after calling flush anyway because it's that complicated. Just like the other flush operations(java GC, glFlush and so on). If you choose to stick to this option, please be mindful of a start and an end point of the concurrent I/O op. You wouldn't want a case where the main thread closes the file and another worker thread tries to do I/O on that.
The general solution to this problem is creating a thread that handles the file exclusively. If other thread should read/write from/to the file, they must ask the thread to do that for them. This is tricky, I know. You'd need to compose a simple protocol, sync mechanism, but in a nutshell, it goes like this:
prep a queue, a cv(condition variable), a lock. create a thread and open the file. Doesn't matter who opens the file
The thread spawns and waits for the queue to be filled in
Other threads send a request I/O op to the thread. The request includes the data for the file and an op code.
The thread handles the requests from the queue. This is where the real I/O happens.
You could use anonymous FIFO instead of a queue. Or skip the opcode part if the file is write-only.
Unlike network I/O, modern OSes can't do file I/Os in a non-blocking manner. So expect a significant blocking time(io wait). Also, there's this problem where the queue fills up too quick and eats a lot of memory when I/O is relatively slow. There will be a case where the whole program should wait for the I/O to complete before terminating itself. Not much you can do about it. You could close the file from another thread while I/O is in progress on Linux(close() is MT-safe ), I don't know how that's gonna work on other OS.
There are alternatives like async file I/O or overlapped I/O which involves signal handling or callbacks. Using these doesn't require a creating of a thread but each has pros and cons, mostly regarding portability.

Are file descriptors thread safe [duplicate]

What would happen if you call read (or write, or both) in two different thread, on the same file descriptor (lets says we are interested about a local file, and a it's a socket file descriptor), without using explicitly a synchronization mechanism?
Read and Write are syscall, so, on a single core CPU, it's probably unlucky that two read would be executed "at the same time". But with multiple cores...
What the linux kernel will do?
And let's be a bit more general : is the behavior always the same for other kernels (like BSDs) ?
Edit : According to the close documentation, we should be sure that the file descriptor isn't used by a syscall in an other thread. So it seams that explicit synchronization would be required before closing a file descriptor (and so, also around read/write if thread that may call it are still running).

Any system level (syscall) file descriptor access is thread safe in all mainstream UNIX-like OSes.
Though depending on the age they are not necessarily signal safe.
If you call read, write, accept or similar on a file descriptor from two different tasks then the kernel's internal locking mechanism will resolve contention.
For reads each byte may be only read once though and writes will go in any undefined order.
The stdio library functions fread, fwrite and co. also have by default internal locking on the control structures, though by using flags it is possible to disable that.

The comment about close is because it doesn't make a lot of sense to close a file descriptor in any situation in which some other thread might be trying to use it. So while it is 'safe' as far as the kernel is concerned, it can lead to odd, hard to diagnose corner cases.
If a thread closes a file descriptor while a second thread is trying to read from it, the second thread may get an unexpected EBADF error. Worse, if a third thread is simultaneously opening a new file, that might reallocate the same fd, and the second thread might accidentally read from the new file rather than the one it was expecting...

Have a care for those who follow in your footsteps
It's perfectly normal to protect the file descriptor with a mutex semaphore. It removes any dependence on kernel behaviour so your message boundaries are now certain. You then don't have to cite the last paragraph at the bottom of a 15,489 line manpage which explains why the mutex isn't necessary (I exaggerated, but you get my meaning)
It also makes it clear to anyone reading your code that the file descriptor is being used by more than one thread.
Fringe Benefit
There is a fringe benefit to using a mutex that way. Suppose you've got different messages coming from the different threads and some of those messages are more important than others. All you need to do is set the thread priorities to reflect their messages' importance. That way the OS will ensure that your messages will be sent in order of importance for minimal effort on your part.

The result would depend on how the threads are scheduled to run at that particular instant in time.
One way to potentially avoid undefined behavior with multi-threading is to assume that you are doing memory operations. E.g. updating a linked list or changing a variable, etc.
If you use mutex/semaphores/lock or some other synchronization mechanism, it should work as intended.

Waiting on a condition (pthread_cond_wait) and a socket change (select) simultaneously

I'm writing a POSIX compatible multi-threaded server in c/c++ that must be able to accept, read from, and write to a large number of connections asynchronously. The server has several worker threads which perform tasks and occasionally (and unpredictably) queue data to be written to the sockets. Data is also occasionally (and unpredictably) written to the sockets by the clients, so the server must also read asynchronously. One obvious way of doing this is to give each connection a thread which reads and writes from/to its socket; this is ugly, though, since each connection may persist for a long time and the server thus may have to hold hundred or thousand threads just to keep track of connections.
A better approach would be to have a single thread that handled all communications using the select()/pselect() functions. I.e., a single thread waits on any socket to be readable, then spawns a job to process the input that will be handled by a pool of other threads whenever input is available. Whenever the other worker threads produce output for a connection, it gets queued, and the communication thread waits for that socket to be writable before writing it.
The problem with this is that the communication thread may be waiting in the select() or pselect() function when output is queued by the worker threads of the server. It's possible that, if no input arrives for several seconds or minutes, a queued chunk of output will just wait for the communication thread to be done select()ing. This shouldn't happen, however--data should be written as soon as possible.
Right now I see a couple solutions to this that are thread-safe. One is to have the communication thread busy-wait on input and update the list of sockets it waits on for writing every tenth of a second or so. This isn't optimal since it involves busy-waiting, but it will work. Another option is to use pselect() and send the USR1 signal (or something equivalent) whenever new output has been queued, allowing the communication thread to update the list of sockets it is waiting on for writable status immediately. I prefer the latter here, but still dislike using a signal for something that should be a condition (pthread_cond_t). Yet another option would be to include, in the list of file descriptors on which select() is waiting, a dummy file that we write a single byte to whenever a socket needs to be added to the writable fd_set for select(); this would wake up the communications server because that particular dummy file would then be readable, thus allowing the communications thread to immediately update it's writable fd_set.
I feel intuitively, that the second approach (with the signal) is the 'most correct' way to program the server, but I'm curious if anyone knows either which of the above is the most efficient, generally speaking, whether either of the above will cause race conditions that I'm not aware of, or if anyone knows of a more general solution to this problem. What I really want is a pthread_cond_wait_and_select() function that allows the comm thread to wait on both a change in sockets or a signal from a condition.
Thanks in advance.

This is a fairly common problem.
One often used solution is to have pipes as a communication mechanism from worker threads back to the I/O thread. Having completed its task a worker thread writes the pointer to the result into the pipe. The I/O thread waits on the read end of the pipe along with other sockets and file descriptors and once the pipe is ready for read it wakes up, retrieves the pointer to the result and proceeds with pushing the result into the client connection in non-blocking mode.
Note, that since pipe reads and writes of less then or equal to PIPE_BUF are atomic, the pointers get written and read in one shot. One can even have multiple worker threads writing pointers into the same pipe because of the atomicity guarantee.

Unfortunately, the best way to do this is different for each platform. The canonical, portable way to do it is to have your I/O thread block in poll. If you need to get the I/O thread to leave poll, you send a single byte on a pipe that the thread is polling. That will cause the thread to exit from poll immediately.
On Linux, epoll is the best way. On BSD-derived operating systems (including OSX, I think), kqueue. On Solaris, it used to be /dev/poll and there's something else now whose name I forget.
You may just want to consider using a library like libevent or Boost.Asio. They give you the best I/O model on each platform they support.

Your second approach is the cleaner way to go. It's totally normal to have things like select or epoll include custom events in your list. This is what we do on my current project to handle such events. We also use timers (on Linux timerfd_create) for periodic events.
On Linux the eventfd lets you create such arbitrary user events for this purpose -- thus I'd say it is quite accepted practice. For POSIX only functions, well, hmm, perhaps one of the pipe commands or socketpair I've also seen.
Busy-polling is not a good option. First you'll be scanning memory which will be used by other threads, thus causing CPU memory contention. Secondly you'll always have to return to your select call which will create a huge number of system calls and context switches which will hurt overall system performance.

Select() system call in threads?

I am reading data from multiple serial ports. At present I am using a custom signal handler (by setting sa_handler) to compare and wake threads based on file descriptor information. I was searching for a way out to have individual threads with unique signal handlers, in this regard I found that select system call is to be used.
Now I have following questions:
If I am using a thread (Qt) then where do I put the select system call to monitor the serial port?
Is the select system call thread safe?
Is it CPU intensive because there are many things happening in my app including GUI update?
Please do not mind, if you find these questions ridiculous. I have never used such a mechanism for serial communication.

The POSIX specification (select) is the place to look for the select definition. I personally recommend poll - it has a better interface and can handle any number of descriptors, rather than a system-defined limit.
If I understand correctly you're waking threads based on the state of certain descriptors. A better way would be to have each thread have its own descriptor and call select itself. You see, select does not modify the system state, and as long as you use thread-local variables it'll be safe. However, you will definitely want to ensure you do not close a descriptor that a thread depends on.
Using select/poll with a timeout leaves the "waiting" up to the kernel side, which means the thread is usually put to sleep. While the thread is sleeping it is not using any CPU time. A while/for loop on a select call without a timeout on the other hand will give you a higher CPU usage as you're constantly spinning in the loop.
Hope this helps.
EDIT: Also, select/poll can have unpredictable results when working with the same descriptor in multiple threads. The simple reason for this is that the first thread might be woken up because the descriptor is ready for reading, but the second thread has to wait for the next "available for reading" wakeup.
As long as you're not selecting on the same descriptor in multiple threads you should not have a problem.

It is a system call -- it should be thread safe, I think.
I did not do this before, but I would be rather surprised, if it where not. How CPU intensive select() is, depends in my opinion largely on the number of file handles you are waiting for. select() is mostly used, to wait for a number (>1) of file handles to become ready.
It should also be mentioned that select() should not be used to poll the file handles -- for performance reason. Normal usage is: You have your work done and some time can elapse till the next thing is going on. Now you suspend your process with select and let another process run. select() normally does suspend the active process. How this works together with threads, I am not sure! I would think, that the whole process (and all threads) are suspended. But this might be documented. It also could depend (on Linux) whether you use system-threads or User-Threads. The kernel will not know User-Threads and hence suspend the whole process.

Kill a blocked Boost::Thread

I am writing an application which blocks on input from two istreams.
Reading from either istream is a synchronous (blocking) call, so, I decided to create two Boost::threads to do the reading.
Either one of these threads can get to the "end" (based on some input received), and once the "end" is reached, both input streams stop receiving. Unfortunately, I cannot know which will do so.
Thus, I cannot join() on both threads, because only one thread (cannot be predetermined which one) will actually return (unblock).
I must somehow force the other to exit, but it is blocked waiting for input, so it cannot itself decide it is time to return (condition variables or what not).
Is their a way to either:
Send a signal a boost::thread, or
Force an istream to "fail", or
Kill a Boost::thread?
Note:
One of the istreams is cin
I am trying to restart the process, so I cannot close the input streams in a way that prohibits reseting them.
Edit:
I do know when the "end" is reached, and I do know which thread has successfully finished, and which needs to be killed. Its the killing I need to figure out (or a different strategy for reading from an istream).
I need both threads to exit and cleanup properly :(
Thanks!

I don't think there is a way to do it cross platform, but pthread_cancel should be what you are looking for. With a boost thread you can get the native_handle from a thread, and call pthread_cancel on it.
In addition a better way might be to use the boost asio equivalent of a select call on multiple files. That way one thread will be blocked waiting for the input, but it could come from either input stream. I don't know how easy it is to do something like this with iostreams though.

Yes there is!
boost::thread::terminate() will do the job to your specifications.
It will cause the targeted thread to throw an exception. Assuming it's uncaught, the stack will unwind properly destroying all resources and terminating thread execution.
The termination isn't instant. (The wrong thread is running at that moment, anyway.)
It happens under predefined conditions - the most convenient for you would probably be when calling boost::this_thread::sleep();, which you could have that thread do periodically.

If a boost thread is blocking on an i/o operation (e.g. cin>>whatever), boost::thread::terminate() will not kill the thread. cin i/o is not a valid termination point. Catch 22.

Well on linux, I use pthread_signal(SIGUSR1), as it interrupts blocking IO. There no such call on windows as I discovered when porting my code. Only a deprecated one in socket reading call. In windows you have to explicitly define an event that will interrupt your blocking call. So there no such thing (AFAIK) as a generic way to interrupt blocking IO.
The boost.thread design handle this by managing well identified interrupt points. I don't know boost.asio well and it seems that you don't want to rely on it anyway. If you don't want to refactor to use non-blocking paradigm, What you can do is using something between non-blocking (polling) and blocking IO. That is do something like (pseudo code ?) :
while(!stopped && !interrupted)
{
io.blockingCall(timeout);
if(!stopped && !interrupted)
{
doSomething();
}
}
Then you interrupt your two threads and join them ...
Perhaps it is simpler in your case ? If you have a master thread that knows one thread is ended you just have to close the IO of the other thread ?
Edit:
By the way I'm interested in the final solution you have ...

I had a similar issue myself and have reached this solution, which some other readers of this question might find useful:
Assuming that you are using a condition variable with a wait() command, it is important for you to know that in Boost, the wait() statement is a natural interrupt point. So just put a try/catch block around the code with the wait statement and allow the function to terminate normally in your catch block.
Now, assuming you have a container with your thread pointers, iterate over your thread pointers and call interrupt() on each thread, followed by join().
Now all of your threads will terminate gracefully and any Boost-related memory cleanup should work cleanly.

Rather than trying to kill your thread, you can always tryjoin the thread instead, and if it fails, you join the other one instead. (Assuming you will always be able to join at least one of your two threads).
In boost:thread you're looking for the timed_join function.
If you want to look at the correct answer, however, that would be to use non-blocking io with timed waits. Allowing you to get the flow structure of synchronous io, with the non-blocking of asynchronous io.
You talk about reading form an istream, but an istream is only an interface. for stdin, you can just fclose the stdin file descriptor to interrupt the read. As for the other, it depends an where you're reading from...

It seems that threads are not helping you do what you want in a simple way. If Boost.Asio is not to your liking, consider using select().
The idea is to get two file descriptors and use select() to tell you which of them has input available. The file descriptor for cin is typically STDIN_FILENO; how to get the other one depends on your specifics (if it's a file, just open() it instead of using ifstream).
Call select() in a loop to find out which input to read, and when you want to stop, just break out of the loop.

Under Windows, use QueueUserAPC to queue a proc which throws an exception. That approach works fine for me.
HOWEVER: I've just found that boost mutexes etc are not "alertable" on win32, so QueueUserAPC cannot interrupt them.

Very late, but in Windows (and it's precursors like VMS or RSX for those that rember such things) I'd use something like ReadFileEx with a completion routine that signals when finished, and CancelIO if the read needs to be cancelled early.
Linux/BSD has an entirely different underlying API which isn't as flexible. Using pthread_kill to send a signal works for me, that will stop the read/open operation.
It's worth implementing different code in this area for each platform, IMHO.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js