Full standard stream redirection without potential deadlock

Full standard stream redirection without potential deadlock - c++

If I want to redirect stdin, stdout, and stderr, without the risk of deadlock (for example, child process may need more data on stdin to flush stdout), do I have to spawn multiple threads, or is there any other solution to the problem. Current implementation:
std::thread stderr_proc{read, io_redirector.handle(), stderr, io_redirector.stderr()};
std::thread stdout_proc{read, io_redirector.handle(), stdout, io_redirector.stdout()};
write(io_redirector.handle(), stdin, io_redirector.stdin());
int status;
if(::waitpid(pid, &status, 0) == -1) { abort(); }
stdout_proc.join();
stderr_proc.join();
Including the main thread, this implementation uses one thread per stream to avoid deadlock, but I think it is quite heavy-weight to start two new threads. Especially since this is called from one of many worker threads, it would be nice to have a single-threaded solution.

Related

Capture stdout in multithreaded program

I've got a function I need to call from a third-party library which I can't control. That function evaluates a command I pass in and prints its results to stdout. In my use case, I need to capture the results into a std::string variable (not write to a file), which I can do just fine in a single-threaded example:
int fd[2];
pid_t pid;
char *args[] = {};
if ( pid == 0 )
{
dup2( fd[1], STDOUT_FILENO );
close( fd[0] );
close( fd[1] );
char *args[] = {};
// This func will print the results I want to stdout, but I have no control over its code.
festival_eval_command("(print utt2)");
execv( args[0], args );
}
close( fd[1] );
char buffer[1000000];
ssize_t length = read( fd[0], buffer, sizeof(buffer) - 1 );
std::string RESULT = buffer;
memset(buffer, 0, sizeof buffer); // clear the buffer
// RESULT now holds the contents that would have been printed out in third_party_eval().
Some constraints/detail:
My program is multi-threaded, so other threads may be using stdout simultaneously (my understanding is that C++ ties the output from multiple threads into stdout)
The third-party library is Festival, an open-source speech synthesis library written in LISP (which I have no experience in). I'm using its C++ API by calling: festival_eval_command("(print utt2)");
festival_eval_command appears to use stdout, not std::cout (I've tested by redirecting both in a single-threaded program and only the stdout redirection captures the output from utt2)
As far as I can tell from the source, festival_eval_command doesn't allow for an alternate file descriptor.
This function is only being run in one of the threads of my multithreaded program, so I'm only concerned about isolating the festival_eval_command output from the other threads' stdout.
My question: Is there a way I can safely retrieve the just results of festival_eval_command() from stdout in a multi-threaded program? It sounds like my options are:
Launch this function in a separate process, which has its own stdout. Do the IO redirection in that separate process, get the output I need and return it back to my main program process. Is this correct? How would I go about doing this?
Use a mutex around the festival_eval_command. I don't quite understand how mutexes interact with other threads though. If I have this example:
void do_stuff_simultaneously() {
std::cout << "Printing output to terminal..." << std::endl;
}
// main thread
void do_stuff() {
// launch a separate thread that may print to stdout
std::thread t(do_stuff_simultaneously);
// lock stdout somehow
// redirect stdout to string variable
festival_eval_command("(print utt2)");
// unlock stdout
}
Does the locking of stdout prevent do_stuff_simultaneously from accessing it? Is there a way to make stdout thread-safe like this?

However, my program is multi-threaded, so other threads may be using stdout simultaneously
The outputs of threads are going to be interleaved in a fashion you cannot control. Unless each thread writes its entire output using one std::cout.write (see below for why).
Is there a way I can safely retrieve the just results of third_party_eval() from stdout in a multi-threaded program?
Each thread must run that function in a separate process, from which you capture its stdout into a std::string s (different one for each process).
Then in parent process you write that std::string into stdout with
std::cout.write(s.data(), s.size()). std::cout.write locks a mutex (to protect itself from data race and corruption when multiple threads write into it in any way, including operator<<), so that the output of one process is not interleaved with anything else.

Note up front: This shows why globals are often a bad idea! Even more, library code (i.e. code intended for re-use in different contexts) should never use globals. This is also something to tell the supplier of that code, they should fix their library to provide a version that at least takes an output filedescriptor instead of writing to stdout.
Here's what I would consider doing: Move the whole function execution to a separate process. That way, if multiple threads need to run it, they will start separate processes with separate outputs that they can process independently.
An alternative way is to wrap this single function. This wrapper does all the IO redirection and it (being a critical section) is guarded by a mutex, so that two threads invoking the wrapper will be serialized. However, this has downsides, because in the meantime, that code still messes with your process' standard streams (so a stray call to output something would be mixed into the function output).
A second alternative is to put the function into a wrapper process who's only goal is to serialize the use of the function. You'd start that process on demand or on start of your application and use some form of IPC to communicate with it.

recvmsg in blocking mode still works after fd is invalid [duplicate]

Let's say I start a thread to receive on a port. The socket call will block on recvfrom.
Then, somehow in another thread, I close the socket.
On Windows, this will unblock recvfrom and my thread execution will terminate.
On Linux, this does not unblock recvfrom, and as a result, my thread is sitting doing nothing forever, and the thread execution does not terminate.
Can anyone help me with what's happening on Linux? When the socket is closed, I want recvfrom to unblock
I keep reading about using select(), but I don't know how to use it for my specific case.

Call shutdown(sock, SHUT_RDWR) on the socket, then wait for the thread to exit. (i.e. pthread_join).
You would think that close() would unblock the recvfrom(), but it doesn't on linux.

Here's a sketch of a simple way to use select() to deal with this problem:
// Note: untested code, may contain typos or bugs
static volatile bool _threadGoAway = false;
void MyThread(void *)
{
int fd = (your socket fd);
while(1)
{
struct timeval timeout = {1, 0}; // make select() return once per second
fd_set readSet;
FD_ZERO(&readSet);
FD_SET(fd, &readSet);
if (select(fd+1, &readSet, NULL, NULL, &timeout) >= 0)
{
if (_threadGoAway)
{
printf("MyThread: main thread wants me to scram, bye bye!\n");
return;
}
else if (FD_ISSET(fd, &readSet))
{
char buf[1024];
int numBytes = recvfrom(fd, buf, sizeof(buf), 0);
[...handle the received bytes here...]
}
}
else perror("select");
}
}
// To be called by the main thread at shutdown time
void MakeTheReadThreadGoAway()
{
_threadGoAway = true;
(void) pthread_join(_thread, NULL); // may block for up to one second
}
A more elegant method would be to avoid using the timeout feature of select, and instead create a socket pair (using socketpair()) and have the main thread send a byte on its end of the socket pair when it wants the I/O thread to go away, and have the I/O thread exit when it receives a byte on its socket at the other end of the socketpair. I'll leave that as an exercise for the reader though. :)
It's also often a good idea to set the socket to non-blocking mode also, to avoid the (small but non-zero) chance that the recvfrom() call might block even after select() indicated the socket is ready-to-read, as described here. But blocking mode might be "good enough" for your purpose.

Not an answer, but the Linux close man page contains the interesting quote:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a file
descriptor may be reused, there are some obscure race conditions that
may cause unintended side effects.

You are asking for the impossible. There is simply no possible way for the thread that calls close to know that the other thread is blocked in recvfrom. Try to write code that guarantees that this happens, you will find that it is impossible.
No matter what you do, it will always be possible for the call to close to race with the call to recvfrom. The call to close changes what the socket descriptor refers to, so it can change the semantic meaning of the call to recvfrom.
There is no way for the thread that enters recvfrom to somehow signal to the thread that calls close that it is blocked (as opposed to being about to block or just entering the system call). So there is literally no possible way to ensure the behavior of close and recvfrom are predictable.
Consider the following:
A thread is about to call recvfrom, but it gets pre-empted by other things the system needs to do.
Later, the thread calls close.
A thread started by the system's I/O library calls socket and gets the same decsriptor as the one you closed.
Finally, the thread calls recvfrom, and now it's receiving from the socket the library opened.
Oops.
Don'd ever do anything even remotely like this. A resource must not be released while another thread is, or might be, using it. Period.

C++ pthreads multi-tasking [windows]

For a solution to an earlier problem, I was kindly pointed to multi-threading (via pthreads).
The original problem is thus:
I have two functions, one of which is the main body, which is real-time; the other is a continually running function that blocks. The real-time, when attempting to run the blocking function, obvious blocks, making it unresponsive to the user which is unacceptable as a real-time process.
The original aim was to make the blocking function independent of the real-time solution (or at least, pseudo-independent), which I attempted with pthreads.
Here's a simplified version of the code:
void * RenderImages(void * Data)
{
while(1); //Simulating a permanently blocking process
return NULL;
}
int main(int ArgC, char *ArgVar[])
{
pthread_t threads[PTHREAD_NUMBER];
void *Ptr = NULL;
int I = 0;
I = pthread_create(&threads[0], NULL, RenderImages, Ptr);
if(I != 0)
{
printf("pthread_create Error!\n");
return -1;
}
I = pthread_join(threads[0],NULL);
//Doesn't reach here as pthread_join is blocking
printf("Testing!\n");
return 0;
}
The code above, however, blocks on calling pthread_join (which makes pthread nothing more than an unnecessarily complicated way of calling the function directly - which defeats the point).
My question is thus:
What functions would I have to use, to make it so I can run a pthread for a few milliseconds, suspend the process, then run another function, then go back and run the process for a few more milli-seconds etc?
OR
If the above isn't possible, what solution is there to the original problem?

Assuming that the "main" thread only cares when the "blocking" thread has completed its work, I think you want condition variables. Look into pthread_cond_wait and pthread_cond_signal.

pthread_join is the function you use to wait for a thread to end.
http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
Use pthread_sigmask to manage suspend states:
http://man.yolinux.com/cgi-bin/man2html?cgi_command=pthread_sigmask

You can always use 3 threads, one for each function plus the main thread.

What you need is a queuing mechanism. Your main thread will create 'Jobs'. You then place these 'Jobs' onto your backlog queue where your Worker Thread will pick them up and process then. When the job is done. The worker thread places the now completed 'Jobs' onto the completed queue. You main thread can intermittently check the completed queue and if there is a completed job,it will pick up the 'Job' and do whatever it needs to with it. Your worker thread then goes into a wait state until the next job comes along.
There are numerous ways to roll out the queues. The queue can be a unix pipe. A windows IO Completion Port or you can roll out your own with a linked list/arrays, conditional variables and mutexes.

Correct way of checking if threads are done?

I'm using multithreading in my application with _beginthread and right now to wait until all threads are done I have global bools that get set to true as each thread completes so I'm in a while loop until then. There must be a cleaner way of doing this?
Thanks

You can use WaitForMultipleObjects to wait for the threads to finish in primary thread.

What you want to have a look at is thread synchronization techniques - luckily there is quite a bit of information on MSDN which can probably help you out. It's likely you'll want to use Events and WaitHandles here's the main stuff on MSDN: http://msdn.microsoft.com/en-us/library/ms681924%28v=VS.85%29.aspx there are a number of examples.
There's also some info on synchronization in MFC (which may or may not prove helpful, added for reference purposes): http://msdn.microsoft.com/en-us/library/975t8ks0%28VS.71%29.aspx
I've done a bit of searching, but I've had a hard time trying to track down some helpful info for you which doesn't use the MFC implementation. There's a good tutorial here ( http://www.informit.com/library/content.aspx?b=Visual_C_PlusPlus&seqNum=149 ) but, again, using MFC. You could take a look at the MFC implementation of mutexes though as a start.
So, you'd need to get familiar with synchronization functions and structures - all covered here on MSDN: http://msdn.microsoft.com/en-us/library/ms686679%28v=VS.85%29.aspx

Use _beginthreadex instead. Both _beginthread and _beginthreadex return a thread handle, but the thread started with _beginthread automatically closes its handle when it finishes, so using it for synchronization is not reliable.
Thread handle can be used with one of the synchronization functions of Win32, such as WaitForSingleObject or WaitForMultipleObjects.
When done, handles returned by _beginthreadex must be closed with CloseHandle().

The usual method is to keep all of the thread handles and then wait on each handle. When the handle is signaled, the thread has finished so it is removed from the set of threads. I use std::set<HANDLE> to keep track of the thread handles. There are two different methods for waiting on multiple objects in Windows:
Iterate over the set and call WaitForSingleObject with a timeout on each one
Convert the set into an array or vector and call WaitForMultipleObjects
The first sounds inefficient, but it is actually the most direct and least error prone of the two. If you need to wait for all of the threads, then use the following loop:
std::set<HANDLE> thread_handles; // contains the handle of each worker thread
while (!thread_handles.empty()) {
std::set<HANDLE> threads_left;
for (std::set<HANDLE>::iterator cur_thread=thread_handles.begin(),
last=thread_handles.end();
cur_thread != last; ++cur_thread)
{
DWORD rc = ::WaitForSingleObject(*cur_thread, some_timeout);
if (rc == WAIT_OBJECT_0) {
::CloseHandle(*cur_thread); // necessary with _beginthreadex
} else if (rc == WAIT_TIMEOUT) {
threads_left.add(cur_thread); // wait again
} else {
// this shouldn't happen... try to close the handle and hope
// for the best!
::CloseHandle(*cur_thread); // necessary with _beginthreadex
}
}
std::swap(threads_left, thread_handles);
}
Using WaitForMultipleObjects to wait for the threads to finish is a bit more difficult than it sounds. The following will wait for all of the threads; however, it only waits for WAIT_MAXIMUM_OBJECTS threads at a time. Another options is to loop over each page of threads. I'll leave that exercise to the reader ;)
DWORD large_timeout = (5 * 60 * 1000); // five minutes
std::set<HANDLE> thread_handles; // contains the handle of each worker thread
std::vector<HANDLE> ary; // WaitForMultipleObjects wants an array...
while (!thread_handles.empty()) {
ary.assign(thread_handles.begin(), thread_handles.end());
DWORD rc = ::WaitForMultipleObjects(std::min(ary.size(), WAIT_MAXIMUM_OBJECTS),
&ary[0], FALSE, large_timeout);
if (rc == WAIT_FAILED) {
// handle a failure case... this is usually something pretty bad
break;
} else if (rc == WAIT_TIMEOUT) {
// no thread exited in five minutes... this can be tricky since one of
// the threads beyond the first WAIT_MAXIMUM_OBJECTS may have terminated
} else {
long idx = (rc - WAIT_OBJECT_0);
if (idx > 0 && idx < ary.size()) {
// the object at `idx` was signaled, this means that the
// thread has terminated.
thread_handles.erase(ary[idx]);
::CloseHandle(ary[idx]); // necessary with _beginthreadex
}
}
}
This isn't exactly pretty but it should work. If you trust that all of your threads will exit and don't mind waiting for them, then you can use WaitForMultipleObjects(ary.size(), &ary[0], TRUE, INFINITE). This usually isn't very safe though since a runaway thread will cause your application to block indefinitely and it will only work if ary.size() is less than MAXIMUM_WAIT_OBJECTS.
Of course the other option is to find a thread pool implementation and use it instead. Writing threading code is not really a lot of fun especially once you have to support it in the wild. Consider using something like boost::thread_group instead.

You can use boost::thread objects. Call join on the object and it will wait for the thread to finish.

Windows provides events for one thread to notify another. Out of the box Visual C++ provides support for events only inside MFC. For a portable, non-MFC version, check the thread management classes of the Boost library. They make launching and waiting for threads a lot easier, although they don't provide direct access to all of Windows API's functionality.

What is an analog for win32 file locking in boost::interprocess?

What sync mechanism should I use to give exclusive access to the text file in boost?
The file will likely be accessed by threads from only one process.

The file locking APIs are generally for inter process locking. If you are in a single process everything in Boost.Thread package that suits your needs will do. Outside processes the Boost.Interprocess should be used. You might want to read the following warning from Boost.Interprocess:
Caution: Synchronization limitations
If you plan to use file locks just like named mutexes, be careful, because portable file locks have synchronization limitations, mainly because different implementations (POSIX, Windows) offer different guarantees. Interprocess file locks have the following limitations:
It's unspecified if a file_lock synchronizes two threads from the same process.
It's unspecified if a process can use two file_lock objects pointing to the same file.
The first limitation comes mainly from POSIX, since a file handle is a per-process attribute and not a per-thread attribute. This means that if a thread uses a file_lock object to lock a file, other threads will see the file as locked. Windows file locking mechanism, on the other hand, offer thread-synchronization guarantees so a thread trying to lock the already locked file, would block.
The second limitation comes from the fact that file locking synchronization state is tied with a single file descriptor in Windows. This means that if two file_lock objects are created pointing to the same file, no synchronization is guaranteed. In POSIX, when two file descriptors are used to lock a file if a descriptor is closed, all file locks set by the calling process are cleared.
To sum up, if you plan to use file locking in your processes, use the following restrictions:
For each file, use a single file_lock object per process.
Use the same thread to lock and unlock a file.
If you are using a std::fstream/native file handle to write to the file while using file locks on that file, don't close the file before releasing all the locks of the file.

I suppose it is acquire_file_lock
inline bool acquire_file_lock(file_handle_t hnd)
{
struct ::flock lock;
lock.l_type = F_WRLCK;
lock.l_whence = SEEK_SET;
lock.l_start = 0;
lock.l_len = 0;
return -1 != ::fcntl(hnd, F_SETLKW, &lock);
}
It is consistent with a non-boost implementation of a lock.
struct flock fl = {F_WRLCK, SEEK_SET, 0, 0, 0 };
int fd;
fl.l_pid = getpid();
if (argc > 1)
fl.l_type = F_RDLCK;
if ((fd = open("lockdemo.c", O_RDWR)) == -1) {
perror("open");
exit(1);
}
printf("Press <RETURN> to try to get lock: ");
getchar();
printf("Trying to get lock...");
if (fcntl(fd, F_SETLKW, &fl) == -1) {
perror("fcntl");
exit(1);
}
printf("got lock\n");
printf("Press <RETURN> to

If you are sure it will only be accessed from one process, a read-write lock with file handles in thread local storage could be a solution. That would simulate the above with only one writer but several readers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js