Why is it prohibited to use fork without exec in mac? - c++

My question is quite simple.
On Linux it is quite popular to use fork without exec
However, I have found that on MacOS this is not possible (see fork manual)
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fork.2.html
There are limits to what you can do in the child process. To be totally safe you should restrict your-self yourself
self to only executing async-signal safe operations until such time as one of the exec functions is
called. All APIs, including global data symbols, in any framework or library should be assumed to be
unsafe after a fork() unless explicitly documented to be safe or async-signal safe. If you need to use
these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself.
This seems strange to me? What is the reason? Is it possible to workaround it?

It's okay to use fork in OS X, under the same restrictions you would use fork with Linux. Linux has similar caveats or via the Wayback Machine.
If you are building an application that is single-threaded and relies on core UNIX APIs and design philosophy, you should be fine. If you are linking to additional libraries, you should be intimately familiar with their behavior. Imagine linking to a library that started a background thread – after forking you'd be in a potentially undefined state, given only the thread that called fork is cloned.
OS X offers some awesome features for taking advantage of multiple cores, such as Grand Central Dispatch that may be worth considering.
I'd recommend you read this article by Mike Ash on fork safety under OS X.

Related

Is it possible to use fork in modern C++?

Traditional C++ was very straightforward and only a library intended to create threads (like pthread) gave rise to other threads.
Modern C++ is much closer to Java with many functions being thread based, with thread pools ready to run asynchronous jobs, etc. It's much more likely that some library, including the standard library, uses threads to compute asynchronously some function, or sets up the infrastructure to do so even if it isn't used.
In that context, is it ever safe to use functions with global impact like fork?
The answer to this question, like almost everything else in C++, is "it depends".
If we assume there are other threads in the program, and those threads are synchronizing with each other, calling fork is dangerous. This is because, fork does not wait for all threads to be a synchronization point (i.e. mutex release) to fork the process. In the forked process, only the thread that called fork will be present, and the others will have been terminated, possibly in the middle of a critical section. This means any memory shared with other threads, that wasn't a std::atomic<int> or similar, is an undefined state.
If your forked process reads from this memory, or indeed expects the other threads to be running, it is likely not going to work reliably. However, most uses of fork actually have effectively no preconditions on program state. That is because the most common thing to do is to immediately call execv or similar to spawn a subprocess. In this case your entire process is kinda "replaced" by some new process, and all memory from your old process is discarded.
tl;dr - Calling fork may not be safe in multithreaded programs. Sometimes it is safe; like if no threads have spawned yet, or evecv is called immediately. If you are using fork for something else, consider using a thread instead.
See the fork man page and this helpful blog post for the nitty-gritty.
To add to peteigel's answer, my advice is - if you want to fork, do it very early, before any other threads than the main thread are started.
In general, anything you can do in C, you can do in C++, since C++, especially on Linux with clang or gcc extensions, is pretty darn close to a perfect superset of C. Of course, when there are good portable APIs in std C++, use them. The canonical example is preferring std::thread over pthreads C API.
One caveat is pthread_cancel, which must be avoided on C++ due to exceptions. See e.g. pthread cancel harmful on C++.
Here is another link that explains the problem:
pthread_cancel while in destructor
In general, C++ cleanup handling is in general easier and more elegant than C, since RAII is part and parcel of C++ culture, and C does not have destructors.

What is a good and optimized way to run shell command in a pthread?

Basically, I want to compress a file in a pthread thread using gzip.
The first solution that pops up in mind and on Google is to call system().
What does the stackoverflow community suggest?
Shall I use system() in a pthread?
Or shall I myself just fork and exec in pthread? But since pthread is a thread, is it advisable to do a fork() and exec() in pthread thread?
Or what is the better approach than the above?
You shouldn't use system for this, but not because it's expensive. (For any file that is worth bothering to compress, the overhead of any technique for invoking a background gzip compression is negligible relative to the cost of doing the compression itself.) The reason you shouldn't use system is, system invokes a shell, and that means you have to worry about quoting the arguments. If you use fork and execvp instead, you don't have to worry about quoting anything.
The problems associated with mixing fork and wait with threads are real, but they are tractable. If your OS has posix_spawn, it will take care of some of those problems for you. I don't normally recommend posix_spawn because code that uses it is, in general, harder to maintain than code that uses fork, but for this application it should be OK. I cannot fit a comprehensive guide to mixing fork and wait with threads into this answer box.
An alternative you should consider is compressing the data in the thread that would be waiting for the gzip process, using zlib. This avoids the problems of mixing fork and wait with threads, but it adds a library dependency to your program, which may not be as convenient as relying on an external gzip executable.
Start with system call in another thread and only add complexity when needed.
The extra complexity of doing fork/exec or using a zip library is only worth the effort if system is not sufficient for some reason (i.e. you want to redirect both stdin and stdout of the child process into your parent process, or your want to compress a file in memory for sending it over the network without writing new files).

GNU pth vs. pthread

I want to build a portable and efficient server in C++; it will have lots of clients trying to connect at the same time, so it must be able of handling each request parallel.
I have been trying to find documentation, guides... etc. for multithreading. I have found a lot about POSIX Pthread, but almost nothing for GNU Pth (apart from the official manual in gnu.org).
So, can anyone explain me the difference between POSIX Pthread and GNU Pth? Please, I want the response not to be a copy of Wikipedia's contents (keep in mind that I'm an absolute newbie to multithreading). I want my server to be portable and efficient between all *nix-based systems, keeping away of using heavy fork()s.
Thanks for your help.
PS: I think it's better to ask this here: what about Windows? Are Pthreads or Pth an option there? If not, what is the API for that operating system?
Use Pthreads, it's much more widely used, so there is far more information and support available for it. I've never met anyone who actually uses GNU Pth. Or better yet if you are using C++11 use std::thread and if not then use boost::thread.
So, can anyone explain me the difference between POSIX Pthread and GNU Pth?
Pthreads is a cross-platform standard for pre-emptible multithreading, meaning (usually) the OS kernel manages the threads and the OS scheduler decides when each thread gets to run (if you have a single core only one thread can run at a time, if you have multiple cores multiple threads can run at a time). The OS scheduler could pause any thread at (almost) any time and let another thread run, so each thread gets a limited "time slice" and then other threads get to run.
GNU Pth is a non-preemptible user-space threading library, meaning the threads and which ones run at which time are decided in user-space not by the kernel. Some people say programs using non-preemptible threading libraries are easier to understand, because your thread won't get paused at arbitrary times for another thread to run.
I want my server to be portable and efficient between all *nix-based systems, keeping away of using heavy fork()s.
fork is not heavy on UNIX.
what about W*ndows? Are Pthreads or Pth an option there? If not, what is the API for that operating system?
There are pthreads APIs for Windows, but they're not native to the Windows OS. I don't know if GNU Pth works on Windows - I doubt it, unless you use Cygwin. Windows has its own Win32 thread model.
Using std::thread or boost::thread is portable to POSIX platforms and Windows, and makes certain parts of the API easier to use (specifically, locking and unlocking mutexes can be easily done in an exception safe way and condition variables are easier to use.)
Gnu PTH is for a very limited use case: you want to use a multi-threaded implementation paradigm but you don't want to use multiple CPUs or cores and you don't want to rely on any OS or kernel-level support. Since almost all general-purpose CPUs now have multiple cores, this use case is increasingly irrelevant.
Windows has a separate threading model from POSIX; if you want your application to be cross-platform it is best to use a cross-platform threading library such as boost::thread.
I think GNUs PTH is meaned for C in the first place. You can use it on C++ too but C++ have its own anyway.
There are quite some applications using pth like low-level burning tools (and so GUI-Tools like K3B and Brasero depend on pth), also GnuPG uses PTH, the package management of Archlinux and some multimedia stuff.
On Windows its always a bit complicated. Microsoft did never get over the fact that C is the Programming Language from/for UNIX-Systems and so is suffering the NIH Symptome (Not Invented Here)
So they do a lot of stuff without any advantage just to be different.
If you use an Application which should run everywhere and its not low-level, use Qt with its QThreads and QThreadPool
Its 100% the same on all operating systems
You need much less code
If you write an "low-level" application i recommend to split your applications into backends and frontends and write a own backend for each OS and use the library which will do the least problems.

Multiplatform multiprocessing?

I was wondering why in the new C++11 they added threads and not processes.
Couldn't have they done a wrapper around platform specific functions?
Any suggestion about the most portable way to do multiprocessing? fork()? OpenMP?
If you could use Qt, QProcess class could be an elegant platform independent solution.
If you want to do this portably I'd suggest you avoid calling fork() directly and instead write your own library function that can be mapped on to a combination of fork() and exec() on systems where that's available. If you're careful you can make your function have the same or similar semantics as CreateProcess() on Win32.
UNIX systems tend to have a quite different approach to processes and process management compared to Windows based systems so it's non-trivial to make all but the simplest wrappers portable.
Of course if you have C++11 or Boost available I'd just stick with that. If you don't have any globals (which is a good thing generally anyway) and don't set up and shared data any other way then the practical differences between threads and processes on modern systems is slim. All the threads you create can make progress independently of each other in the same way the processes can.
Failing that you could look at An MPI implementation if message passing suits your task, or a batch scheduler system.
I am using Boost Interprocess.
It does not provide the possibility to create new processes, but once they are there, it allows them to communicate.
In this particular case I can create the processes I need from a shell script.

C++ master/worker

I am looking for a cross-platform C++ master/worker library or work queue library. The general idea is that my application would create some sort of Task or Work objects, pass them to the work master or work queue, which would in turn execute the work in separate threads or processes. To provide a bit of context, the application is a CD ripper, and the the tasks that I want to parallelize are things like "rip track", "encode WAV to Mp3", etc.
My basic requirements are:
Must support a configurable number of concurrent tasks.
Must support dependencies between tasks, such that tasks are not executed until all tasks that they depend on have completed.
Must allow for cancellation of tasks (or at least not prevent me from coding cancellation into my own tasks).
Must allow for reporting of status and progress information back to the main application thread.
Must work on Windows, Mac OS X, and Linux
Must be open source.
It would be especially nice if this library also:
Integrated with Qt's signal/slot mechanism.
Supported the use of threads or processes for executing tasks.
By way of analogy, I'm looking for something similar to Java's ExecutorService or some other similar thread pooling library, but in cross-platform C++. Does anyone know of such a beast?
Thanks!
I haven't used it in long enough that I'm not positive whether it exactly meets your needs, but check out the Adaptive Communications Environment (ACE). This library allows you to construct "active objects" which have work queues and execute their main body in their own threads, as well as thread pools that can be shared amoung objects. Then you can pass queue work objects on to active objects for them to process. Objects can be chained in various ways. The library is fairly heavy and has a lot to it to learn, but there have been a couple of books written about it and theres a fair amount of tutorial information available online as well. It should be able to do everything you want plus more, my only concern is whether it possesses the interfaces you are looking for 'out of the box' or if you'd need to build on top of it to get exactly what you are looking for.
I think this calls for intel's Threading Building Blocks, which pretty much does what you want.
Check out Intels' Thread Building Blocks library.
Sounds like you require some kind of "Time Sharing System".
There are some good open source ones out there, but I don't know
if they have built-in QT slot support.
This is probably a huge overkill for what you need but still worth mentioning -
BOINC is a distributed framework for such tasks. There's a main server that gives out tasks to perform and a cloud of workers that do its bidding. It is the framework behind projects like SETI#Home and many others.
See this post for creating threads using the boost library in C++:
Simple example of threading in C++
(it is a c++ thread even though the title says c)
basically, create your own "master" object that takes a "runnable" object and starts it running in a new thread.
Then you can create new classes that implement "runnable" and throw them over to your master runner any old time you want.