Using system() in a thread C++ - c++

I want to use the method system() in a (non-main) thread(pthread) in C++. For example,
system("/path/to/some/script.sh");
Is this permitted? If so, is it safe and are there any precautions I should take?
The reason I'm asking is that I've had the following comment from a code reviewer:
"The rule is system() can only be called from a single-threaded process. I think you need to move your new code to a separate application."
Is the first sentence of the comment valid?

As of GNU/Linux implementation of system, it modifies the process signal mask during command execution. In multithreaded program, we're in for nasty surprises, e.g. if another thread forks at the same time.

I wouldn't do it for a wide variety of different reasons, the problem with signal masks just being one.
In general, fork and threads are a tricky mix and need to be handled with care. The existing library functions were likely not written with a multi-threaded program in mind.

Related

Is it possible to use fork in modern C++?

Traditional C++ was very straightforward and only a library intended to create threads (like pthread) gave rise to other threads.
Modern C++ is much closer to Java with many functions being thread based, with thread pools ready to run asynchronous jobs, etc. It's much more likely that some library, including the standard library, uses threads to compute asynchronously some function, or sets up the infrastructure to do so even if it isn't used.
In that context, is it ever safe to use functions with global impact like fork?
The answer to this question, like almost everything else in C++, is "it depends".
If we assume there are other threads in the program, and those threads are synchronizing with each other, calling fork is dangerous. This is because, fork does not wait for all threads to be a synchronization point (i.e. mutex release) to fork the process. In the forked process, only the thread that called fork will be present, and the others will have been terminated, possibly in the middle of a critical section. This means any memory shared with other threads, that wasn't a std::atomic<int> or similar, is an undefined state.
If your forked process reads from this memory, or indeed expects the other threads to be running, it is likely not going to work reliably. However, most uses of fork actually have effectively no preconditions on program state. That is because the most common thing to do is to immediately call execv or similar to spawn a subprocess. In this case your entire process is kinda "replaced" by some new process, and all memory from your old process is discarded.
tl;dr - Calling fork may not be safe in multithreaded programs. Sometimes it is safe; like if no threads have spawned yet, or evecv is called immediately. If you are using fork for something else, consider using a thread instead.
See the fork man page and this helpful blog post for the nitty-gritty.
To add to peteigel's answer, my advice is - if you want to fork, do it very early, before any other threads than the main thread are started.
In general, anything you can do in C, you can do in C++, since C++, especially on Linux with clang or gcc extensions, is pretty darn close to a perfect superset of C. Of course, when there are good portable APIs in std C++, use them. The canonical example is preferring std::thread over pthreads C API.
One caveat is pthread_cancel, which must be avoided on C++ due to exceptions. See e.g. pthread cancel harmful on C++.
Here is another link that explains the problem:
pthread_cancel while in destructor
In general, C++ cleanup handling is in general easier and more elegant than C, since RAII is part and parcel of C++ culture, and C does not have destructors.

What is a good and optimized way to run shell command in a pthread?

Basically, I want to compress a file in a pthread thread using gzip.
The first solution that pops up in mind and on Google is to call system().
What does the stackoverflow community suggest?
Shall I use system() in a pthread?
Or shall I myself just fork and exec in pthread? But since pthread is a thread, is it advisable to do a fork() and exec() in pthread thread?
Or what is the better approach than the above?
You shouldn't use system for this, but not because it's expensive. (For any file that is worth bothering to compress, the overhead of any technique for invoking a background gzip compression is negligible relative to the cost of doing the compression itself.) The reason you shouldn't use system is, system invokes a shell, and that means you have to worry about quoting the arguments. If you use fork and execvp instead, you don't have to worry about quoting anything.
The problems associated with mixing fork and wait with threads are real, but they are tractable. If your OS has posix_spawn, it will take care of some of those problems for you. I don't normally recommend posix_spawn because code that uses it is, in general, harder to maintain than code that uses fork, but for this application it should be OK. I cannot fit a comprehensive guide to mixing fork and wait with threads into this answer box.
An alternative you should consider is compressing the data in the thread that would be waiting for the gzip process, using zlib. This avoids the problems of mixing fork and wait with threads, but it adds a library dependency to your program, which may not be as convenient as relying on an external gzip executable.
Start with system call in another thread and only add complexity when needed.
The extra complexity of doing fork/exec or using a zip library is only worth the effort if system is not sufficient for some reason (i.e. you want to redirect both stdin and stdout of the child process into your parent process, or your want to compress a file in memory for sending it over the network without writing new files).

exit fails to set error code

I have a C++ Windows program that fails to set the exit code. The program is very complex and I'm currently unable to reproduce this with a simple test case. I do know that the program calls exit(1) because I have a breakpoint on that line. Immediately after I step over it, the debugger (VS2010) prints The program program.exe has exited with code 0 (0x0). When I run it from the shell, %ERRORLEVEL% is also set to 0.
I use subsystem:console and plain old main (no WinMain).
This only happens on Windows Server 2008 R2, not on my Windows 8.1 laptop. I'm running the same executable on both.
I have tried to use exit, _exit, ExitProcess, and return (the offending call is in main), but none of those seem to have any effect. I also have tried to return other codes, also with no result.
There's a similar question but I cannot reproduce the results described in it. My program does use threads.
How can I approach debugging this issue? I'm rather baffled.
I have tried to use exit, _exit, ExitProcess, and return
You've eliminated all reasonable explanations, particularly with ExitProcess(). There is only one possibility left, you need to try TerminateProcess(). If that still doesn't set the exit code then you need to shove that machine out of a 4th story window.
But with the expectation that it now works. The difference between ExitProcess() and TerminateProcess() is that the former ensures that all DLLs are notified by the termination. Their DllMain() function gets called with fdwReason = DLL_PROCESS_DETACH. Which gives a DLL the opportunity to do something icky like calling Exit/TerminateProcess() itself, thus screwing up the exit code.
Finding such a DLL can be difficult if you don't have all the source code. Could be an injected one as well, there are entirely too many around these days. Best thing to do is to set a breakpoint on system call so you can catch it in the act, you probably want to do this regardless.
Once you step into main(), use Debug > New Breakpoint > Break at Function and enter {,,ntdll.dll}_NtTerminateProcess#8. Press F5 and the debugger now stops just before the program terminates. Look at the Call Stack to find the evil-doer.
Strange symptoms involving exit(), _exit(), ExitProcess(), and others in a multithreaded program - particularly if the symptoms vary between hosts - have a smell of a variable being modified or accessed by different threads, without synchronisation.
Looking at the other thread you linked to, it appears you are using a volatile variable to communicate between threads, but not using any form of synchronisation (for example, code which accesses the value of that variable and code that modifies that value need to cooperate via means of a critical section, mutex, or comparable construct).
That little bit of indirect evidence makes the smell even stronger.
The basic problem I suspect is that declaring a variable as volatile is neither necessary nor sufficient to ensure that variable always has values that will make sense to your program. In particular, it is not sufficient to prevent a thread which is modifying a variable from being preempted when the modification is only partly complete, and for another thread to attempt accessing or modifying the affected variable.
If you look up some articles by Herb Sutter (particularly those concerned with thread synchronisation in his "Guru of the Week" series) you will find detailed explanations of why that is so. Other authors also describe such things, but Sutter's articles are ones that I recall offhand.
The solution is to introduce some means of synchronisation, and for EVERY thread in your program to religiously use it before accessing or modifying variables shared between them. This avoids the various problems (race conditions, operations being preempted partway through) that would cause symptoms like you describe.
Such problems are rarely caught by stepping through with a debugger. The reason for that is that the symptoms are an emergent property. Several unlikely and often independent occurrences, in disparate threads of execution, must occur together. Debuggers do typically change the timing of events in programs, and timing is a critical consideration in the symptoms emerging.
Options include making key variables atomic (so particular operations cannot be preempted), critical sections (where the threads explicitly cooperate within a program), or mutexes (which, depending on definition, allows threads in different programs to explicitly cooperate before accessing shared memory).
Yes, this introduces a bottleneck in your program - a point where every thread must rendezvous and potentially wait for each other. That can affect throughput of your program. Some people advocate using volatile variables to avoid such concerns. More often than not, the result is intermittent symptoms in long running programs like you have described in this question and the "similar question" you linked to.
It doesn't matter whether you use standard means of synchronisation (e.g. introduced in C++11) or windows specific means (WIN API functions). The important thing is that you use a deliberate synchronisation method, rather than just making variables volatile. Different options for synchronisation have different trade-offs, so you will need to make a decision relevant to needs of your program.
Another consideration is to signal all threads so they close cleanly, wait until they are all closed, capture their exit codes, and THEN exit the program. It is often less error prone to do this in the thread running main() - which ultimately starts the process, so is more likely to have access to information it needs to cleanup correctly. If another thread decides the program needs to exit, then better if it communicates that need back to main() to do it.

Forcing the operation system to perform cleanup after a "subroutine"

I'm writing an images-processing program in C++. For this purpose, I have modified a third party program (an edge detector) into a static library that I use in my program.
It seams the original edge detector relied on the OS to clean up the memory after the main function has been executed. Unfortunately, after I have modified this original code, the main function became a "common" repeatedly called function and no automatic cleaning is therefore performed. The result is a huge memory leak every time the function is called.
I'm not able to exhaustively review the whole code of the detector to fix this. I'd like therefore to ask: In general, is there a way to separate a "subroutine" of the whole program (in my case the detector) from the rest and to force the OS to clean up after the subroutine as if it was a stand-alone program? Could there be a solution with the use of threads, for example?
Thank you for your replies.
If you are using a *nix platform, perhaps you could fork the library call.
You could run it in a separate process that would be kicked in from your program.
There are ways to pipe the child-process'es stdin, stdout, so you can control it.
What you could also try is to use valgrind to detect the leaks and fix them.
If you are on linux, you can google for: fork() or system() functions to create a child process.

Catching Signals c++

I have a boost threadpool which I use to do certain tasks. I also have a Sensor class that has the pure virtual function doWork(int total) = 0;. Whenever it is requested, my main process gets the necessary Sensor pointer and tells the threadpool to run Sensor::doWork(int total).
threadpool->schedule(boost::bind(&Sensor::doWork,this,123456));
I am dynamically loading libraries of type Sensor, thus it is out of my control if someone else has faulty coding which results in SEGFAULTS and such. So is there a way for me to (in my main process) handle any errors thrown by Sensor::doWork(int total), clean up the thread, delete that sensor object and notify the console what and where the error has occurred?
Really the only way to handle a segmentation fault here is to run Sensor::doWork in a completely separate process.
In UNIX, this involves using fork (or some other similar means), running Sensor::doWork in the child process, and then somehow shuttling the results back to the parent process.
I assume similar means are available in Windows.
EDIT: I thought I'd flesh out a bit some of the things you can do.
Solution #1: you can work with processes in the same fashion as you would threads. For example, you could create process pool that sit there in a loop of
Wait for a task to be passed in over a pipe or queue or some similar object
Perform the task
Return the results over a pipe or queue or some similar object
And since you're executing the tasks in the other processes, you're protected against them crashing. The main difficulty with this solution is actually communicating between processes; maybe boost's interprocess library will help with that. I've mainly done this sort of thing in python, which has a standard multiprocessing module that handles this stuff for you.
Solution #2: You could divide your application into "safe" and "risky" portions that run in different processes. The "risky" portion executes the Sensor::doWork methods and anything else you might want to do in that process -- but only work that is acceptable to be spontaneously lost if it crashes. The "safe" portion deals with any precious information that you cannot afford to lose, and monitors the "risky" portion, performing some recovery operations when the child crashes. And, of course, whatever other work you decide you want to do in the safe part.
If you got a SIGSEGV, even if you caught it you have no guarantee about your program state so there's pretty much no way to recover.
If you're working with 3rd party libraries, and they're buggy, and the library maintainer won't fix it (and you don't have the source) then your only recourse is to run the third party library from within a totally separate binary that talks to the main binary by some means. See for example firefox and plugin-container.
You might want to register a function callback to catch SIGSEV. In C this can be done using signal. Be aware, however, there is not much you can do, when the OS sends you a SIGSEV (note that it isn't required to). You don't really know in what state your program is in, I'd guess. If for example the heap got corrupt, new and delete operations may fail, so even a plain simple
std::cout << std::string("hello world") << std::endl;
statement, might not work since memory from the heap needs to be allocated.
Best, Christoph