Terminate one thread (that is stuck) from another - c++

Using only standard C++ (no platform specific API), I would like to launch an external application that may complete immediately or timeout. If the application halts, my app has a timeout after which it simply terminates the app and relaunches it.
Now, down to the nitty gritty, I tried launching two threads:
first thread launches the app and waits for it to terminate
second thread waits for a few seconds and checks if the first thread terminated. If it did not, then it considers it as stalled.
Question is, how do I terminate the first thread from the second? The way I'm launching the app is using the system() function. It's synchronous so there isn't any way for me to check from that thread if I wish for termination. It has to be forced somehow through an exception, externally.
How is this done properly?7
P.S.: if this is not possible, and I suspect it isn't, then I simply do not wish to wait for that thread anymore. It can simply remain stalled in the background. How do I achieve that? (currently, I'm waiting for that thread with a join())

You cannot forcefully terminate another thread. You can only politely ask it to exit. This holds in C++ and POSIX thread models. Windows has TerminateThread, but it's so dangerous it's practically unusable. POSIX has pthread_cancel. That's cooperative termination which could fit your bill, but there's no standard C++ equivalent.
Even if you terminate a thread somehow, it does nothing to any program it might have launched via system.
To let a thread go free with no obligation to join, use thread::detach().

To answer your question about killing a thread, POSIX offers two functions:
pthread_cancel();
This will stop the thread at a cancellation point.
The other is:
pthread_kill();
This function will send a signal to the thread. Contrary to the cancellation pointer concept, this one can happen at any point in the thread. In other words, if the thread has a mutex locked at that time, you're going to lose that lock... (unless you cleanly handle the signal in that thread).
However, what you are describing is a system() call which you make in a separate thread so that way you are not blocked. I don't think that either of these functions are going to help you because there is another process running, not just a simple thread. What you need is to stop that other process.
In your case, what you need to do is find out the pid of the child (or children) and send a signal to that child process (or all children and grandchildren, etc). In that case, you use the kill() function like so:
kill(child_pid, SIGINT);
Once the child died and cleaned up, the system() call will return and your thread is ready to be joined. So in order, you do:
...
child_pid = find_child_pid(); // Note: there is no such function, you have to write it or use a library which offers such
kill(child_pid, SIGNINT);
pthread_join(thread_id);
If that child process can create children and you want them out of the picture too (like in a shell when you hit Ctrl-C) then you need to find about all the children of your child, and their children, etc. You do so by looking at the PPID (Parent PID) of each process to see if it matches one of the children. That info is available in the /proc/<pid>/stat or /proc/<pid>/status. (the first file is probably best because it's just one line, however, it is tricky to go past the process name since it can include parenthesis... so you have to make sure to search the ')' from the end of the line (otherwise you could find a ) from the program name). Once you've got that, skip the state and there is the PPID. (So ) S <ppid>).
Repeat the search until all the parent/child are found and then start sending a SIGINT or SIGTERM or SIGKILL to each on of them.
As mentioned in the other answer, you can use pthread_detach() to quit your software and leave that other thread behind. This is probably much less desirable if you want that other process to end before your main process ends. It very much depends on what you are trying to accomplish, too.
Another, probably much more complicated way, is to use fork() + execve(). That means you have to re-implement your own system() call, but the advantage is that you do not need a thread and you get the pid of the child for free (i.e. thus you can kill it without searching for the child pid). If the function you need to run is not dynamically defined with a varying set of command line arguments, it's not too complicated. If you need to change stdin, stdout, stderr, and the arguments depend on all sorts of things, it becomes much more involved...

Related

How to safely terminate a multithreaded process

I am working on a project where we have used pthread_create to create several child threads.
The thread creation logic is not in my control as its implemented by some other part of project.
Each thread perform some operation which takes more than 30 seconds to complete.
Under normal condition the program works perfectly fine.
But the problem occurs at the time of termination of the program.
I need to exit from main as quickly as possible when I receive the SIGINT signal.
When I call exit() or return from main, the exit handlers and global objects' destructors are called. And I believe these operations are having a race condition with the running threads. And I believe there are many race conditions, which is making hard to solve all of theses.
The way I see it there are two solutions.
call _exit() and forget all de-allocation of resources
When SIGINT is there, close/kill all threads and then call exit() from main thread, which will release resources.
I think 1st option will work, but I do not want to abruptly terminate the process.
So I want to know if it is possible to terminate all child threads as quickly as possible so that exit handler & destructor can perform required clean-up task and terminate the program.
I have gone through this post, let me know if you know other ways: POSIX API call to list all the pthreads running in a process
Also, let me know if there is any other solution to this problem
What is it that you need to do before the program quits? If the answer is 'deallocate resources', then you don't need to worry. If you call _exit then the program will exit immediately and the OS will clean up everything for you.
Be aware also that what you can safely do in a signal hander is extremely limited, so attempting to perform any cleanup yourself is not recommended. If you're interested, there's a list of what you can do here. But you can't flush a file to disk, for example (which is about the only thing I can think of that you might legitimately want to do here). That's off limits.
I need to exit from main as quickly as possible when I receive the SIGINT signal.
How is that defined? Because there's no way to "exit quickly as possible" when you receive one signal like that.
You can either set flag(s), post to semaphore(s), or similar to set a state that tells other threads it's time to shut down, or you can kill the entire process.
If you elect to set flag(s) or similar to tell the other threads to shut down, you set those flags and return from your signal handler and hope the threads behave and the process shuts down cleanly.
If you elect to kill threads, there's effectively no difference in killing a thread, killing the process, or calling _exit(). You might as well just keep it simple and call _exit().
That's all you can chose between when you have to make your decision in a single signal handler call. Pick one.
A better solution is to use escalating signals. For example, when you get SIGQUIT or SIGINT, you set flag(s) or otherwise tell threads it's time to clean up and exit the process - or else. Then, say five seconds later whatever is shutting down your process sends SIGTERM and the "or else" happens. When you get SIGTERM, your signal handler simply calls _exit() - those threads had their chance and they messed it up and that's their fault. Or you can call abort() to generate a core file and maybe provide enough evidence to fix the miscreant threads that won't shut down.
And finally, five seconds later the managing process will nuke the process from orbit with SIGKILL just to be sure.

non-blocking system call in c++ program using fork

Based on this SO post, and this example, I expect that, when I use fork(), will allow me executing system/execvp in non-blocking manner. However, when I try to issue a long-running child process in a fork block in the above mentioned example code, the control does not return to parent block, till the child has finished.
Can you tell a method, how I should design the code to allow non-blocking calls to system, in C/C++ code. Also, I plan to write a program, where more than one chidren are forked from a same parent. How can I get the pid of the children?
Thanks for your kind help.
fork will immediately return to both the child and parent. However, that example (test3.c) calls wait4, which like it sounds, waits for another process to do something (in this case exit).
Mentioned sample code waits for child to return after spawning - that's why it blocks.
To get the pid of child process, use return value of fork().
fork() is single system code which returns two different values - pid of child to parent process and 0 to child process. This is why you can distinguish code blocks in your program which should be executed by parent and children.
Refer to man fork(2).
Another thing you probably should pay attention to concerning fork() and wait() is that after child process exits kernel still holds some information about it (e.g. exit status) which should be consumed somehow. Otherwise such process will become 'zombie' (Z in ps output). This is work done with wait*() calls. Besides, after child exits its parent is notified by kernel with SIGCHLD. If you don't want to process children return values, you can notify system that you're going to ignore SIGCHLD with signal(), sigaction(), etc. In this case that additional data is automatically reaped off. Such behavior may be default on your system but it is still adviseable that you state such behavior explicitly to improve portability of your program.

How to make a new thread and terminate it after some time has elapsed?

The deal is:
I want to create a thread that works similarly to executing a new .exe in Windows, so if that program (new thread) crashes or goes into infinite loop: it will be killed gracefully (after the time limit exceeded or when it crashed) and all resources freed properly.
And when that thread has succeeded, i would like to be able to modify some global variable which could have some data in it, such as a list of files for example. That is why i cant just execute external executable from Windows, since i cant access the variables inside the function that got executed into the new thread.
Edit: Clarified the problem a lot more.
The thread will already run after calling CreateThread.
WaitForSingleObject is not necessary (unless you really want to wait for the thread to finish); but it will not "force-quit" the thread; in fact, force-quitting - even if it might be possible - is never such a good idea; you might e.g. leave resources opened or otherwise leave your application in a state which is no good.
A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You would now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly ending the execution of that entire call-stack. That's not going to work.
The reason that web browsers moved from a "thread-per-tab" to "process-per-tab" model is exactly this: because processes can be terminated without leaving the other processes in an unknown state. What you need is to use processes instead of threads.
When the process finishes and sets it's data, you need to use some inter-process communication system to read that data (I like Boost.Interprocess myself). It won't look like a regular C++ global variable, but you shouldn't have a problem with reading it. This way, you can effectively kill the process if it's taking too long, and your program will remain in a reasonable state.
Well, that's what WaitForSingleObject does. It blocks until the object does something (in case of a thread it waits until the thread exits or the timeout elapses). What you need is
HANDLE thread = CreateThread(0, 0, do_stuff, NULL, 0, 0);
//rest of code that will run paralelly with your new thread.
WaitForSingleObject(thread, 4000); // wait 4 seconds or for the other thread to exit
If you want your worker thread to shut down after a period of time has elapsed, the best way to do that is to have the thread itself monitor the elapsed time in some way and then exit when the time is up.
Another way to do this is to monitor the elapsed time in the main thread or even a third, monitor type thread. When the time has elapsed, set an event. Your worker thread could wait for this event in it's main loop, and then exit when it has been raised. These kinds of events, which are used to signal the thread to kill itself, are sometimes called "death events." (Or at least, I call them that.)
Yet another way to do this is to queue a user job to the worker thread, which needs to be in an alterable wait state. The APC can then set some internal state variable which will trigger the death sequence in the thread when it resumes.
There is another method which I hesitate even mentioning, because it should only be used in extremely dire circumstances. You can kill the thread. This is a very dangerous method akin to turning off your sink by detonating an atomic bomb. You get the sink turned off, but there could be other unintended consequences as well. Please don't do this unless you know exactly what you're doing and why.
Remove the call to WaitForSingleObject. That causes your parent thread to wait.
Remove the WaitForSingleObject call?

Why would I want to start a thread "suspended"?

The Windows and Solaris thread APIs both allow a thread to be created in a "suspended" state. The thread only actually starts when it is later "resumed". I'm used to POSIX threads which don't have this concept, and I'm struggling to understand the motivation for it. Can anyone suggest why it would be useful to create a "suspended" thread?
Here's a simple illustrative example. WinAPI allows me to do this:
t = CreateThread(NULL,0,func,NULL,CREATE_SUSPENDED,NULL);
// A. Thread not running, so do... something here?
ResumeThread(t);
// B. Thread running, so do something else.
The (simpler) POSIX equivalent appears to be:
// A. Thread not running, so do... something here?
pthread_create(&t,NULL,func,NULL);
// B. Thread running, so do something else.
Does anyone have any real-world examples where they've been able to do something at point A (between CreateThread & ResumeThread) which would have been difficult on POSIX?
To preallocate resources and later start the thread almost immediately.
You have a mechanism that reuses a thread (resumes it), but you don't have actually a thread to reuse and you must create one.
It can be useful to create a thread in a suspended state in many instances (I find) - you may wish to get the handle to the thread and set some of it's properties before allowing it to start using the resources you're setting up for it.
Starting is suspended is much safer than starting it and then suspending it - you have no idea how far it's got or what it's doing.
Another example might be for when you want to use a thread pool - you create the necessary threads up front, suspended, and then when a request comes in, pick one of the threads, set the thread information for the task, and then set it as schedulable.
I dare say there are ways around not having CREATE_SUSPENDED, but it certainly has its uses.
There are some example of uses in 'Windows via C/C++' (Richter/Nasarre) if you want lots of detail!
There is an implicit race condition in CreateThread: you cannot obtain the thread ID until after the thread started running. It is entirely unpredictable when the call returns, for all you know the thread might have already completed. If the thread causes any interaction in the rest of that process that requires the TID then you've got a problem.
It is not an unsolvable problem if the API doesn't support starting the thread suspended, simply have the thread block on a mutex right away and release that mutex after the CreateThread call returns.
However, there's another use for CREATE_SUSPENDED in the Windows API that is very difficult to deal with if API support is lacking. The CreateProcess() call also accepts this flag, it suspends the startup thread of the process. The mechanism is identical, the process gets loaded and you'll get a PID but no code runs until you release the startup thread. That's very useful, I've used this feature to setup a process guard that detects process failure and creates a minidump. The CREATE_SUSPEND flag allowed me to detect and deal with initialization failures, normally very hard to troubleshoot.
You might want to start a thread with some other (usually lower) priority or with a specific affinity mask. If you spawn it as usual it can run with undesired priority/affinity for some time. So you start it suspended, change the parameters you want, then resume the thread.
The threads we use are able to exchange messages, and we have arbitrarily configurable priority-inherited message queues (described in the config file) that connect those threads. Until every queue has been constructed and connected to every thread, we cannot allow the threads to execute, since they will start sending messages off to nowhere and expect responses. Until every thread was constructed, we cannot construct the queues since they need to attach to something. So, no thread can be allowed to do work until the very last one was configured. We use boost.threads, and the first thing they do is wait on a boost::barrier.
I stumbled with a similar problem once upon I time. The reasons for suspended initial state are treated in other answer.
My solution with pthread was to use a mutex and cond_wait, but I don't know if it is a good solution and if can cover all the possible needs. I don't know, moreover, if the thread can be considered suspended (at the time, I considered "blocked" in the manual as a synonim, but likely it is not so)

double fork using vfork

HI
I am writing a part of a server which should dispatch some other child processes.
Because I want to wait on some of the processes, and dispatch others without waiting for completion, I use double fork for the second kind of processes (thus avoiding the zombie processes).
Problem is, my server holds a lot of memory, so forking takes a long time (even the copy-on-write fork used in Linux which copies only the paging tables)
I want to replace the fork() with vfork(), and it's easy for the second fork (as it only calls execve() in the child), but I couldn't find any way I can replace the first one.
Does anyone know how I can do that?
Thanks!
The server is a linux (RH5U4), written in C++.
Why not simply have the newly exec'd process do another fork itself? That way only a small simple process will have its page tables copied?
EDIT:
Of course the parent would have to do a short-duration wait() to clean up the zombie from that one, but the grandchild process could then run for as long as it wanted.
vfork() can only be used to fork and then call exec or exit. Also, vfork() will block the parent process until the child calls _exit or exec, which is almost certainly not the behavior that you want.
The reason for this is that vfork() doesn't make any copies of any of the data, including the stack, for the new process. So everything is shared, and it is very easy to accidentally change something that the parent process cannot handle. Since the data is shared without copies, the parent process cannot continue running at the same time as the child, so it must wait for the child to _exit or call exec so it is no longer using the data when the parent starts to modify it.
I think that what you really want to do is to make use of SIGCHLD and maintain a list of child processes. You can then do away with the double fork by having your main process be notified when children change state (mostly, when they die) and perform some action on them based on that. You can also keep track of of any of your child processes take longer than expected to complete (because you stored their creation time in your list) and take action if they go crazy and never complete.
Don't double fork. Handle SIGCHLD to save errno, call wait, restore errno.
I believe you can use the answer to another question I asked, for a similar reason. You can vfork() + exec() to an executable which forks again. See setuid() before calling execv() in vfork() / clone()