How to monitor Unexpectedly exited Threads? - c++

In multi thread programming, what if one of worker thread is unexpectedly exited and main thread needs to know whether that thread is alive or not.
Is there any way to check this?
I was wondering if there is a typical signal that is made when worker thread is exited.
(Linux)
Thank you

If threads are unexpectedly dying in your program, it is toast. If you want fault isolation, with recovery, use multiple processes (with shared memory) instead of, or inaddition to threads. On POSIX (and Win32 also) you can detect if the owner of a process-shared mutex died while holding that mutex and implement some "fsck-like" check and repair of the shared data to try to restore its invariants. (Obviously it helps you if the data structure is designed with recoverable transactions in mind.)
On Win32 you can use Windows structured exception handling (SEH) to catch any kind of exception in a thread. (For instance access violation, division by zero, ...). Using the tool help API you can gain a list of the attached modules, and there are interfaces for reading the machine registers, faulting address, etc.
In POSIX you can do that with signal handling. Events like access violations and such deliver signals to the thread to which they pertain.
It doesn't seem realistic to code up these pieces into a recovery strategy that tries to keep a buggy program running.

Related

Abandoning std::future if the underlying thread was killed

I'm building plugins for a host application using C++11/14, for now targeting Windows and MacOS. The plugins start up async worker threads when the host app starts us up, and if they're still running when the host shuts the plugins down they get signaled to stop. Some of these worker threads are started with std::async so I can use an std::future to get the thread result back, while other less involved threads are just std::threads which I ultimately just join to see when they're done. It all works nicely this way.
Unless the host decides not to call our shutdown procedure when it shuts down itself... Yeah, I know, but it really is that bad sometimes -- it often enough just crashes during shutdown. And they even plan to make that into a 'feature' and call it "Fast Exit" to please their users; just pull the plug and we're done extra fast :(
For that case I have registered an std::atexit handler. It last-minute signals any still running threads to exit NOW (atomic bools and/or signals to wake them up), then it waits a second to give the threads some time to respond, and finally it detaches the regular std::thread threads and hopes for the best. This way at least the threads get a heads up to quickly write intermediate state to disk for a next round (if needed), and quit writing to probably already deceased data structures, thus avoiding crashes which would make any crash dump point the finger at my plugins.
However, atexit handlers run at OS DLL unload time, so I'm not even allowed to use thread synchronization (right?). And under the debugger I just saw all of the worker threads were presumably already killed by the OS, since the atexit handler's thread was the only thread left under the debugger. Needless to say, all remaining std::futures went into full blocking mode, hanging up the remaining corpse of the dead host app...
Is there a way to abandon an std::future? In MS Visual C++ I saw futures have an _Abandon method, but that's too platform specific (and undocumented) for my taste. Or is my only recourse to not use std::future, do all thread communication via my own data structures and synchronization, and work with simple std::threads which can just be detached?

c++ child threads terminating on main() parent thread exit?

VS2013, C++
I just release dll application. One of dll app function run thread by _beginthread.
In normal software flow I use mutex and control threads. Before unregister dll from main application I wait for thread terminating and close handlers.
However there is one case that main application could close without release resources in correct way I mean without waiting for child thread terminating and without close of handlers.
Is there any risk if main application force exit? Is there any risk if I run application and threads again after exit?
Is there any risk for OS? Are all threads terminating after main exit?
I know that it is "dirty" solution but for some reason I can’t change that.
Thank you in advance for advices.
According to Raymond Chen - in Windows systems - if the main thread terminates, your application hangs while all your threads end. This means, no your solution will not work, your thread will freeze your application in the closing state. Also even if your thread would be forcefully terminated on exit, it would not be uninitialized, and - since we are talking about MFC threads here - it would cause your application to leak resources, so pretty please don't do that!
Is there any risk if main application force exit?
Yes! Since thread can have started consistence-sensitive processes.
Is there any risk if I run application and threads again after exit?
Yes! May be previous shutdown crushed the data structure and now you cannot even load data correctly
Is there any risk for OS?
It depends on your business. May be you create a soft for disk-optimization and you are moving clusters while emergency shutdown?
Are all threads terminating after main exit?
Yes! You need foreseen special "join" code that waits accomplishment of threads.
I would say, the behavior is undefined. Too many things may happen, when the application is terminated without having the chance to clean up.
This SO question may give some ideas.
This MS article describes TerminateThread function and also lists some implication of unexpectedly terminating the threads (which is probably happened on calling exit):
If the target thread owns a critical section, the critical section
will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be
inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users
of the DLL.
So looks like there is a risk even for the OS
kernel32 state for the thread's process could be inconsistent

Catch process kills in c++ under Unix

Is it possible with a C++ program to monitor which processes gets killed (either by the user or by the OS), or if the process terminates for some other reasons which are not segmentation fault or illegal operations, and perform some actions afterwards?
Short answer, yes it's possible.
Long answer:
You will need to implement signal handlers for the different signals that may kill a process. You can't necessarily catch EVERY type of signal (in particular, SIGKILL is not possible to catch since that would potentially make a process unkillable).
Use the sigaction function call to set up your signal handlers.
There is a decent list of which signals do what here (about 1/3 down from the top):
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html
Edit: Sorry, thought you meant within the process, not from outside of the process. If you "own" the process, you can use ptrace and it's PTRACE_GETSIGINFO to get what the signal was.
To generally "find processes killed" would be quite difficult - or at least to tell the difference between processes just exiting on their own, as opposed to those that exit because they are killed for some other reason.

Catching signals such as SIGSEGV and SIGFPE in multithreaded program

I am trying to write a multithreaded logging system for a program running on linux.
Calls to the logging system in the main program threads pushes a data structure containing the data to be logged into a FIFO queue. A dedicated thread picks the data of the queue and outputs the data, while the programs main thread continues with its task.
If the main program causes SIGSEGV or other signals to be raised I need to make sure that the queue is empty before terminating.
My plan is to block the signals using pthread_sigmask http://man7.org/linux/man-pages/man3/pthread_sigmask.3.html for all but one thread, but reading the list of signals on http://man7.org/linux/man-pages/man7/signal.7.html i noticed:
A signal may be generated (and thus pending) for a process as a whole (e.g., when sent >using kill(2)) or for a specific thread (e.g., certain signals, such as SIGSEGV and SIGFPE, >generated as a consequence of executing a specific machine-language instruction are
thread directed, as are signals targeted at a specific thread using pthread_kill(3)).
If I block SIGSEGV on all threads but a thread dedicated to catching signals, will it then catch a SIGSEGV raised by a different thread?
I found the question Signal handling with multiple threads in Linux, but I am clueless as to which signals are thread specific and how to catch them.
I agree with the comments: in practice catching and handling SIGSEGV is often a bad thing.
And SIGSEGV is delivered to a specific thread (see this), the one running the machine instruction which accessed to some illegal address.
So you cannot run a thread dedicated to catching SIGSEGV in other threads. And you probably could not easily use signalfd(2) for SIGSEGV...
Catching (and returning normally from its signal handler) SIGSEGV is a complex and processor specific thing (it cannot be "portable C code"). You need to inspect and alter the machine state in the handler, that is either modify the address space (by calling mmap(2) etc...) or modify the register state of the current thread. So use sigaction(2) with SA_SIGINFO and change the machine specific state pointed by the third argument (of type ucontext_t*) of the signal handler. Then dive into the processor specific uc_mcontext field of it. Have fun changing individual registers, etc... If you don't alter the machine state of the faulty thread, execution is resumed (after returning from your SIGSEGV handler) in the same situation as before, and another SIGSEGV signal is immediately sent.... Or simply, don't return normally from a SIGSEGV handler (e.g. use siglongjmp(3) or abort(3) or _exit(2) ...).
Even if you happen to do all this, it is rumored that Linux kernels are not extremely efficient on such executions. So it is rumored that trying to mimic Hurd/Mach external pagers this way on Linux is not very efficient. See this answer...
Of course signal handlers should call only (see signal(7) for more) async-signal-safe functions. In particular, you cannot in principle call fprintf from them (and you might not be able to use reliably your logging system, but it could work in most but not all cases).
What I said on SIGSEGV also holds for SIGBUS and SIGFPE (and other thread-specific asynchronous signals, if they exist).

Design and Technical issue in Multi Threaded Application

I wanted to Discuss the Design and technical issue/challenges related with multi threaded application.
Issue I faced
1.I came across the situation where there is multiple thread is using the shared function/variable crash the application, so proper guard is required on that occasion.
2. State Machine and Multi thread-
There are several point one should remember before delve in to the multi thread application.
There can issue related to 1. Memory 2. Handle 3. Socket etc.
please share your experience on the following point
what are the common mistake one do in the multi threaded application
Any specific issue related to multi threaded.
Should we pass data by value or by referen in the thread function.
Well, there are so many...
1) Shared functions/procedures - they are just code and, unless the code modifies itself, there can be no problem. Local variables are no problem because each thread calls on a separate stack, (amost by definition:). Any other data can an issue and may need protection. 99.99% of all household API calls on multiTasking OS are thread-safe, again, almost by definition. Another poster has already warned about thread-local storage...
2) State machines. Can be a little awkward. You can easly lock all the events firing into the SM, so ensuring the integrity of the state, but you must not make blocking calls from inside the SM while it is locked, (might seem obvious, but I have done this.. once :).
I occasionally run state-machines from one thread only, queueing event objects to it. This moves the locking to the input queue and means that the SM is somewhat easier to debug. It also means that the thread running the SM can implement timeouts on an internal delta queue and so itself fire timeout calls to the objects on the delta queue, (classic example: TCP server sockets with connection timeouts - thousands of socket objects that each need an independent timeout).
3) 'Should we pass data by value or by referen in the thread function.'. Not sure what you mean, here. Most OS allow one pointer to be passed on thread creation - do with it what you will. You could pass it an event it should signal on work completion or a queue object upon which it is to wait for work requests. After creation, you need some form of inter-thread comms to send requests and get results, (unless you are going to use the direct 'read/write/waitForExit' mechanism - AV/deadlock/noClose generator).
I usually use a simple semaphore/CS producer-consumer queue to send/receive comms objects between worker threads, and the PostMessage API to send them to a UI thread. Apart from the locking in the queue, I don't often need any more locking. You have to try quite hard to deadlock a threaded system based on message-passing and things like thread pools become trivial - just make [no. of CPU] threads and pass each one the same queue to wait on.
Common mistakes. See the other posters for many, to which I would add:
a) Reading/writing directly to thread fields to pass parameters and return results, (esp. between UI threads and 'worker' threads), ie 'Create thread suspended, load parameters into thread fields, resume thread, wait on thread handle for exit, read results from thread fields, free thread object'. This causes performance hit from continually creating/terminating/destroying threads and often forces the developer to ensure that thread are terminated when exiting an app to prevent AV/216/217 exceptions on close. This can be very tricky, in some cases impossible because a few API's block with no way of unblocking them. If developers would stop this nasty practice, there would be far fewer app close problems.
b) Trying to build multiThreaded apps in a procedural fashion, eg. trying to wait for results from a work thread in a UI event handler. Much safer to build a thread request object, load it with parameters, queue it to a work thread and exit the event handler. The thread can get the object, do work, put results back into the object and, (on Windows, anyway), PostMessage the object back. A UI message-handler can deal with the results and dispose of the object, (or recycle, reuse:). This approach means that, since the UI and worker are always operating on different data that can outlive them both, no locking and, (usually), no need to ensure that the work thread is freed when closing the app, (problems with this are ledgendary).
Rgds,
Martin
The biggest issue people face in multi threading applications are race conditions, deadlocks and not using semaphores of some sort to protect globally accessible variables.
You are facing these problems when using thread locks.
Deadlock
Priority Inversion
Convoying
“Async-signal-safety”
Kill-tolerant availability
Preemption tolerance
Overall performance
If you want to look at more advanced threading techniques you can look at the lock free threading, where many threads work on the same problem in case they are waiting.
Deadlocks, memory corruption (of shared resources) due to lack of proper synchronization, buffer overflow (even that can be occured due to memory corruption), improper usage of thread local storage are the most common things
Also it depends on under which platform and technology you're using to implement the thread. For e.g. in Microsoft Windows, if you use MFC objects, several MFC objects are not really shareable across threads because they're heavily rely on thread local storage (e.g CSocket, CWnd classes etc.)