thread 'disappears' when blocking on read() how do i debug it?

thread 'disappears' when blocking on read() how do i debug it? - c++

I have a multithreaded application, in c++ running under Linux (Fedora 27). One of the threads keep reading data from a file on the local disk using low-level IO (open, read, etc.) and supplies that data to a buffer that is rotated between other threads.
Now, i suddenly ran into a strange problem where read() would start blocking infinitely for no apparent reason at arbitrary offset into the file. I added a monitor thread that would detect this block (by setting a timestamp before entering read() ) and attempt to shut down the program when it occurred.
The weird thing now, is that at the end of the main thread, it waits for pthread_join, and on that read thread - it returns 0 (success).
I tried again, but replaced the call to read() with a while(1); and now, pthread_join does not finish as expected.
I then examined the program in gdb, and to my surprise when i reach the pthread_join, the read thread is GONE!
Looking at info thread when the monitor thread detects a blocking read() the thread is still there, but at some point it disappears, and i can't catch it!
I'm trying to catch this thread exiting and i'm looking for ideas on how to do so. I am using pthread_cleanup_push/pop but my function is not being invoked by the read thread (all other threads do).
any ideas? i'm at my wits end!
edit ----------------------------------------
it appears to have something to do with syslog being called from a completely unrelated thread.

read is a cancellation point, so if your application calls pthread_cancel to terminate the thread at some point, the thread will cease to exist (after executing the cleanup actions). Joining a canceled thread succeeds and yields the special value PTHREAD_CANCELED for the void * value optionally filled out by pthread_join.
If you replace read with an endless loop, then there is no cancellation point, the cancellation request is not acted upon, and pthread_join will also wait indefinitely.

Related

How to safely terminate a multithreaded process

I am working on a project where we have used pthread_create to create several child threads.
The thread creation logic is not in my control as its implemented by some other part of project.
Each thread perform some operation which takes more than 30 seconds to complete.
Under normal condition the program works perfectly fine.
But the problem occurs at the time of termination of the program.
I need to exit from main as quickly as possible when I receive the SIGINT signal.
When I call exit() or return from main, the exit handlers and global objects' destructors are called. And I believe these operations are having a race condition with the running threads. And I believe there are many race conditions, which is making hard to solve all of theses.
The way I see it there are two solutions.
call _exit() and forget all de-allocation of resources
When SIGINT is there, close/kill all threads and then call exit() from main thread, which will release resources.
I think 1st option will work, but I do not want to abruptly terminate the process.
So I want to know if it is possible to terminate all child threads as quickly as possible so that exit handler & destructor can perform required clean-up task and terminate the program.
I have gone through this post, let me know if you know other ways: POSIX API call to list all the pthreads running in a process
Also, let me know if there is any other solution to this problem

What is it that you need to do before the program quits? If the answer is 'deallocate resources', then you don't need to worry. If you call _exit then the program will exit immediately and the OS will clean up everything for you.
Be aware also that what you can safely do in a signal hander is extremely limited, so attempting to perform any cleanup yourself is not recommended. If you're interested, there's a list of what you can do here. But you can't flush a file to disk, for example (which is about the only thing I can think of that you might legitimately want to do here). That's off limits.

I need to exit from main as quickly as possible when I receive the SIGINT signal.
How is that defined? Because there's no way to "exit quickly as possible" when you receive one signal like that.
You can either set flag(s), post to semaphore(s), or similar to set a state that tells other threads it's time to shut down, or you can kill the entire process.
If you elect to set flag(s) or similar to tell the other threads to shut down, you set those flags and return from your signal handler and hope the threads behave and the process shuts down cleanly.
If you elect to kill threads, there's effectively no difference in killing a thread, killing the process, or calling _exit(). You might as well just keep it simple and call _exit().
That's all you can chose between when you have to make your decision in a single signal handler call. Pick one.
A better solution is to use escalating signals. For example, when you get SIGQUIT or SIGINT, you set flag(s) or otherwise tell threads it's time to clean up and exit the process - or else. Then, say five seconds later whatever is shutting down your process sends SIGTERM and the "or else" happens. When you get SIGTERM, your signal handler simply calls _exit() - those threads had their chance and they messed it up and that's their fault. Or you can call abort() to generate a core file and maybe provide enough evidence to fix the miscreant threads that won't shut down.
And finally, five seconds later the managing process will nuke the process from orbit with SIGKILL just to be sure.

Do I need to join every thread in my application ?

I'm new with multi-threading and I need to get the whole idea about the "join" and do I need to join every thread in my application ?, and how does that work with multi-threading ?

no, you can detach one thread if you want it to leave it alone.
If you start a thread, either you detach it or you join it before the program ends, otherwise this is undefined behaviour.
To know that a thread needs to be detached you need to ask yourself this question: "do I want the the thread to run after the program main function is finished?". Here are some examples:
When you do File/New you create a new thread and you detach it: the thread will be closed when the user closes the document Here you don't need to join the threads
When you do a Monte Carlo simulation, some distributed computing, or any Divide And Conquer type algorithms, you launch all the threads and you need to wait for all the results so that you can combine them. Here you explicitly need to join the thread before combining the results

Not joining a thread is like not deleteing all memory you new. It can be harmless, or it could be a bad habit.
A thread you have not synchronized with is in an unknown state of execution. If it is a file writing thread, it could be half way through writing a file and then the app finishes. If it is a network communications thread, it could be half way through a handshake.
The downside to joining every thread is if one of them has gotten into a bad state and has blocked, your app can hang.
In general you should try to send a message to your outstanding threads to tell them to exit and clean up. Then you should wait a modest amount of time for them to finish or otherwise respond that they are good to die, and then shut down the app. Now prior to this you should signify your program is no longer open for business -- shit down GUI windows, respond to requests from other processes that you are shutting down, etc -- so if this takes longer than anticipated the user is not bothered. Finally if things go imperfectly -- if threads refuse to respond to your request that they shut down and you give up on them -- then you should log errors as well, so you can fix what may be a symptom of a bigger problem.
The last time a worker thread unexpectedly hung I initially thought was a problem with a network outage and a bug in the timeout code. Upon deeper inspection it was because one of the objects in use was deleted prior to the shutdown synchronization: the undefined behaviour that resulted just looked like a hang in my reproduction cases. Had we not carefully joined, that bug would have been harder to track down (now, the right thing to do would have been to use a shared resource that we could not delete: but mistakes happen).

The pthread_join() function suspends execution of the calling thread
until the target thread terminates, unless the target thread has
already terminated. On return from a successful pthread_join() call
with a non-NULL value_ptr argument, the value passed to pthread_exit()
by the terminating thread is made available in the location referenced
by value_ptr. When a pthread_join() returns successfully, the target
thread has been terminated. The results of multiple simultaneous calls
to pthread_join() specifying the same target thread are undefined. If
the thread calling pthread_join() is canceled, then the target thread
will not be detached.
So pthread_join does two things:
Wait for the thread to finish.
Clean up any resources associated
with the thread.
This means that if you exit the process without call to pthread_join, then (2) will be done for you by the OS (although it won't do thread cancellation cleanup), and (1) will not be done.
So whether you need to call pthread_join depends whether you need (1) to happen.
Detached thread
If you don't need the thread to run, then you may as well pthread_detach it. A detached thread cannot be joined (so you can't wait on its completion), but its resources are freed automatically if it does complete.

do I need to join every thread in my application ?
Not necessarily - depends on your design and OS. Join() is actively hazardous in GUI apps - tend If you don't need to know, or don't care, about knowing if one thread has terminated from another thread, you don't need to join it.
I try very hard to not join/WaitFor any threads at all. Pool threads, app-lifetime threads and the like often do not require any explicit termination - depends on OS and whether the thread/s own, or are explicitly bound to, any resources that need explicit termination/close/whatever.

Threads can be either joinable or detached. Detached threads should not be joined. On the other hand, if you didn't join the joinable thread, you app would leak some memory and some thread structures. c++11 std::thread would call std::terminate, if it wasn't marked detached and thread object went out of scope without .join() called. See pthread_detach and pthread_create. This is much alike with processes. When the child exits, it will stay as zombee while it's creater willn't call waitpid. The reson for such behavior is that thread's and process's creater might want to know there exit code.
Update: if pthread_create is called with attribute argument equal to NULL (default attributes are used), joinable thread will be created. To create a detached thread, you can use attributes:
pthread_attr_t attrs;
pthread_attr_init(&attrs);
pthread_attr_setdetachstate(&attrs, PTHREAD_CREATE_DETACHED);
pthread_create(thread, attrs, callback, arg);
Also, you can make a thread to be detached by calling pthread_detach on a created one. If you will try to join with a detached thread, pthread_join will return EINVAL error code. glibc has a non portable extension pthread_getattr_np that allows to get attributes of a running thread. So you can check if thread is detached with pthread_attr_getdetachstate.

Threads creating process in infinite loop

In my application a thread runs while(1){} in it so thread terminates when my app is terminated by user.
Is it safe to do like this? I am using while(1){} because my app continuously monitors devices on system.
After some time I am getting "(R6016) not enough space for thread data" on ffmpeg.
I read this but did not get solution of my problem:
http://support.microsoft.com/kb/126709
Thread description:
Thread uses ffmpeg and handle utility (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx). within while(1){} loop.
ffmpeg and handle is running through QProcess which I am deleting after process ends.
while(1){} loop waits for 5 seconds using
msleep(5000).

This is not safe.
Change while (1) to while (!stopCondition) and have stopCondition change to TRUE when exiting. The main thread should wait for all other thread to finish before exiting.
Note: stopCondition is defined as volatile int stopCondition.
When the main thread exists, a cleanup process starts:
- global destructors are called (C++).
- C runtime library starts to shut down, releasing all memory allocated with malloc, unloading dynamic libraries and other resources.
A thread that depends on the C runtime being functional will crash or if it runs code from a shared/dynamic libray. If that thread was doing something important like writing to a file, the file will be corrupt. Maybe in your case things are not so bad, but seeing an application crash doesn't looks good to say the least.
This is not the full story, but I think it makes my point.

How to make a new thread and terminate it after some time has elapsed?

The deal is:
I want to create a thread that works similarly to executing a new .exe in Windows, so if that program (new thread) crashes or goes into infinite loop: it will be killed gracefully (after the time limit exceeded or when it crashed) and all resources freed properly.
And when that thread has succeeded, i would like to be able to modify some global variable which could have some data in it, such as a list of files for example. That is why i cant just execute external executable from Windows, since i cant access the variables inside the function that got executed into the new thread.
Edit: Clarified the problem a lot more.

The thread will already run after calling CreateThread.
WaitForSingleObject is not necessary (unless you really want to wait for the thread to finish); but it will not "force-quit" the thread; in fact, force-quitting - even if it might be possible - is never such a good idea; you might e.g. leave resources opened or otherwise leave your application in a state which is no good.

A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You would now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly ending the execution of that entire call-stack. That's not going to work.
The reason that web browsers moved from a "thread-per-tab" to "process-per-tab" model is exactly this: because processes can be terminated without leaving the other processes in an unknown state. What you need is to use processes instead of threads.
When the process finishes and sets it's data, you need to use some inter-process communication system to read that data (I like Boost.Interprocess myself). It won't look like a regular C++ global variable, but you shouldn't have a problem with reading it. This way, you can effectively kill the process if it's taking too long, and your program will remain in a reasonable state.

Well, that's what WaitForSingleObject does. It blocks until the object does something (in case of a thread it waits until the thread exits or the timeout elapses). What you need is
HANDLE thread = CreateThread(0, 0, do_stuff, NULL, 0, 0);
//rest of code that will run paralelly with your new thread.
WaitForSingleObject(thread, 4000); // wait 4 seconds or for the other thread to exit

If you want your worker thread to shut down after a period of time has elapsed, the best way to do that is to have the thread itself monitor the elapsed time in some way and then exit when the time is up.
Another way to do this is to monitor the elapsed time in the main thread or even a third, monitor type thread. When the time has elapsed, set an event. Your worker thread could wait for this event in it's main loop, and then exit when it has been raised. These kinds of events, which are used to signal the thread to kill itself, are sometimes called "death events." (Or at least, I call them that.)
Yet another way to do this is to queue a user job to the worker thread, which needs to be in an alterable wait state. The APC can then set some internal state variable which will trigger the death sequence in the thread when it resumes.
There is another method which I hesitate even mentioning, because it should only be used in extremely dire circumstances. You can kill the thread. This is a very dangerous method akin to turning off your sink by detonating an atomic bomb. You get the sink turned off, but there could be other unintended consequences as well. Please don't do this unless you know exactly what you're doing and why.

Remove the call to WaitForSingleObject. That causes your parent thread to wait.

Remove the WaitForSingleObject call?

Stopping an MFC thread

I understand the problem with just killing the thread directly (via AfxEndThread or other means), and I've seen the examples using CEvent objects to signal the thread and then having the thread clean itself up. The problem I have is that using CEvent to signal the thread seems to require a loop where you check to see if the thread is signaled at the end of the loop. The problem is, my thread doesn't loop. It just runs, and the processing could take a while (which is why I'd like to be able to stop it).
Also, if I were to just kill the thread, I realize that anything I've allocated will not have a chance to clean itself up. It seems to me like any locals I've been using that happen to have put stuff on the heap will also not be able to clean themselves up. Is this the case?

There is no secret magic knowledge here.
Just check the event object periodically throughout the function code, where you deem it is safe to exit.

Does your thread ever exit? If so, you could set an event in the thread at exit and have the main process wait for that event via waitforsingleevent. This is best to do with a timeout so the main process doesn't appear to lockup when it's closing. At the timeout event, kill the thread via AfxKillThread. You'll have to determine what a reasonable timeout is, though.
Since you don't loop in the thread this seems to me to be the only way to do this. Of course, you could something like set a boolean flag in the main process and have the thread periodically check this flag, but then your thread code will be littered with "if(!canRun) return;" type code.
If the thread never exits, then AfxKillThread/AfxTerminateThread is the only way to stop the thread.

Locals would be placed on the stack and, hence, WOULD be freed on forcing the thread shut (I think). Destructors won't get called though and any critical sections the thread holds will not get released.
If the thread is ONLY doing things with simple data types on the stack, however, it IS a safe thing to be doing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js