Detect blocking call in multithreaded C++ application

Detect blocking call in multithreaded C++ application - c++

In a C++ application I have 2 threads (thread1 and thread2) from the same class. They are running a loop very fast and nothing should made any blocking calls inside the loop:
while (!end) {
//dostuff, no blocking!
}
cout << "ended" << endl;
There is some bug in the application, because when I run only one thread at a time, and I set its end property, it can quit from the loop successfully.
However, if I run both threads, sometimes one of the threads is not able to break out from the loop (in spite of having its end property set).
The loop itself is quite big (few hundred lines), I can put a (conditional)breakpoint into it, but when I'm stepping, I lose the functionality (as the thread should run fast), so even if I found which line blocks, it might be the wrong way.
So, my question: is there any option in gdb for having a breakpoint which behaves like a watchdog?
I.e: it should break the thread if withing a certain time it won't be hit, so I can check which line causes the trouble.

I.e: it should break the thread if withing a certain time it won't be hit, so I can check which line causes the trouble.
You don't need this functionality to check which line causes trouble (and the functionality doesn't exist).
Wait for the "not exiting" condition to happen, and hit Control-C. GDB will stop all threads of the process, and give you the (gdb) prompt. At that point, issue thread apply all where command, and you will see where the "not exiting" thread is stuck.

Related

Multithreaded app has problems re-opening files - Windows taking too long to close?

I have a multithreaded app that opens a few files (read-only) and does a bunch of calculations based on data in those files. Each thread then generates some output files.
The code runs fine so long as I generate the threads and then delete them then the app exits. If, however, I try to put the thread creation/deletion into a subroutine and call it several times then the threads have problems when they try to re-open the input files. I have an if(inFile==NULL) check within each thread and sometimes that gets triggered but sometimes it just crashes. Regardless, each thread has an fclose() for each file and the threads are properly terminated so the files should always be closed before the threads are recreated.
I can create multiple threads that can open the same input files and that works fine. But if I close those threads and re-create new ones (e.g. by repeatedly calling a subroutine to create the threads) then I get errors when the threads try to re-open the input files.
The crashes are not predictable. Sometimes I can loop through the thread creation/deletion process several times, sometimes it crashes on the second time, sometimes the fourth, etc.
The only thing I can think of is that the OS (Windows 7) takes too long to close the file sometimes, so the next thread is spawned before the file is closed and then there's some kind of error due to the fact that the OS is trying to close the file while the thread is trying to open it. It seems to me that that could trigger the if(inFile==NULL) condition.
But, sometimes when the if(inFile==NULL) condition is not triggered I still get jibberish read in from the input file. So it thinks it has a good file pointer but it clearly does not.
I realize this is probably a tough question to answer but I'm stumped. So maybe someone has an idea.
Thanks in advance,
rgames

How to debug a rare deadlock?

I'm trying to debug a custom thread pool implementation that has rarely deadlocks. So I cannot use a debugger like gdb because I have click like 100 times "launch" debugger before having a deadlock.
Currently, I'm running the threadpool test in an infinite loop in a shell script, but that means I cannot see variables and so on. I'm trying to std::cout data, but that slow down the thread and reduce the risk of deadlocks meaning that I can wait like 1hour with my infinite before getting messages. Then I don't get the error, and I need more messages, which means waiting one more hour...
How to efficiently debug the program so that its restart over and over until it deadlocks ? (Or maybe should I open another question with all the code for some help ?)
Thank you in advance !
Bonus question : how to check everything goes fine with a std::condition_variable ? You cannot really tell which thread are asleep or if a race condition occurs on the wait condition.

There are 2 basic ways:
Automate the running of program under debugger. Using gdb program -ex 'run <args>' -ex 'quit' should run the program under debugger and then quit. If the program is still alive in one form or another (segfault, or you broke it manually) you will be asked for confirmation.
Attach the debugger after reproducing the deadlock. For example gdb can be run as gdb <program> <pid> to attach to running program - just wait for deadlock and attach then. This is especially useful when attached debugger causes timing to be changed and you can no longer repro the bug.
In this way you can just run it in loop and wait for result while you drink coffee. BTW - I find the second option easier.

If this is some kind of homework - restarting again and again with more debug will be a reasonable approach.
If somebody pays money for every hour you wait, they might prefer to invest in a software that supports replay-based debugging, that is, a software that records everything a program does, every instruction, and allows you to replay it again and again, debugging back and forth. Thus instead of adding more debug, you record a session during which a deadlock happens, and then start debugging just before the deadlock happened. You can step back and forth as often as you want, until you finally found the culprit.
The software mentioned in the link actually supports Linux and multithreading.

Mozilla rr open source replay based debugging
https://github.com/mozilla/rr
Hans mentioned replay based debugging, but there is a specific open source implementation that is worth mentioning: Mozilla rr.
First you do a record run, and then you can replay the exact same run as many times as you want, and observe it in GDB, and it preserves everything, including input / output and thread ordering.
The official website mentions:
rr's original motivation was to make debugging of intermittent failures easie
Furthermore, rr enables GDB reverse debugging commands such as reverse-next to go to the previous line, which makes it much easier to find the root cause of the problem.
Here is a minimal example of rr in action: How to go to the previous line in GDB?

You can run your test case under GDB in a loop using the command shown in https://stackoverflow.com/a/8657833/341065: gdb --eval-command=run --eval-command=quit --args ./a.out.
I have used this myself: (while gdb --eval-command=run --eval-command=quit --args ./thread_testU ; do echo . ; done).
Once it deadlocks and does not exit, you can just interrupt it by CTRL+C to enter into the debugger.

An easy quick debug to find deadlocks is to have some global variables that you modify where you want to debug, and then print it in a signal handler. You can use SIGINT (sent when you interrupt with ctrl+c) or SIGTERM (sent when you kill the program):
int dbg;
int multithreaded_function()
{
signal(SIGINT, dbg_sighandler);
...
dbg = someVar;
...
}
void dbg_sighandler(int)
{
std::cout << dbg1 << std::endl;
std::exit(EXIT_FAILURE);
}
Like that you just see the state of all your debug variables when you interrupt the program with ctrl+c.
In addition you can run it in a shell while loop:
$> while [ $? -eq 0 ]
do
./my_program
done
which will run your program forever until it fails ($? is the exit status of your program and you exit with EXIT_FAILURE in your signal handler).
It worked well for me, especially for finding out how many thread passed before and after what locks.
It is quite rustic, but you do not need any extra tool and it is fast to implement.

Couldn't terminate thread (error 6)

We have a huge, complex wxWidgets application written in C++. I added an extra background thread. When the user clicks "go", the thread starts. When they click "stop", the thread stops. For reasons beyond my comprehension, clicking "stop" also causes the following message to be displayed:
Can not wait for thread termination (error 6: the handle is invalid.)
Couldn't terminate thread (error 6: the handle is invalid.)
Why the hell is this happening?? And more importantly, how do I make this go away immediately?
The thread is started here:
_worker = new WorkerThread();
_worker->Create();
_worker->Run();
I know for a fact that the thread is running, because I can see the disk files it's writing.
The thread is stopped here:
if (_worker)
{
_worker->Delete();
_worker = NULL;
}
The WorkerThread class only overrides Enter(). It is definitely a detachable thread.
The documentation is full of dire warnings about how a detachable thread can delete itself at any moment, and everything must always be wrapped in a critical section. But my worker thread runs forever, until I tell it to stop. I can't see why I would need a critical section for anything.
Is the thread taking too long to stop? Is that the problem? (It only checks TestDestroy() once per second. Is that too slow?)
I really can't figure out how the hell to solve this.

You may "make it go away" by using wxLogNull, as with any other messages generated by wxWidgets. You should not do this however as you seem to have a real bug somewhere in your code, the thread handle obviously should not be invalid and if it is, something clearly doesn't go as you think it does. By sweeping the error under the carpet you all but guarantee that it will reappear in a different guise at the worst possible moment and typically on a clients machine where you will be unable to debug it. Better really do it now.

command to suspend a thread with GDB

I'm a little new to GDB. I'm hoping someone can help me with something that should be quite simple, I've used Google/docs but I'm just missing something.
What is the 'normal' way folks debug threaded apps with GDB? I'm using pthreads. I'm wanting to watch only one thread - the two options I see are
a) tell the debugger somehow to attach to a particular thread, such that stepping wont result in jumping threads on each context switch
b) tell the debugger to suspend/free any 'uninteresting' threads
I'd prefer to go route b) - reading the help for GDB I dont see a command for this, tips?

See documentation for set scheduler-locking on.
Beware: if you suspend other threads, and if one of them holds a lock, and if your interesting thread needs that lock at some point while stepping, you'll deadlock.
What is the 'normal' way folks debug threaded apps
You can never debug thread correctness, you can only design it in. In my experience, most of debugging of threaded apps is putting in assertions, and examining state of the world when one of the assertions is violated.

First, you need to enable comfortable for multi-threading debugger behavior with the following commands. No idea why it's disabled by default.
set target-async 1
set non-stop on
I personally put those commands into .gdbinit file. They make your every command to be applied only to the currently focused thread. Note: the thread might be running, so you have to pause it.
To see the focused thread execute the thread.
To switch to another thread append the number of the thread, e.g. thread 2.
To see all threads with their numbers issue info thread.
To apply a command to a particular thread issue something like thread apply threadnum command. E.g. thread apply 4 bt will apply backtrace command to a thread number 4. thread apply all continue continues all paused threads.
There is a small problem though — many commands needs the thread to be paused. I know a few ways of doing that:
interrupt command: interrupts the thread execution, accepts a number of a thread to pause, without an argument breaks the focused one.
Setting a breakpoint somewhere. Note that you may set a breakpoint to a particular thread, so that other threads will ignore it, like break linenum thread threadnum. E.g. break 25 thread 4.
You may also find very useful that you can set a list of commands to be executed when a breakpoint hit through the command commands — so e.g. you may quickly print interesting values, then continue execution.

Strange issue running infinite while loop in EXE

I am facing strange issue on Windows CE:
Running 3 EXEs
1)First exe doing some work every 8 minutes unless exit event is signaled.
2)Second exe doing some work every 5 minutes unless exit event signaled.
3)Third exe while loop is running and in while loop it do some work at random times.
This while loop continues until exit event signaled.
Now this exit event is global event and can be signaled by any process.
The Problem is
When I run First exe it works fine,
Run second exe it works fine,
run third exe it works fine
When I run all exes then only third exe runs and no instructions get executed in first and second.
As soon as third exe gets terminated first and second starts get processing.
It that can be the case that while loop in third exe is taking all CPU cycles?
I havn't tried putting Sleep but I think that can do some tricks.
But OS should give CPU to all processes ...
Any thoughts ???

Put the while loop in the third EXE to Sleep each time through the loop and see what happens. Even if it doesn't fix this particular probem, it isn't ever good practice to poll with a while loop, and even using Sleep inside a loop is a poor substitute for a proper timer.

On the MSDN, I also read that CE allows for (less than) 32 processes simultaneously. (However, the context switches are lightning fast...). Some are already taken by system services.

(From Memory) Processes in Windows CE run until completion if there are no higher priority processes running, or they run for their time slice (100ms) if there are other processes of equal priority running. I'm not sure if Windows CE gives the process with the active/foreground window a small priority boost (just like desktop Windows), or not.
In your situation the first two processes are starved of processor time so they never run until the third process exits. Some ways to solve this are:
Make the third process wait/block on some multi-process primitives (mutex, semaphore, etc) and a short timeout. Using WaitForMultipleObjects/WaitForSingleObject etc.
Make the third process wait using a call to Sleep every time around the processing loop.
Boost the priority of the other processes so when they need to run they will interrupt the third process and actually run. I would probably make the least often called process have the highest priority of the three processes.
The other thing to check is that the third process does actually complete its tasks in time, and does not peg the CPU trying to do its thing normally.

Yeah I think that is not good solution . I may try to use timer and see the results..

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js