A very odd one: I have a Qt 4 embedded app that runs on framebuffer, it normally runs from inittab as the only UI on the box. There is an option to put the machine to sleep - I do the normal thing and open /sys/power/state, write "mem" and close it (using QFile). Very straight forward and it works fine EXCEPT the first time the app runs after booting. If it sleeps then it receives SIGUSR2 and just hangs forever with a blank screen. The hang occurs after resume.
But, if I manually kill it and run it again .. sleep works fine again. Note that it must do the failed sleep attempt and be killed - whereafter all is peachy every time it runs, SIGUSR2 never shows up again.
I have already tried to trap the signal - doesn't trap. No idea why - except that I know that pthreads uses SIGUSR2 ..
Stumped. Ideas? Clues?
[edit] I found that if I fork() and write to /sys/power/state in the child and exit it sort of solves the problem, but it doesn't solve the mystery..
[edit 2] I subsequently found that in fact the child is still hanging when the machine is shut down (causing it to hang forever without shutting down..), although the ugly hack just mentioned did fix the hang coming out of suspend - I have not figured out what is happening but finally solved it by just using a script/daemon: in a while loop it checks a file in /tmp for an action and either halts or suspends and restarts the app afterwards .. not pretty but it works.
And still the mystery of the SIGUSR2 hang remains ..
Related
I'm having this obscure problem since 2 days : I created a launch-at-boot application in C++ on a debian system, which worked flawlessly until I integrated some multithreading elements.
There are only 2 threads (1 main and 1 child)
I included -lpthread and -pthread in the makefile
I tried both /.config/autostart and the .desktop file methods (same
result)
The program is lanched with sudo
There is no error/crash anywhere, the main thread works OK, but the
child thread runs 1 iteration only then stops for some reason
even tried to add some sleep in the lxsession boot sequence
If I launch the same command line than in the autostart file in a terminal (sudo or not), it works perfectly.
Its been 2 days and I just have NO CLUE !
If someone experienced this before or can find some logic in it, i'll be ever grateful.
It appears to me that you simply have ... a bug in your new logic. You have made an error in the design of your multi-threading logic, such that the child thread only runs one iteration. (Or, much more likely, stalls in an infinite-wait. Waits for a event that is never signaled, a semaphore that is never raised, a queue that runs dry and is never filled, and so on.)
We can help you further if you post excerpts of the code in question ... only illustrating how the child thread is launched and how it interacts with the parent. (Condition-variables, semaphores, and so-forth, which is probably where the crux of your error lies.)
I would suggest that "all the other stuff is irrelevant." You don't need "a sleep in the boot-sequence" (if the sequence waits for your program to complete, and if it needs to). I suggest that it seems to me that you simply have ... a bug in your new code which introduces multi-threading.
And you might wish to contemplate whether multi-threading is advantageous, given that you had a non-threaded version of the same thing that worked properly. If the processing that is to be done used to be done (successfully) by a single thread, such processing might or might not be more-advantageously processed by "n threads." Should you find-and-fix this bug, or is it just as well to abandon the change and revert back to what worked? Only you can decide that ...
Thank you all for your suggestions.
I found a "fix" : running the startup program in a terminal ('#lxterminal -e url/to/program &' in autostart of lxsession) instead of background seems to fix it SOMEHOW. There is no GUI though ... it is a service.
The multithreaded logic isnt at fault here, not my first shot, and I really want to keep this feature (#Mike Robinson).
I will reconsider the use of sudo as suggested as well, which seems sketchy all things considered. It might get it running in background. thanks # datenwolf.
I'm trying to debug a custom thread pool implementation that has rarely deadlocks. So I cannot use a debugger like gdb because I have click like 100 times "launch" debugger before having a deadlock.
Currently, I'm running the threadpool test in an infinite loop in a shell script, but that means I cannot see variables and so on. I'm trying to std::cout data, but that slow down the thread and reduce the risk of deadlocks meaning that I can wait like 1hour with my infinite before getting messages. Then I don't get the error, and I need more messages, which means waiting one more hour...
How to efficiently debug the program so that its restart over and over until it deadlocks ? (Or maybe should I open another question with all the code for some help ?)
Thank you in advance !
Bonus question : how to check everything goes fine with a std::condition_variable ? You cannot really tell which thread are asleep or if a race condition occurs on the wait condition.
There are 2 basic ways:
Automate the running of program under debugger. Using gdb program -ex 'run <args>' -ex 'quit' should run the program under debugger and then quit. If the program is still alive in one form or another (segfault, or you broke it manually) you will be asked for confirmation.
Attach the debugger after reproducing the deadlock. For example gdb can be run as gdb <program> <pid> to attach to running program - just wait for deadlock and attach then. This is especially useful when attached debugger causes timing to be changed and you can no longer repro the bug.
In this way you can just run it in loop and wait for result while you drink coffee. BTW - I find the second option easier.
If this is some kind of homework - restarting again and again with more debug will be a reasonable approach.
If somebody pays money for every hour you wait, they might prefer to invest in a software that supports replay-based debugging, that is, a software that records everything a program does, every instruction, and allows you to replay it again and again, debugging back and forth. Thus instead of adding more debug, you record a session during which a deadlock happens, and then start debugging just before the deadlock happened. You can step back and forth as often as you want, until you finally found the culprit.
The software mentioned in the link actually supports Linux and multithreading.
Mozilla rr open source replay based debugging
https://github.com/mozilla/rr
Hans mentioned replay based debugging, but there is a specific open source implementation that is worth mentioning: Mozilla rr.
First you do a record run, and then you can replay the exact same run as many times as you want, and observe it in GDB, and it preserves everything, including input / output and thread ordering.
The official website mentions:
rr's original motivation was to make debugging of intermittent failures easie
Furthermore, rr enables GDB reverse debugging commands such as reverse-next to go to the previous line, which makes it much easier to find the root cause of the problem.
Here is a minimal example of rr in action: How to go to the previous line in GDB?
You can run your test case under GDB in a loop using the command shown in https://stackoverflow.com/a/8657833/341065: gdb --eval-command=run --eval-command=quit --args ./a.out.
I have used this myself: (while gdb --eval-command=run --eval-command=quit --args ./thread_testU ; do echo . ; done).
Once it deadlocks and does not exit, you can just interrupt it by CTRL+C to enter into the debugger.
An easy quick debug to find deadlocks is to have some global variables that you modify where you want to debug, and then print it in a signal handler. You can use SIGINT (sent when you interrupt with ctrl+c) or SIGTERM (sent when you kill the program):
int dbg;
int multithreaded_function()
{
signal(SIGINT, dbg_sighandler);
...
dbg = someVar;
...
}
void dbg_sighandler(int)
{
std::cout << dbg1 << std::endl;
std::exit(EXIT_FAILURE);
}
Like that you just see the state of all your debug variables when you interrupt the program with ctrl+c.
In addition you can run it in a shell while loop:
$> while [ $? -eq 0 ]
do
./my_program
done
which will run your program forever until it fails ($? is the exit status of your program and you exit with EXIT_FAILURE in your signal handler).
It worked well for me, especially for finding out how many thread passed before and after what locks.
It is quite rustic, but you do not need any extra tool and it is fast to implement.
I have written a C++ program and I am executing in the gnome terminal (I am on Ubuntu). I press Ctrl + Z, which suspends the process. Later on, I execute % on the same terminal, which resumes execution.
From what I've read, Ctrl+Z sends a TSTP signals to the process, which tells it to stop execution. But TSTP is polite, in the sense that the process is allowed to continue until it decides it can stop. In my C++ program code, I didn't do anything to explicitly deal with TSTP signals. So, my question is, what things inside my C++ code will continue running in spite of the TSTP signal? For example, if I have a file stream open, will it wait until it is closed? I expect an overall answer, not too deep or covering all the details. I just want an idea of how this happens.
Your program continues running while the SIGTSTP handler executes. Since you haven't set one up, you get the default signal handling behavior, which is for the process to be stopped.
While your process is stopped, it simply isn't scheduled for execution. Files don't get closed, nor is stopping delayed until files get closed (unless done in the signal handler).
This website looks like it has a helpful explanation of how a handler can be installed to perform some tasks and then have the default stopping behavior:
http://man7.org/tlpi/code/online/dist/pgsjc/handling_SIGTSTP.c.html
C++ code with around 5k lines hangs randomly - in linux. My code deals with transmitting and reception of packets through RAW socket. The code just stops at a point randomly without any response - not even [ctrl+c] proves handy :: every time after hang I used to kill the process.
I tried GDB and result was same it hanged - ctrl+c produced a SIGTERM error message .
On using valgrind the code hanged similarly .
How to debug this issue? Is it any kind of system error?
Using strace command , it was clear that the hang was due to futex_wait_private issue. Socket read was pushed into deadlock scenario. On increasing the select timeout value - the issue could be resolved.
I have an iPhone app that is recording audio and then broadcasting it across the network. In response to a "stop" from the far end it queues a notification to stop the recording. Alas when it gets to the AudioQueueStop call the application just hangs (ie Stop never exits). Thanks to the notification all AudioQueue manipulation is happening on the same thread.
Has anyone got any idea whats going on here?
Edit: I have set up a listener in the UI thread that handles the recorder.
Then from my network thread I use a "postNotificationName" beliving that it was post a message to the UI thread and everything would run from that thread. This does not appear to be the case. When I break point the function that is called by the postNotificationName it appears that the call is being made on the networking thread and NOT on the UI Thread.
I assume this is my error. Anyone know how to make it work by notifying the UIThread to handle it?
Edit2: OK I've re-written it to use performSelectorOnMainThread. And it still crashes.
On the plus side I just learnt how to get a lot more info out of XCode so I can see the call stack goes:
semaphore_timedwait_signal_trap
semaphore_timedwait_signal
_pthread_cond_wait
pthread_cond_timedwait_relative_np
CAGuard::WaitFor
ClientAudioQueue::ServicePendingCallbacks
AudioQueueStop
[etc]
Anyone got any ideas why it hangs?
How do you call AudioQueueStop ? The function supports two modes: synchronous and asynchronous.
The preferred way is to use the asynchronous stopping as the function will return immediately, while the remaining buffers will be played/recorded.
If you want to go synchronous and you get a hang, then maybe there is a dead-lock or a race condition somewhere. Have you tried to pause the application under the debugger and check the threads' stack-frames to see where the problem is ?