gdb nostop SIGSEGV on a specific thread - c++

I have a program that purposely segfaults on one threads, but I have a problem that the other thread is segfaulting, I'd like to catch it with GDB, I saw that I can:
handle SIGSEGV nostop noprint
but I'd like to do that only on the thread that purposely does that.. is it possible?
I'll explain:
I have 2 threads, one thread is segfaulting(and recovers(mprotect read only and then releasing memory)), that works fine, the other thread does something else, but sadly, there is a bug and it is segfaulting, I want to catch that segfault, and not the other ones that occur in the other thread.

As I know, depending on the OS, and I assume linux for my answer and the answer is 'NO'!
Posix exceptions can have a sigmask per thread but only one handler per task. So it is not possible to set different handling for each thread. sigaction will handle it for the complete process. So I see no way for gdb to change this.

I'll explain: I have 2 threads, one thread is segfaulting(and recovers(mprotect read only and then releasing memory)), that works fine, the other thread does something else, but sadly, there is a bug and it is segfaulting, I want to catch that segfault, and not the other ones that occur in the other thread
You have to tell gdb to ignore the first SIGSEGV signal. So after the first sagfault use the signal 0 command in this thread. Your program will resume execution under gdb and that is that you want. Then it will stop at the second segfault in your second thread and this is what you want to inspect.
(gdb) help signal
Continue program with the specified signal.
Usage: signal SIGNAL
The SIGNAL argument is processed the same as the handle command.
An argument of "0" means continue the program without sending it a signal.
This is useful in cases where the program stopped because of a signal,
and you want to resume the program while discarding the signal.
So
Do not use handle SIGSEGV nostop noprint. Run your program under
gdb.
When it segfaults in the first threead do signal 0. Your program
resumes execution.
Then it segfaults in another thread. Now use backtrace to see the
problem.
Or if your two thread are not dependent on each other you can wait in the thread that first segfaulted while another segfault happen. Just do call sleep(60) in the first thread as soon as it causes a segfault and wait for another segfault in another thread. Your first thread will wait:
Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff7fde700 (LWP 25744)]
0x000000000040075d in my_thread_func1 (arg=0x0) at my_test_2.cpp:17
17 ptr1 = ptr1 / 0;
(gdb) call sleep(60)
Thread 140737343510272:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff75dd700 (LWP 25745)]
0x00000000004007a3 in my_thread_func2 (arg=0x0) at my_test_2.cpp:27
27 *ptr2 = *ptr2 + 2;
The program received a signal in another thread while
making a function call from GDB.
Evaluation of the expression containing the function
(sleep) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb)

Related

Does QThread::quit() immediately end the thread or does it wait until returning to the event loop?

There are a lot of Qt multi-threading tutorials out there that state that a QThread can be stopped safely using the following two lines.
qthread.quit(); // Cause the thread to cease.
qthread.wait(); // Wait until the thread actually stops to synchronize.
I have a lot of code doing this, and in most cases of stopping thread, I'll always set my own cancel flag and check it often during execution (as is the norm). Until now, I was thinking that calling quit would perhaps cause the thread to simply no longer execute any waiting signals (e.g. signals that are queued will no longer have their slots called) but still wait on the currently executing slot to finish.
But I'm wondering if I was right or if quit() actually stops the execution of the thread where it's at, for instance if something is unfinished, like a file descriptor hasn't been closed, it definitely should be, even though in most cases my worker objects will clean up those resources, I'd feel better if I knew exactly how quit works.
I'm asking this because QThread::quit() documentation says that it's "equivalent to calling QThread::exit(0)". I believe this means that the thread would immediately stop where it's at. But what would happen to the stackframe that quit was called in?
QThread::quit does nothing if the thread does not have an event loop or some code in the thread is blocking the event loop. So it will not necessarily stop the thread.
So QThread::quit tells the thread's event loop to exit. After calling it the thread will get finished as soon as the control returns to the event loop of the thread.
You will have to add some kind of abort flag if you are blocking event loop for example by working in a loop. This can be done by a boolean member variable that is public or at least has a public setter method. Then you can tell the thread to exit ASAP from outside (e.g. from your main thread) by setting the abort flag. Of course this will require your thread code to check the abort flag at regular intervals.
You may also force a thread to terminate right now via QThread::terminate(), but this is a very bad practice, because it may terminate the thread at an undefined position in its code, which means you may end up with resources never getting freed up and other nasty stuff. So use this only if you really can't get around it. From its documentation:
Warning: This function is dangerous and its use is discouraged. The thread can be terminated at any point in its code path. Threads can be terminated while modifying data. There is no chance for the thread to clean up after itself, unlock any held mutexes, etc. In short, use this function only if absolutely necessary.
I think this is a good way to finish a thread when you are using loops in a thread:
myThread->m_abort = true; //Tell the thread to abort
if(!myThread->wait(5000)) //Wait until it actually has terminated (max. 5 sec)
{
myThread->terminate(); //Thread didn't exit in time, probably deadlocked, terminate it!
myThread->wait(); //We have to wait again here!
}
In case, if you want to use Qt's builtin facility then try QThread::requestInterruption().
Main thread
struct X {
QThread m_Thread;
void Quit ()
{
m_Thread.quit();
m_Thread.requestInterruption();
}
};
Some Thread referred by X::m_Thread
while(<condition>) {
if(QThread::currentThread()->isInterruptionRequested())
return;
...
}
As per the documentation:
void QThread::requestInterruption()
Request the interruption of the thread. That request is advisory and it is up to code running on the thread to decide if and how it should act upon such request. This function does not stop any event loop running on the thread and does not terminate it in any way.

Does executing an int 3 interrupt stop the entire process on Linux or just the current thread?

Suppose the architecture is x86. And the OS is Linux based. Given a multithreaded process in which a single thread executes an int 3 instruction, does the interrupt handler stop from executing the entire process or just the thread that executed the int 3 instruction?
Since the question is Linux specific, let's dive into kernel sources! We know int 3 will generate a SIGTRAP, as we can see in do_int3. The default behaviour of SIGTRAP is to terminate the process and dump core.
do_int3 calls do_trap which, after a lot of indirection, calls complete_signal, where most of the magic happens. Following the comments, it's quite clear to see what is happening without much need for explanation:
A thread is found to deliver the signal to. The main thread is given first crack, but any thread can get it unless explicitly stated it doesn't want to.
SIGTRAP is fatal (and we've assumed we want to establish what the default behaviour is) and must dump core, so it is fatal to the whole group
The loop at line 1003 wakes up all threads and delivers the signal.
EDIT: To answer the comment:
When the process is being ptraced, the behaviour is pretty well documented in the manual page (see "Signal-delivery-stop"). Basically, after the kernel selects an arbitrary thread which handles the signal, if the selected thread is traced, it enters signal-delivery-stop -- this means the signal is not yet delivered to the process, and can be suppressed by the tracer process. This is the case with a debugger: a dead process is of no use to us when debugging (that's not entirely true, but let's consider the live-debugging scenario, which is the only one which makes sense in this context), so by default we block SIGTRAP unless the user specifies otherwise. In this case it is irrelevant how the traced process handles SIGTRAP (SIG_IGN or SIG_DFL or a custom handler) because it will never know it occurred.
Note that in the case of SIGTRAP, the tracer process must account for various scenarios other than the process being stopped, as also detailed in the man page under each ptrace action.
Easy enough to test:
#include <thread>
#include <vector>
void f(int v) {
std::this_thread::sleep_for(std::chrono::seconds(2));
if (v == 2) asm("int $3");
std::this_thread::sleep_for(std::chrono::seconds(1));
printf("%d\n", v); // no sync here to keep it simple
}
int main() {
std::vector<std::thread> threads;
for (int i = 0; i < 4; i++) threads.emplace_back(f, i);
for (auto& thread : threads) thread.join();
return 0;
}
If only thread was stopped it should still print the message from threads other then 2 but that is not the case and entire process stops before printing anything (or triggers a breakpoint when debugging). On Ubuntu the message you get is:
Trace/breakpoint trap (core dumped)
int 3 is a privileged instruction that userspace code is not allowed to run.
The kernel will then send a SIGTRAP signal to your process, and the default action for a SIGTRAP signal is to terminate the entire process.
The answer is really neither. Int 3 is used to trigger a breakpoint. The interrupt handler is tiny, and neither the interrupt nor its handler stop any threads.
If there is no debugger loaded the handler will either ignore it or call the OS to take some kind of error action like raising a signal (perhaps SIGTRAP). No threads are harmed.
If there is an in-process debugger, the breakpoint ISR transfers control to it. The breakpoint does not stop any threads, except the one that breaks. The debugger may try to suspend others.
If there is a out-of-process debugger, the handler will invoke it, but this has to be mediated through the OS in order to do a suitable context switch. As part of that switch the OS will suspend the debuggee, which means all its threads will stop.

Why does pthread_exit() in rare cases cause a SEGV when called after pthread_detach()?

I am getting a SEGV in C++ that I cannot easily reproduce (it occurs in about one in 100,000 test runs) in my call to pthread_join() as my application is shutting down. I checked the value of errno and it is zero. This is running on Centos v4.
Under what conditions would pthread_join() get a SEGV? This might be some kind of race condition since it is extremely rare. One person suggests I should not be calling pthread_detach() and pthread_exit(), but I am not clear on why.
My first working hypothesis was that pthread_join() is being called while pthread_exit() is still running in the other thread and that this somehow leads to a SEGV, but many have stated this is not an issue.
The failing code getting SEGV in the main thread during application exit looks roughly like this (with error return code checking omitted for brevity):
// During application startup, this function is called to create the child thread:
return_val = pthread_create(&_threadId, &attr,
(void *(*)(void *))initialize,
(void *)this);
// Apparently this next line is the issue:
return_val = pthread_detach(_threadId);
// Later during exit the following code is executed in the main thread:
// This main thread waits for the child thread exit request to finish:
// Release condition so child thread will exit:
releaseCond(mtx(), startCond(), &startCount);
// Wait until the child thread is done exiting so we don't delete memory it is
// using while it is shutting down.
waitOnCond(mtx(), endCond(), &endCount, 0);
// The above wait completes at the point that the child thread is about
// to call pthread_exit().
// It is unspecified whether a thread that has exited but remains unjoined
// counts against {PTHREAD_THREADS_MAX}, hence we must do pthread_join() to
// avoid possibly leaking the threads we destroy.
pthread_join(_threadId, NULL); // SEGV in here!!!
The child thread which is being joined on exit runs the following code which begins at the point above where releaseCond() is called in the main thread:
// Wait for main thread to tell us to exit:
waitOnCond(mtx(), startCond(), &startCount);
// Tell the main thread we are done so it will do pthread_join():
releaseCond(mtx(), endCond(), &endCount);
// At this point the main thread could call pthread_join() while we
// call pthread_exit().
pthread_exit(NULL);
The thread appeared to come up properly and no error codes were produced during its creation during application startup and the thread performed its task correctly which took around five seconds before the application exited.
What might cause this rare SEGV to occur and how might I program defensively against it. One claim is that my call to pthread_detach() is the issue, if so, how should my code be corrected.
Assuming:
pthread_create returns zero (you are checking it, right?)
attr is a valid pthread_attr_t object (How are you creating it? Why not just pass NULL instead?)
attr does not specify that the thread is to be created detached
You did not call pthread_detach or pthread_join on the thread somewhere else
...then it is "impossible" for pthread_join to fail, and you either have some other memory corruption or a bug in your runtime.
[update]
The RATIONALE section for pthread_detach says:
The *pthread_join*() or *pthread_detach*() functions should eventually be
called for every thread that is created so that storage associated
with the thread may be reclaimed.
Although it does not say these are mutually exclusive, the pthread_join documentation specifies:
The behavior is undefined if the value specified by the thread
argument to *pthread_join*() does not refer to a joinable thread.
I am having trouble finding the exact wording that says a detached thread is not joinable, but I am pretty sure it is true.
So, either call pthread_join or pthread_detach, but not both.
If you read the standards documentation for pthread_join and pthread_exit and related pages, the join suspends execution "until the target thread terminates", and the thread calling pthread_exit doesn't terminate until it's done calling pthread_exit, so what you're worried about can't be the problem.
You may have corrupted memory somewhere (as Nemo suggests), or called pthread_exit from a cleanup handler (as user315052 suggests), or something else. But it's not "a race condition between pthread_join() and pthread_exit()", unless you're on a buggy or non-compliant implementation.
There is insufficient information to fully diagnose your problem. I concur with the other posted answers that the problem is more likely undefined behavior in your code than a race condition between pthread_join and pthread_exit. But I would also agree the existence of such a race would constitute a bug in the pthread library implementation.
Regarding pthread_join:
return_val = pthread_create(&_threadId, &attr,
(void *(*)(void *))initialize,
(void *)this);
//...
pthread_join(_threadId, NULL); // SEGV in here!!!
It looks like the join is in a class. This opens up the possibility that the object could be deleted while main is trying to do the join. If pthread_join is accessing freed memory, the result is undefined behavior. I am leaning towards this possibility, since accessing freed memory is very often undetected.
Regarding pthread_exit: The man page on Linux, and the POSIX spec state:
An implicit call to pthread_exit() is made when a thread other than the thread in which main() was first invoked returns from the start routine that was used to create it. The function's return value shall serve as the thread's exit status.
The behavior of pthread_exit() is undefined if called from a cancellation cleanup handler or destructor function that was invoked as a result of either an implicit or explicit call to pthread_exit().
If the pthread_exit call is made in a cleanup handler, you will have undefined behavior.

What signal might pthread_join() cause?

I had a an error condition in C++ that I cannot easily reproduce in my call to pthread_join() some signal was generated, I do not know which one, but my signal handler was called and for some reason did not print out the normal debug information on the signal that was generated. I did get a stack trace that showed:
# 2 /lib/tls/libpthread.so.0: pthread_join(...) +0x1c [0xce439c]
I reviewed the man page for pthread_join() and did not see any mention of signals.
What might have been the signal generated and what might have been the cause? This might be some kind of race condition.
http://linux.die.net/man/7/pthreads :
Signals are used in implementation internally
http://osr600doc.sco.com/man/html.PTHREAD/pthread_join.PTHREAD.html :
The wait in pthread_join is not broken by a signal.
If a thread waiting in pthread_join receives a signal that is not masked,
it will execute the signal handler, and then return to waiting in pthread_join.
Note that this behavior differs from that of pthread_cond_wait.
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=%2Fapis%2Fusers_25.htm :
Threadsafe: Yes
Signal Safe: No
Basically, you might be experiencing the fact that pthread_join() accepts some other signal due to some error or external event.
We cannot guess what exactly causes the signal.
P.S. The exact environment is not specified which makes decision even harder.

SIGINT while doing std::thread::join()

I have a main(), which spawns a thread, and then joins to it.
I want to be able to CTRL-C the program, so I would install SIGINT handler in main (the spawn thread will ignore this signal). When I am in sig-handler I will cancel the spawned thread with cancel(), but what happens with the current 'join()', which was active during the signal invocation.
My guess is that I will get EAGAIN or EINTR, and I would have to make join() in loop. Am I right? Thank you.
The question is: Is this legal with mulithreading. I don't mind to just set a flag withing SIGINT handle, but what happens with the join() call?
Signals and threads? Here be dragons! You have to fully-specify the masks, or else any thread may receive the signal.
The signal handler should generally not assume it is running in the "main" thread. Rather, it should post a message and return, analagously to thread interruption. The main thread can pick this up later in an event loop or whatever and then join.
std::thread::join() has a void return type, so cannot return EINTR. On POSIX platforms it is likely a wrapper around pthread_join, which does not return EINTR. Joining a thread should not return or throw until the thread has been successfully joined-with, provided it is called on a joinable thread.
As an aside, it may not be safe to cancel the thread from a signal handler. std::thread does not have a cancel() member function, so I presume you have written your own. You therefore need to check that it is safe for use in a signal handler --- pthread_cancel() is not listed as a function that is safe to call from a signal handler, for example.