I'm developing a multi-threaded C++ application using GCC 4.4.5 and GDB 7.2.
At the moment, I have four threads. Each one interacts with a CAN bus in one form or another, either reading, writing, polling or handling messages.
In order to determine which thread is doing what, I have decided to add the thread IDs to log messages.
In my logging functions, I have the following code:
// This is for outputting debug messages
void logDebug(string msg, thread::id threadId[ = NULL]) {
#ifdebug _DEBUG
threadState.outputLock->lock();
if (threadId != NULL)
cout << "[Thread #" << threadId << "] ";
// The rest of the output
threadState.outputLock->unlock();
#endif
}
This is the (debug) output from the application:
[Thread #3085296768] [DEBUG] [Mon Jun 17 10:18:45 2019] CAN frame was empty or no message on bus...
----------
And this is the what GDB is telling me:
Thread #3 7575 [core: 0] (Suspended: Breakpoint)
----
Why is the debugger giving me different information from the application (the thread IDs/numbers) and is there a way to output the same information in the application, as the debugger is telling me?
The expected behaviour is that the thread IDs are identical.
EDIT:
I forgot to add some possibly important information.
I'm cross-compiling to an embedded device powered by a POWERPC chip, running a derivative of Debian Wheezy.
You can get the thread id from your application with the following system call : syscall(SYS_gettid)
From there you can set the thread name by either :
writing directly the name in /proc/PID/task/TID/comm
using the pthread function int pthread_setname_np(pthread_t thread, const char *name)
Then in GDB you can easily match the given thread name, its Linux TID and the GDB thread ID with info threads command.
Hope this helps.
Related
Hello my fellow Developers,
I am trying to analyse stack-dump from a C++ program, in this particular program we are creating signal handlers and and inside signal handler we have code something like below
res = backtrace(arr, size);
if (res > 0)
{
btrace = backtrace_symbols(arr, size);
}
for ( unsigned int i =0; i < size ; ++i )
{
log_error(" trace: %s", btrace[i] );
}
I am able to get traces however unable to find address which i am expecting, now I am wondering if there is any possibility that other threads might have filled the stacktrace buffer(as it is an embedded device and we have limited buffer size), if so how i can identify/filter stacktraces thread wise or is there any option to restrict stacktraces to only thread which caught segmentation fault.
Note: I am able to identify 2 addresses from my executable which are present in stacktrace, very recent one is the address of signal handler itself and 2nd one is from another thread.
I tried to googling resources on how backtrace works in multi threaded env. on but not able to find much info, so I am here.
Thanks
I have the following code in my program.
Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask); //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;
The thread that executes this code will properly call the signal handler when on native linux. The problem is that on Windows Subsystem for Linux, the program hangs with the SIGUSR1 or SIGUSR2 is sent via pthread_kill which sends signals to a thread. Why does this work on native ubuntu (via VMWARE WORKSTATION 14) and debian and fedora, but NOT WSL?
When you have a hanging bug that you cannot reproduce when running within the debugger, you can attach the debugger to the running process after you reproduce the hang. This won't let you observe the variables changing as you lead to the hang, but at least you get the stack trace of exactly where the hang is occurring.
Once you know the process id of the hung process (assume it's 12345), you can use:
$ gdb -p 12345
Or, you can kill the process with a signal that will cause a core to be generated. I like to use SIGTRAP, since it is easy to distinguish from a SIGSEGV.
$ kill -SIGTRAP 12345
And then you can use gdb to discover what the process was hanging on.
The advantage of attaching to the running process is that the process is still live. This allows you to call functions from the debugger, which may provide easier access to diagnostics built into your program. The core file preserves the error, which is beneficial if the hanging bug is difficult to reproduce.
To debug a locked file problem, we're calling SysInternal's Handle64.exe 4.11 from a .NET process (via Process.Start with asynchronous output redirection). The calling process hangs on Process.WaitForExit because the Handle64 process doesn't exit (for more than two hours).
We took a dump of the corresponding Handle64 process and checked it in the Visual Studio 2017 debugger. It shows two threads ("Main Thread" and "ntdll.dll!TppWorkerThread").
Main thread's call stack:
ntdll.dll!NtWaitForSingleObject () Unknown
ntdll.dll!LdrpDrainWorkQueue() Unknown
ntdll.dll!RtlExitUserProcess() Unknown
kernel32.dll!ExitProcessImplementation () Unknown
handle64.exe!000000014000664c() Unknown
handle64.exe!00000001400082a5() Unknown
kernel32.dll!BaseThreadInitThunk () Unknown
ntdll.dll!RtlUserThreadStart () Unknown
Worker thread's call stack:
ntdll.dll!NtWaitForSingleObject() Unknown
ntdll.dll!LdrpDrainWorkQueue() Unknown
ntdll.dll!LdrpInitializeThread() Unknown
ntdll.dll!_LdrpInitialize() Unknown
ntdll.dll!LdrInitializeThunk() Unknown
My question is: Why would a process hang in LdrpDrainWorkQueue? From https://stackoverflow.com/a/42789684/62838, I gather that this is the Windows 10 parallel loader at work, but why would it get stuck while exiting the process? Can this be caused by how we invoke Handle64 from another process? I.e., are we doing something wrong or is this rather a bug in Handle64?
How long did you wait?
According to this analysis,
The worker thread idle timeout is set to 30 seconds. Programs which
execute in less than 30 seconds will appear to hang due to
ntdll!TppWorkerThread waiting for the idle timeout before the process
terminates.
I would recommend trying to set the registry key specified in that article to disable the parallel loader and see if this resolved the issue.
Parent Key: HKLM\Software\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\handle64.exe
Value Name: MaxLoaderThreads
Type: DWORD
Value: 1 to disable
I'm currently writing a multi-threaded, high efficient and scalable algorithm. Because I have to guess a parameter for the code and I'm not sure how the calculation performs on a specific data set, I would like to watch a variable. The test only works with a real world, huge data set. It is possible to analyze the collected data after profiling. Imagine the following, simple code example (real code can contain multiple watch points:
// function get's called by loops of multiple threads
void payload(data_t* data, double threshold) {
double value = calc(data);
// here I want to watch the value
if (value < threshold) {
doSomething(data);
} else {
doSomethingElse(data);
}
}
I thought about the following approaches:
Using cout or other system outputs
Use a binary output (file, network)
Set a breakpoint via gdb/lldb
Use variable watching + logging via gdb/lldb
I'm not happy with the results because: To use 1. and 2. I have to change the code, but this is a debugging/evaluating task. Furthermore 1. requires locking and 1.+2. requires I/O operations, which heavily slows down the entire code and makes testing with real data nearly impossible. 3. is also too slow. To use 4., I have to know the variable address because it's not a global variable, but because threads get created by a dynamic scheduler, this would require breaking + stepping for each thread.
So my conclusion is, that I need a profiler/debugger that works at machine code level and dumps/logs/watches the variable without double->string conversion and is highly efficient, or to sum up with other words: I would like to profile the internal state of my algorithm without heavy slow-down and without doing deep modification. Does anybody know a tool that is able to this?
OK, this took some time but now I'm able to present a solution for my problem. It's called tracepoints. Instead of breaking the program every time, it's more lightweight and (ideally) doesn't change performance/timing too much. It does not require code changes. Here is an explanation how to use them using gdb:
Make sure you compiled your program with debugging symbols (using the -g flag). Now, start the gdb server and provide a network port (e.g. 10000) and the program arguments:
gdbserver :10000 ./program --parameters you --want --to use
Now, switch to a second console and start gdb (program parameters are not required here):
gdb ./program
All following commands are entered in the gdb command line interface. So let's connect to the server:
target remote :10000
After you got the connection confirmation, use trace or ftrace to set a tracepoint to a specific source location (try ftrace first, it should be faster but doesn't work on all platforms):
trace source.c:127
This should create tracepoint #1. Now you can setup an action for this tracepoint. Here I want to collect the data from myVariable
action 1
collect myVariable
end
If expect much data or want to use the data later (after restart), you can set a binary trace file:
tsave trace.bin
Now, start tracing and run the program:
tstart
continue
You can wait for program exit or interrupt your program using CTRL-C (still on gdb console, not on server side). Continue by telling gdb that you want to stop tracing:
tstop
Now we come the tricky part and I'm not really happy with the following code because it's really slow:
set pagination off
set logging file trace.txt
tfind start
while ($trace_frame != -1)
set logging on
printf "%f\n", myVariable
set logging off
tfind
end
This dumps all variable data to a text file. You can add some filter or preparation here. Now you're done and you can exit gdb. This will also shutdown the server:
quit
For detailed documentation especially for explanation of filtering and more advanced tracepoint positions, you can visit the following document: http://sourceware.org/gdb/onlinedocs/gdb/Tracepoints.html
To isolate trace file writing from your program execution, you can use cgroups or another network connected computer. When using another computer, you have to add the host to the port information (e.g. 192.168.1.37:10000). To load a binary trace file later, just start gdb as shown above (forget the server) and change the target:
gdb ./program
target tfile trace.bin
you can set hardware watchpoint using gdb debugger, for example if you have
bool b;
variable and you want to be notified every time the value of it has chenged (by any thread)
you would declare a watchpoint like this:
(gdb) watch *(bool*)0x7fffffffe344
example:
root#comp:~# gdb prog
GNU gdb (GDB) 7.5-ubuntu
Copyright ...
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses...done.
(gdb) watch *(bool*)0x7fffffffe344
Hardware watchpoint 1: *(bool*)0x7fffffffe344
(gdb) start
Temporary breakpoint 2 at 0x40079f: file main.cpp, line 26.
Starting program: /dist/Debug/GNU-Linux-x86/cppapp_socket5_ipaddresses
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = true
New value = false
main () at main.cpp:50
50 if (strcmp(mask, "255.0.0.0") != 0) {
(gdb) c
Continuing.
Hardware watchpoint 1: *(bool*)0x7fffffffe344
Old value = false
New value = true
main () at main.cpp:41
41 if (ifa ->ifa_addr->sa_family == AF_INET) { // check it is IP4
(gdb) c
Continuing.
mask:255.255.255.0
eth0 IP Address 192.168.1.5
[Inferior 1 (process 18146) exited normally]
(gdb) q
I am having problem in getting my stack trace output to stderr or dumping to a log file. I am running the code in Kubuntu10.04 with gcc compiler (4.4.3). The issue is that in the normal running mode (without gdb), the program does not output anything except 'Segmentation Fault'. I wish to output the backtrace output as in the print statements below. When I run gdb with my application, it comes to the printf/fprintf/(function call) statement, and then crashes with the following statement:
669 {
(gdb)
670 printf("Testing for stability.\n");
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff68b1f45 in puts () from /lib/libc.so.6
The strange things is that it works if I call a function within the same file that crashes, it works fine and spews the output properly. But if the program crashes in a function outside this file, it does not print any output.
So no printf or file dumping statement or function call gets processed. I am using the following sample code:
void bt_sighandler(int sig, siginfo_t *info,
void *secret) {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
ucontext_t *uc = (ucontext_t *)secret;
/* Do something useful with siginfo_t */
if (sig == SIGSEGV)
printf("Got signal %d, faulty address is %p, "
"from %p\n", sig, info->si_addr,
uc->uc_mcontext.gregs[0]);
else
printf("Got signal %d#92; \n", sig);
trace_size = backtrace(trace, 16);
/* overwrite sigaction with caller's address */
trace[1] = (void *) uc->uc_mcontext.gregs[0];
messages = backtrace_symbols(trace, trace_size);
/* skip first stack frame (points here) */
printf("[bt] Execution path:#92; \n");
for (i=1; i<trace_size; ++i)
printf("[bt] %s#92; \n", messages[i]);
exit(0);
}
int main() {
/* Install our signal handler */
struct sigaction sa;
sa.sa_sigaction = (void *)bt_sighandler;
sigemptyset (&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGSEGV, &sa, NULL);
sigaction(SIGUSR1, &sa, NULL);
/* Do something */
printf("%d#92; \n", func_b());
}
Thanks in advance for any help.
Unfortunately you just can't reliably do much of anything in a SIGSEGV handler. Think about it this way: Your program has a serious error and its state (including system level state such as the heap) is in an inconsistent state.
In such a case, you can't expect the OS to magically fix up the heap and other internals it needs in order to be able to execute arbitrary code within your signal handler.
If the SEGV happens in your own code, the good solution is to use the core and fix the root problem. If the core happens in other code via say a shared library, I'd suggest isolating that code in an entirely separate binary and communicate between the two binaries. Then if the library crashes your main program does not.
You are supposed to do very little in a signal handler, in principle only access variables of type sig_atomic_t and volatile data.
Doing I/O is definitely out of the question. See this page for gcc:
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy
Try using simpler functions, such as strcat() and write().
Is there a reason you can't use valgrind?
When the application crashes Linux creates a core dump with the state of the application when it crashed. The core file can be examined using gdb.
If no core file is created try changing core file size with
ulimit -c unlimited
in the same shell and before the program is started.
The name of the core file is usually core.PID where PID is the pid of the program.
The core file is usually placed somewhere in /tmp or the directory where the program was started.
A lot more info on core files is available on the man page for core. Use
man core
to read the man page.
I managed to get it partially working. Actually I was running the application in 'sudo' mode. Running it in user mode gives me the callstack. However running in user mode disables hardware acceleration (nvidia graphics drivers). To resolve that, I added myself to the 'video' group, so that I have access to /dev/nvidia0 & /dev/nvidiactl. However when I get the access the stack does not get generated anymore. Its only when I am in user mode and hardware acceleration is disabled, the stack is coming. But I can't run my application without hardware acceleration (mean some important functionality would get disabled). Please let me know if anyone has any idea.
Thanks.