I'm debugging a daemon and after it forks and child calls setsid() the execution cannot be interrupted by pressing Ctrl-C.
Here is a simple example:
// test.c
#include <unistd.h>
#include <stdio.h>
int main()
{
int i = 0;
if(fork())
{
printf("Parent\n");
return 1;
}
printf("Child\n");
setsid();
for(;;)
i++;
return 0;
}
Steps to reproduce:
gcc -g -o test test.c
gdb ./test
In gdb shell:
set follow-fork-mode child (because by default gdb follows parent process)
run
press Ctrl-C
And nothing happens. Even if I send SIGINT to the process, it is hanging. I have to kill gdb to stop this.
But I can interrupt loop if I send SIGINT before pressing Ctrl-C. If I set breakpoint inside loop, process stops on it fine.
If I comment out one of "fork()" or "setsid()", everything works as expected.
Some workarounds I know are:
Run process in foreground for debugging purpose
Add some sleep in child process after it daemonizes. And attach gdb to it
But maybe someone knows better solution or can explain, why this is happening.
Thanks in advance
Linux 4.12.10-1 x86_64
gcc 7.1.1
GNU gdb (GDB) 8.0
UPD 2017-09-11:
Some more research (here stat <process> stands for grep State /proc/<pid of process>/status)
Scenario 1 (send SIGINT before pressing Ctrl-C, in this case everything work as expected)
$ stat gdb
State: S (sleeping)
$ stat test
State: R (running)
$ killall -SIGINT test
$ stat gdb
State: S (sleeping)
$ stat test
State: t (tracing stop)
Scenario 2 (send SIGINT after pressing Ctrl-C, in this case gdb seems to be hanging)
$ stat gdb
State: R (running)
$ stat test
State: R (running)
$ killall -SIGINT test
$ stat gdb
State: R (running)
$ stat test
State: t (tracing stop)
$ grep TracerPid /proc/`pidof test`/status
TracerPid: 26533
Backtrace of gdb:
#0 0x00007fbf7bb316c0 in __write_nocancel () from /usr/lib/libpthread.so.0
#1 0x00005625e2cc090c in serial_event_set(serial_event*) ()
#2 <signal handler called>
#3 0x00007fbf7aae3ba7 in kill () from /usr/lib/libc.so.6
#4 0x00005625e2cf908b in default_target_pass_ctrlc(target_ops*) ()
#5 0x00005625e2d0b3a4 in maybe_quit() ()
#6 0x00005625e2c24e92 in invoke_async_signal_handlers() ()
#7 0x00005625e2c259e6 in start_event_loop() ()
#8 0x00005625e2c7d33b in captured_command_loop(void*) ()
#9 0x00005625e2c27c55 in catch_errors(int (*)(void*), void*, char const*, return_mask) ()
#10 0x00005625e2c7e63f in gdb_main(captured_main_args*) ()
#11 0x00005625e2a62bec in main ()
Related
When I use perf tool to analyze my business process,It's abborted!
It's only happens to capture my business process data, other processes are normal.
I don't konw the reason.
Is there a bug with my program? how to debug?
How to use perf record normally?
Thanks for the help.
Thanks Evenyone
env:
# uname -a
Linux rk3326_64 4.4.194 #6 SMP Tue Jun 15 19:28:51 CST 2021 aarch64 GNU/Linux
# perf --version
perf version 4.4.194
perf top command and response
# perf top -p <pid>
double free or corruption (!prev)
Aborted
perf record command
# because this version perf doesn't support --sleep (specify the acquisition time ), so I use Ctrl-C to stop the perf command. when I stopped(Ctrl-C),it was aborted
# perf record -F 100 -p <pid>
double free or corruption (!prev)
Aborted
I open coredump on the device. coredump
coredump filename: core-perf-14092-6-1624288344
# gdb /usr/bin/perf ./core-perf-14092-6-1624288344
...
Core was generated by `perf top -p 12789'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000007f7f2692b8 in raise () from /lib/libc.so.6
[Current thread is 1 (LWP 14092)]
(gdb) #0 0x0000007f7f2692b8 in raise () from /lib/libc.so.6
#1 0x0000007f7f2579d4 in abort () from /lib/libc.so.6
#2 0x0000007f7f2a2040 in ?? () from /lib/libc.so.6
#3 0x0000007f7f2a862c in ?? () from /lib/libc.so.6
#4 0x0000007f7f2aa094 in ?? () from /lib/libc.so.6
#5 0x00000000004c46a4 in ?? ()
#6 0x000000000047a1b0 in ?? ()
#7 0x000000000048ab5c in ?? ()
#8 0x0000000000456520 in ?? ()
#9 0x000000000041ba68 in ?? ()
#10 0x000000000041d99c in ?? ()
#11 0x000000000044c5fc in ?? ()
#12 0x00000000004061c0 in ?? ()
#13 0x0000007f7f257e34 in __libc_start_main () from /lib/libc.so.6
#14 0x00000000004062ec in ?? ()
Backtrace stopped: not enough registers or memory available to unwind further
Contents of hello.cpp
#include <gtkmm.h>
void RunInMain()
{
printf("RunInMain\n");
}
void ThreadFunc()
{
printf("ThreadFunc\n");
Glib::signal_idle().connect_once(std::bind(&RunInMain));
}
int main()
{
Gtk::Main kit(0, NULL);
Gtk::Window window;
window.set_title("hello world");
Glib::Thread* pThread = Glib::Thread::create(&ThreadFunc);
kit.run(window);
pThread->join();
return(0);
}
Compile with:
g++ `pkg-config gtkmm-2.4 --cflags --libs` hello.cpp -Wno-deprecated-declarations -fsanitize=thread
This is the error from TSAN when executing the resulting a.out file:
WARNING: ThreadSanitizer: data race (pid=153699)
Write of size 8 at 0x7b5000006f90 by thread T1:
#0 memset <null> (libtsan.so.0+0x37abf)
#1 g_slice_alloc0 <null> (libglib-2.0.so.0+0x71412)
#2 sigc::pointer_functor0<void>::operator()() const <null> (a.out+0x402835)
#3 sigc::adaptor_functor<sigc::pointer_functor0<void> >::operator()() const <null> (a.out+0x402606)
#4 sigc::internal::slot_call0<void (*)(), void>::call_it(sigc::internal::slot_rep*) <null> (a.out+0x4021d0)
#5 call_thread_entry_slot /usr/include/sigc++-2.0/sigc++/functors/slot.h:535 (libglibmm-2.4.so.1+0x5d889)
Previous write of size 8 at 0x7b5000006f90 by main thread:
#0 posix_memalign <null> (libtsan.so.0+0x3061d)
#1 allocator_memalign ../glib/gslice.c:1411 (libglib-2.0.so.0+0x706b8)
#2 allocator_add_slab ../glib/gslice.c:1283 (libglib-2.0.so.0+0x706b8)
#3 slab_allocator_alloc_chunk ../glib/gslice.c:1329 (libglib-2.0.so.0+0x706b8)
#4 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
Location is heap block of size 496 at 0x7b5000006e00 allocated by main thread:
#0 posix_memalign <null> (libtsan.so.0+0x3061d)
#1 allocator_memalign ../glib/gslice.c:1411 (libglib-2.0.so.0+0x706b8)
#2 allocator_add_slab ../glib/gslice.c:1283 (libglib-2.0.so.0+0x706b8)
#3 slab_allocator_alloc_chunk ../glib/gslice.c:1329 (libglib-2.0.so.0+0x706b8)
#4 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
Thread T1 (tid=153701, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x5ec29)
#1 g_system_thread_new ../glib/gthread-posix.c:1308 (libglib-2.0.so.0+0xa0ea0)
#2 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
SUMMARY: ThreadSanitizer: data race (/lib64/libtsan.so.0+0x37abf) in memset
The code runs as expected (I get all of the prints) but I don't understand why I'm getting the TSAN data race warning. If I comment out the Glib::signal_idle().connect_once line, there is no TSAN error. From what I've read, that function is supposed to be safe to call from any thread. Is TSAN reporting a false positive here or is there a real data race?
Fedora 31 linux
g++ 10.0.1
glibmm24-2.64.2-1
gtkmm24-2.24.5-9
libtsan-10.2.1-9
From TSAN wiki:
TSAN generally requires all code to be compiled with -fsanitize=thread. If some code (e.g. dynamic libraries) is not compiled with the flag, it can lead to false positive race reports, false negative race reports and/or missed stack frames in reports depending on the nature of non-instrumented code.
If you are using glib from distribution repository (e.g.: sudo apt get install libglib2.0-dev), the number of false positive reports will depend on how the library was built - number of warnings will vary from distro to distro. In order to get proper TSAN report, one should compile all used shared libraries by hand with -fsanitize=thread. In particular glib should be compiled by hand, because it contains various thread-related APIs.
Compile glib with TSAN (for Debian 11.5 "bullseye"):
# clone TAG 2.66.8 (TAG should match glib version on the host)
git clone --depth=1 --branch=2.66.8 https://github.com/GNOME/glib.git
cd glib
CFLAGS="-O2 -g -fsanitize=thread" meson build
ninja -C build
# add TSAN-enabled glib libraries to lib search path
export LD_LIBRARY_PATH=$PWD/build/gio:$PWD/build/glib:$PWD/build/gmodule:$PWD/build/gobject:$PWD/build/gthread
Before running your project, make sure that it links with freshly compiled glib libraries (all glib libraries if used, i.e.: libglib, libgio, libgmodule, libgobject, libgthread) with ldd a.out.
I have a very simple Hello World c++ program. I am running it on Mac OS Mojave 10.14.5. Apple LLVM version 10.0.1. GNU gdb (GDB) 8.3
#include <iostream>
#include <cstdio>
using namespace std;
int main() {
printf("Hello, World!");
return 0;
}
I compile it with command g++ -g a.cpp
I run gdb as sudo gdb ./a.out.
In (gdb) prompt I type start
I get the following message, but the (gdb) prompt never returns:
Temporary breakpoint 1 at 0x100000f3f: file a.cpp, line 6.
Starting program: a.out
[New Thread 0x1203 of process 5444]
[New Thread 0xf03 of process 5444]
I cannot even close the process with control + z. I have to force a termination of the terminal to close it.
Code (m1.cpp):
#include <iostream>
using namespace std;
int main (int argc, char *argv[])
{
cout << "running m1" << endl;
return 0;
}
GDB Version: GNU gdb (GDB) 7.6.2
Built using: g++ -g m1.cpp
Command line history:
(gdb) b main
Breakpoint 1 at 0x40087b: file m1.cpp, line 6.
(gdb) r
Starting program: .../a.out
Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaaac16a0 in strcmp () from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb)
When I run without setting any breakpoints, it runs without errors.
As requested:
(gdb) bt
#0 strcmp () from /lib64/ld-linux-x86-64.so.2
#1 in check_match.12104 () from /lib64/ld-linux-x86-64.so.2
#2 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
#3 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#4 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2
#5 in dl_main () from /lib64/ld-linux-x86-64.so.2
#6 in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2
#7 in _dl_start () from /lib64/ld-linux-x86-64.so.2
#8 in _start () from /lib64/ld-linux-x86-64.so.2
#9 in ?? ()
I was able to replicate the OP's observed behavior (using the same compile and getting the same backtrace). The behavior was persistent across a range GDBs and GCCs. I noticed that the symptom goes away when I unset SHELL. In my normal environment I use tcsh (version 1.15.00). If SHELL is set, then (I believe) gdb launches using tcsh. If I unset SHELL, gdb launches using sh. This is enough for me to make forward progress. I don't have a crisp explanation for what would be different in tcsh to manifest the issue but if others have the same behavior, it may shed more light on the issue.
I checked that in my GNU gdb version 7.11.1. It worked really fine in it.
I first compiled the same program and built it using:
g++ -g m1.cpp
Then, ran the executable in the gdb as follows:
gdb -q ./a.out
And did the same things you mentioned. It worked fine.
Update your gdb, and check that again and let know.
I stumbled into a weird bug involving C++11, pthreads, and the -pg flag. It seems that my threads are getting stuck on a C++ library routine line mcount.c file when it invokes a static function in any of my classes.
Sleeping
Awakened
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7bc6148 in pthread_join (threadid=140737333020416, thread_return=0x7fffffffe4f8)
at pthread_join.c:89
89 pthread_join.c: No such file or directory.
(gdb) info threads
Id Target Id Frame
17 Thread 0x7fffef3cd700 (LWP 6152) "test.o" __mcount_internal (frompc=4198422, selfpc=4206354)
at mcount.c:72
16 Thread 0x7fffefbce700 (LWP 6151) "test.o" __mcount_internal (frompc=4211225, selfpc=4212043)
at mcount.c:72
15 Thread 0x7ffff03cf700 (LWP 6150) "test.o" __mcount_internal (frompc=4211225, selfpc=4212043)
at mcount.c:72
......
at mcount.c:72
3 Thread 0x7ffff63db700 (LWP 6138) "test.o" __mcount_internal (frompc=4206451, selfpc=4211201)
at mcount.c:72
2 Thread 0x7ffff6bdc700 (LWP 6136) "test.o" __mcount_internal (frompc=4206732, selfpc=4211201)
at mcount.c:72
* 1 Thread 0x7ffff7fd6740 (LWP 6135) "test.o" 0x00007ffff7bc6148 in pthread_join (
threadid=140737333020416, thread_return=0x7fffffffe4f8) at pthread_join.c:89
(gdb) thread 17
[Switching to thread 17 (Thread 0x7fffef3cd700 (LWP 6152))]
#0 __mcount_internal (frompc=4198422, selfpc=4206354) at mcount.c:72
72 mcount.c: No such file or directory.
(gdb) bt
#0 __mcount_internal (frompc=4198422, selfpc=4206354) at mcount.c:72
#1 0x00007ffff71d0b94 in mcount () at ../sysdeps/x86_64/_mcount.S:48
#2 0x00007ffff7ff7030 in ?? ()
#3 0x000000000000001a in ?? ()
#4 0x0000000008800191 in ?? ()
#5 0x000000000000001a in ?? ()
#6 0x00007ffff7ff7030 in ?? ()
#7 0x0000000000000005 in ?? ()
#8 0x0000000000000040 in ?? ()
#9 0x0000000000402f12 in Helper::remove (vec=0x8800191, pos=0, p=0x5) at Helpers.hpp:100
The threads should all exit after main thread prints "Awakened" but they dont, and when I interrupt the program, they are all in the mcount.c file. Which seems be called in between my call to Helper::remove and the initialization of the function variables in the Helper::Remove.
Indicated by
#9 0x0000000000402f12 in Helper::remove (vec=0x8800191, pos=0, p=0x5) at Helpers.hpp:100
which should hold the values (vec=0x7ffff7ff7030, pos=26, p=0x8800191), the last variable makes me wonder if I am some how overwriting the stack. (these values were retrieved from stack frame #10).
Line 100 in Helpers.hpp is simply the function declaration:
static bool remove(WFVector *vec, int pos, void *p){
Can anyone explain why the inclusion of the -pg flag causes threads to get stuck in the static function?
Code compiled with: g++-4.7 -DDEBUG=1 -g -pg -std=c++0x -mcx16 -m64 tester.cpp -o test.o -I /usr/include/boost -lpthread
and testing with GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
gprof is known to not support multi-threaded applications. See if this workaround solves the problem. Most people simply use another profiling tool anyways.
I personally prefer Linux's built-in perf. Searching for "gprof threads" will give plenty of results from SO, with various suggestions for profiling tools.
Removing the -pg flag fixes the error. It took me a long time to figure it out and it was only dumb luck that I figured it out. So I am posting this bug in case anyone else runs into this issue.