My Project got crashed after 5 days of testing , when I analyze the dump file its showing as BUS Error
Here the below chunk of code i got from the backtrace
Program terminated with signal SIGBUS, Bus error.
#0 0x0000000000000531 in ?? ()
[Current thread is 1 (LWP 902)]
(gdb) bt
#0 0x0000000000000531 in ?? ()
#1 0x000000000041a294 in CUtilsTimer::forgetTimer() ()
#2 0x0000000000415160 in CEMPLinkMonitor::monitor_ethernet_link_status() ()
#3 0x0000000000413fc8 in CEMPTransport::recvEMPData(Emp_Packet*) ()
#4 0x000000000041313c in CEMPRxTransport::run() ()
#5 0x00000000004190a8 in CUtilsThread::runLoop(void*) ()
#6 0x0000007fac289fb8 in ?? () from /lib/libpthread.so.0
#7 0x0000007fa74bdc98 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I tried to find the root cause but not able get any clue to crack , Please help
Related
When I start my program in cuda-gdb, I get an output like:
[New Thread 0x7fffef8ea700 (LWP 8003)]
[New Thread 0x7fffe35b2700 (LWP 8010)]
[New Thread 0x7fffe2db1700 (LWP 8011)]
[New Thread 0x7fffe25b0700 (LWP 8012)]
I do not understand why these multiple threads are launched in the beginning. I have not launched my program in multi-threaded mode. I am using MPI, but I start one process. So, where are these threads coming from?
This does not affect my debugging process in any way. Its just that I don't understand what this means.
These threads you see are created by the CUDA runtime library, and aren't directly related to cuda-gdb itself. If you run the same code with gdb, you will also see the same messages.
If you want to see what happens what these threads are doing or where they're coming from, simply compile your code with the -g flag, set a breakpoint in your code (e.g., immediately before a CUDA kernel starts), run it, and then run the following command in the gdb console:
thread apply all backtrace
This command has the same effect of gdb's backtrace, except that it will show the backtrace for all threads created by your program.
In my case, I get the following messages after starting my program:
[New Thread 0x7fffeffb3700 (LWP 7141)]
[New Thread 0x7fffef731700 (LWP 7142)]
[New Thread 0x7fffeef30700 (LWP 7143)]
When I run the command mentioned above in my gdb console, I see the following output:
(gdb) thread apply all backtrace
Thread 4 (Thread 0x7fffeef30700 (LWP 7143)):
#0 pthread_cond_timedwait##GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x00007ffff63c19b7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6386bb7 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffeef30700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 3 (Thread 0x7fffef731700 (LWP 7142)):
#0 0x00007ffff6cc5aed in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007ffff63bf6a3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff642261e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffef731700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 2 (Thread 0x7fffeffb3700 (LWP 7141)):
#0 0x00007ffff6ccfa9f in accept4 (fd=13, addr=..., addr_len=0x7fffeffb2e18, flags=-1) at ../sysdeps/unix/sysv/linux/accept4.c:45
#1 0x00007ffff63c0556 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff63b404d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread (arg=0x7fffeffb3700) at pthread_create.c:309
#5 0x00007ffff6cce62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Thread 1 (Thread 0x7ffff7fc0740 (LWP 7136)):
#0 main () at cuda_heap.cu:66
As you can verify, all threads that have been created at the beginning match both thread addresses and LWP (Light Weight Process) IDs. You can see that all of them come from libcuda.so.1.
In cuda-gdb, you're able to see some more detailed information:
(cuda-gdb) thread apply all bt
Thread 4 (Thread 0x7fffeef30700 (LWP 10019)):
#0 0x00007ffff79c33f8 in pthread_cond_timedwait##GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff63c19b7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff6386bb7 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 3 (Thread 0x7fffef731700 (LWP 10018)):
#0 0x00007ffff6cc5aed in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff63bf6a3 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff642261e in cuVDPAUCtxCreate () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 2 (Thread 0x7fffeffb3700 (LWP 10017)):
#0 0x00007ffff6ccfa9f in accept4 () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff63c0556 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff63b404d in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff63c0f48 in cudbgApiDetach () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff79bf064 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#5 0x00007ffff6cce62d in clone () from /lib/x86_64-linux-gnu/libc.so.6
Thread 1 (Thread 0x7ffff7fc0740 (LWP 10007)):
#0 main () at cuda_heap.cu:66
i don't know what exactly it is, but I think cuda-gdb need to create multiple thread to catch the errors/exceptions like: memory violation or bank conflicts.
My app crashes sometime and I cant find the cause. My app is multithread (QThread) and use several QUdpSockets. I think it happens due to the simultaneous access to the socket, but I dont know when and where.
There is results of bt from core file:
#0 0x414596e1 in ?? ()
#1 0x412d731b in pthread_kill (thread=1649, signo=6) at signals.c:69
#2 0x412d76a0 in __pthread_raise (sig=6) at signals.c:200
#3 0x41459395 in ?? ()
#4 0x00000006 in ?? ()
#5 0x41546ff4 in ?? ()
#6 0xbd5fd8bc in ?? ()
#7 0x4145a87d in ?? ()
#8 0x00000006 in ?? ()
#9 0x00000020 in ?? ()
#10 0x00000000 in ?? ()
What is sig=6 and when it emited?
How can I determine the reason of this behavior?
How do I know which -dev libraries are missing (??? positions of the stack)?
Signal number 6 on Linux is SIGABRT - the fact that it's being raised with pthread_raise() seems to indicate that the application has directly called abort() or a failed assert().
It's likely that the missing parts of your backtrace are in the QT libraries, so try installing the debugging symbols for all of those.
I am working with OpenSSL Library. When I execute the project I am facing crash issue from this line of the source code:
m_pSslFd = SSL_new(m_pCtx);
Declaration and initialization part is correct. Execution is working fine when this library method is called first time. But it crashes while this library method is called second time.
I am giving gdb back trace for this crash
(gdb) bt
#0 0x0000003dee876285 in malloc_consolidate () from /lib64/libc.so.6
#1 0x0000003dee879415 in _int_malloc () from /lib64/libc.so.6
#2 0x0000003dee87a9a1 in malloc () from /lib64/libc.so.6
#3 0x00000032c1c6abee in CRYPTO_malloc () from /usr/lib64/libcrypto.so.10
#4 0x00000032c202986a in ssl3_new () from /usr/lib64/libssl.so.10
#5 0x00000032c203bfae in dtls1_new () from /usr/lib64/libssl.so.10
#6 0x00000032c204534c in SSL_new () from /usr/lib64/libssl.so.10
#7 0x00007ffff7882bf7 in DTLSCore::DoDTLSClientNegotiation (this=0x858940, iFd=#0x7fff635fd3bc, speer=...)at src/afg/DTLSCore.cpp:236
Any suggestion will be helpful for me. Thank You.
I have a multithreaded app that uses QThreadPool. It crashes after a random amount of time (sometimes minutes, sometimes hours...) with a segfault. I recompiled with debugging symbols and ran through GDB. Here's the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
(gdb) where
#0 0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#1 0x00007ffff724b732 in ?? () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#2 0x00007ffff51d713b in g_main_context_check ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff51d75c2 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff51d7744 in g_main_context_iteration ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff724c023 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#6 0x00007ffff71fa2cb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#7 0x00007ffff71fe33e in QCoreApplication::exec() ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#8 0x0000000000409bb9 in main (argc=1, argv=<optimized out>) at main.cpp:166
That's the complete backtrace. It references/mentions basically no code within the app itself; it all appears to be Qt library code causing the fault. Not sure what source from the app itself to include in this post since GDB does not reference anything within the app itself. Any ideas?
I have application (server) written in C++ that are crashing around few hours, looks random probably.
Worst part is i trying to debug any of core file using gdb and i see that result:
gdb --core=core.668 --symbols=selectserver
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
Core was generated by `./selectserver'.
Program terminated with signal 11, Segmentation fault.
[New process 672]
[New process 671]
[New process 670]
[New process 669]
[New process 668]
#0 0xb7866896 in ?? ()
(gdb) info threads
5 process 668 0xffffe410 in __kernel_vsyscall ()
4 process 669 0xffffe410 in __kernel_vsyscall ()
3 process 670 0xffffe410 in __kernel_vsyscall ()
2 process 671 0xffffe410 in __kernel_vsyscall ()
* 1 process 672 0xb7866896 in ?? ()
(gdb) bt
#0 0xb7866896 in ?? ()
#1 0x082da4b0 in ?? ()
#2 0xb79e4252 in ?? ()
#3 0xa2ba9014 in ?? ()
#4 0x0825e14c in ?? ()
#5 0x082da4b0 in ?? ()
#6 0xb56175e8 in ?? ()
#7 0x00000080 in ?? ()
#8 0xb5fe723f in ?? ()
#9 0xa2ba9014 in ?? ()
#10 0xa2ba9008 in ?? ()
#11 0xb7a32ff4 in ?? ()
#12 0x00000000 in ?? ()
(gdb) thread 2
[Switching to thread 2 (process 671)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (process 670)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 4
[Switching to thread 4 (process 669)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7889486 in ?? ()
#2 0x00000000 in ?? ()
(gdb) thread 5
[Switching to thread 5 (process 668)]#0 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb78b7de1 in ?? ()
#2 0x00000032 in ?? ()
#3 0xbf849ae8 in ?? ()
#4 0xbf8499e8 in ?? ()
#5 0x00000000 in ?? ()
(gdb) quit
I dont know what is going on, why addresses on stack excluding __kernel_vsyscall are so wired not maps to symbol.
What i need to do to find the problem, debug memory dump of that problem.
Thanks for help!
You need to compile the program with debugging symbols or get a separate file with debugging symbols. Pass the -g flag to gcc to enable these.
If you want to see what all of the functions are, even the ones inside library functions (for instance, standard library functions) you also need to get a version of the library with debugging symbols.
Starting gdb --core=core.668 selectserver fixed problem.