TDengine daemon core dumped - gdb

Environment
OS: CentOS 7.9_x64
Memory, CPU, current Disk Space:Memory 96G, Disk 1T
TDengine Version:TDengine-server-2.0.20.13-Linux-x64
TDengine taosd daemon coredump.
gdb output:
[New LWP 5461]
[New LWP 5499]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/taosd'.
Program terminated with signal 11, Segmentation fault.
#0 0x000056308db735cf in gcBuildQueryJson (pContext=0x7fdfdc0008c0, cmd=0x7fdfe00014a0, result=0x7fdfcc048ab0, numOfRows=682) at /home/ubuntu/workroom/jenkins/TDinternal/community/src/plugins/http/src/httpGcJson.c:154
154 /home/ubuntu/workroom/jenkins/TDinternal/community/src/plugins/http/src/httpGcJson.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64
how to resolve it?

how to resolve it?
It's a bug in TDengine-server. You don't "resolve" bugs.
You can try to figure out what the bug is (via debugging), or you can try newer version of TDengine-server (current appears to be 2.2.0.2) and hope that the particular bug you've hit has been fixed.

Related

How to get back console of a running gdb process?

I had attached gdb to a long running process(>25 hours). To manage the session, I used screen on my Ubuntu machine. I could get the session back. I got back the gdb console. But on continuing I saw my process throw SIGABRT and exit followed up by other process exit messages.
[New LWP 122]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fe8ef29ea15 in futex_abstimed_wait_cancelable (private=0, abstime=0x7ffc8c628420, expected=0, futex_word=0x7fe8e6378640) at ../sysdeps/unix/sysv/
linux/futex-internal.h:205
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) c
Continuing.
(gdb) [Thread 0x7be8d8bfd700 (LWP 48) exited]
Thread 32 "my-process" received signal SIGABRT, Aborted.
[Switching to Thread 0x7be8d2bbd700 (LWP 60)]
0x00007fe8eece4428 in __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) [Thread 0x7be8b67ff700 (LWP 119) exited]
[Thread 0x7be8b49fe700 (LWP 122) exited]
...
I am not able to get the gdb console after that. Though I see a process running when I run ps -ef
root 133 0 1 Jan14 ? 00:26:09 gdb --pid=23
How do get back the console for this gdb process? I wanted to see the backtrace.
Or is there a better way to attach gdb to a long running process ?

GDB with corefile on remote embedded device - How to get more information about backtrace?

I have a core dump from a C++ application running on an embedded imx6 board (yocto linux). I can put gdb on the box and run it in a terminal to examine the core file like so just fine:
gdb myApplication core.udpsrc256:src.1520419431.5526
I get extremely limited information, and really need to know more about what caused the core dump. All I have is a printout from the application:
(myApplication:5526): GLib-ERROR **: ../../glib-2.46.2/glib/gmem.c:100: failed to allocate 65611 bytes
./run-app.sh: line 8: 5526 Trace/breakpoint trap (core dumped) XDG_RUNTIME_DIR=/run/user/root ./myApplication
Also the core dump backtrace gives some useless stuff. I need to know more stuff up the stack that led to this frame:
#0 0x75ff1910 in raise () from /lib/libc.so.6
[Current thread is 1 (LWP 5533)]
(gdb)
(gdb)
(gdb) bt
#0 0x75ff1910 in raise () from /lib/libc.so.6
#1 0x6b169558 in g_logv () from /usr/lib/libglib-2.0.so.0
#2 0x6b169610 in g_log () from /usr/lib/libglib-2.0.so.0
#3 0x6b1681c4 in g_malloc () from /usr/lib/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Sidenote -- there is some warnings when I startup gdb:
GNU gdb (GDB) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-poky-linux-gnueabi".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from qt5qmlvideo...done.
warning: exec file is newer than core file.
[New LWP 5533]
[New LWP 5526]
[New LWP 5531]
[New LWP 5528]
[New LWP 5534]
[New LWP 21064]
[New LWP 5536]
[New LWP 21065]
[New LWP 5532]
[New LWP 5527]
[New LWP 5530]
[New LWP 5537]
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `./qt5qmlvideo -platform wayland'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x75ff1910 in raise () from /lib/libc.so.6
[Current thread is 1 (LWP 5533)]
(gdb)
Can anyone help? Do I need some of the stuff gdb warns about... or can i rebuild my application and its dependencies in some other configuration that would give more output? Thank you!
Some more notes that may matter -
This is a multithreaded application running a gstreamer pipeline. Many gstreamer plugins generate their own threads, one of which in this pipeline is 'udpsrc'. I'm wondering if it's because this failure happens in one of those threads is the reason why I can't get details, but I want to know how to get it to show the details if possible!
(1)
The
Do you need "set solib-search-path" or "set sysroot"?
is a problem. Check the path (on your device) where linux-vdso.so.1 resides, and include that in the solib-search-path. Similarly for the other shared-object libraries that your program uses. E.g. if some shared-object libraries are in /lib, some are in /usr/adowdy/lib and some are in /usr/adowdy/arm/lib, you can say:
(gdb) set solib-search-path /lib:/usr/adowdy/lib:/usr/adowdy/arm/lib
(2) The
warning: Unable to find libthread_db matching inferior's thread
library, thread debugging will not be available.
is also a problem. See the answer to this question
(3) The
failed to allocate 65611 bytes
is a clue. Are you, by any chance, trying to allocate a negative number of bytes (maybe 65536 - 65611 = -75 bytes)?
Also the core dump backtrace gives some useless stuff.
It's not entirely useless. The stack trace, and the message from the application say the same thing: your application ran out of memory (malloc failed to allocate 65611 bytes).
While a more complete stack would tell you which particular call to g_malloc failed, it's very likely to not matter in practice -- if this g_malloc didn't fail, the next one would.
You likely have a memory leak, or are simply allocating too much memory for what your system allows.
You should look into many debugging tools built for solving this exact problem.

Eigen SIGSEGV on Solaris gcc 4.9.0 with debug flags on

I am compiling the unit tests (via GoogleTests) for my program and whenever I try to compile in DEBUG mode on Solaris 11.3 with Eigen 3.2.x, I'm getting this SIGSEGV error then core dump when running the program in gdb:
(gdb) r
...
[Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)]
Program received signal SIGSEGV, Segmentation fault. [Switching to
Thread 1 (LWP 1)] 0x0830fc30 in
Eigen::internal::ploadu (
from=0xfeffe5a0) at ./eigen/Eigen/src/Core/arch/SSE/Complex.h:307 307 {
EIGEN_DEBUG_UNALIGNED_LOAD return Packet1cd(ploadu((const
double*)from)); }
(gdb)
When print from in gdb this is what I'm getting:
gdb p from: (const std::complex< double > *) 0xfeffe5a0
This SIGSEGV only on Solaris, and only when compiling with -Og. I've compiled and tested it on multiple other OSes and there are no issues whatsoever. Is this a known issue? It looks it has to do with some SSE optimizations and alignments, however I cannot pinpoint what exactly is going on.

Show errors in my gdb log report

Here is a sample session directly from gdb console
Starting program:
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x00000000025654f0 in ~F()
(gdb) bt
#0 0x00000000025654f0 in ~F()
at hello.cpp:123
(gdb) c
Continuing.
foo.cpp:122:12: runtime error: member call on null pointer of type 'Object'
Here is my .gdbinit file
set pagination off
set language c++
set print pretty on
set logging file gdb.txt
set logging on
break ~F()
info breakpoints
r
bt
c
set logging off
quit
and the gdb.txt produced looks something like this:
Breakpoint 1 at 0x25654f0
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000025654f0 <~F()>
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x00000000025654f0 in ~F()
#0 0x00000000025654f0 in ~F()
....
Breakpoint 1, 0x00000000025654f0 in ~F()
I don't see foo.cpp:122:12: runtime error: member call on null pointer of type 'Object' coming out in the log. How do I get that into my log?
Thanks
That message come from your program, not from gdb.
One way to make it work is to have your program and gdb write to the same log. The only trick here is to make sure they are both writing in "append" mode. There is a "set logging" subcommand for this, and for your program, you can run like:
(gdb) run >> log

how to view segmentation fault (core dumped)

I am unable to move forward in getting to see the core dumped.
I have got this when i typed
gdb normal_estimation core
Reading symbols from /home/sai/Documents/pcl_learning/normal_estimation/build/normal_estimation...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New LWP 11816]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `./normal_estimation'.
Program terminated with signal 11, Segmentation fault.
#0 0xb53101d6 in free () from /lib/i386-linux-gnu/libc.so.6
(gdb)
Please let me know what should i do?
Program terminated with signal 11, Segmentation fault.
#0 0xb53101d6 in free () from /lib/i386-linux-gnu/libc.so.6
The first command you need to learn is backtrace (or its synonym: where).
This will tell you which code invoked the free, which crashed.
However, it is possible that that code has nothing to do with the actual problem: any crash in free is always caused by heap corruption of some sort (freeing un-allocated memory, freeing the same memory twice, writing to memory that has already been freed, or overflowing an allocated buffer).
The most useful tools to diagnose heap corruption on Linux are Valgrind and AddressSanitizer. Chances are either of these tools will tell you exactly what you are doing wrong.