How to debug loading shared library - c++

I have a multi-process program run on MIPS CPU with uclibc and it is compiled with gcc 4.5.3.
One of the process(it is name "tv") require to linked with one shared library(libtest.so) which is also written by me. The "tv" process is written in C++ and libtest.so is in C.
I have also dump the elf header from libtest.so, it has both PIC and CPIC flag set, so I think the creation of the library is OK.
When I try to run the program, all the processes starts fine except the "tv" process. There is no error message. When I use ps to check it's status, it has become a zombie process.
I have tried the following
If I remove the libtest.so from linking process, and remove any reference to the libtest.so, the "tv" process can run without any issue.
If I remove any reference to the libtest.so BUT keep the libtest.so in the linking process, the "tv" process still does not run.
I have tried to use LD_DEBUG=all to debug, but it does not work in my board as there is no valid output.
So I am guess there is something went wrong when ld try to load the libtest.so when "tv" process is starting. But I don't know how to debug? How should I find out if the which part of libtest.so is causing the problem?
Any suggestion is welcome. Thanks in advance.

Make sure this isn't related to missing out on the extern C declaration for your API which are to be invoked as C functions.

You have error in load process. So write simplest application which loads your library and unloads it immediatelly and debug it.

Related

gdb: how to learn which shared library loaded a shared library in question

I need to get the list of shared libraries used by an app in runtime. Most of them can be listed by ldd, but some can be seen only with gdb -p <pid> and by running the gdb command info sharedlib.
It would really help, if I could learn in some way: for a chosen library (in the list, output by info sharedlib), which library (of the same list) had caused to load it. Is there any way to learn it in gdb or in some other way? Because sometimes, I see a loaded library in the list and cannot get why it is there and which (probably, previously loaded in memory) library loaded it.
UPDATE:
I attach a screen shot of gdb showing the info that I need. I used breakpoint on dlopen, as it was suggested in comments and in the answer. The command x/s $rdi prints out the first argument of dlopen, as by Linux System V ABI, i.e. it prints the name if the library, about which I want to learn who loaded it (libdebuginfod.so.1). I put it here just for those who are curious. In my case, it can be seen, that the libdebuginfod.so.1 was loaded by libdw.so.1 (as shown by bt command).
Is there any way to learn it in gdb or in some other way?
There are a few ways.
You can run the program with env LD_DEBUG=files /path/to/exe.
This will produce output similar to:
LD_DEBUG=files /bin/date
76042:
76042: file=libc.so.6 [0]; needed by /bin/date [0]
It is the needed by part that you most care about.
You could also use GDB and use set set stop-on-solib-events 1. This will produce output similar to:
Stopped due to shared library event:
Inferior loaded /lib/x86_64-linux-gnu/libc.so.6
At that point, you could execute where command and observe which dlopen() call caused the new library to be loaded.
You could also set a breakpoint on dlopen() and do the same.
The stop-on-solib-events may be better if your executable repeatedly dlopen()s the same library -- the library set will not change when you dlopen() the same library again, and you'll stop when you don't care to stop. Setting stop-on-solib-events avoids such unnecessary stops.

is there a solution to solve "Someone allocated physical memory at VA 0x400000...0 without creating a VMA"?

I'm trying to use cross-compiler to compile a c file to a RISCV executable program which is simply to print the thread id.
the program uses pthrad.h and print the thread id in a for circle. there are just simply thread_creat(), thread_deatch()
I use gcc to complie the file to a X86 program and it runs well.
However, when I use riscv-linux-gnu-gcc to compile the file to a RISCV program and the gem5 where the program runs will report the error as follows
So I tried to use --debug-flags= to make clear where it goes wrong, but I get a message that I can't understand
for this, I have checked there is a ld.so.cache file in my linux system.
Though I know what is VMA, there is not document or hint telling me how to manage this problem such as how to creat VMA in gem5.
I hope to reap some help from you.
Thank you!

Why is gdb refusing to load my shared objects and what is the validation operation

Main question:
In Ubuntu trying to debug an embedded application running in QNX, I am getting the following error message from gdb:
warning: Shared object "$SOLIB_PATH/libc.so.4" could not be validated and will be ignored.,
Q: What is the "validation" operation going on ?
After some research I found that the information reported by readelf -n libfoo.so contains a build-id and that this is compared against something and there could be a mismatch causing gdb to refuse to load the library. If that's the case what ELF file's build-id is the shared object's build-id compared against ? Can I find this information parsing the executable file ?
More context:
I have a .core file for this executable. I am using a version of gdb provided by QNX and making sure I use set sysroot and set solib-search-path to where I installed the QNX toolchain.
My full command to launch gdb in Ubuntu is :
$QNX_TOOLCHAIN_PATH/ntox86_64-gdb --init-eval-command 'set sysroot $SYSROOT_PATH' --init-eval-command 'set solib-search-path $SOLIB_PATH --init-eval-command 'python sys.path.append("/usr/share/gcc-8/python");' -c path-to-exe.core path-to-executable-bin
Gdb is complaining that it cannot load shared objects :
warning: Shared object "$SOLIB_PATH/libc.so.4" could not be validated and will be ignored.
The big thing here is to make sure you're using the exact same binary that is on the target (that the program runs over). This is often quite difficult with libc, especially because libc/ldqnx are sometimes "the same thing" and it confuses gdb.
The easiest way to do this is to log your mkifs output (on the linux host):
make 2>&1 | tee build-out.txt
and read through that, search for libc.so.4, and copy the binary that's being pulled onto the target into . (wherever you're running gdb) so you don't need to mess with SOLIB paths (the lazy solution).
Alternatively, scp/ftp a new libc (one that you want to use, and ideally one that you have associated symbols for) into /tmp and use LD_LIBRARY_PATH to pull that one (and DL_DEBUG=libs to confirm, if you need). Use that same libc to debug
source: I work at QNX and even we struggle with gdb + libc sometimes

Ocamlopt doesn’t produce any output, only an error code

I’m trying to call into a bulky C++ library from OCaml, and I’m having trouble with ocamlopt, which sitently fails with error code 2.
I’m doing the whole dance with putting up a C interface, and I can get it to work in general, but as soon as I reference this library, build breaks.
Is there some way to know what exactly is failing? I tried the -verbose flag, but it just prints the commandline arguments (which are quite long).
Would you have any tips as to how to investigate a silent failure like this?
TL;DR; check that you have enough memory and/or disk space.
Something like this could happen when ocamlopt is either killed by a signal or runs out the memory (or both), check the dmesg output, look for OOM messages from the kernel, also use htop to get the idea on the memory footprint.
Also, since this happens when you're trying to link with the C++ library, it is most likey that it is the ld process which is failing (again, most likely with OOM), as ocamlopt uses the system linker.
In case anyone else runs into this again: the problem was that there were too many -ccopt and -cclib arguments getting passed in by the build driver. When I started including a C++ library with lots of other dependencies, we seemed to have reached the breaking point.
The solution was to change the build driver's OCaml compiler and linker rules to write all the compiler and linker args into files so they could all be passed in as a single -ccopt #<compiler.args> or -cclib #<linker.args> argument. Both gcc and ld support the #file command line option.
GitHub issue: ocamlopt lets the compiler/linker silently fail if too many -ccopt or -cclib arguments are passed in

Why are there no debug symbols in my vmlinux when using gdb with /proc/kcore?

I've configure all CONFIG_DEBUG_ related options to y,but when I try to debug the kernel,it says no debug symbols found:
gdb /usr/src/linux-2.6.32.9/vmlinux /proc/kcore
Reading symbols from /usr/src/linux-2.6.32.9/vmlinux...(no debugging symbols found)...done.
Why?
Here is my best guess so far: I don't know, and it doesn't matter.
I don't know why GDB is printing the message "(no debugging symbols found)". I've actually seen this when building my own kernels. I configure a kernel to use debug symbols, but GDB still prints this message when it looks at the kernel image. I never bothered to look into it, because my image can still be debugged fine. Despite the message, GDB can still disassemble functions, add breakpoints, look up symbols, and single-step through functions. I never noticed a lack of debugging functionality. I'm guessing that the same thing is happening to you.
Edit: Based on the your comments to the question, it looks like you were searching for the wrong symbol with your debugger. System call handlers start with a prefix of sys_, but you can't tell from looking at the code. The macro SYSCALL_DEFINE4(ptrace, ...) just ends up declaring the function as asmlinkage long sys_ptrace(...), although it does some other crazy stuff if you have ftrace enabled.
make menuconfig->kernel hacking->[]Kernel debugging->[]Compile the kernel with debug info(CONFIG_DEBUG_INFO)
It's also possible when you package your vmlinuz image, the debug symbols were stripped (when using make-kpkg to build deb package for linux kernel). So you have to use the built vmlinux file under your linux source tree to have those debug symbols.
Add -g to the CFLAGS variable in the kernel Makefile
I might be wrong, but I thought you would have to install the debuginfo package for your kernel to get symbols