Need GLIBC debug information from rpmbuild of updated source - gdb

I'm working on RHEL WS 4.5.
I've obtained the glibc source rpm matching this system, opened it to get its contents using rpm2cpio.
Working in that tree, I've created a patch to mtrace.c (i want to add more stack backtrace levels) and incorporated it in the spec file and created a new set of RPMs including the debuginfo rpms.
I installed all of these on a test vm (created from the same RH base image) and can confirm that my changes are included.
But with more complex executions, I crash in mtrace.c ... but gdb can't find the debug information so I don't get line number info and I can't actually debug the failure.
Based on dates, I think I can confirm that the debug information is installed on the test system in /usr/src/debug/glibc-2.3.6/
I tried
sharedlibrary libc*
in gdb and it tells me the symbols are already loaded.
My test includes a locally built python and full symbols are found for python.
My sense is that perhaps glibc isn't being built under rpmbuild with debug enabled. I've reviewed the glibc.spec file and even built with
_enable_debug_packages
defined as 1 which looked like it might influence the result. My review of the configure scripts invoked during the rpmbuild build step didn't give me any hints.
Hmmmm .. just found /usr/lib/debug/lib/libc-2.3.4.so.debug
and /usr/lib/debug/lib/tls/i486/libc-2.3.4.so.debug
but both of these are reported as stripped by the file command.

It appears that you are installing non-matching RPMs:
/usr/src/debug/glibc-2.3.6
just found /usr/lib/debug/lib/libc-2.3.4.so.debug
There are not for the same version; there is no way they came from the same -debuginfo RPM.
both of these are reported as stripped by the file command.
These should not show as stripped. Either they were not built correctly, or your strip is busted.
Also note that you don't actually have to get all of this working to debug your problem. In the RPMBUILD directory, you should be able to find the glibc build directory, with full-debug libc.so.6. Just copy that library into your VM, and you wouldn't have to worry about the debuginfo RPM.

Try verifying that debug info for mtrace.c is indeed present. First see if the separate debug info for GLIBC knows about a compilation unit called mtrace.c:
$ eu-readelf -w /usr/lib/debug/lib64/libc-2.15.so.debug > t
$ grep mtrace t
name (strp) "mtrace.c"
name (strp) "mtrace"
1 0 0 0 mtrace.c
[10480] "mtrace.c"
[104bb] "mtrace"
[5052] symbol: mtrace, CUs: 446
Then see if GDB actually finds the source file from the glibc-debuginfo RPM:
(gdb) set pagination off
(gdb) start # pause your test program right after main()
(gdb) set logging on
Copying output to gdb.txt.
(gdb) info sources
Quit GDB then grep for mtrace in gdb.txt and you should find something like /usr/src/debug/glibc-2.15-a316c1f/malloc/mtrace.c
This works with GDB 7.4. I'm not sure the GDB version shipped with RHEL 4.5 supports all the command used above. Building upstream GDB from source is in fact easier than Python though.
When trying to add strack traces to mtrace, make sure you don't call malloc() directly or indirectly in the GLIBC malloc hooks.

Related

No source file for Netaccel_link error on running program

I have an OCaml program that worked fine on Ubuntu 16 but when recompiled and run on Ubuntu 20 I get the following error:-
$ ocamldebug ./linearizer
OCaml Debugger version 4.08.1
(ocd) r
Loading program... done.
Time: 89534
Program end.
Uncaught exception: Sys_error "Illegal seek"
(ocd) b
Time: 89533 - pc: 624888 - module Netaccel_link
No source file for Netaccel_link.
I thought this was due to missing dev libraries but:-
$ sudo apt install libocamlnet-ocaml-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libocamlnet-ocaml-dev is already the newest version (4.1.6-1build6).
0 upgraded, 0 newly installed, 0 to remove and 20 not upgraded.
What setup step am I missing on Ubuntu 20?
This looks like a regression bug in libocamlnet and you should report an issue there or, I am a bit pessimistic that you will get any response, you can try to debug the issue yourself.
The problem that you are facing has nothing to do with missing libraries (they will be reported during installation or, if the package is broken, end up in linker errors). It may result, however, from some misconfiguration of the system. If that is true, then you're lucky as you can fix it yourself.
I will give you some advice that might help you in debugging this issue. For more, please try using discuss.ocaml.org as a more suitable media (SO doesn't favor this kind of a discussion and we might get deleted by admins).
The illegal seek exception is thrown when the seek operation is applied on a non-regular file, aka ESPIPE Unix error. So check your inputs. It could be that what was previously regarded as a file in Ubuntu is now a pipe or a socket.
Try to use ltrace or strace to pinpoint the culprit e.g.,
ltrace ./linearizer
or, if it overwhelms you, try strace
strace ./linearizer
Instead of using ocamldebug you can use plain gdb. You can use gdb's interfaces to provide the path to the source code (though most likely it won't work since ocamlnet is not compiled with debug information). I believe that it will give you a more meaningful backtrace.
Instead of using the system installation try using opam. Install your dependencies with opam and try older versions as well as newer versions of the OCaml compiler. Also, try different versions of ocamlnet. Ideally, try to reproduce the environment that used to work for you.
When nothing else works, you can use objdump -d and look at the disassembly of your binary. OCaml is using a pretty readable and intuitive name mangling scheme (<module_name>__<function_name>_<uid>), so you can easily find the source code (search for <module_name>.ml file and look for the <function_name> there)
Finally, just use docker or any other container to run your application. Consider switching from ocamlnet to something more modern and supported.

loading libc's symbols into gdb

I'm debugging a binary with an older libc version than my system's one (I have libc-2.31, I'm running 2.24). I execute gdb with the LD_LIBRARY_PATH and it works like a charm, but I cannot load any symbols.
I downloaded the closest symbols file from http://archive.ubuntu.com/ubuntu/pool/main/g/glibc/libc6-dbg_2.23-0ubuntu11.2_amd64.deb, extracted it and after loading the binary into gdb, I execute:
add-symbol-file <path_to_libc-2.27.so from the deb package>
the file was loaded successfuly, but the addresses are incorrect. For example, trying to stop on a symbol such as 'main_arena' (x/40gx &main_arena) produces the following error:
0x3ebc40 <main_arena>: Cannot access memory at address 0x3ebc40
obviously this address is too low, thus I guess it's only the offset. What is my problem? maybe I need to find the exact debug file that suits my version (2.24)? because I there is no one.
Thanks!
I execute gdb with the LD_LIBRARY_PATH and it works like a charm,
It is not supposed to work, and if it happens to work today, it will likely break tomorrow.
The easiest solution is to debug inside a VM or a docker container with the desired version of GLIBC installed.
If you don't want to do that, see this answer on how to properly set things up for multiple GLIBCs on a single host.

Why is gdb refusing to load my shared objects and what is the validation operation

Main question:
In Ubuntu trying to debug an embedded application running in QNX, I am getting the following error message from gdb:
warning: Shared object "$SOLIB_PATH/libc.so.4" could not be validated and will be ignored.,
Q: What is the "validation" operation going on ?
After some research I found that the information reported by readelf -n libfoo.so contains a build-id and that this is compared against something and there could be a mismatch causing gdb to refuse to load the library. If that's the case what ELF file's build-id is the shared object's build-id compared against ? Can I find this information parsing the executable file ?
More context:
I have a .core file for this executable. I am using a version of gdb provided by QNX and making sure I use set sysroot and set solib-search-path to where I installed the QNX toolchain.
My full command to launch gdb in Ubuntu is :
$QNX_TOOLCHAIN_PATH/ntox86_64-gdb --init-eval-command 'set sysroot $SYSROOT_PATH' --init-eval-command 'set solib-search-path $SOLIB_PATH --init-eval-command 'python sys.path.append("/usr/share/gcc-8/python");' -c path-to-exe.core path-to-executable-bin
Gdb is complaining that it cannot load shared objects :
warning: Shared object "$SOLIB_PATH/libc.so.4" could not be validated and will be ignored.
The big thing here is to make sure you're using the exact same binary that is on the target (that the program runs over). This is often quite difficult with libc, especially because libc/ldqnx are sometimes "the same thing" and it confuses gdb.
The easiest way to do this is to log your mkifs output (on the linux host):
make 2>&1 | tee build-out.txt
and read through that, search for libc.so.4, and copy the binary that's being pulled onto the target into . (wherever you're running gdb) so you don't need to mess with SOLIB paths (the lazy solution).
Alternatively, scp/ftp a new libc (one that you want to use, and ideally one that you have associated symbols for) into /tmp and use LD_LIBRARY_PATH to pull that one (and DL_DEBUG=libs to confirm, if you need). Use that same libc to debug
source: I work at QNX and even we struggle with gdb + libc sometimes

compile gdb source rpm with symbols using rpmbuild

I want to make gdb rpm from gdb.spec file using rpmbuld which I can do without any problem but now in addition to that i want GDB to be complied with symbols so that when gdb is being attached to itself I should know the exact call flow and where exactly its failing.
Reason for doing this exercise is I am creating the application which will internally invoke gdb by calling gdb_init and going down failing with segmentation fault in gdb source code.
The easiest way to prevent stripping debug symbols
in rpm build is to add exit 0 at the end of %install.
The symbols are stripped by commands that are appended
to the %install scriptlet. Adding "exit 0" prevents the
commands from being run.
I don't know how you would to this with rpmbuild, but building gdb is really easy. Just get official source package, unpack it, then configure this way:
CFLAGS="-g3 -O0" path/to/gdb/source/configure --prefix path/to/your/installation/directory
make
make install
O0 is not strictly necessary, but if you want to debug a gdb crash, it will help.

How to load extra libraries for GDB?

I'm trying to debug a CUDA program, but when I'm launching gdb like so:
$ gdb -i=mi <program name>
$ r <program arguments>
I'm getting:
/home/wvxvw/Projects/cuda/exercise-1-udacity/cs344/HW2/hw:
error while loading shared libraries: libcudart.so.5.0:
cannot open shared object file: No such file or directory
Process gdb-inferior killed
(formatted for readability)
(I'm running gdb using M-xgdb) If that matters, then CUDA libraries are in the .bashrc
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
error while loading shared libraries: libcudart.so.5.0
This error has nothing to do with GDB: your executable, when run from inside GDB, can't find the library it needs.
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
GDB runs your program in a new $SHELL, so that should have worked. I wonder if there is some interaction with emacs.
In any case, this:
(gdb) set env LD_LIBRARY_PATH /usr/local/cuda/lib64
(gdb) run
should fix this problem.
Update:
as I've mentioned it before, ld path is set properly
No, it isn't. If it was, you wouldn't have the problem.
Now, I don't know why it isn't set properly. If you really want to find out, start by running GDB outside emacs (to exclude possible emacs interactions).
If the problem is still present, gdb show env, shell env, adding echo "Here" to your ~/.basrc, etc. should help you find where things are not working as you expect them.
I've had this problem as well. One way to look at it is that even if the LD_LIBRARY_PATH variable is correct when you enter show env into gdb, it may not be correct when you actually execute the program because gdb executes $SHELL -c <program> to run the program. Try this as a test, run $SHELL from the command line and then echo $LD_LIBRARY_PATH. Is it correct? If not, then you probably need to add it to your rc (.tcshrc in my case).
I had a similar problem when trying to run gdb on windows 7. I use MobaXterm to access a Linux toolbox. I installed gdb separately from http://www.gnu.org/software/gdb/ . I got it to work by making sure gdb could find the correct .dll files as mentioned by Employed Russian. If you have MobaXterm installed the .dll files should appear in your home directory in MobaXterm/slash/bin.
gdb however did not recognize the LD_LIBRARY_PATH variable. For me, it worked when I used the PATH variable instead:
(gdb) set env PATH C:\Users\Joshua\Documents\MobaXterm\slash\bin
(gdb) run
I would think using PATH instead of LD_LIBRARY_PATH might work for you provided you put the correct path to your library.
gdb is looking for a library, so why are you concerned with the include path? You may want to try to set the gdb option "solib-search-path" to point to the location of the libcudart.so.5.0 library.