I have an OCaml program that worked fine on Ubuntu 16 but when recompiled and run on Ubuntu 20 I get the following error:-
$ ocamldebug ./linearizer
OCaml Debugger version 4.08.1
(ocd) r
Loading program... done.
Time: 89534
Program end.
Uncaught exception: Sys_error "Illegal seek"
(ocd) b
Time: 89533 - pc: 624888 - module Netaccel_link
No source file for Netaccel_link.
I thought this was due to missing dev libraries but:-
$ sudo apt install libocamlnet-ocaml-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libocamlnet-ocaml-dev is already the newest version (4.1.6-1build6).
0 upgraded, 0 newly installed, 0 to remove and 20 not upgraded.
What setup step am I missing on Ubuntu 20?
This looks like a regression bug in libocamlnet and you should report an issue there or, I am a bit pessimistic that you will get any response, you can try to debug the issue yourself.
The problem that you are facing has nothing to do with missing libraries (they will be reported during installation or, if the package is broken, end up in linker errors). It may result, however, from some misconfiguration of the system. If that is true, then you're lucky as you can fix it yourself.
I will give you some advice that might help you in debugging this issue. For more, please try using discuss.ocaml.org as a more suitable media (SO doesn't favor this kind of a discussion and we might get deleted by admins).
The illegal seek exception is thrown when the seek operation is applied on a non-regular file, aka ESPIPE Unix error. So check your inputs. It could be that what was previously regarded as a file in Ubuntu is now a pipe or a socket.
Try to use ltrace or strace to pinpoint the culprit e.g.,
ltrace ./linearizer
or, if it overwhelms you, try strace
strace ./linearizer
Instead of using ocamldebug you can use plain gdb. You can use gdb's interfaces to provide the path to the source code (though most likely it won't work since ocamlnet is not compiled with debug information). I believe that it will give you a more meaningful backtrace.
Instead of using the system installation try using opam. Install your dependencies with opam and try older versions as well as newer versions of the OCaml compiler. Also, try different versions of ocamlnet. Ideally, try to reproduce the environment that used to work for you.
When nothing else works, you can use objdump -d and look at the disassembly of your binary. OCaml is using a pretty readable and intuitive name mangling scheme (<module_name>__<function_name>_<uid>), so you can easily find the source code (search for <module_name>.ml file and look for the <function_name> there)
Finally, just use docker or any other container to run your application. Consider switching from ocamlnet to something more modern and supported.
Related
I recently went to try to debug a program with GDB and got the following error:
gdb: error while loading shared libraries: libncursesw.so.6: cannot open shared object file: No such file or directory
So I went investigating and tried the obvious things, ie sudo apt-get install libncursesw5 (and dev variants) and apt reports that I've already got the latest version...so next I tried reinstalling GDB, problem persists. The output of ldd with GDB confirms to me that it still doesn't know where this mythical libncursesw.so.6 file is, so I go digging around in the usr/lib/x86_64-linux-gnu folder and run ls libncu* which returns six results: libncurses.a, libncurses++.a, libncurses.so, libncurses++w.a, libncursesw.a, and libncursesw.so...but no libncursesw.so.6. I then naively attempted to just make a copy of libncursesw.so named libncursesw.so.6, to which gdb reports that this file is "too short".
In googling I can't seem to find a good explanation on how to get this file in place? Every other answer I see just suggests running sudo apt-get install libncursesw5 (or something similar) but I've already tried pretty much every variant of that I can think of. I was going to remove it and then reinstall it but when I went to do that it gave me a scary warning that I could be doing something potentially harmful to my system so I aborted that idea.
Some context that also might(?) help:
I'm running a pretty recent install of Linux Mint 19.3 Cinnamon, and this was my first time trying to run GDB on my new computer. I basically set this new computer up as a new install, just porting over my home directory and a couple of the more useful hidden . files from my old laptop...I figure this shouldn't be the reason GDB is failing/these files don't exist on the new machine but just in case I'm mentioning it.
obvious things, ie sudo apt-get install libncursesw5
You want libncursesw6, not libncursesw5.
I am trying to use ocamldebug with my project, to understand why a 3rd party lib I'm using is not behaving the way I expected.
https://ocaml.org/manual/debugger.html
The OCaml debugger is invoked by running the program ocamldebug with the name of the bytecode executable file as first argument
I have added (modes byte exe) to my dune file.
When I run dune build I can see the bytecode file output, alongside the exe, as _build/default/bin/cli.bc
When I pass this to ocamldebug I get the following error:
ocamldebug _build/default/bin/cli.bc
OCaml Debugger version 4.12.0
(ocd) r
Loading program... done.
Fatal error: debugger does not support channel locks
Lost connection with process 33035 (active process)
between time 170000 and time 180000
Restart from time 170000 and try to get closer of the problem ? (y or n)
If I choose y the console seems to hang indefinitely.
I found the source of the error here:
https://github.com/ocaml/ocaml/blob/f68acd1a618ac54790a8347fad466084f15a9a9e/runtime/debugger.c#L144
/* The code in this file does not bracket channel I/O operations with
Lock and Unlock, so fail if those are not no-ops. */
if (caml_channel_mutex_lock != NULL ||
caml_channel_mutex_unlock != NULL ||
caml_channel_mutex_unlock_exn != NULL)
caml_fatal_error("debugger does not support channel locks");
...but I don't know what might be triggering it.
My project is using cmdliner and lwt ...I think at this early point of execution it hasn't hit any lwt code though.
Is ocamldebug incompatible with cmdliner?
If that's the case then I will need to make a new entrypoint just for debugging I guess. (currently the bin/cli is the only executable artefact in my project, the code I need to debug is all under lib/s)
It looks like that the OCaml debugger is broken for your version of macOS. Please, report the issue to the OCaml issue tracker including the detailed information on your system. I can't reproduce it on my machine, but I am using a pretty old version of macOS (10.11.6) and I have the 4.12 debugger working flawlessly.
As a workaround, try using an older version of OCaml, as this channel lock test was introduced very recently you can install any version prior to 4.12,
opam switch create 4.11.0
eval $(opam env)
Then, do not forget to rebuild your project (previously installing the required dependencies),
opam install lwt cmdliner
dune build
and then you can use the debugger to your taste.
I am currently installing the HDF5 library, more precisely the hdf5-1.10.0-patch1, on Cygwin, as I want to use it with Fortran. Following the instructions from the hdfgroup website
(here is the link), I did the following:
./configure --enable-fortran
make > "out1_check.txt" 2> "warn1_check.txt" &
make check > "out2_check.txt" 2> "warn2_check.txt" &
The execution of the last command (make check) proceeds as it should, until it gets stuck. The process does not stop and something is happening (8-12% CPU are in use by sh.exe, already 39 hours of CPU time) but "out2_check.txt" looks like
Making check in src
...
[many successful checks]
...
============================
No need to test testlinks_env.sh again.
============================
============================
Testing testswmr.sh
Unfortunately, I do not have the output file from the first run of make check, but it did not contain more information on Testing testswmr.sh. There was never any error message.
So, what is this testswmr.sh, why does it get stuck and how can I finalize the installation process? Maybe I can skip the remaining checks and just proceed to make install?
Important note: an older version of HDF5 is already installed from the Cygwin repo. It does not seem to support Fortran however, so I decided to install the current version myself.
Available (and used) compilers are gcc and gfortran.
As far as I can tell, only Intel Fortran is supported on Windows. There is no Cygwin download here https://support.hdfgroup.org/HDF5/release/obtain518.html and I have never come across a report of experience for Cygwin/Fortran/HDF5.
Your options:
Use Intel Fortran
Use Linux or Mac
Sorry
I'm working on RHEL WS 4.5.
I've obtained the glibc source rpm matching this system, opened it to get its contents using rpm2cpio.
Working in that tree, I've created a patch to mtrace.c (i want to add more stack backtrace levels) and incorporated it in the spec file and created a new set of RPMs including the debuginfo rpms.
I installed all of these on a test vm (created from the same RH base image) and can confirm that my changes are included.
But with more complex executions, I crash in mtrace.c ... but gdb can't find the debug information so I don't get line number info and I can't actually debug the failure.
Based on dates, I think I can confirm that the debug information is installed on the test system in /usr/src/debug/glibc-2.3.6/
I tried
sharedlibrary libc*
in gdb and it tells me the symbols are already loaded.
My test includes a locally built python and full symbols are found for python.
My sense is that perhaps glibc isn't being built under rpmbuild with debug enabled. I've reviewed the glibc.spec file and even built with
_enable_debug_packages
defined as 1 which looked like it might influence the result. My review of the configure scripts invoked during the rpmbuild build step didn't give me any hints.
Hmmmm .. just found /usr/lib/debug/lib/libc-2.3.4.so.debug
and /usr/lib/debug/lib/tls/i486/libc-2.3.4.so.debug
but both of these are reported as stripped by the file command.
It appears that you are installing non-matching RPMs:
/usr/src/debug/glibc-2.3.6
just found /usr/lib/debug/lib/libc-2.3.4.so.debug
There are not for the same version; there is no way they came from the same -debuginfo RPM.
both of these are reported as stripped by the file command.
These should not show as stripped. Either they were not built correctly, or your strip is busted.
Also note that you don't actually have to get all of this working to debug your problem. In the RPMBUILD directory, you should be able to find the glibc build directory, with full-debug libc.so.6. Just copy that library into your VM, and you wouldn't have to worry about the debuginfo RPM.
Try verifying that debug info for mtrace.c is indeed present. First see if the separate debug info for GLIBC knows about a compilation unit called mtrace.c:
$ eu-readelf -w /usr/lib/debug/lib64/libc-2.15.so.debug > t
$ grep mtrace t
name (strp) "mtrace.c"
name (strp) "mtrace"
1 0 0 0 mtrace.c
[10480] "mtrace.c"
[104bb] "mtrace"
[5052] symbol: mtrace, CUs: 446
Then see if GDB actually finds the source file from the glibc-debuginfo RPM:
(gdb) set pagination off
(gdb) start # pause your test program right after main()
(gdb) set logging on
Copying output to gdb.txt.
(gdb) info sources
Quit GDB then grep for mtrace in gdb.txt and you should find something like /usr/src/debug/glibc-2.15-a316c1f/malloc/mtrace.c
This works with GDB 7.4. I'm not sure the GDB version shipped with RHEL 4.5 supports all the command used above. Building upstream GDB from source is in fact easier than Python though.
When trying to add strack traces to mtrace, make sure you don't call malloc() directly or indirectly in the GLIBC malloc hooks.
I'm trying to profile a C++ application, that I did not write, to get a sense for where the major computation points are. I'm not a C++ expert and even less so C++ debugging/profiling expert. I believe I am running into a (common?) problem with dynamic libraries.
I compile link to Google CPU Profiler using (OS X, G++):
env LIBS=-lprofiler ./configure
make
make install
I then run profile the installed application (jags) with:
env CPUPROFILE=./jags.prof /usr/local/bin/jags regression.cmd
pprof /usr/local/bin/jags jags.prof
Unfortunately, I get the error:
pprof /usr/local/bin/jags jags.prof Can't exec "objdump":
No such file or directory at /usr/local/bin/pprof line 2833.
objdump /System/Library/Frameworks/Accelerate.framework/Versions/A/
Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib: No such file or directory
The program dynamically links to libLAPACK.dylib. So prof does not seem to understand it (?). I thought about trying to statically link, but the documents associated with the program say that it is impossible to statically link in LAPACK or BLAS (two required libraries).
Is there a way to have the profiler ignore libLAPACK? I'm okay if it doesn't sample within libLAPACK. Or how might I get profiling to work?
This error was caused by jags being a shell script, that subsequently called profilable code.
pprof /usr/local/bin/REAL_EXEC jags.prof
fixes the problem.
I don't see a clean way to do it, but maybe there's a hacky workaround -- what happens if you hack the pprof perl script (or better a copy thereof;-), line 2834, so that instead of calling error it emits the message and then does return undef;?
If you're profiling on OSX, the Shark tool is really great as well. It's very simple to use, and has worked out of the box for me when I've tried it.