OCaml statically detect dependency on non-pervasives library in standard distribution - ocaml

Certain modules that ship with OCaml like Unix and Bigarray have their own .cmx and .cmxa files in ocamlopt -where (which is ~/.opam/4.03.0/lib/ocaml on my system in my current opam switch).
Is there a way to determine without compiling which source files depend on which of these "special" libraries in the standard distribution? I'm intending to consume this output later in a Makefile.
The following program example.ml
open Unix;;
Unix.system "echo hi";;
Can be compiled using ocamlfind ocamlopt -package unix -linkpkg example.ml. I'm not sure how to compile it without going through the ocamlfind wrapper.
I'm wondering if there's a way to statically detect that the unbound-in-this-file module Unix corresponds to "something" in the standard distribution and report unix.cmxa as a dependency. ocamldep does not seem to report it as a dependency by default.
ocamldep -all example.ml just reports that the various object and interfaces files that can be produced using example.ml depend on example.ml. I was hoping for either an error message complaining that ocamldep doesn't understand the Unix module or some indication that it's required to build the objects.
$ ocamldep -all example.ml
example.cmo example.cmi : example.ml
example.cmx example.o example.cmi : example.ml

I understand that your question is:
For a given module name, say Unix, how can we find the library which provides it?
Unfortunately there is no such a tool (yet).
If we limit the search space to the libraries come with the OCaml compiler itself, I would do:
$ ocamlobjinfo $HOME/.opam/4.03.0/lib/ocaml/*.cma | grep '^\(File\|Unit name\)'
This will list all the modules defined in each archive. You may or may not find the module name in the result.
It is impossible in general, since the library you seek may not be standard or may not be installed locally. You can use API search engines like ocamloscope but they never cover all the OCaml libraries ever written of course.

Though modules might be packed into libraries with arbitrary names, the module interfaces still preserve a one to one mapping between top-level module names and compiled module interface file names. So if you have an error 'Unbound module Xxx`, the you can do
find ~/.opam -iname Xxx.cmi
If you didn't find any, then it means, that such library is not installed. And currently, there is no well-established way to find out which package provides this module, you can use Google, ask people on mailing lists or discussion forums, or try to use apt-file hoping that the library is in a standard distribution.
If the search returned exactly one folder, then you're lucky, you got the package. The package may contain object files of different genera (.cmx - for native code, .cmo - for the bytecode), as well as libraries (.cma - is a collection of .cmo, .cmxa - is a collection of .cmx, and .cmxs is a dynamic version of .cmxs). The flexibility of OCaml, that is both a boon and a bane, allows any of these files to be missing. Well-mannered libraries usually provide all these files, as well as have a naming convention that package name matches with the library name. But if you're using ocamlfind and the folder has the META file, then the name of the folder is the name of the package that you need to pass to ocamlfind in order to link the libraries from this package.
If you have more than one results, then you need to use common sense to determine which of the two libraries you need to use. Alternatively, you may try to use one and another and see which one compiles.

Related

Loading OCaml modules not in the current directory

I'm writing a large OCaml project. I wrote a file foo.ml, which works perfectly. In a subdirectory of foo.ml's directory, there is a file bar.ml.
bar.ml references code in foo.ml, so its opening line is:
open Foo
This gives me an error at compile time:
Unbound module Foo.
What can I do to fix this without changing the location of foo.ml?
The easy path is to use one of OCaml build system like ocamlbuild or oasis. Another option would be jbuilder but jbuilder is quite opiniated about file organization and does not allow for the kind of subdirectory structure that you are asking for.
The more explicit path comes with a warning: OCaml build process is complicated with many moving parts that can be hard to deal with.
After this customary warning, when looking for modules, OCaml compiler first looks for module in the current compilation environment, then looks for compiled interface ".cmi" files in the directories specified by the "-I" option flags (plus the current directory and the standard library directory).
Thus in order to compile your bar.ml file, you will need to add the parent directory in the list of included directories with the -I .. option.
After all this, you will discover that during the linking phase, all object files (i.e. .cmo or .cmx) need to be listed in a topological order compatible with the dependency graph of your project.
Consequently, let me repeat my advice: use a proper build system.

Don't Link all Standard Library Modules when Compiling OCaml

I'm putting together an intro OCaml project for a CS class and part of it involves implementing list operations. I want them to be able to use Pervasives, but not List or any other standard library modules. Is there a way to set up ocamlbuild so it only links against Pervasives in the standard library?
You can use the -nostdlib option of the compilers but that will hide both Pervasives and List.
What you want is difficult to achieve since both compilation units are part of the same library archive namely stdlib.cma.
You could maybe try to compile your own copy of Pervasives and use the above flag.
I see two opportunities: either remove module directly from the OCaml standard library or hide them by overloading with a module with different (possibly empty) signature.
The first variant requires editing OCaml distribution Makefiles. With opam and is not that scary, actually, as you can patch OCaml quite easily and distribute each patched OCaml as a separate compiler. To remove module from the stdlib archive you will need to edit stdlib/Makefile.shared, stdlib/StdlibModules, and stdlib.mllib. After you've removed the unnecessary modules, you can do:
./configure
make world.opt
make install
Another option is to (ab)use the -open command line argument of ocamlc. When this option is specified with a name of a module, this module will be automatically opened in the compiled program. For example, you can write your own overlay over a standard library, that has the following interface (minimal.mli):
module List = sig end (* or whatever you want to expose *)
and then you can compile either with ocamlc -open minimal ..., or, with ocamlbuild: ocamlbuild -cflags -open,minimal ... (you can also use _tags file to pass the open flag, or write an ocamlbuild plugin).

How to determine in which .SO library is given C function?

I have this problem all the time in Linux programming. As long as all the manuals and almost all the source code for Linux are C-centric, all references to some function needs only some include <something.h> line and the function is accessible from the C/C++ code.
But I am programming in assembly language and know almost nothing about C/C++.
In order to be able to call some function, I have to import it from the corresponding .so library.
How to determine the file name of the library? It often differs from the name of the library itself and is not specified in the manuals.
For example, the name of the XLib is actually libX11.so.6. The name of the XShm extension library seems to be libXext.so.6.
Is there easy way to determine the secret real name of the library, using provided C manuals and references?
This is another not-100%-accurate method that may give you some ideas as to how you can narrow things down a bit. It doesn't exactly fit the question because it uses common linux utilities instead of man files, but it may still be helpful.
Use your distribution's package management software.
For example, on Arch Linux, if you were interested in a function in GLFW/glfw3.h, you could find out who owns that file:
$ pacman -Qo /usr/include/GLFW/glfw3.h
/usr/include/GLFW/glfw3.h is owned by glfw 3.1-1
Find out which .so files are in that package:
$ pacman -Ql glfw | grep 'so$'
glfw /usr/lib/libglfw.so
And, if needed, find the actual file that link points to:
$ readlink -f /usr/lib/libglfw.so
/usr/lib/libglfw.so.3.1
This will depend on your distribution. I believe on Ubuntu/Debian you'd use dpkg-query instead.
Edit: DevSolar points out in a comment that you can use apt-file search <header> and apt-file list <package> instead of dpkg-query -S <header> and dpkg-query -L <package>. apt-file appears to work even for packages that aren't installed (though it seems slower?).
I also noticed that (on my Ubuntu VM at least) that, e.g., libglfw-dev contains the libglfw.so symlink, while libglfw2 contains the actual libglfw.so.2 object.
Once you have a set of .so files, you can check them for whatever function you are interested in:
$ nm -D /usr/lib/libglfw.so | grep "glfwCreateWindow"
0000000000007cd0 T glfwCreateWindow
Note that I pulled this last step from a comment on the previous question and don't fully understand it. Maybe you could even skip the earlier steps and rely on nm and grep alone?
This is not a sure fire way, but it can help in many cases.
Basically, you can usually find the library name at the bottom of the man page.
Eg, man XCreateWindow says libX11 on the last line. Then you look for libX11.so and use nm or readelf to see all exported functions.
Another example, man XShm says libXext at the bottom. And so on.
UPDATE
If the function is in section (2) of the man pages, it's a system call (see man man) and is provided by glibc, which would be libc-2.??.so.
Lastly (thanks Basile), if the function does not mention the library, it is also most likely provided by glibc.
DISCLAIMER: Again this is not a 100% accurate method -- but it should help in most cases.
You can ask gcc to tell you which file it would use for linking like so:
gcc --print-file-name=libX11.so
Sample output:
/usr/lib/gcc/x86_64-linux-gnu/4.9/../../../x86_64-linux-gnu/libX11.so
This file will usually be a symlink, so you'll have to pipe it through readlink or realpath to get the actual file. For example:
readlink -f $(gcc --print-file-name=libXext.so)
Sample output:
/usr/lib/x86_64-linux-gnu/libXext.so.6.4.0
As I commented, you could use gcc to link your program, and then it should be able to accept -lX11 ; by using gcc -v instead of gcc you'll find out what is actually linked and how.
However, you have a much more significant issue than finding the lib*.so.*; most C or C++ APIs are described in header files, and these C or C++ header files also contain symbolic constants (like O_RDONLY for open(2)...) or macros (like WIFEXITED in POSIX wait ...) whose value or expansion you should manually find in header files or documentations. (Quite often, such constants are either preprocessor #define-d constants or enum values). Also, some headers -in particular in C++- contains a lot of inline-d functions (or macros)!
A possible way might be to generate some C files to find all these constants, enums, macros, inlined functions..., and/or to customize the GCC compiler (e.g. with MELT ...) to find them.
So my message is that for better or worse, the C language is deeply tied to Linux & POSIX.
You might restrict yourself to use only syscalls(2) from your assembler code. Then you won't use libX11 and you don't need any header or constant (except the ones for syscalls, starting from <asm/unistd.h>).
BTW, in 2015, coding entirely in assembler for performance reasons is a mistake. The compiler is generating better code than you reasonably can (as soon as you have more than a few hundred machine instructions). In practice, you can code in assembler with GCC by using extended asm instructions in your C functions.
Or are you building your own compiler ? Then you should have told so in your question!
Read also the Program Library HowTo & the Linux Assembly HowTo

Creating an OCaml library

I am trying to create a library that I can use in other OCaml projects, and I'm totally lost.
I'm currently using ocamlbuild which is great for spitting out executables, but I don't know how to get a library out of it.
I've discovered the -a option in ocamlopt and ocamlc but I'm not really sure how to use it. The documentation I've found (for example, here), seems to assume some preexisting knowledge. I don't even know what a .a file is. After I run that, which of the outputted files do I need to build a project that depends on this library? Do I need the mli files so that the application knows the signatures of the library code, or is that included in the output somehow? Also, it would be nice to be able to package all the files together, something similar to a .jar file for Java.
In any case, I would love for ocamlbuild to do all of this for me, since if I have to invoke ocamlopt -a I will have to either manually specify dependencies or hack a script around ocamldep -- something that ocamlbuild was supposed to fix. However, I don't know how to tell it to build a library.
I'm willing to use oasis or OPAM or something if it's necessary, but I would like to learn how to do this using just the basic tools first.
OCamlbuild has some built-in functionality for building libraries, so you can get started with just ocamlbuild foo.cma foo.cmxa (assuming foo.ml is your entry point). This will invoke ocamlopt -a and ocamlc -a for you, handling all the dependency plumbing and leaving the generated files inside _build.
That should be enough to let you compile a library and link it from another program. Since this is just a test you can simply point at the aforementioned _build with -I when compiling the program that uses the library. For real use a library should be packaged - when you get to that point you'll want to look into ocamlfind, oasis, etc.
Have a look at the ocaml.org tutorial on compiling OCaml projects. Additionally the official manual for the bytecode and native code compilers contains useful detail on producing and using the various types of files.
The documentation for ocamlbuild archives seems to cover this pretty well.
In any case, here's one way to do ocaml libraries. Let's say you have a directory called foo containing your .ml, .mli, and .mllib files. Let's say it contained bar.ml, bar.mli, baz.ml, and baz.mli. To distribute all this as one library, you'd also have a foo.mllib in that directory, whose contents are
Bar
Baz
Then to compile, do
$ ocamlbuild -use-ocamlfind foo.cma foo.cmxa
Here is an example.
Then to use your library foo, let's say you had a sibling directory called main, and main contains main.ml, _tags, myocamlbuild.ml.
myocamlbuild.ml should have the following contents:
open Ocamlbuild_plugin
open Command
let () =
dispatch (
function
| After_rules ->
ocaml_lib
~extern:true
~dir:"/path/to/foo/_build"
"foo"
| _ -> ()
)
_tags should have the following contents:
<main.{ml,native,byte}>: use_foo
Compile main.ml with
$ ocamlbuild -use-ocamlfind main.byte main.native
run with
$ ./main.byte
$ ./main.native
More information here as well: https://ocaml.org/learn/tutorials/ocamlbuild/Using_an_external_library.html

using OCaml Batteries Included as a vanilla cma

I am a bit frustrated and confused by the OCaml Batteries Included concept and the way most tutorials I could find proceed. Before I get to use "productivity" tools like GODI or replace invocations of ocamlc with ocamlfind batteries/ocamlc (which is, at this point, too magical for me) I was hoping to be able to simply use OCaml Batteries Included core set of libraries like any other library. To that end I downloaded the latest source from git (head hash: 9f94ecb) and did a make all. I noticed that I got three .cma libraries at ./_build/src/ together with 102 .cmi files in the same directory. So I assumed that compiling with the -I switch pointing to that directory and linking with one of the three .cma libraries found there would be enough without needing to "install" the Batteries or use the platform tools. To test that, I set out to produce an executable for the following simple program I found somewhere:
(* file euler001.ml *)
open BatEnum
open BatPervasives
let main () =
(1--999)
|> BatEnum.filter (fun i -> i mod 3 = 0 || i mod 5 == 0)
|> BatEnum.reduce (+)
|> BatInt.print stdout
let _ = main ()
I was able to compile it with:
ocamlc -c -I ../batteries-included/_build/src/ euler001.ml
but when I tried to link with:
ocamlc -o euler001 unix.cma nums.cma ../batteries-included/_build/src/batteries.cma euler001.cmo
I got:
File "_none_", line 1, characters 0-1:
Error: Error while linking ../batteries-included/_build/src/batteries.cma(BatBigarray):
The external function `caml_ba_reshape' is not available
The nums.cma and unix.cma I added at the command line because the linker complained about missing references to undefined global Big_int and (when that was added) to Unix. But after these two modules were added on the linker invocation I received the last message (on the missing external function 'caml_ba_reshape') which proved blocking for me. So I would like to ask:
how does one proceed in this particular case?
how does one proceed in the general case (i.e. when the linker complains about a missing external function)
is it viable to use Batteries Included in this fashion? Before I rely on platform tools I want to have the assurance that I can use the underlying artifacts (cma and cmi/mli files) with the standard OCaml compiler and linker if I run into problems.
caml_ba_reshape is, as you could guess from the name but I agree it's not obvious, a primitive of the Bigarray module. You should add bigarray.cma in your compilation command, before batteries.cma which depends on it.
There is a reason why it is advised to use ocamlfind, which is precisely used to abstract over those dependencies. I don't think you are supposed to use ocamlfind batteries/ocamlc, but rather ocamlfind ocamlc -package batteries. If you insist on using the compiler without such support, then indeed you have to compile manually -- I understand your frustration, but I hope you also understand that it is intrisic to any sufficiently sophisticated OCaml library, and that it comes only from your self-imposed constraints.
how does one proceed in the general case (i.e. when the linker complains about a missing external function
You have to know or guess where the primitive comes from. Looking at the META file provided by the library, which is used to inform ocamlfind of the dependencies, may help you. You can use the tool ocamlobjinfo to know which primitive a .cma provides, if you want to check your assumption. (Or better, use ocamlfind to spit the correct compile command, see below.)
is it viable to use Batteries Included in this fashion?
Compiling "by hand" is reasonable if you insist. Working only in the source repository, without installing the library, is not. It's easy to keep doing what you do after an install, just replace your -I ... by the chosen install path.
Before I rely on platform tools I want to have the assurance that I can use the underlying artifacts (cma and cmi/mli files) with the standard OCaml compiler and linker if I run into problems.
ocamlfind is not (only) a platform tool. It is the way to use third-party ocaml libraries, period. It should be a standard on any ocaml-using platform. That it does not come with INRIA's distribution is an historical detail.
You can ask ocamlfind to show you its invocation of the bare compilers:
% ocamlfind ocamlc -linkpkg -package batteries t.ml -o test -verbose
Effective set of compiler predicates:
pkg_unix,pkg_num.core,pkg_num,pkg_bigarray,pkg_str,pkg_batteries,autolink,byte
+ ocamlc.opt -o test -verbose -I /usr/local/lib/ocaml/3.12.1/batteries /usr/lib/ocaml/unix.cma /usr/lib/ocaml/nums.cma /usr/lib/ocaml/bigarray.cma /usr/lib/ocaml/str.cma /usr/local/lib/ocaml/3.12.1/batteries/batteries.cma t.ml
I don't want to throw stones at you. The landscape of OCaml tools, beside the minimal nutshell of what's provided by the source distribution, is quite sparse and lack a coherent point of entry. With time I've grown used to those tools and it's quite natural to use them, but I understand there is some cost of entry that we should try to lower.
PS: any advice on how to improve batteries documentation is warmly welcome. Patches to add things to the documentation or fix it are even better. batteries-devel#lists.forge.ocamlcore.org is the place to go.