Compiling modules in different directories - ocaml

I'm trying to follow these instructions to compile a module that depends on another module which I've created: https://ocaml.org/learn/tutorials/modules.html
In my case, I have a module ~/courseFiles/chapter5/moduleA.ml and another module in ~/OCamlCommons/listMethods.ml. I have compiled listMethods.ml using ocamlopt -c listMethods.ml and this seemed to work, it produced a file listMethods.cmx.
The file moduleA.ml contains open ListMethods;;. Now with my terminal located at ~/courseFiles/chapter5 I ran ocamlopt -c moduleA.ml but the terminal returns
Error: Unbound module ListMethods
Now I can understand why it would do this, but the instructions at that site seem to indicate that what I've done is how you're supposed to do this. Presumably I need to pass in the location of either the script or executable files when compiling moduleA.ml, but I'm not sure what the syntax should be. I've tried a few guesses, and guessed about how I could do this with ocamlfind but I haven't succeeded. I tried looking for instructions on compiling modules located in different directories but didn't find anything (or anything I can make sense of anyway).

First of all, the toolkit that is shipped with the OCaml System Distribution (aka the compiler) is very versatile but quite low-level and should be seen as a foundation layer for building more high-level build systems. Therefore, learning it is quite hard and usually makes sense only if you're going to build such systems. It is much easier to learn how to use dune or oasis or ocamlbuild instead. Moreover, it will diverge your attention from what actually matters - learning the language.
With all that said, let me answer your question in full details. OCaml implements a separate compilation scheme, where each compilation unit could be built independently and then linked into a single binary. This scheme is common in C/C++ languages, and in fact, OCaml compiler toolchain is very similar to the C compiler toolchain.
When you run ocamlopt -c x.ml you're creating a compilation unit, and as a result a few files are produced, namely:
x.o - contains actually the compiled machine code
x.cmx - contains optimization data and other compiler-specific information
x.cmi - contains compiled interface to the module X.
In order to compile a module, the compiler doesn't need the code of any other modules used in that module. But what it needs is the typing information, i.e., it needs to know what is the type of List.find function, or a type of any other function that is provided by some module which is external to your module. This information is stored in cmi files, for which (compiled) header files from C/C++ is the closest counterpart. As in C/C++ the compiler is searching for them in the include search path, which by default includes the current directory and the location of the standard library, but could be extended using the -I option (the same as in C/C++). Therefore, if your module is using another module defined in a folder A you need to tell the compiler where to search for it, e.g.,
ocamlopt -I A -c x.ml
The produced objective file will not contain any code from external modules. Therefore, once you will reach the final stage of compilation - the linking phase, you have to provide the implementation, e.g., if your module X was using a module implemented in a file with relative path A/y.ml, and you have compiled it in that folder, then you need to specify again the location of the compiled implementation, e.g.,
ocamlopt -I A y.cmx x.cmx -o exe
The order is important, all modules used by a module should be specified before that module, otherwise, you will get the "No implementations provided" error.
As you can see, it is a pretty convoluted process, and it is really not worthwhile to invest your time in learning it. So, if you have an option, then use a higher-level tool to build your programs. If not sure, then choose Dune :)

Related

OCaml Compilation #use error

I am trying to compile an OCaml file with the debugger flag -g with the following line within the file -- #use "file2.ml". Why does the file not compile as long as I have the use keyword in it? What exactly does the "#use" keyword do? Is there an alternative?
The directives starting with # are supported only in the toplevel, the OCaml interpreter, also known as a read-eval-print loop.
In the toplevel, #use causes the interpreter to read OCaml code from a text file. Afterward it continues to take commands interactively.
For compiled code, you should compile separately then link your modules together. If the code in file2.ml doesn't form a full module, you'll probably want to cut/paste it directly into the main file. OCaml doesn't have compile-time sourcefile inclusion like the C family has.
Update
Here's how to compile two OCaml files the old school way. You'll find there are those who say these ways are obsolete and that you should learn to use ocamlbuild. This may be correct, but I am old school at least for now.
$ ocamlc -o program file2.ml file1.ml
You have to list the .ml files in dependency order, i.e., a file has to be listed after any files it uses. This is already one reason to use a more powerful build tool.
Note that this does not behave the same as with #use, which loads all the top-level names of file2.ml into the global namespace. With separate compilation, names from file2.ml will be contained in a module named File2. If you want to call a function f defined in file2.ml, you should call it as File2.f.
TL;DR;
Use ocamlbuild to drive your compilation process.
ocamlbuild -clfags -g file1.native
Deeper into the woods
#use is a special directive in the OCaml toplevel, in other words in OCaml interpreter. This directives are special commands that are executed by the interpreter for some effects. They are not part of the language. Moreover, different interpreters have different directives. If you want to write an OCaml program, then you need to compile, since OCaml is a compiled language.
OCaml comes with a very descent infrastructure, here is a short guideline. First of all, there're two compilers, bytecode and native. Bytecode is slow (still much faster than Python or Ruby), but portable. Native compiles to native machine code and thus very fast. If you have an access to both compilers, then use the latter. One can say, that native compilation is slower, I would say that on a modern machines the difference is negligible.
The plethora of tools is pretty large, we have ocaml for interpreter, ocamlc for bytecode compiler, ocamlopt for native compilation, ocamldep for finding dependencies between modules, ocamldoc for compiling documentation, ocamlmktop for making your own interpreters, ocamlmklib to bundle libraries, ocamlrun to run bytecode. Also we have ocamlfind to find libraries on your system. Fortunately we also have One Tool to rule them all, One Tool to find them, One Tool to bring them all and in the darkness bind them.
Here comes ocamlbuild. This tool is a part of OCaml language distribution, and it knows everything about all 9 minor tools. So you don't really need to learn them in order to start programming in OCaml. ocamlbuild will find all dependencies, link libraries, and create for you anything you want, including shared libraries or your own interpreters. For example, to link with core library, just pass -pkg core option. Moreover, ocamlbuild has _tags file, that allows you to store your arguments and save some space on your command line. So, other way to always pass -g option (not a bad idea, btw) to compiler is to add the following to your _tags file:
debug : true
And since, you're adding this -g option, I suspect that you're interested in backtraces. If that is true, then don't forget to enable backtracing recording with either calling to Printexc.record_backtrace or by setting environment variable OCAMLRUNPARAM=b.
I came up with exactly the same problem, but the solution which I have found is to use the #include directive from the cppo preprocessor. You can include the preprocessor in the -pp switch on the OCaml compiler.

Finding all libraries and header files forming a C++ executable

If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.

What are all these *.cm[a-z] files and when we need them

OCaml have various extensions for compiled files: *.a, *.cma, *.cmi, *.cmx, *.cmxa, *.cmxs (and, perhaps, this is not exhaustive list). What are they and in which cases do I need them?
If I have a library, which files do I need to ship with it. I noticed some people blindly install all *.cm[a-z] files into the archive, but is it really required?
First I suggest you read the overview section of the byte code and native code compilers as this will greatly improve you understanding of what these files are.
Now more specifically if your library is a set of modules caracterized by a set of .mli/.ml files:
A cmi file hold the compiled interfaces of your module (result of compiling a .mli file). For each module of your library that you want other people to be able to use you need to install them (i.e. the cmi define your public interface). It's also a good practice to install the mli files so that people can have a peek at them. These days you should also install the cmti files (generated using -bin-annot option) which are annotated compiled interfaces and that can be used by tools like ocp-index, odoc and odig.
cma files hold an archive of the result of byte code comipilation (cmo files) of your library. You should install them if you want people to be able to compile with your library to byte code.
cmxa and .a files hold an archive of the result of native code compilation (cmx/o files) of your library. They are the pendent of cma files but for native code. You need to install them if you want people to be able to compile with your library to native code.
cmxs are the pendant of cmxa for native dynlinking. You need to install them if you want users of your library to be able to dynamically load your library in their programs as a plugin using the Dynlink module.
cmx files are in the cmxa however there is one reason why you may want to also install them. If they can be seen at seperate compilation time along with the cmi files they allow the compiler to perform cross-module inlining. The files separately compiled that way do however become dependent on that implementation which means that they will need a recompile if the cmx changes (i.e. the implementation) even if the cmi (i.e. the interface) did not.
Note that in general it's good if you are able to compile and install all of these files (though sometimes for some reasons you may want to not install the cmx files so that you can separately compile against a cmi and be able to switch implementation without a recompile) (see the -opaque compilation flag if you need this).
On final thing to note is that in OCaml there is no proper name spacing: every toplevel module is defined in a global namespace. This means that you need to be very careful in the toplevel module names you put in library, even if you don't export their cmi. Especially avoid generic terms that could be used by other libraries, use a short prefix for your library e.g. MyLib_file rather File (and again even if File turns out to be an internal module that you have in the cma but whose cmi you don't export it could clash with other private or public File modules defined in other libraries)
https://realworldocaml.org/v1/en/html/the-compiler-backend-byte-code-and-native-code.html is a good resource you can read for your question.
for a summary:

make SCons compile everything in one gcc line?

I have a rather complex SCons script that compiles a big C++ project.
This gcc manual page says:
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
So it's better to give all my files to a single g++ invocation and let it drive the compilation however it pleases.
But SCons does not do this. it calls g++ separately for every single C++ file in the project and then links them using ld
Is there a way to make SCons do this?
The main reason to have a build system with the ability to express dependencies is to support some kind of conditional/incremental build. Otherwise you might as well just use a script with the one command you need.
That being said, the result of having gcc/g++ optimize as the manual describe is substantial. In particular if you have C++ templates you use often. Good for run-time performance, bad for recompile performance.
I suggest you try and make your own builder doing what you need. Here is another question with an inspirational answer: SCons custom builder - build with multiple files and output one file
Currently the answer is no.
Logic similar to this was developed for MSVC only.
You can see this in the man page (http://scons.org/doc/production/HTML/scons-man.html) as follows:
MSVC_BATCH When set to any true value, specifies that SCons should
batch compilation of object files when calling the Microsoft Visual
C/C++ compiler. All compilations of source files from the same source
directory that generate target files in a same output directory and
were configured in SCons using the same construction environment will
be built in a single call to the compiler. Only source files that have
changed since their object files were built will be passed to each
compiler invocation (via the $CHANGED_SOURCES construction variable).
Any compilations where the object (target) file base name (minus the
.obj) does not match the source file base name will be compiled
separately.
As always patches are welcome to add this in a more general fashion.
In general this should be left up to the program developer. Trying to compile all together in an amalgamation may introduce unintended behaviour to the program if it even compiles in the first place. Your best bet if you want this kind of optimisation without editing the source yourself is to use a compiler with inter-process optimisation like icc -ipo.
Example where an amalgamation of two .c files would not compile is for example if they use two identical static symbols with different functionality.

C++ Compile on different platforms

I am currently developing a C++ command line utility to be distributed as an open-source utility on Github. However, I want people who download the program to be able to easily compile and run the program on any platform (specifically Mac, Linux, and Windows) in as few steps as possible. Assuming only small changes have to be made to the code to make it compatible with the various platform-independent C++ compilers (g++ and win32), how can I do this? Are makefiles relevant?
My advice is, do not use make files, maintaining the files for big enougth projects is tedious and errors happen sometimes which you don't catch immediatly (because the *.o file is still there).
See this question here
Makefiles are indeed highly relevant. You may find that you need (at least) two different makefiles to compensate for the fact that you have different compilers.
It's hard to be specific about how you solve this, since it depends on how complex the project is. It may be easiest to write a script/batchfile, and just document "Use the command build.sh on Linux/Unix, and build.bat on Windows") - and then let the respective files deal with for example setting up the name of the compiler and flags, etc.
Or you can have an include into the makefile, which is determined by the architecture. Or different makefiles.
If the project is REALLY simple, it may be just enough to provide a basic makefile - but it's unlikely, as a compile of x.cpp on Linux/MacOS makes an object file is called x.o, on windows the object file is called x.obj. Libraries have different names, dll's have differnet names, and on Linux/MacOS, the final executable has no extension (typically) so it's called "myprog", where the executable under windows is called "myprog.exe".
These sorts of differences mean that the makefile needs to be different.