Resolve library conflict in SML/NJ Compilation Manager - sml

I'm using SML/NJ 110.79, which includes support for new structures defined by the Successor ML project. Among others, the Fn structure.
As it happens, I already had an identically named structure in one of my personal project with utilities, which worked fine before 110.79.
With 110.79, for this .cm file:
group is
I get the following error, though: Error: structure Fn imported from
$SMLNJ-BASIS/( and also from
Does anyone know how to resolve this conflict through the Compilation Manager. Ideally, my Fn structure will be able to "extend" the standard Fn by just open-ing it, but projects using the sml-extras library, will not see the standard Fn structure, only my extended version.
Is this possible? Do I need to wrap/re-export the whole library in my project?

I managed to solve this by using what I believe is called an administrative library in the CM manual, §2.9.
What that means precisely is to create an auxiliary .cm file that wraps the basis library and re-exports only the symbols we're interested in.
This is the main project file.
structure Main
(* Let's say this library redefines the Fn and Ref structures *)
(* and the REF signature. *)
(* This excludes out Fn, Ref and REF from the Basis library, but *)
(* imports anything else. *)
This file imports $/ and then re-exports all of it except Fn, Ref and REF.
library($/ - (
structure Fn
structure Ref
signature REF
structure Main =
open Fn (* sml-extras's Fn *)
The solution is based on the set calculus described in the CM manual, §4 and on the EBNF grammar from Appendix A.
Another solution would have been to change sml-extras to re-export the whole $/, while shadowing the conflicting symbols. However, in the interest of modularity I decided to go with the solution detailed above.


How can I register C symbols in the R load table?

This does not work
There is various manuals on how to include C and C++ code in R without Rcpp. Following the first example in this manual and writing a c++ function
void double_me(int* x) {
// Doubles the value at the memory location pointed to by x
*x = *x + *x;
into a file doubler.cpp and compiling
R CMD SHLIB doubler.cpp
works fine. The process results in additional files doubler.o and The shared object can be loaded from R, but does not register the symbol double_me it is supposed to expose:
> dyn.load("")
> getLoadedDLLs()
base base
methods /usr/lib/R/library/methods/libs/
utils /usr/lib/R/library/utils/libs/
grDevices /usr/lib/R/library/grDevices/libs/
graphics /usr/lib/R/library/graphics/libs/
stats /usr/lib/R/library/stats/libs/
doubler /home/<user>/<path>/src/
tools /usr/lib/R/library/tools/libs/
base FALSE
methods FALSE
utils FALSE
grDevices FALSE
graphics FALSE
stats FALSE
doubler TRUE
tools FALSE
> getDLLRegisteredRoutines("doubler")
data frame with 0 columns and 0 rows
> .C("double_me", as.integer(1))
Error in .C("double_me") : C symbol name "double_me" not in load table
And evidently, there is something missing - presumably the registration of the symbol (the name of the function) in the load table.
This works, but it is hard to understand, how
According to this manual we are supposed to apply
extern "C" {
to third functions in another file. In the manual, this is a function in X_main.o, compiled alongside the loaded library ( that apparently gets registered in the load table without ever being explicitly loaded. And it can then be called with .C("X_main") nevertheless.
I am not sure how exactly this works or how it could be applied to the first example with the trivial double_me function.
Related questions
There are many questions that are likely related to this on Stackoverflow. Unfortunately, none of them has a well-explained answer. Examples are this one, this one, this one, and this one.
C++ allows for overloading, where you can have the same function with different arguments. In order to achieve this, they have to have different names in the compiled file. So when you declare void double_me(int* x) as a C++ function, it'll actually be compiled down to something with a horrible suffix on the name. See for example for details. If you specify C linkage, name mangling doesn't occur, and it gets output with a fairly normal name.
Back to your question, the function you want to call from R should probably have C linkage. That's it. You're getting confused about needing third functions, etc. Just mark double_me as extern "C" and you'll be all good.
You probably need to register the function with R, even so.....

how to search for OCaml functions by name and type

In Haskell, there are two main ways to look up information on functions.
Sites like Hoogle and Stackage. These sites provide two main types of searching:
Searching for the name of a function. For example, here is a search on Hoogle for a function called catMaybes.
This search returns the type of the catMaybes function, as well as the package and module it is defined in.
The major use-case for this type of search is when you see a function used somewhere and you want to know its type and what package it is defined in.
Searching for the type of a function. For example, here is a search on Hoogle for a function of type [Maybe a] -> [a].
This search returns multiple functions that have a similar type, the first of which is catMaybes. It also returns the package and module catMaybes is defined in.
The major use-case for this type of search occurs when you are writing code. You know the type of the function you need, and you're wondering if it is already defined somewhere. For example, you have a list of Maybes, and you want to return a list with all the Nothings removed. You know the function is going to have the type [Maybe a] -> [a].
Directly from ghci. In ghci, it is easy to get information about a function with the :info command, as long as the function is already in your environment.
For example, here is a ghci session showing how to get info about the catMaybes function. Note how you must import the Data.Maybes module first:
> import Data.Maybe
> :info catMaybes
catMaybes :: [Maybe a] -> [a] -- Defined in ‘Data.Maybe’
:info shows both the type of the catMaybes and where it is defined.
In OCaml, what sites/tools can be used to search for functions by name or type?
For example, I'm reading through Real World OCaml. I came across some code using the |> function. I wondered if there was a function <| for composing the opposite way. However, I don't know of any way of searching for a function called <|. Also, I don't know of any way of figuring out where |> is defined.
Based on the linked code above, I guess the |> would either have to be in Pervasives or somewhere in Jane Street's Core, but it would be nice to have a tool that gave the exact location.
awesome-ocaml has a section on dev tools that should be helpful.
ocamloscope (github) is sort of an Hoogle for OCaml. Search by name works well. Search by type is less good.
For local search by name, ocp-browser provides a convenient TUI.
In your editor, merlin and ocp-index can do lookup-to-definition and lookup-documentation.
There is a WIP public instance of odig here with a lot (but not all) packages. You can use odig locally too, as stated in another answer.
P.S. The function you are looking for is ##, and it's in the standard library.
The ocp-index package provides a basic facility for searching API functions, e.g.,
$ ocp-index locate '|>'
The ocp-browser is a beautiful interface to this utility.
They are all integrated with Emacs (and other popular text editors). Speaking of text editors and IDE, Merlin is a killer feature without which I can't imagine OCaml coding anymore. It is capable of jumping directly to the definition, extracting documentation and incremental typechecking.
Speaking of web-based search, there was an argot document generator that has an API search engine featuring type search, full-text search, and regular expressions. A project is somewhat abandoned and doesn't work with the latest OCaml.
We forked it, update to the latest OCaml, fixed a few bugs, and enhanced the unification procedure to get a better type search. The result can be found here.
One of the main features is a search by type manifest, that ignores such irrelevant things as parameter ordering in functions, field names, differences between record names and tuples (e.g., string * int is the same as {name : string; age : int}) and aliasing. For example, in our project there are quite a few aliases, e.g., type bil = stmt list = Stmt.t list = Stmt.t Core_kernel.Std.list = .... You can choose any name when you search (using type manifest), as the algorithm will correctly unify all aliases.
odig may be helpful. once installed, you can browse by package and by function.

Functors in OCaml: triple code duplication necessary?

I'd like to clarify one point: currently it seems to me that triple signature duplication is necessary while declaring a functor, provided we export it in the .mli file. Here is an example:
Suppose we have a functor Make, which produces a module A parametrized by SigA (simplest example I could think of). Consequently, in the .mli file we have:
module type A = sig
type a
val identity : a -> a
module type SigA = sig
type a
module Make (MA:SigA) :
A with type a := MA.a
Now I understand that we have to write an implementation in the .ml file:
module Make (MA:SigA) = struct
type a = MA.a
let identity obj = obj
So far so good, right? No! Turns out we have to copy the declaration of A and SigA verbatim into the .ml file:
module type A = sig
type a
val identity : a -> a
module type SigA = sig
type a
module Make (MA:SigA) = struct
type a = MA.a
let identity obj = obj
While I (vaguely) understand the rationale behind copying SigA (after all, it is mentioned in the source code), copying A definition seems like a completely pointless exercise to me.
I've had a brief look through the Core codebase, and they just seem to either duplicate it for small modules and for larger once they export it to the separate .mli, which is used both from .ml and .mli.
So is it just a state of affairs? Is everyone fine with copying the module signature THREE times (once in the .mli file, two times in the .ml file: declaration and the definition!!)
Currently I'm considering just ditching .mli files altogether and restricting the modules export using signatures in the .ml files.
EDIT: yes I know that I can avoid this problem by declaring the interface for A inline inside Make in the .mli file. However this doesn't help me if I want to use that interface from outside of that module.
That's because a pair of ML and MLI file acts like a structure and a corresponding signature it is matched against.
The usual way to avoid writing out the module type twice is to define it in a separate ML file. For example,
(* *)
module type A = sig
type a
module type B = sig
type b
val identity : b -> b
(* make.mli *)
module Make (A : Sig.A) : Sig.B with type b = A.a
(* *)
module Make (A : Sig.A) =
type b = A.a
let identity x = x
It is fine to leave out an MLI file in the case where it does not hide anything, like for the Sig module above.
In other cases, writing out the signature separately from the implementation is a feature, and not really duplication -- it defines the export of a module, and usually, that is a small subset of what's in the implementation.

Is there a way to print user-defined datatypes in ocaml?

I can't use print_endline because it requires a string, and I don't (think) I have any way to convert my very simple user-defined datatypes to strings. How can I check the values of variables of these datatypes?
In many cases, it's not hard to write your own string_of_ conversion routine. That's a simple alternative that doesn't require any extra libraries or non-standard OCaml extensions. For the courses I teach that use OCaml, this is often the simplest mechanism for students.
(It would be nice if there were support for a generic conversion to strings though; perhaps the OCaml deriving stuff will catch on.)
There's nothing in the base language that does this for you. There is a project named OCaml Deriving (named after a feature of Haskell) that can automatically derive print functions from type declarations. I haven't used it, but it sounds excellent.
Once you have a function for printing your type (derived or not), you can install it in the ocaml top-level. This can be handy, as the built-in top-level printing sometimes doesn't do quite what you want. To do this, use the #install-printer directive, described in Chapter 9 of the OCaml Manual.
There are third-party library functions like dump in OCaml Batteries Included or OCaml Extlib, that will generically convert any value to a string using all the runtime information it can get. But this won't be able to recover all information; for example, constructor names are lost and become just integers, so it will not look exactly the way you want. You will basically have to write your own conversion functions, or use some tool that will write them for you.
Along the lines of previous answers, ppx_sexp is a PPX for generating printers from type definitions. Here's an example of how to use it while using jbuilder as your build system, and using Base and Stdio as your stdlib.
First, the jbuild file which specifies how to do the build:
(jbuild_version 1)
((names (w))
(libraries (base stdio))
(preprocess (pps (ppx_jane ppx_driver.runner)))
And here's the code.
open Base
open Stdio
type t = { a: int; b: float * float }
[##deriving sexp]
let () =
let x = { a = 3; b = (4.5,5.6) } in
[%sexp (x : t)] |> Sexp.to_string_hum |> print_endline
And when you run it you get this output:
((a 3) (b (4.5 5.6)))
S-expression converters are present throughout Base and all the related libraries (Stdio, Core_kernel, Core, Async, Incremental, etc.), and so you can pretty much count on being able to serialize any data structure you encounter there, as well as anything you define on your own.

How can OCaml values be printed outside the toplevel?

The OCaml repl ("toplevel") has rich printing for any types, user-defined or otherwise. Is it possible to access this functionality outside the toplevel, rather than having to write an entirely custom set of value printers for one's own entire set of types?
The pretty-printing facility is part of the toplevel library. You'll find the source in toplevel/ It's understandable, considering that it needs type information: you can't just throw any value at it, the choice of pretty-printer is based on the type.
If you want to use this code in your program, you'll need to link with the toplevel library (toplevellib.cma) or compile in genprintval (which means bringing in enough bits of the type checker to analyse the type, it can get pretty big).
There is a similar facility (but not sharing the code, I think) in the debugger (debugger/ and debugger/
There are third-party libraries that you can directly link against and that provide pretty-printing facilities. Extlib's Std.dump provides a very crude facility (not based on the type). Deriving by Jeremy Yallop and Jake Donham is another approach. This Caml Weekly News item offers more suggestions.
The OCaml Batteries Included library contains the dump function in its BatPervasives module . It converts any value to a string and returns it. You can see its source code here. The output will not be identical to the toplevel, because some information is lost at runtime, e.g. abstract data type constructors will become integers.
No. As of OCaml 4.06, the compiler doesn't make type information available at runtime. It is therefore not possible to have standalone programs that nicely print any OCaml data without some compromises. The two main avenues are:
Some form of preprocessing which derives printers from type definitions. Today, the best approach might be the show plugin of ppx-deriving. This requires annotating each type definition.
Relying only on the runtime representation of values. This requires no effort from the programmer and works out-of-the-box on data produced by external libraries. However it doesn't show things like record field names or any other information that was lost during compilation. An instance of this approach is detailed below.
The function Dum.to_stdout from the dum package will take any OCaml value, including cyclic ones, and print their physical representation in a human-readable form given the data available at runtime only.
Simple things give more or less what one would expect:
# Dum.to_stdout ("Hello", 42, Some `Thing, [1;2;3]);;
("Hello" 42 (582416334) [ 1 2 3 ])
Cyclic—and in general, shared—values are shown using labels and references. This is a circular list:
# let rec cyc = 1 :: 2 :: cyc;;
# Dum.to_stdout cyc;;
#0: (1 (2 #0))
We can also look into the runtime representation of functions, modules and other things. For example, the Filename module can be inspected as follows:
# module type Filename = module type of Filename;;
# Dum.to_stdout (module Filename : Filename);;
#0: "."
#1: "/"
#2: closure (#1 #3: closure ())
#4: closure ()
closure (#4)
closure ()
closure ()
closure (#5: closure (#3))
closure (#5)
closure (#5)
closure (closure () #3 #0)
closure (closure () #3 #0)
closure (#6: closure (#2 <lazy>) #7: (#8))
closure (#6 #7)
closure (#7)
closure (#7)
#8: "/tmp"
closure (closure () "'\\''")
I know you want it outside of top level but I think it's worth mentioning how to do it in top level so that ppl looking for printing in anyway (since it seems outside top level is not trivial):
load your file in top level
#use "";;
then "call" the variable inside top level:
utop # let nada = Nothing;;
utop # nada;;
- : foo = Nothing