How can OCaml values be printed outside the toplevel? - ocaml

The OCaml repl ("toplevel") has rich printing for any types, user-defined or otherwise. Is it possible to access this functionality outside the toplevel, rather than having to write an entirely custom set of value printers for one's own entire set of types?

The pretty-printing facility is part of the toplevel library. You'll find the source in toplevel/genprintval.ml. It's understandable, considering that it needs type information: you can't just throw any value at it, the choice of pretty-printer is based on the type.
If you want to use this code in your program, you'll need to link with the toplevel library (toplevellib.cma) or compile in genprintval (which means bringing in enough bits of the type checker to analyse the type, it can get pretty big).
There is a similar facility (but not sharing the code, I think) in the debugger (debugger/printval.ml and debugger/loadprinter.ml).
There are third-party libraries that you can directly link against and that provide pretty-printing facilities. Extlib's Std.dump provides a very crude facility (not based on the type). Deriving by Jeremy Yallop and Jake Donham is another approach. This Caml Weekly News item offers more suggestions.

The OCaml Batteries Included library contains the dump function in its BatPervasives module . It converts any value to a string and returns it. You can see its source code here. The output will not be identical to the toplevel, because some information is lost at runtime, e.g. abstract data type constructors will become integers.

No. As of OCaml 4.06, the compiler doesn't make type information available at runtime. It is therefore not possible to have standalone programs that nicely print any OCaml data without some compromises. The two main avenues are:
Some form of preprocessing which derives printers from type definitions. Today, the best approach might be the show plugin of ppx-deriving. This requires annotating each type definition.
Relying only on the runtime representation of values. This requires no effort from the programmer and works out-of-the-box on data produced by external libraries. However it doesn't show things like record field names or any other information that was lost during compilation. An instance of this approach is detailed below.
The function Dum.to_stdout from the dum package will take any OCaml value, including cyclic ones, and print their physical representation in a human-readable form given the data available at runtime only.
Simple things give more or less what one would expect:
# Dum.to_stdout ("Hello", 42, Some `Thing, [1;2;3]);;
("Hello" 42 (582416334) [ 1 2 3 ])
Cyclic—and in general, shared—values are shown using labels and references. This is a circular list:
# let rec cyc = 1 :: 2 :: cyc;;
# Dum.to_stdout cyc;;
#0: (1 (2 #0))
We can also look into the runtime representation of functions, modules and other things. For example, the Filename module can be inspected as follows:
# module type Filename = module type of Filename;;
# Dum.to_stdout (module Filename : Filename);;
(
#0: "."
".."
#1: "/"
#2: closure (#1 #3: closure ())
#4: closure ()
closure (#4)
closure ()
closure ()
closure (#5: closure (#3))
closure (#5)
closure (#5)
closure (closure () #3 #0)
closure (closure () #3 #0)
closure (#6: closure (#2 <lazy>) #7: (#8))
closure (#6 #7)
closure (#7)
closure (#7)
#8: "/tmp"
closure (closure () "'\\''")
)

I know you want it outside of top level but I think it's worth mentioning how to do it in top level so that ppl looking for printing in anyway (since it seems outside top level is not trivial):
load your file in top level
utop
#use "datatypes.ml";;
then "call" the variable inside top level:
utop # let nada = Nothing;;
utop # nada;;
- : foo = Nothing
ref: https://discuss.ocaml.org/t/how-does-one-print-any-type/4362/16?u=brando90

Related

how to search for OCaml functions by name and type

In Haskell, there are two main ways to look up information on functions.
Sites like Hoogle and Stackage. These sites provide two main types of searching:
Searching for the name of a function. For example, here is a search on Hoogle for a function called catMaybes.
This search returns the type of the catMaybes function, as well as the package and module it is defined in.
The major use-case for this type of search is when you see a function used somewhere and you want to know its type and what package it is defined in.
Searching for the type of a function. For example, here is a search on Hoogle for a function of type [Maybe a] -> [a].
This search returns multiple functions that have a similar type, the first of which is catMaybes. It also returns the package and module catMaybes is defined in.
The major use-case for this type of search occurs when you are writing code. You know the type of the function you need, and you're wondering if it is already defined somewhere. For example, you have a list of Maybes, and you want to return a list with all the Nothings removed. You know the function is going to have the type [Maybe a] -> [a].
Directly from ghci. In ghci, it is easy to get information about a function with the :info command, as long as the function is already in your environment.
For example, here is a ghci session showing how to get info about the catMaybes function. Note how you must import the Data.Maybes module first:
> import Data.Maybe
> :info catMaybes
catMaybes :: [Maybe a] -> [a] -- Defined in ‘Data.Maybe’
>
:info shows both the type of the catMaybes and where it is defined.
In OCaml, what sites/tools can be used to search for functions by name or type?
For example, I'm reading through Real World OCaml. I came across some code using the |> function. I wondered if there was a function <| for composing the opposite way. However, I don't know of any way of searching for a function called <|. Also, I don't know of any way of figuring out where |> is defined.
Based on the linked code above, I guess the |> would either have to be in Pervasives or somewhere in Jane Street's Core, but it would be nice to have a tool that gave the exact location.
awesome-ocaml has a section on dev tools that should be helpful.
ocamloscope (github) is sort of an Hoogle for OCaml. Search by name works well. Search by type is less good.
For local search by name, ocp-browser provides a convenient TUI.
In your editor, merlin and ocp-index can do lookup-to-definition and lookup-documentation.
There is a WIP public instance of odig here with a lot (but not all) packages. You can use odig locally too, as stated in another answer.
P.S. The function you are looking for is ##, and it's in the standard library.
The ocp-index package provides a basic facility for searching API functions, e.g.,
$ ocp-index locate '|>'
/home/ivg/.opam/devel/build/ocaml/stdlib/pervasives.ml:39:0
The ocp-browser is a beautiful interface to this utility.
They are all integrated with Emacs (and other popular text editors). Speaking of text editors and IDE, Merlin is a killer feature without which I can't imagine OCaml coding anymore. It is capable of jumping directly to the definition, extracting documentation and incremental typechecking.
Speaking of web-based search, there was an argot document generator that has an API search engine featuring type search, full-text search, and regular expressions. A project is somewhat abandoned and doesn't work with the latest OCaml.
We forked it, update to the latest OCaml, fixed a few bugs, and enhanced the unification procedure to get a better type search. The result can be found here.
One of the main features is a search by type manifest, that ignores such irrelevant things as parameter ordering in functions, field names, differences between record names and tuples (e.g., string * int is the same as {name : string; age : int}) and aliasing. For example, in our project there are quite a few aliases, e.g., type bil = stmt list = Stmt.t list = Stmt.t Core_kernel.Std.list = .... You can choose any name when you search (using type manifest), as the algorithm will correctly unify all aliases.
odig may be helpful. once installed, you can browse by package and by function.

C++ function slash operator lambda expression

in fact I don't know how to be very precise.
Today, I browsed the following page:
http://siliconframework.org/docs/hello_world.html
I found the following syntax:
GET / _hello = [] () { return D(_message = "Hello world."); }
I found "GET" can be a function by lambda expression, but I cannot figure out what does "/" and "_hello" mean here, and how they connect to something meaningful.
Also, what is that "_message = "?
BTW, my primary C++ knowledge is before C++11.
I googled quite a bit.
Could any one kindly give an explanation?
This library uses what is known as an embedded Domain Specific Language, where it warps C++ and preprocessor syntax in ways that allow a seemingly different language to be just another part of a C++ program.
In short, magic.
The first bit of magic lies in:
iod_define_symbol(hello)
which is a macro that generates the identifier _hello of type _hello_t.
It also creates a _hello_t type which inherites from a CRTP helper called iod::symbol<_hello_t>.
_hello_t overrides various operators (including operator= and operator/) in ways that they don't do what you'd normally expect C++ objects to behave.
GET / _hello = [] () { return D(_message = "Hello world."); }
so this calls
operator=(
operator/( GET, _hello ),
/* lambda_goes_here */
);
similarly in the lambda:
D(_message = "Hello world.");
is
D( operator=(_message, "Hello world.") );
operator/ and operator= can do nearly anything.
In the D case, = doesn't do any assigning -- instead, it builds a structure that basically says "the field called "message" is assigned the value "Hello world.".
_message knows it is called "message" because it was generated by a macro iod_define_symbol(message) where they took the string message and stored it with the type _message_t, and created the variable _message which is an instance of that type.
D takes an number of such key/value pairs and bundles them together.
The lambda returns this bundle.
So [] () { return D(_message = "Hello world."); } is a lambda that returns a bundle of key-value pair attachments, written in a strange way.
We then invoke operator= with GET/_hello on the left hand side.
GET is another global object with operator/ overloaded on it. I haven't tracked it down. Suppose it is of type iod::get_t (I made up that name: again, I haven't looked up what type it is, and it doesn't really matter)
Then iod::get_t::operator/(iod::symbol<T> const&) is overloaded to generate yet another helper type. This type gets the T's name (in this case "hello"), and waits for it to be assigned to by a lambda.
When assigned to, it doesn't do what you expect. Instead, it goes off and builds an association between "hello" and invoking that lambda, where that lambda is expected to return a set of key-value pairs generated by D.
We then pass one or more such associations to http_api, which gathers up those bundles and builds the data required to run a web server with those queries and those responses, possibly including flags saying "I am going to be an http server".
sl::mhd_json_serve then takes that data, and a port number, and actually runs a web server.
All of this is a bunch of layers of abstraction to make some reflection easier. The structures generated both have C++ identifiers, and similar strings. The similar strings are exposed in them, and when the json serialization (or deserialization) code is generated, those strings are used to read/write the json values.
The macros merely exist to make writing the boilerplate easier.
Techniques that might be helpful to read on further include "expression templates", "reflection", "CRTP", embedded "Domain Specific Language"s if you want to learn about what is going on here.
Some of the above contains minor "lies told to children" -- in particular, the operator syntax doesn't work quite like I implied. (a/b is not equivalent to operator/(a,b), in that the second won't call member operator /. Understanding that they are just functions is what I intend, not that the syntax is the same.)
#mattheiuG (the author of this framework) has shared these slides in a comment below this post that further explains D and the _message tokens and the framework.
It's not standard C++ syntax, it's framework specific instead. The elements prefixed with an underscore (_hello, _message etc) are used with a symbol definition generator that runs and creates the necessary definitions prior to compilation.
There's some more information on it on the end of this page: http://siliconframework.org/docs/symbols.html. Qt does a similar thing with its moc tool.

How can I count number of times an overloaded operator was used in a code base with particular type of operands

I have a templated class SafeInt<T> (By Microsoft).
This class in theory can be used in place of a POD integer type and can detect any integer overflows during arithmetic operations.
For this class I wrote some custom templatized overloaded arithmetic operator (+, -, *, /) functions whose both arguments are objects of SafeInt<T>.
I typedef'd all my integer types to SafeInt class type.
I want to search my codebase for instances of the said binary operators where both operands are of type SafeInt.
Some of the ways I could think of
String search using regex and weed through the code to detect operator usage instances where both operands are SafeInt objects.
Write a clang tool and process the AST to do this searching (I am yet to learn how to write such a tool.)
Somehow add a counter to count the number of times the custom overloaded operator is instantiated. I spent a lot of time trying this but doesn't seem to work.
Can anyone suggest a better way?
Please let me know if I need to clarify anything.
Thanks.
Short answer
You can do this using the clang-query command:
$ clang-query \
-c='m cxxOperatorCallExpr(callee(functionDecl(hasName("operator+"))), hasArgument(0, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))), hasArgument(1, expr(hasType(cxxRecordDecl(hasName("SafeInt"))))))' \
use-si.cc --
Match #1:
/home/scott/wrk/learn/clang/clang-query1/use-si.cc:10:3: note: "root" binds here
x + y; // reported
^~~~~
1 match.
What is clang-query?
clang-query is a utility intended to facilitate writing clang-tidy checks. In particular it understands the language of AST Matchers and can be used to interactively explore what is matched by a given match expression. However, as shown here, it can also be used non-interactively to look for arbitrary AST tree patterns.
The blog post Exploring Clang Tooling Part 2: Examining the Clang AST with clang-query by Stephen Kelly provides a nice introduction to using clang-query.
The clang-query program is included in the pre-built LLVM binaries, or it can be built from source as described in the AST Matchers Tutorial.
How does the above command work?
The -c argument provides a command to run non-interactively. With whitespace added, the command is:
m // Match (and report) every
cxxOperatorCallExpr( // operator function call
callee(functionDecl( // where the callee
hasName("operator+"))), // is "operator+", and
hasArgument(0, // where the first argument
expr(hasType(cxxRecordDecl( // is a class type
hasName("SafeInt"))))), // called "SafeInt",
hasArgument(1, // and the second argument
expr(hasType(cxxRecordDecl( // is also a class type
hasName("SafeInt")))))) // called "SafeInt".
The command line ends with use-si.cc --, meaning to analyze use-si.cc and there are no extra compiler flags needed by clang to interpret it.
The clang-query command line has the same basic structure as that of clang-tidy, including the ability to pass -p compile_commands.json to scan many files at once, possibly with different compiler options per file.
Example input
For completeness, the input I used to test my matcher is use-si.cc:
// use-si.cc
#include "SafeInt.hpp" // SafeInt
void f1()
{
SafeInt<int> x(2);
SafeInt<int> y(3);
x + y; // reported
x + 2; // not reported
2 + x; // not reported
}
where SafeInt.hpp comes from https://github.com/dcleblanc/SafeInt , the repo named on the Microsoft SafeInt page.
To do this right, you clearly have to be able to identify individual uses of the operator which overload to a specific operator definition. Fundamentally, you need what the front end of a C++ compiler does: parsing and name resolution (including the overloads).
Obviously GCC and Clang have this basic capability. But you want to track/display all uses of the specific operator. You can probably bend Clang (or GCC, harder) to give you this information on a file-by-file basis.
Our DMS Software Reengineering Toolkit with its C++ Front End can be used for this, too.
DMS provides the generic parsing and symbol table support machinery; the C++ front end specializes DMS to handle C++ with full, accurate name resolution including overloads, for both GCC5 and MSVS2015. Its symbol table actually collects, for each declaration in a scope, the point of the declaration, and the list of uses of that declaration in terms of accurate source positions. The symbol scopes include an entry for each (overloaded) operator valid in the scope. You could just
go to the desired symbol table entry and enumerate/count the list of references to get a raw count. There are standard APIs for this available via DMS.
The same kind of symbol scope/definition/uses information is used by our Java Source Browser to build an HTML-based JavaDoc-like display with full HTML linkages between symbol declarations and uses. So for any symbol declaration, you can easily see the uses.
The C++ front end has a similar HTMLizer that operates on C++ source code. It isn't as mature/pretty, but it is robust. It presently doesn't show all the uses of a declared symbol, but that would be a pretty straightforward change to make to it. (I don't have a publicly visible instance of it. Contact me through my bio and I can send you a sample).

Resolve library conflict in SML/NJ Compilation Manager

I'm using SML/NJ 110.79, which includes support for new structures defined by the Successor ML project. Among others, the Fn structure.
As it happens, I already had an identically named structure in one of my personal project with utilities, which worked fine before 110.79.
With 110.79, for this .cm file:
group is
$/basis.cm
$SMACKAGE/sml-extras/v0.1.0/sources.sml.cm
I get the following error, though:
sources.cm:3.3-3.45 Error: structure Fn imported from
$SMLNJ-BASIS/(basis.cm):basis-common.cm#155252(fn.sml) and also from
$SMACKAGE/sml-extras/v0.1.0/(sources.sml.cm):src/fn.sml
Does anyone know how to resolve this conflict through the Compilation Manager. Ideally, my Fn structure will be able to "extend" the standard Fn by just open-ing it, but projects using the sml-extras library, will not see the standard Fn structure, only my extended version.
Is this possible? Do I need to wrap/re-export the whole basis.cm library in my sml-extras.cm project?
I managed to solve this by using what I believe is called an administrative library in the CM manual, §2.9.
What that means precisely is to create an auxiliary .cm file that wraps the basis library and re-exports only the symbols we're interested in.
sources.cm
This is the main project file.
library
structure Main
is
(* Let's say this library redefines the Fn and Ref structures *)
(* and the REF signature. *)
$SMACKAGE/sml-extras/v0.1.0/sources.sml.cm
(* This excludes out Fn, Ref and REF from the Basis library, but *)
(* imports anything else. *)
amended-basis.cm
main.sml
amended-basis.cm
This file imports $/basis.cm and then re-exports all of it except Fn, Ref and REF.
group
library($/basis.cm) - (
structure Fn
structure Ref
signature REF
)
is
$/basis.cm
main.sml
structure Main =
struct
open Fn (* sml-extras's Fn *)
end
The solution is based on the set calculus described in the CM manual, §4 and on the EBNF grammar from Appendix A.
Another solution would have been to change sml-extras to re-export the whole $/basis.cm, while shadowing the conflicting symbols. However, in the interest of modularity I decided to go with the solution detailed above.

Is there a way to print user-defined datatypes in ocaml?

I can't use print_endline because it requires a string, and I don't (think) I have any way to convert my very simple user-defined datatypes to strings. How can I check the values of variables of these datatypes?
In many cases, it's not hard to write your own string_of_ conversion routine. That's a simple alternative that doesn't require any extra libraries or non-standard OCaml extensions. For the courses I teach that use OCaml, this is often the simplest mechanism for students.
(It would be nice if there were support for a generic conversion to strings though; perhaps the OCaml deriving stuff will catch on.)
There's nothing in the base language that does this for you. There is a project named OCaml Deriving (named after a feature of Haskell) that can automatically derive print functions from type declarations. I haven't used it, but it sounds excellent.
http://code.google.com/p/deriving/
Once you have a function for printing your type (derived or not), you can install it in the ocaml top-level. This can be handy, as the built-in top-level printing sometimes doesn't do quite what you want. To do this, use the #install-printer directive, described in Chapter 9 of the OCaml Manual.
There are third-party library functions like dump in OCaml Batteries Included or OCaml Extlib, that will generically convert any value to a string using all the runtime information it can get. But this won't be able to recover all information; for example, constructor names are lost and become just integers, so it will not look exactly the way you want. You will basically have to write your own conversion functions, or use some tool that will write them for you.
Along the lines of previous answers, ppx_sexp is a PPX for generating printers from type definitions. Here's an example of how to use it while using jbuilder as your build system, and using Base and Stdio as your stdlib.
First, the jbuild file which specifies how to do the build:
(jbuild_version 1)
(executables
((names (w))
(libraries (base stdio))
(preprocess (pps (ppx_jane ppx_driver.runner)))
))
And here's the code.
open Base
open Stdio
type t = { a: int; b: float * float }
[##deriving sexp]
let () =
let x = { a = 3; b = (4.5,5.6) } in
[%sexp (x : t)] |> Sexp.to_string_hum |> print_endline
And when you run it you get this output:
((a 3) (b (4.5 5.6)))
S-expression converters are present throughout Base and all the related libraries (Stdio, Core_kernel, Core, Async, Incremental, etc.), and so you can pretty much count on being able to serialize any data structure you encounter there, as well as anything you define on your own.