Implement and represent polyadic operations - ocaml

I'm not sure I even know how to ask this question .. In implementing a compiler I would like to allow the client to specify, say, folds on tuples. I have provided a way to curry and uncurry a function, but only because I wrote a binary operator in Ocaml and folded it over the term and type representations. The user could not write this function.
In the macro processor, the user can write this function because tuples are lists.
For curried functions, the user can easily write transformers, because the term is binary in both the target language and the Ocaml representation of the term and the typing.
But they can't do this for tuples. Here's another example: the user easily defines serial functional composition operator. But the user cannot define parallel composition: the binary version:
f1: D1 -> C1, f2: D2-> C2 --> f1 * f2: D1 * D2 -> C1 * C2
is easily written, but cannot be extended to 3 terms: here a fold would compute
f1 * (f2 * f3)
instead of
f1 * f2 * f3
[isomorphic but not equal]
The generalisation of this question is "How do I implement a polyadic programming language" which is a bit too much to ask here. What I tried to do was provide a builtin transformer:
curry: T1 * T2 * T3 ... --> T1 -> T2 -> ...
uncurry: T1 -> T2 -> .. T1 * T2 * T3
so then the user could just do folds with a binary operator:
uncurry (fold user_op (uncurry term))
but this is neither general enough nor works so well.. :)
I guess an equivalent question for Haskell would be: since Haskell has no n-ary products, the n-ary tuple constructors is simulated in the library with n functions, where each one has to be written out by hand. This clearly sucks. How would this be fixed?
[I mean, it is trivial to write a Python script to generate those n functions up to some limit n, so why is it so hard to do this in a well typed way inside the language?]

There are two components that collaborate to cause this issue:
Tuples are not auto-flattened - the parentheses in type expressions do more than group, they create distinct types that are joined by further tuples. This leads to your observation that a * (b * c) is isomorphic but not equivalent to a * b * c.
The type system does not provide a means to express algebra on tuple types. By this I mean that the type system does not have a cons operator or any equivalent for tuples; there is no way to express that a type has more or fewer tupled elements than another type.
The result is that there is no way to express the type of a function that operates on tuples of arbitrary length.
So, the short summary is that the OCaml type system lacks mechanisms to express the type of the function you are trying to write, and therefore you cannot write the function (setting aside nasty games with Obj; you could write the function, but you could not express its type to use it in a type-safe fashion).

There are basically two options.
You can make your language untyped or weakly typed. In C for example, tuples (of integers, say) can be represented as int*. Something would need to keep track of the tuples' lengths but the type system won't. I assume you do not want to go that way.
For a type-safe language, you need a very expressive type system. Namely, you need types that depend on values. The members of which are functions that return types. For example the tuple type constructor (not to be confused with the tuple constructor) could have type tuple : int -> Type -> Type.
An operation that extends a tuple could have type forall n:int. forall T:Type. tuple n T -> T -> tuple (n+1) T. Such type systems come at a cost: Typically, type inference is not recursive and type checking is only decidable if you are willing to annotate all kinds of subexpressions with their type (The annotations in the forall parts above are only a hint of what that would entail.).
This options seems to be overkill if all you want to achieve is polyadicity, though.
Disclaimer: My knowledge on type theory is a bit dated.

Related

Is there a function that can make a string representation of any type?

I was desperately looking for the last hour for a method in the OCaml Library which converts an 'a to a string:
'a -> string
Is there something in the library which I just haven't found? Or do I have to do it different (writing everything by my own)?
It is not possible to write a printing function show of type 'a -> string in OCaml.
Indeed, types are erased after compilation in OCaml. (They are in fact erased after the typechecking which is one of the early phase of the compilation pipeline).
Consequently, a function of type 'a -> _ can either:
ignore its argument:
let f _ = "<something>"
peek at the memory representation of a value
let f x = if Obj.is_block x then "<block>" else "<immediate>"
Even peeking at the memory representation of a value has limited utility since many different types will share the same memory representation.
If you want to print a type, you need to create a printer for this type. You can either do this by hand using the Fmt library (or the Format module in the standard library)
type tree = Leaf of int | Node of { left:tree; right: tree }
let pp ppf tree = match tree with
| Leaf d -> Fmt.fp ppf "Leaf %d" d
| Node n -> Fmt.fp ppf "Node { left:%a; right:%a}" pp n.left pp n.right
or by using a ppx (a small preprocessing extension for OCaml) like https://github.com/ocaml-ppx/ppx_deriving.
type tree = Leaf of int | Node of { left:tree; right: tree } [##deriving show]
If you just want a quick hacky solution, you can use dump from theBatteries library. It doesn't work for all cases, but it does work for primitives, lists, etc. It accesses the underlying raw memory representation, hence is able to overcome (to some extent) the difficulties mentioned in the other answers.
You can use it like this (after installing it via opam install batteries):
# #require "batteries";;
# Batteries.dump 1;;
- : string = "1"
# Batteries.dump 1.2;;
- : string = "1.2"
# Batteries.dump [1;2;3];;
- : string = "[1; 2; 3]"
If you want a more "proper" solution, use ppx_deriving as recommended by #octachron. It is much more reliable/maintainable/customizable.
What you are looking for is a meaningful function of type 'a. 'a -> string, with parametric polymorphism (i.e. a single function that can operate the same for all possible types 'a, even those that didn’t exist when the function was created). This is not possible in OCaml. Here are explications depending on your programming background.
Coming from Haskell
If you were expecting such a function because you are familiar with the Haskell function show, then notice that its type is actually show :: Show a => a -> String. It uses an instance of the typeclass Show a, which is implicitly inserted by the compiler at call sites. This is not parametric polymorphism, this is ad-hoc polymorphism (show is overloaded, if you want). There is no such feature in OCaml (yet? there are projects for the future of the language, look for “modular implicits” or “modular explicits”).
Coming from OOP
If you were expecting such a function because you are familiar with OO languages in which every value is an object with a method toString, then this is not the case of OCaml. OCaml does not use the object model pervasively, and run-time representation of OCaml values retains no (or very few) notion of type. I refer you to #octachron’s answer.
Again, toString in OOP is not parametric polymorphism but overloading: there is not a single method toString which is defined for all possible types. Instead there are multiple — possibly very different — implementations of a method of the same name. In some OO languages, programmers try to follow the discipline of implementing a method by that name for every class they define, but it is only a coding practice. One could very well create objects that do not have such a method.
[ Actually, the notions involved in both worlds are pretty similar: Haskell requires an instance of a typeclass Show a providing a function show; OOP requires an object of a class Stringifiable (for instance) providing a method toString. Or, of course, an instance/object of a descendent typeclass/class. ]
Another possibility is to use https://github.com/ocaml-ppx/ppx_deriving with will create the function of Path.To.My.Super.Type.t -> string you can then use with your value. However you still need to track the path of the type by hand but it is better than nothing.
Another project provide feature similar to Batterie https://github.com/reasonml/reason-native/blob/master/src/console/README.md (I haven't tested Batterie so can't give opinion) They have the same limitation: they introspect the runtime encoding so can't get something really useable. I think it was done with windows/browser in mind so if cross plat is required I will test this one before (unless batterie is already pulled). and even if the code source is in reason you can use with same API in OCaml.

Haskell - Tuples and Lists as built-in types: how are they actually declared?

In chapter 2 of "A gentle introduction to Haskell", user-defined types are explained and then the notion that built-in types are, apart from a special syntax, no more different than user-defined ones:
Earlier we introduced several "built-in" types such as lists, tuples, integers, and characters. We have also shown how new user-defined types can be defined. Aside from special syntax, are the built-in types in any way more special than the user-defined ones? The answer is no. (The special syntax is for convenience and for consistency with historical convention, but has no semantic consequences.)
So you could define a tuple like to the following:
data (a,b) = (a,b)
data (a,b,c) = (a,b,c)
data (a,b,c,d) = (a,b,c,d)
Which sure you cannot because that would require an infinite number of declarations. So how are these types actually implemented? Especially regarding the fact that only against a type declaration you can pattern-match?
Since GHC is open source, we can just look at it:
The tuples are a lot less magical than you think:
From https://github.com/ghc/ghc/blob/master/libraries/ghc-prim/GHC/Tuple.hs
data (a,b) = (a,b)
data (a,b,c) = (a,b,c)
data (a,b,c,d) = (a,b,c,d)
data (a,b,c,d,e) = (a,b,c,d,e)
data (a,b,c,d,e,f) = (a,b,c,d,e,f)
data (a,b,c,d,e,f,g) = (a,b,c,d,e,f,g)
-- and so on...
So, tuples with different arities are just different data types, and tuples with very big number of arities is not supported.
Lists are also around there:
From https://github.com/ghc/ghc/blob/master/libraries/ghc-prim/GHC/Types.hs#L101
data [] a = [] | a : [a]
But there is a little bit of magic (special syntax) for lists.
Note: I know that GitHub is not where GHC is developed, but searching "ghc source code" on Google did not yield the correct page, and GitHub was the easiest.
You defined three tuple types there, not one hence your argument with the infinite number of declarations doesn't cut. A standard confoming Haskell needs to support only a finite number of tuple types. Hence finitely many declarations.
In fact, you can define:
data Pair a b = Pair a b
and this is isomorphic to an ordinary 2-tuple.

What's the difference between records and tuples in OCaml

Is there any difference between records and tuples that is not just a syntactic difference ?
Is there a performance difference ?
Is the implementation the same for tuples and records ?
Do you have examples of things that can be done using tuples but not with records (and
conversely) ?
Modulo syntax they are almost the same. The main semantic difference is that tuples are structural types, while records are nominal types. That implies e.g. that records can be recursive while tuples cannot (at least not without the -rectypes option):
type t = {a : int, b : unit -> t} (* fine *)
type u = int * (unit -> u) (* error *)
Moreover, records can have mutable fields, tuples can't.
FWIW, in OCaml's sister language SML, tuples are records. That is, in SML (a,b,c) is just syntactic sugar for {1=a,2=b,3=c}, and records are structural types as well.
Floats fields in float-only records or arrays are stored unboxed, while no such optimization applies to tuples. If you are storing a lot of floats and only floats, it is important to use records -- and you can gain by splitting a mixed float/other datastructure to have an internal float-only record.
The other differences are at the type level, and were already described by Andreas -- records are generative while tuples pre-exist and have a structural semantics. If you want structural records with polymorphic accessors, you can use object types.

Why prefer currying to tuple arguments in OCaml?

"Introduction to Caml" says
Note, in Caml it is better to use Curried function definitions for multiple-argument functions, not tuples.
when comparing 'a -> 'b -> 'c calling conventions to 'a * 'b -> 'c.
When working with SML/NJ I got used to using tuple types for both input and output : ('a * 'b) -> ('c * 'd) so using tuples to express multiple inputs seems symmetric with the way I express multiple outputs.
Why is currying recommended for OCaml function declarations over tuple arguments? Is it just the greater flexibility that comes with allowing currying/partial evaluation, or is there some other benefit that derives from implementation details of the OCaml compiler?
I think a lot of it is convention -- standard library functions in OCaml are curried, whereas in Standard ML they are generally not except for some higher-order functions. However, there is one difference baked into the language: the operators (e.g. (*)) are curried in OCaml (e.g. int -> int -> int); whereas they are uncurried in Standard ML (e.g. op* can be (int * int) -> int). Because of that, built-in higher-order functions (e.g. fold) also take a function that is curried in OCaml and uncurried in Standard ML; that means for your function to work with that, you need to follow the respective convention, and it follows from there.
Yes, it is mainly the notational convenience and the flexibility to do partial application. Curried functions are idiomatic in OCaml, and the compiler is likely to optimise them somewhat better than tupled functions (whereas SML compilers typically optimise for tuples).
The pros of tupling are the argument/result symmetry you mention (which is especially useful when composing functions) and perhaps the notational familiarity (at least for people coming from the non-functional world).
Some comment about the optimisation in OCaml.
In OCaml, i noticed that tuples are always allocated when passing them as argument. Even if allocating in the primary heap is fast in ocaml, it is of course longer than doing nothing. Thus each time you pass a tuple as argument, there is some time spent to the allocation and filling of the tuple.
I expected that the ocaml compiler would be optimizing cases where it isn't needed to build the tuple. For instance, when you inline the called function, you may only use the tuple components and not the tuple itself. Thus the tuple could just be ignored. Unfortunately in this case OCaml doesn't remove the useless tuple and still perform the allocation. For this reason, it may not be a good idea to use tuples in critical sections of code.

Keeping type generic without η-expansion

What I'm doing: I'm writing a small interpreter system that can parse a file, turn it into a sequence of operations, and then feed thousands of data sets into that sequence to extract some final value from each. A compiled interpreter consists of a list of pure functions that take two arguments: a data set, and an execution context. Each function returns the modified execution context:
type ('data, 'context) interpreter = ('data -> 'context -> 'context) list
The compiler is essentially a tokenizer with a final token-to-instruction mapping step that uses a map description defined as follows:
type ('data, 'context) map = (string * ('data -> 'context -> 'context)) list
Typical interpreter usage looks like this:
let pocket_calc =
let map = [ "add", (fun d c -> c # add d) ;
"sub", (fun d c -> c # sub d) ;
"mul", (fun d c -> c # mul d) ]
in
Interpreter.parse map "path/to/file.txt"
let new_context = Interpreter.run pocket_calc data old_context
The problem: I'd like my pocket_calc interpreter to work with any class that supports add, sub and mul methods, and the corresponding data type (could be integers for one context class and floating-point numbers for another).
However, pocket_calc is defined as a value and not a function, so the type system does not make its type generic: the first time it's used, the 'data and 'context types are bound to the types of whatever data and context I first provide, and the interpreter becomes forever incompatible with any other data and context types.
A viable solution is to eta-expand the definition of the interpreter to allow its type parameters to be generic:
let pocket_calc data context =
let map = [ "add", (fun d c -> c # add d) ;
"sub", (fun d c -> c # sub d) ;
"mul", (fun d c -> c # mul d) ]
in
let interpreter = Interpreter.parse map "path/to/file.txt" in
Interpreter.run interpreter data context
However, this solution is unacceptable for several reasons:
It re-compiles the interpreter every time it's called, which significantly degrades performance. Even the mapping step (turning a token list into a interpreter using the map list) causes a noticeable slowdown.
My design relies on all interpreters being loaded at initialization time, because the compiler issues warnings whenever a token in the loaded file does not match a line in the map list, and I want to see all those warnings when the software launches (not when individual interpreters are eventually run).
I sometimes want to reuse a given map list in several interpreters, whether on its own or by prepending additional instructions (for instance, "div").
The questions: is there any way to make the type parametric other than eta-expansion? Maybe some clever trick involving module signatures or inheritance? If that's impossible, is there any way to alleviate the three issues I have mentioned above in order to make eta-expansion an acceptable solution? Thank you!
A viable solution is to eta-expand the
definition of the interpreter to allow
its type parameters to be generic:
let pocket_calc data context =
let map = [ "add", (fun d c -> c # add d) ;
"sub", (fun d c -> c # sub d) ;
"mul", (fun d c -> c # mul d) ]
in
let interpreter = Interpreter.parse map "path/to/file.txt" in
Interpreter.run interpreter data context
However, this solution is unacceptable
for several reasons:
It re-compiles the interpreter every time it's called, which
significantly degrades performance.
Even the mapping step (turning a token
list into a interpreter using the map
list) causes a noticeable slowdown.
It recompiles the interpreter every time because you are doing it wrong. The proper form is more something like this (and technically, if the partial interpretation of Interpreter.run to interpreter can do some computations, you should move it out of the fun too).
let pocket_calc =
let map = [ "add", (fun d c -> c # add d) ;
"sub", (fun d c -> c # sub d) ;
"mul", (fun d c -> c # mul d) ]
in
let interpreter = Interpreter.parse map "path/to/file.txt" in
fun data context -> Interpreter.run interpreter data context
I think your problem lies in a lack of polymorphism in your operations, which you would like to have a closed parametric type (works for all data supporting the following arithmetic primitives) instead of having a type parameter representing a fixed data type.
However, it's a bit difficult to ensure it's exactly this, because your code is not self-contained enough to test it.
Assuming the given type for primitives :
type 'a primitives = <
add : 'a -> 'a;
mul : 'a -> 'a;
sub : 'a -> 'a;
>
You can use the first-order polymorphism provided by structures and objects :
type op = { op : 'a . 'a -> 'a primitives -> 'a }
let map = [ "add", { op = fun d c -> c # add d } ;
"sub", { op = fun d c -> c # sub d } ;
"mul", { op = fun d c -> c # mul d } ];;
You get back the following data-agnostic type :
val map : (string * op) list
Edit: regarding your comment about different operation types, I'm not sure which level of flexibility you want. I don't think you could mix operations over different primitives in the same list, and still benefit from the specifities of each : at best, you could only transform an "operation over add/sub/mul" into an "operation over add/sub/mul/div" (as we're contravariant in the primitives type), but certainly not much.
On a more pragmatic level, it's true that, with that design, you need a different "operation" type for each primitives type. You could easily, however, build a functor parametrized by the primitives type and returning the operation type.
I don't know how one would expose a direct subtyping relation between different primitive types. The problem is that this would need a subtyping relation at the functor level, which I don't think we have in Caml. You could, however, using a simpler form of explicit subtyping (instead of casting a :> b, use a function a -> b), build second functor, contravariant, that, given a map from a primitive type to the other, would build a map from one operation type to the other.
It's entirely possible that, with a different and clever representation of the type evolved, a much simpler solution is possible. First-class modules of 3.12 might also come in play, but they tend to be helpful for first-class existential types, whereas here we rhater use universal types.
Interpretive overhead and operation reifications
Besides your local typing problem, I'm not sure you're heading the right way. You're trying to eliminate interpretive overhead by building, "ahead of time" (before using the operations), a closure corresponding to a in-language representation of your operation.
In my experience, this approach doesn't generally get rid of interpretive overhead, it rather moves it to another layer. If you create your closures naïvely, you will have the parsing flow of control reproduced at the closure layer : the closure will call other closures, etc., as your parsing code "interpreted" the input when creating the closure. You eliminated the cost of parsing, but the possibly suboptimal flow of control is still the same. Additionnaly, closures tend to be a pain to manipulate directly : you have to be very careful about comparison operations for example, serialization, etc.
I think you may be interested in the long term in an intermediate "reified" language representing your operations : a simple algebraic data type for arithmetic operations, that you would build from your textual representation. You can still try to build closures "ahead of time" from it, though I'm not sure the performances are much better than directly interpreting it, if the in-memory representation is decent. Moreover, it will be much easier to plug in intermediary analyzers/transformers to optimize your operations, for example going from an "associative binary operations" model to a "n-ary operations" model, which could be more efficiently evaluated.