Walking OCaml tuples of arbitrary depth - ocaml

I'm trying to understand better the OCaml type inference. I created this example:
let rec f t = match t with
| (l,r) -> (f l)+(f r)
| _ -> 1
and I want to apply it on any binary tuple (pair) with nested pairs, to obtain the total number of leafs. Example: f ((1,2),3)
The function f refuses to compile, because a contradiction in types at (f l): "This expression has type 'a but an expression was expected of type 'a * 'b".
Question: 'a being any type, could not also be a pair, or else be handled by the _ case? Is any method to walk tuples of arbitrary depth without converting them to other data structures, such as variants?
PS: In C++ I would solve this kind of problem by creating two template functions "f", one to handle tuples and one other types.

There is a way to do this, although I wouldn't recommend it to a new user due to the resulting complexities. You should get used to writing regular OCaml first.
That said, you can walk arbitrary types in a generic way by capturing the necessary structure as a GADT. For this simple problem it is quite easy:
type 'a ty =
| Pair : 'a ty * 'b ty -> ('a * 'b) ty
| Other : 'a ty
let rec count_leaves : type a . a -> a ty -> int =
fun a ty ->
match ty with
| Pair (ta, tb) -> count_leaves (fst a) ta + count_leaves (snd a) tb
| Other -> 1
Notice how the pattern matching on the a ty here corresponds to the pattern matching on values in your (poorly typed) example function.
More useful functions could be written with a more complete type representation, although the machinery becomes heavy and complicated once arbitrary tuples, records, sum types, etc have to be supported.

Any combination of tuples will have a value shape completely described by it's type (because there is no "choice" in the type structure) - hence the "number of leaves" question can be answered completely statically at compile-time. Once you have a function operating on such type - this function is fixed to operate on that specific type (and shape) only.
If you want to build a tree that can have different shapes (but same type - hence can be handled by same function) - you need to add variants to the mix, i.e. classic type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree, or any other type that describes value with some dynamic "choice" of shape.

Related

Difference between iterators, enumerations and sequences

I want to understand what is the difference between iterators, enumerations and sequences in ocaml
enumeration:
type 'a t = {
mutable count : unit -> int; (** Return the number of remaining elements in the enumeration. *)
mutable next : unit -> 'a; (** Return the next element of the enumeration or raise [No_more_elements].*)
mutable clone : unit -> 'a t;(** Return a copy of the enumeration. *)
mutable fast : bool; (** [true] if [count] can be done without reading all elements, [false] otherwise.*)
}
sequence:
type 'a node =
| Nil
| Cons of 'a * 'a t
and 'a t = unit -> 'a node
I don't have any idea about iterators
Enumerations/Generators
BatEnum (what you call "enumeration", but let's use module names instead) is more or less isomorphic to a generator, which is often said pull-based:
generator : unit -> 'a option
This means "Each time you call generator (), I will give you a new element from the collection, until there are no more elements and it returns None". Note that this means previous elements are not accessible. This behavior is called "destructive".
This is similar to the gen library. Such iterators are fundamentally very imperative (they work by maintaining a current state).
Sequences
Pull-based approaches are not necessarily destructive, this is where the Seq type fits. It's a list-like structure, except each node is hidden behind a closure. It's similar to lazy lists, but without the guaranty of persistency. You can manipulate these sequences pretty much like lists, by pattern matching on them.
type 'a node =
| Nil
| Cons of 'a * 'a seq
and 'a seq = unit -> 'a node
Iterators
Iterators such as sequence, also said "push-based", have a type that is similar to the iter function that you find on many data-structures:
iterator : ('a -> unit) -> unit
which means "iterator f will apply the f function to all the elements in the collection`.
What's the difference?
One key difference between pull-based and push-based approaches is their expressivity. Consider that you have two generators, gen1 and gen2, it's easy to add them:
let add gen1 gen2 =
let gen () =
match gen1(), gen2() with
| Some v1, Some v2 -> Some (v1+v2)
| _ -> None
in
gen
However, you can't really write such a function with most push-based approaches such as sequence, since you don't completely control the iteration.
On the flip side, push-based iterators are usually easier to define and are faster.
Recommendation
Starting in OCaml 4.07, Seq is available in the standard library. There is a seq compatibiliy package that you can use right now, and a large library of combinators in the associated oseq library.
Seq is fast, expressive and fairly easy to use, so I recommend using it.
An enumeration is not what you wrote, you just defined a record here. An enumeration is a type that contains multiple constructors but a variable can only pick one value at a time in it (you can see it as the union type in C) :
type enum = One | Two | Three
let e = One
A sequence is, as you write it, simply a recursive enumeration type (in your case you defined what is usually called a list).
To simplify, let's call the special structures that contains some elements of the same type a container (some known containers are arrays, lists, sets, maps etc)
An iterator is a function that applies the same function to each elements of a container. So you would have the map iterator which applies a function to each element but keep the structure as it is (for example, adding 1 to each element of a list l : List.map (fun e -> e + 1) l). The fold operator which applies a function to each element and an accumulator and returns the accumulator (for example, adding each element of a list l and returning the result : List.fold_left (fun acc e -> acc + e) l).
So,
enumeration and sequence : structures
iterators : function over each element of the structures

SML [circularity] error when doing recursion on lists

I'm trying to built a function that zips the 2 given function, ignoring the longer list's length.
fun zipTail L1 L2 =
let
fun helper buf L1 L2 = buf
| helper buf [x::rest1] [y::rest2] = helper ((x,y)::buf) rest1 rest2
in
reverse (helper [] L1 L2)
end
When I did this I got the error message:
Error: right-hand-side of clause doesn't agree with function result type [circularity]
I'm curious as of what a circularity error is and how should I fix this.
There are a number of problems here
1) In helper buf L1 L2 = buf, the pattern buf L1 L2 would match all possible inputs, rendering your next clause (once debugged) redundant. In context, I think that you meant helper buf [] [] = buf, but then you would run into problems of non-exhaustive matching in the case of lists of unequal sizes. The simplest fix would be to move the second clause (the one with x::rest1) into the top line and then have a second pattern to catch the cases in which at least one of the lists are empty.
2) [xs::rest] is a pattern which matches a list of 1 item where the item is a nonempty list. That isn't your attention. You need to use (,) rather than [,].
3) reverse should be rev.
Making these changes, your definition becomes:
fun zipTail L1 L2 =
let
fun helper buf (x::rest1) (y::rest2) = helper ((x,y)::buf) rest1 rest2
| helper buf rest1 rest2 = buf
in
rev (helper [] L1 L2)
end;
Which works as intended.
The error message itself is a bit hard to understand, but you can think of it like this. In
helper buf [x::rest1] [y::rest2] = helper ((x,y)::buf) rest1 rest2
the things in the brackets on the left hand side are lists of lists. So their type would be 'a list list where 'a is the type of x. In x::rest1 the type of rest1 would have to be 'a list Since rest1 also appears on the other side of the equals sign in the same position as [x::rest1] then the type of rest1 would have to be the same as the type of [x::rest1], which is 'a list list. Thus rest1 must be both 'a list and 'a list list, which is impossible.
The circularity comes from if you attempt to make sense of 'a list list = 'a list, you would need a type 'a with 'a = 'a list. This would be a type whose values consists of a list of values of the same type, and the values of the items in that list would have to themselves be lists of elements of the same type ... It is a viscous circle which never ends.
The problem with circularity shows up many other places.
You want (x::rest1) and not [x::rest1].
The problem is a syntactic misconception.
The pattern [foo] will match against a list with exactly one element in it, foo.
The pattern x::rest1 will match against a list with at least one element in it, x, and its (possibly empty) tail, rest1. This is the pattern you want. But the pattern contains an infix operator, so you need to add a parenthesis around it.
The combined pattern [x::rest1] will match against a list with exactly one element that is itself a list with at least one element. This pattern is valid, although overly specific, and does not provoke a type error in itself.
The reason you get a circularity error is that the compiler can't infer what the type of rest1 is. As it occurs on the right-hand side of the :: pattern constructor, it must be 'a list, and as it occurs all by itself, it must be 'a. Trying to unify 'a = 'a list is like finding solutions to the equation x = x + 1.
You might say "well, as long as 'a = 'a list list list list list ... infinitely, like ∞ = ∞ + 1, that's a solution." But the Damas-Hindley-Milner type system doesn't treat this infinite construction as a well-defined type. And creating the singleton list [[[...x...]]] would require an infinite amount of brackets, so it isn't entirely practical anyways.
Some simpler examples of circularity:
fun derp [x] = derp x: This is a simplification of your case where the pattern in the first argument of derp indicates a list, and the x indicates that the type of element in this list must be the same as the type of the list itself.
fun wat x = wat [x]: This is a very similar case where wat takes an argument of type 'a and calls itself with an argument of type 'a list. Naturally, 'a could be an 'a list, but then so must 'a list be an 'a list list, etc.
As I said, you're getting circularity because of a syntactic misconception wrt. list patterns. But circularity is not restricted to lists. They're a product of composed types and self-reference. Here's an example without lists taken from Function which applies its argument to itself?:
fun erg x = x x: Here, x can be thought of as having type 'a to begin with, but seeing it applied as a function to itself, it must also have type 'a -> 'b. But if 'a = 'a -> 'b, then 'a -> b = ('a -> 'b) -> 'b, and ('a -> 'b) -> b = (('a -> 'b) -> b) -> b, and so on. SML compilers are quick to determine that there are no solutions here.
This is not to say that functions with circular types are always useless. As newacct points out, turning purely anonymous functions into recursive ones actually requires this, like in the Y-combinator.
The built-in ListPair.zip
is usually tail-recursive, by the way.

How to declare a hasEq constraint?

I'm just starting out with F*, by which I mean I've written a few lines along with the tutorial. So far it's really interesting and I'd like to keep learning.
The first thing I tried to do on my own was to write a type that represents a non-empty list. This was my attempt:
type nonEmptyList 'a = l : (list 'a) { l <> [] }
But I get the error
Failed to verify implicit argument: Subtyping check failed; expected
type (a#6468:Type{(hasEq a#0)}); got type Type
I know I'm on the right track though because, if I constrain my list type to containing strings, this does work:
type nonEmptyList = l : (list string) { l <> [] }
I'm assuming this means that l <> [] in the original example isn't valid because I haven't specified that 'a should support equality. The problem is that I cannot for the life of me figure out how to do that. I guess is has something to do with a higher kind called hasEq, but trying things such as:
type nonEmptyList 'a = l : (list 'a) { hasEq 'a /\ l <> [] }
hasn't gotten me anywhere. The tutorial doesn't cover hasEq and I can't find anything helpful in the examples in the GitHub repo so now I'm stuck.
You correctly identified the problem here. The type 'a that you used in the definition of nonEmptyList is left unspecified and therefore could not support equality. Your intuition is correct, you need to tell F* that 'a is a type that has equality, by adding a refinement on it:
To do that, you can write the following:
type nonEmptyList (a:Type{hasEq a}) = l : (list a) { l <> [] }
Note that the binder I used for the type is a and not 'a. It would cause a syntax error, it makes more sense because it isn't "any" type anymore.
Also, note that you can be even more precise and specify the universe of the type a as Type0 if needbe.
Your analysis is indeed correct, and the accepted answer gives the right solution in general.
For your concrete example, though, you don't need decidable equality on list elements: you can just use (list 'a){ ~ (List.isEmpty l) }.
For reference, here's the definition of isEmpty:
(** [isEmpty l] returns [true] if and only if [l] is empty *)
val isEmpty: list 'a -> Tot bool
let isEmpty l = match l with
| [] -> true
| _ -> false

OCaml variance (+'a, -'a) and invariance

After writing this piece of code
module type TS = sig
type +'a t
end
module T : TS = struct
type 'a t = {info : 'a list}
end
I realised I needed info to be mutable.
I wrote, then :
module type TS = sig
type +'a t
end
module T : TS = struct
type 'a t = {mutable info : 'a list}
end
But, surprise,
Type declarations do not match:
type 'a t = { mutable info : 'a list; }
is not included in
type +'a t
Their variances do not agree.
Oh, I remember hearing about variance. It was something about covariance and contravariance. I'm a brave person, I'll find about my problem alone!
I found these two interesting articles (here and here) and I understood!
I can write
module type TS = sig
type (-'a, +'b) t
end
module T : TS = struct
type ('a, 'b) t = 'a -> 'b
end
But then I wondered. How come that mutable datatypes are invariant and not just covariant?
I mean, I understand that an 'A list can be considered as a subtype of an ('A | 'B) list because my list can't change. Same thing for a function, if I have a function of type 'A | 'B -> 'C it can be considered as a subtype of a function of type 'A -> 'C | 'D because if my function can handle 'A and 'B's it can handle only 'A's and if I only return 'C's I can for sure expect 'C or 'D's (but I'll only get 'C's).
But for an array? If I have an 'A array I can't consider it as a an ('A | 'B) array because if I modify an element in the array putting a 'B then my array type is wrong because it truly is an ('A | 'B) array and not an 'A array anymore. But what about a ('A | 'B) array as an 'A array. Yes, it would be strange because my array can contain 'B but strangely I thought it was the same as a function. Maybe, in the end, I didn't understand everything but I wanted to put my thoughts on it here because it took me long to understand it.
TL;DR :
persistent : +'a
functions : -'a
mutable : invariant ('a) ? Why can't I force it to be -'a ?
I think that the easiest explanation is that a mutable value has two intrinsic operations: getter and setter, that are expressed using field access and field set syntaxes:
type 'a t = {mutable data : 'a}
let x = {data = 42}
(* getter *)
x.data
(* setter *)
x.data <- 56
Getter has a type 'a t -> 'a, where 'a type variable occurs on the right-hand side (so it imposes a covariance constraint), and the setter has type 'a t -> 'a -> unit where the type variable occurs to the left of the arrow, that imposes a contravariant constraint. So, we have a type that is both covariant and contravariant, that means that type variable 'a is invariant.
Your type t basically allows two operations: getting and setting. Informally, getting has type 'a t -> 'a list and setting has type 'a t -> 'a list -> unit. Combined, 'a occurs both in a positive and in a negative position.
[EDIT: The following is a (hopefully) clearer version of what I wrote in the first place. I consider it superior, so I deleted the previous version.]
I will try to make it more explicit. Suppose sub is a proper subtype of super and witness is some value of type super which is not a value of type sub. Now let f : sub -> unit be some function which fails on the value witness. Type safety is there to ensure that witness is never passed to f. I will show by example that type safety fails if one is allowed to either treat sub t as a subtype of super t, or the other way around.
let v_super = ({ info = [witness]; } : super t) in
let v_sub = ( v_super : sub t ) in (* Suppose this was allowed. *)
List.map f v_sub.info (* Equivalent to f witness. Woops. *)
So treating super t as a subtype of sub t cannot be allowed. This shows covariance, which you already knew. Now for contravariance.
let v_sub = ({ info = []; } : sub t) in
let v_super = ( v_sub : super t ) in (* Suppose this was allowed. *)
v_super.info <- [witness];
(* As v_sub and v_super are the same thing,
we have v_sub.info=[witness] once more. *)
List.map f v_sub.info (* Woops again. *)
So, treating sub t as a subtype of super t cannot be allowed either, showing contravariance. Together, 'a t is invariant.

OCaml: List mapping a function with 2 inputs

I have a function sqrt which takes 2 floating point values, tolerance and number and gives out square root of the number within the specified tolerance. I use approximation method to do it.
let rec sqrt_rec approx tol number =
..................;;
let sqrt tol x = sqrt_rec (x/.2.0) tol x;;
I've another function map which takes a function and a list and applies the function to all elements of the list.
let rec map f l =
match l with
[] -> []
| h::t -> f h::map f t;;
Now I'm trying to create another function all_sqrt which basically takes 1 floating point value, 1 floating point list and maps function sqrt to all the elements.
let all_sqrt tol_value ip_list = List.map sqrt tol_value ip_list;;
It is obviously giving me error. I tried making tol_value also a list but it still throws up error.
Error: This function is applied to too many arguments;
maybe you forgot a `;'
I believe i'm doing mapping wrong.
The List module contains
val map2 : ('a -> 'b -> 'c) -> 'a list -> 'b list -> 'c list
which is used like this:
let all_sqrt tol_value ip_list = List.map2 sqrt tol_value ip_list
This sounds like homework, since you say you are limited to certain functions in your solution. So I'll try to give just some suggestions, not an answer.
You want to use the same tolerance for all the values in your list. Imagine if there was a way to combine the tolerance with your sqrt function to produce a new function that takes just one parameter. You have something of the type float -> float -> float, and you somehow want to supply just the first float. This would give you back a function of type float -> float.
(As Wes pointed out, this works because your sqrt function is defined in Curried form.)
All I can say is that FP languages like OCaml (and Haskell) are exceptionally good at doing exactly this. In fact, it's kind of hard not to do it as long as you mind the precedences of various things. (I.e., think about the parentheses.)
I don't know O'Caml, but I do know Haskell, and it looks to me like you are applying map to 3 arguments "sqrt tol_value ip_list" map only takes two arguments, and is of the type ('a -> 'b) -> 'a list -> 'b list which means it accepts a function (functions only take one input and return one output), and a list, and returns a new list.
http://en.wikipedia.org/wiki/Currying