Difference between iterators, enumerations and sequences - ocaml

I want to understand what is the difference between iterators, enumerations and sequences in ocaml
enumeration:
type 'a t = {
mutable count : unit -> int; (** Return the number of remaining elements in the enumeration. *)
mutable next : unit -> 'a; (** Return the next element of the enumeration or raise [No_more_elements].*)
mutable clone : unit -> 'a t;(** Return a copy of the enumeration. *)
mutable fast : bool; (** [true] if [count] can be done without reading all elements, [false] otherwise.*)
}
sequence:
type 'a node =
| Nil
| Cons of 'a * 'a t
and 'a t = unit -> 'a node
I don't have any idea about iterators

Enumerations/Generators
BatEnum (what you call "enumeration", but let's use module names instead) is more or less isomorphic to a generator, which is often said pull-based:
generator : unit -> 'a option
This means "Each time you call generator (), I will give you a new element from the collection, until there are no more elements and it returns None". Note that this means previous elements are not accessible. This behavior is called "destructive".
This is similar to the gen library. Such iterators are fundamentally very imperative (they work by maintaining a current state).
Sequences
Pull-based approaches are not necessarily destructive, this is where the Seq type fits. It's a list-like structure, except each node is hidden behind a closure. It's similar to lazy lists, but without the guaranty of persistency. You can manipulate these sequences pretty much like lists, by pattern matching on them.
type 'a node =
| Nil
| Cons of 'a * 'a seq
and 'a seq = unit -> 'a node
Iterators
Iterators such as sequence, also said "push-based", have a type that is similar to the iter function that you find on many data-structures:
iterator : ('a -> unit) -> unit
which means "iterator f will apply the f function to all the elements in the collection`.
What's the difference?
One key difference between pull-based and push-based approaches is their expressivity. Consider that you have two generators, gen1 and gen2, it's easy to add them:
let add gen1 gen2 =
let gen () =
match gen1(), gen2() with
| Some v1, Some v2 -> Some (v1+v2)
| _ -> None
in
gen
However, you can't really write such a function with most push-based approaches such as sequence, since you don't completely control the iteration.
On the flip side, push-based iterators are usually easier to define and are faster.
Recommendation
Starting in OCaml 4.07, Seq is available in the standard library. There is a seq compatibiliy package that you can use right now, and a large library of combinators in the associated oseq library.
Seq is fast, expressive and fairly easy to use, so I recommend using it.

An enumeration is not what you wrote, you just defined a record here. An enumeration is a type that contains multiple constructors but a variable can only pick one value at a time in it (you can see it as the union type in C) :
type enum = One | Two | Three
let e = One
A sequence is, as you write it, simply a recursive enumeration type (in your case you defined what is usually called a list).
To simplify, let's call the special structures that contains some elements of the same type a container (some known containers are arrays, lists, sets, maps etc)
An iterator is a function that applies the same function to each elements of a container. So you would have the map iterator which applies a function to each element but keep the structure as it is (for example, adding 1 to each element of a list l : List.map (fun e -> e + 1) l). The fold operator which applies a function to each element and an accumulator and returns the accumulator (for example, adding each element of a list l and returning the result : List.fold_left (fun acc e -> acc + e) l).
So,
enumeration and sequence : structures
iterators : function over each element of the structures

Related

SML [circularity] error when doing recursion on lists

I'm trying to built a function that zips the 2 given function, ignoring the longer list's length.
fun zipTail L1 L2 =
let
fun helper buf L1 L2 = buf
| helper buf [x::rest1] [y::rest2] = helper ((x,y)::buf) rest1 rest2
in
reverse (helper [] L1 L2)
end
When I did this I got the error message:
Error: right-hand-side of clause doesn't agree with function result type [circularity]
I'm curious as of what a circularity error is and how should I fix this.
There are a number of problems here
1) In helper buf L1 L2 = buf, the pattern buf L1 L2 would match all possible inputs, rendering your next clause (once debugged) redundant. In context, I think that you meant helper buf [] [] = buf, but then you would run into problems of non-exhaustive matching in the case of lists of unequal sizes. The simplest fix would be to move the second clause (the one with x::rest1) into the top line and then have a second pattern to catch the cases in which at least one of the lists are empty.
2) [xs::rest] is a pattern which matches a list of 1 item where the item is a nonempty list. That isn't your attention. You need to use (,) rather than [,].
3) reverse should be rev.
Making these changes, your definition becomes:
fun zipTail L1 L2 =
let
fun helper buf (x::rest1) (y::rest2) = helper ((x,y)::buf) rest1 rest2
| helper buf rest1 rest2 = buf
in
rev (helper [] L1 L2)
end;
Which works as intended.
The error message itself is a bit hard to understand, but you can think of it like this. In
helper buf [x::rest1] [y::rest2] = helper ((x,y)::buf) rest1 rest2
the things in the brackets on the left hand side are lists of lists. So their type would be 'a list list where 'a is the type of x. In x::rest1 the type of rest1 would have to be 'a list Since rest1 also appears on the other side of the equals sign in the same position as [x::rest1] then the type of rest1 would have to be the same as the type of [x::rest1], which is 'a list list. Thus rest1 must be both 'a list and 'a list list, which is impossible.
The circularity comes from if you attempt to make sense of 'a list list = 'a list, you would need a type 'a with 'a = 'a list. This would be a type whose values consists of a list of values of the same type, and the values of the items in that list would have to themselves be lists of elements of the same type ... It is a viscous circle which never ends.
The problem with circularity shows up many other places.
You want (x::rest1) and not [x::rest1].
The problem is a syntactic misconception.
The pattern [foo] will match against a list with exactly one element in it, foo.
The pattern x::rest1 will match against a list with at least one element in it, x, and its (possibly empty) tail, rest1. This is the pattern you want. But the pattern contains an infix operator, so you need to add a parenthesis around it.
The combined pattern [x::rest1] will match against a list with exactly one element that is itself a list with at least one element. This pattern is valid, although overly specific, and does not provoke a type error in itself.
The reason you get a circularity error is that the compiler can't infer what the type of rest1 is. As it occurs on the right-hand side of the :: pattern constructor, it must be 'a list, and as it occurs all by itself, it must be 'a. Trying to unify 'a = 'a list is like finding solutions to the equation x = x + 1.
You might say "well, as long as 'a = 'a list list list list list ... infinitely, like ∞ = ∞ + 1, that's a solution." But the Damas-Hindley-Milner type system doesn't treat this infinite construction as a well-defined type. And creating the singleton list [[[...x...]]] would require an infinite amount of brackets, so it isn't entirely practical anyways.
Some simpler examples of circularity:
fun derp [x] = derp x: This is a simplification of your case where the pattern in the first argument of derp indicates a list, and the x indicates that the type of element in this list must be the same as the type of the list itself.
fun wat x = wat [x]: This is a very similar case where wat takes an argument of type 'a and calls itself with an argument of type 'a list. Naturally, 'a could be an 'a list, but then so must 'a list be an 'a list list, etc.
As I said, you're getting circularity because of a syntactic misconception wrt. list patterns. But circularity is not restricted to lists. They're a product of composed types and self-reference. Here's an example without lists taken from Function which applies its argument to itself?:
fun erg x = x x: Here, x can be thought of as having type 'a to begin with, but seeing it applied as a function to itself, it must also have type 'a -> 'b. But if 'a = 'a -> 'b, then 'a -> b = ('a -> 'b) -> 'b, and ('a -> 'b) -> b = (('a -> 'b) -> b) -> b, and so on. SML compilers are quick to determine that there are no solutions here.
This is not to say that functions with circular types are always useless. As newacct points out, turning purely anonymous functions into recursive ones actually requires this, like in the Y-combinator.
The built-in ListPair.zip
is usually tail-recursive, by the way.

OCaml - Expression was expected of type 'b list

I'm trying to write a function that checks whether a set (denoted by a list) is a subset of another.
I already wrote a helper function that gives me the intersection:
let rec intersect_helper a b =
match a, b with
| [], _ -> []
| _, [] -> []
| ah :: at, bh :: bt ->
if ah > bh then
intersect_helper a bt
else if ah < bh then
intersect_helper at b
else
ah :: intersect_helper at bt
I'm trying to use this inside of the subset function (if A is a subset of B, then A = A intersect B):
let subset a_ b_ =
let a = List.sort_uniq a_
and b = List.sort_uniq b_
in intersect_helper a b;;
Error: This expression has type 'a list -> 'a list but an expression was expected of type 'b list
What exactly is wrong here? I can use intersect_helper perfectly fine by itself, but calling it with lists here does not work. From what I know about 'a, it's just a placeholder for the first argument type. Shouldn't the lists also be of type 'a list?
I'm glad you could solve your own problem, but your code seems exceedingly intricate to me.
If I understood correctly, you want a function that tells whether a list is a subset of another list. Put another way, you want to know whether all elements of list a are present in list b.
Thus, the signature of your function should be
val subset : 'a list -> 'a list -> bool
The standard library comes with a variety of functions to manipulate lists.
let subset l1 l2 =
List.for_all (fun x -> List.mem x l2) l1
List.for_all checks that all elements in a list satisfy a given condition. List.mem checks whether a value is present in a list.
And there you have it. Let's check the results:
# subset [1;2;3] [4;2;3;5;1];;
- : bool = true
# subset [1;2;6] [4;2;3;5;1];;
- : bool = false
# subset [1;1;1] [1;1];; (* Doesn't work with duplicates, though. *)
- : bool = true
Remark: A tiny perk of using List.for_all is that it is a short-circuit operator. That means that it will stop whenever an item doesn't match, which results in better performance overall.
Also, since you specifically asked about sets, the standard library has a module for them. However, sets are a bit more complicated to use because they need you to create new modules using a functor.
module Int = struct
type t = int
let compare = Pervasives.compare
end
module IntSet = Set.Make(Int)
The extra overhead is worth it though, because now IntSet can use the whole Set interface, which includes the IntSet.subset function.
# IntSet.subset (IntSet.of_list [1;2;3]) (IntSet.subset [4;2;3;5;1]);;
- : bool = true
Instead of:
let a = List.sort_uniq a_
Should instead call:
let a = List.sort_uniq compare a_

Walking OCaml tuples of arbitrary depth

I'm trying to understand better the OCaml type inference. I created this example:
let rec f t = match t with
| (l,r) -> (f l)+(f r)
| _ -> 1
and I want to apply it on any binary tuple (pair) with nested pairs, to obtain the total number of leafs. Example: f ((1,2),3)
The function f refuses to compile, because a contradiction in types at (f l): "This expression has type 'a but an expression was expected of type 'a * 'b".
Question: 'a being any type, could not also be a pair, or else be handled by the _ case? Is any method to walk tuples of arbitrary depth without converting them to other data structures, such as variants?
PS: In C++ I would solve this kind of problem by creating two template functions "f", one to handle tuples and one other types.
There is a way to do this, although I wouldn't recommend it to a new user due to the resulting complexities. You should get used to writing regular OCaml first.
That said, you can walk arbitrary types in a generic way by capturing the necessary structure as a GADT. For this simple problem it is quite easy:
type 'a ty =
| Pair : 'a ty * 'b ty -> ('a * 'b) ty
| Other : 'a ty
let rec count_leaves : type a . a -> a ty -> int =
fun a ty ->
match ty with
| Pair (ta, tb) -> count_leaves (fst a) ta + count_leaves (snd a) tb
| Other -> 1
Notice how the pattern matching on the a ty here corresponds to the pattern matching on values in your (poorly typed) example function.
More useful functions could be written with a more complete type representation, although the machinery becomes heavy and complicated once arbitrary tuples, records, sum types, etc have to be supported.
Any combination of tuples will have a value shape completely described by it's type (because there is no "choice" in the type structure) - hence the "number of leaves" question can be answered completely statically at compile-time. Once you have a function operating on such type - this function is fixed to operate on that specific type (and shape) only.
If you want to build a tree that can have different shapes (but same type - hence can be handled by same function) - you need to add variants to the mix, i.e. classic type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree, or any other type that describes value with some dynamic "choice" of shape.

SML: How can I pass a function a list and return the list with all negative reals removed?

Here's what I've got so far...
fun positive l1 = positive(l1,[],[])
| positive (l1, p, n) =
if hd(l1) < 0
then positive(tl(l1), p, n # [hd(l1])
else if hd(l1) >= 0
then positive(tl(l1), p # [hd(l1)], n)
else if null (h1(l1))
then p
Yes, this is for my educational purposes. I'm taking an ML class in college and we had to write a program that would return the biggest integer in a list and I want to go above and beyond that to see if I can remove the positives from it as well.
Also, if possible, can anyone point me to a decent ML book or primer? Our class text doesn't explain things well at all.
You fail to mention that your code doesn't type.
Your first function clause just has the variable l1, which is used in the recursive. However here it is used as the first element of the triple, which is given as the argument. This doesn't really go hand in hand with the Hindley–Milner type system that SML uses. This is perhaps better seen by the following informal thoughts:
Lets start by assuming that l1 has the type 'a, and thus the function must take arguments of that type and return something unknown 'a -> .... However on the right hand side you create an argument (l1, [], []) which must have the type 'a * 'b list * 'c list. But since it is passed as an argument to the function, that must also mean that 'a is equal to 'a * 'b list * 'c list, which clearly is not the case.
Clearly this was not your original intent. It seems that your intent was to have a function that takes an list as argument, and then at the same time have a recursive helper function, which takes two extra accumulation arguments, namely a list of positive and negative numbers in the original list.
To do this, you at least need to give your helper function another name, such that its definition won't rebind the definition of the original function.
Then you have some options, as to which scope this helper function should be in. In general if it doesn't make any sense to be calling this helper function other than from the "main" function, then it should not be places in a scope outside the "main" function. This can be done using a let binding like this:
fun positive xs =
let
fun positive' ys p n = ...
in
positive' xs [] []
end
This way the helper function positives' can't be called outside of the positive function.
With this take care of there are some more issues with your original code.
Since you are only returning the list of positive integers, there is no need to keep track of the
negative ones.
You should be using pattern matching to decompose the list elements. This way you eliminate the
use of taking the head and tail of the list, and also the need to verify whether there actually is
a head and tail in the list.
fun foo [] = ... (* input list is empty *)
| foo (x::xs) = ... (* x is now the head, and xs is the tail *)
You should not use the append operator (#), whenever you can avoid it (which you always can).
The problem is that it has a terrible running time when you have a huge list on the left hand
side and a small list on the right hand side (which is often the case for the right hand side, as
it is mostly used to append a single element). Thus it should in general be considered bad
practice to use it.
However there exists a very simple solution to this, which is to always concatenate the element
in front of the list (constructing the list in reverse order), and then just reversing the list
when returning it as the last thing (making it in expected order):
fun foo [] acc = rev acc
| foo (x::xs) acc = foo xs (x::acc)
Given these small notes, we end up with a function that looks something like this
fun positive xs =
let
fun positive' [] p = rev p
| positive' (y::ys) p =
if y < 0 then
positive' ys p
else
positive' ys (y :: p)
in
positive' xs []
end
Have you learned about List.filter? It might be appropriate here - it takes a function (which is a predicate) of type 'a -> bool and a list of type 'a list, and returns a list consisting of only the elements for which the predicate evaluates to true. For example:
List.filter (fn x => Real.>= (x, 0.0)) [1.0, 4.5, ~3.4, 42.0, ~9.0]
Your existing code won't work because you're comparing to integers using the intversion of <. The code hd(l1) < 0 will work over a list of int, not a list of real. Numeric literals are not automatically coerced by Standard ML. One must explicitly write 0.0, and use Real.< (hd(l1), 0.0) for your test.
If you don't want to use filter from the standard library, you could consider how one might implement filter yourself. Here's one way:
fun filter f [] = []
| filter f (h::t) =
if f h
then h :: filter f t
else filter f t

Difficulty thinking of properties for FsCheck

I've managed to get xUnit working on my little sample assembly. Now I want to see if I can grok FsCheck too. My problem is that I'm stumped when it comes to defining test properties for my functions.
Maybe I've just not got a good sample set of functions, but what would be good test properties for these functions, for example?
//transforms [1;2;3;4] into [(1,2);(3,4)]
pairs : 'a list -> ('a * 'a) list //'
//splits list into list of lists when predicate returns
// true for adjacent elements
splitOn : ('a -> 'a -> bool) -> 'a list -> 'a list list
//returns true if snd is bigger
sndBigger : ('a * 'a) -> bool (requires comparison)
There are already plenty of specific answers, so I'll try to give some general answers which might give you some ideas.
Inductive properties for recursive functions. For simple functions, this amounts probably to re-implementing the recursion. However, keep it simple: while the actual implementation more often than not evolves (e.g. it becomes tail-recursive, you add memoization,...) keep the property straightforward. The ==> property combinator usually comes in handy here. Your pairs function might make a good example.
Properties that hold over several functions in a module or type. This is usually the case when checking abstract data types. For example: adding an element to an array means that the array contains that element. This checks the consistency of Array.add and Array.contains.
Round trips: this is good for conversions (e.g. parsing, serialization) - generate an arbitrary representation, serialize it, deserialize it, check that it equals the original.
You may be able to do this with splitOn and concat.
General properties as sanity checks. Look for generally known properties that may hold - things like commutativity, associativity, idempotence (applying something twice does not change the result), reflexivity, etc. The idea here is more to exercise the function a bit - see if it does anything really weird.
As a general piece of advice, try not to make too big a deal out of it. For sndBigger, a good property would be:
let ``should return true if and only if snd is bigger`` (a:int) (b:int) =
sndBigger (a,b) = b > a
And that is probably exactly the implementation. Don't worry about it - sometimes a simple, old fashioned unit test is just what you need. No guilt necessary! :)
Maybe this link (by the Pex team) also gives some ideas.
I'll start with sndBigger - it is a very simple function, but you can write some properties that should hold about it. For example, what happens when you reverse the values in the tuple:
// Reversing values of the tuple negates the result
let swap (a, b) = (b, a)
let prop_sndBiggerSwap x =
sndBigger x = not (sndBigger (swap x))
// If two elements of the tuple are same, it should give 'false'
let prop_sndBiggerEq a =
sndBigger (a, a) = false
EDIT: This rule prop_sndBiggerSwap doesn't always hold (see comment by kvb). However the following should be correct:
// Reversing values of the tuple negates the result
let prop_sndBiggerSwap a b =
if a <> b then
let x = (a, b)
sndBigger x = not (sndBigger (swap x))
Regarding the pairs function, kvb already posted some good ideas. In addition, you could check that turning the transformed list back into a list of elements returns the original list (you'll need to handle the case when the input list is odd - depending on what the pairs function should do in this case):
let prop_pairsEq (x:_ list) =
if (x.Length%2 = 0) then
x |> pairs |> List.collect (fun (a, b) -> [a; b]) = x
else true
For splitOn, we can test similar thing - if you concatenate all the returned lists, it should give the original list (this doesn't verify the splitting behavior, but it is a good thing to start with - it at least guarantees that no elements will be lost).
let prop_splitOnEq f x =
x |> splitOn f |> List.concat = x
I'm not sure if FsCheck can handle this though (!) because the property takes a function as an argument (so it would need to generate "random functions"). If this doesn't work, you'll need to provide a couple of more specific properties with some handwritten function f. Next, implementing the check that f returns true for all adjacent pairs in the splitted lists (as kvb suggests) isn't actually that difficult:
let prop_splitOnAdjacentTrue f x =
x |> splitOn f
|> List.forall (fun l ->
l |> Seq.pairwise
|> Seq.forall (fun (a, b) -> f a b))
Probably the only last thing that you could check is that f returns false when you give it the last element from one list and the first element from the next list. The following isn't fully complete, but it shows the way to go:
let prop_splitOnOtherFalse f x =
x |> splitOn f
|> Seq.pairwise
|> Seq.forall (fun (a, b) -> lastElement a = firstElement b)
The last sample also shows that you should check whether the splitOn function can return an empty list as part of the returned list of results (because in that case, you couldn't find first/last element).
For some code (e.g. sndBigger), the implementation is so simple that any property will be at least as complex as the original code, so testing via FsCheck may not make sense. However, for the other two functions here are some things that you could check:
pairs
What's expected when the original length is not divisible by two? You could check for throwing an exception if that's the correct behavior.
List.map fst (pairs x) = evenEntries x and List.map snd (pairs x) = oddEntries x for simple functions evenEntries and oddEntries which you can write.
splitOn
If I understand your description of how the function is supposed to work, then you could check conditions like "For every list in the result of splitOn f l, no two consecutive entries satisfy f" and "Taking lists (l1,l2) from splitOn f l pairwise, f (last l1) (first l2) holds". Unfortunately, the logic here will probably be comparable in complexity to the implementation itself.