OCaml optimization of List.length for list literals? - ocaml

Does the OCaml compiler at any point make any effort to optimize calls to List.length for list literals (or other lists for that matter)?
I have written a trivial program (shown below) and compiled with ocamlopt with varying levels of optimization. Reviewing the generated assembly suggests that each time List.length is called, it's an O(n) operation. One would think, given that lists are immutable, the compiler could readily determine the length of a list literal as that is known at compile time, and store that such that List.length becomes O(1).
let a = [1; 2; 3; 4]
let () =
let p i = Printf.printf "%d\n" i in
p (List.length a);
p (List.length a)
A) Am I really bad at picking apart assembly or specifying optimization levels and this is available?
B) If it's not an optimization the compiler does, why not?

No the compiler is not doing any special effort to optimize List.length.
There is simply no reason to do so: list literals are a syntactic sugar that only exists in the parser, they are immediately desugared into the constructor form
let l = (::)(1, (::)(2,(::)(3,(::)(4,[]))))
Moreover, constant propagation and inlining are enough to optimize the call to List.length a to the constant 4 as you can see this with
(* test.ml *)
let test =
let a = [1;2;3;4] in
List.length a
ocamlopt -O3 test.ml -dflambda -inline-max-unroll 2
End of middle end:
let_symbol (camlR (Block (tag 0, 4)))
End camlR

Related

Function that converts a sequence to a list in OCaml

I want to convert a sequence to a list using List.init. I want at each step to retrieve the i th value of s.
let to_list s =
let n = length s in
List.init n
(fun _i ->
match s () with
| Nil -> assert false
| Cons (a, sr) -> a)
This is giving me a list initialized with the first element of s only. Is it possible in OCaml to initialize the list with all the values of s?
It may help to study the definition of List.init.
There are two variations depending on the size of the list: a tail recursive one, init_tailrec_aux, whose result is in reverse order, and a basic one, init_aux. They have identical results, so we need only look at init_aux:
let rec init_aux i n f =
if i >= n then []
else
let r = f i in
r :: init_aux (i+1) n f
This function recursively increments a counter i until it reaches a limit n. For each value of the counter that is strictly less than the limit, it adds the value given by f i to the head of the list being produced.
The question now is, what does your anonymous function do when called with different values of i?:
let f_anon =
(fun _i -> match s () with
|Nil -> assert false
|Cons(a, sr) -> a)
Regardless of _i, it always gives the head of the list produced by s (), and if s () always returns the same list, then f_anon 0 = f_anon 1 = f_anon 2 = f_anon 3 = hd (s ()).
Jeffrey Scofield's answer describes a technique for giving a different value at each _i, and I agree with his suggestion that List.init is not the best solution for this problem.
The essence of the problem is that you're not saving sr, which would let you retrieve the next element of the sequence.
However, the slightly larger problem is that List.init passes only an int as an argument to the initialization function. So even if you did keep track of sr, there's no way it can be passed to your initialization function.
You can do what you want using the impure parts of OCaml. E.g., you could save sr in a global reference variable at each step and retrieve it in the next call to the initialization function. However, this is really quite a cumbersome way to produce your list.
I would suggest not using List.init. You can write a straightforward recursive function to do what you want. (If you care about tail recursion, you can write a slightly less straightforward function.)
using a recursive function will increase the complexity so i think that initializing directly the list (or array) at the corresponding length will be better but i don't really know how to get a different value at each _i like Jeffrey Scofield said i am not really familiar with ocaml especially sequences so i have some difficulties doing that:(

How do I write a function to create a circular version of a list in OCaml?

Its possible to create infinite, circular lists using let rec, without needing to resort to mutable references:
let rec xs = 1 :: 0 :: xs ;;
But can I use this same technique to write a function that receives a finite list and returns an infinite, circular version of it? I tried writing
let rec cycle xs =
let rec result = go xs and
go = function
| [] -> result
| (y::ys) -> y :: go ys in
result
;;
But got the following error
Error: This kind of expression is not allowed as right-hand side of `let rec'
Your code has two problems:
result = go xs is in illegal form for let rec
The function tries to create a loop by some computation, which falls into an infinite loop causing stack overflow.
The above code is rejected by the compiler because you cannot write an expression which may cause recursive computation in the right-hand side of let rec (see Limitations of let rec in OCaml).
Even if you fix the issue you still have a problem: cycle does not finish the job:
let rec cycle xs =
let rec go = function
| [] -> go xs
| y::ys -> y :: g ys
in
go xs;;
cycle [1;2];;
cycle [1;2] fails due to stack overflow.
In OCaml, let rec can define a looped structure only when its definition is "static" and does not perform any computation. let rec xs = 1 :: 0 :: xs is such an example: (::) is not a function but a constructor, which purely constructs the data structure. On the other hand, cycle performs some code execution to dynamically create a structure and it is infinite. I am afraid that you cannot write a function like cycle in OCaml.
If you want to introduce some loops in data like cycle in OCaml, what you can do is using lazy structure to prevent immediate infinite loops like Haskell's lazy list, or use mutation to make a loop by a substitution. OCaml's list is not lazy nor mutable, therefore you cannot write a function dynamically constructs looped lists.
If you do not mind using black magic, you could try this code:
let cycle l =
if l = [] then invalid_arg "cycle" else
let l' = List.map (fun x -> x) l in (* copy the list *)
let rec aux = function
| [] -> assert false
| [_] as lst -> (* find the last cons cell *)
(* and set the last pointer to the beginning of the list *)
Obj.set_field (Obj.repr lst) 1 (Obj.repr l')
| _::t -> aux t
in aux l'; l'
Please be aware that using the Obj module is highly discouraged. On the other hand, there are industrial-strength programs and libraries (Coq, Jane Street's Core, Batteries included) that are known to use this sort of forbidden art.
camlspotter's answer is good enough already. I just want to add several more points here.
First of all, for the problem of write a function that receives a finite list and returns an infinite, circular version of it, it can be done in code / implementation level, just if you really use the function, it will have stackoverflow problem and will never return.
A simple version of what you were trying to do is like this:
let rec circle1 xs = List.rev_append (List.rev xs) (circle1 xs)
val circle: 'a list -> 'a list = <fun>
It can be compiled and theoretically it is correct. On [1;2;3], it is supposed to generate [1;2;3;1;2;3;1;2;3;1;2;3;...].
However, of course, it will fail because its run will be endless and eventually stackoverflow.
So why let rec circle2 = 1::2::3::circle2 will work?
Let's see what will happen if you do it.
First, circle2 is a value and it is a list. After OCaml get this info, it can create a static address for circle2 with memory representation of list.
The memory's real value is 1::2::3::circle2, which actually is Node (1, Node (2, Node (3, circle2))), i.e., A Node with int 1 and address of a Node with int 2 and address of a Node with int 3 and address of circle2. But we already know circle2's address, right? So OCaml just put circle2's address there.
Everything will work.
Also, through this example, we can also know a fact that for a infinite circled list defined like this actually doesn't cost limited memory. It is not generating a real infinite list to consume all memory, instead, when a circle finishes, it just jumps "back" to the head of the list.
Let's then go back to example of circle1. Circle1 is a function, yes, it has an address, but we do not need or want it. What we want is the address of the function application circle1 xs. It is not like circle2, it is a function application which means we need to compute something to get the address. So,
OCaml will do List.rev xs, then try to get address circle1 xs, then repeat, repeat.
Ok, then why we sometimes get Error: This kind of expression is not allowed as right-hand side of 'let rec'?
From http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#s%3aletrecvalues
the let rec binding construct, in addition to the definition of
recursive functions, also supports a certain class of recursive
definitions of non-functional values, such as
let rec name1 = 1 :: name2 and name2 = 2 :: name1 in expr which
binds name1 to the cyclic list 1::2::1::2::…, and name2 to the cyclic
list 2::1::2::1::…Informally, the class of accepted definitions
consists of those definitions where the defined names occur only
inside function bodies or as argument to a data constructor.
If you use let rec to define a binding, say let rec name. This name can be only in either a function body or a data constructor.
In previous two examples, circle1 is in a function body (let rec circle1 = fun xs -> ...) and circle2 is in a data constructor.
If you do let rec circle = circle, it will give error as circle is not in the two allowed cases. let rec x = let y = x in y won't do either, because again, x not in constructor or function.
Here is also a clear explanation:
https://realworldocaml.org/v1/en/html/imperative-programming-1.html
Section Limitations of let rec

A better way than counting the length of a list of units

I sometimes find myself writing code like this:
someFunc :: Foo -> Int
someFunc foo = length $ do
x <- someList
guard someGuard
return ()
Or equivalently:
someFunc foo = length [() | x <- someList, someGuard]
Is there a better way to perform this sort of computation? More efficient? More readable? More idiomatic?
Primo
guard someGuard
return ()
is redundant, guard already returns () if the condition is true. Then I suppose someGuard actually depends on x, otherwise it would be if someGuard then length someList else 0. The usual way to write it is
someFunc foo = filter (\x -> someGuard) someList
if the situation is really as simple as your example looks. For more complicated situations, using one of your example styles is the most direct way. I find the do-notation preferable if things get really complicated.
If you find yourself repeatedly programming to a pattern, the thing to do is write a higher-order function to encapsulate that pattern. You could use the body you have, but in order to be utterly confident that your code is not allocating, I would recommend to use foldl and strict application of an increment operator:
numberSatisfying :: Integral n => (a -> Bool) -> [a] -> n
numberSatisfying p = foldl (\n x -> if p x then (+1) $! n else n) 0
I have used QuickCheck to confirm this code equivalent to your original code. (And yes, it is pretty cool that QuickCheck will test with random predicates.)

List comprehension vs high-order functions in F#

I come from SML background and feel quite comfortable with high-order functions. But I don't really get the idea of list comprehension. Is there any situation where list comprehension is more suitable than high-order functions on List and vice versa?
I heard somewhere that list comprehension is slower than high-order functions, should I avoid to use it when writing performance-critical functions?
For the example' sake, take a look at Projecting a list of lists efficiently in F# where #cfern's answer contains two versions using list comprehension and high-order functions respectively:
let rec cartesian = function
| [] -> [[]]
| L::Ls -> [for C in cartesian Ls do yield! [for x in L do yield x::C]]
and:
let rec cartesian2 = function
| [] -> [[]]
| L::Ls -> cartesian2 Ls |> List.collect (fun C -> L |> List.map (fun x->x::C))
Choosing between comprehensions and higher-order functions is mostly a matter of style. I think that comprehensions are sometimes more readable, but that's just a personal preference. Note that the cartesian function could be written more elegantly like this:
let rec cartesian = function
| [] -> [[]]
| L::Ls ->
[ for C in cartesian Ls do for x in L do yield x::C ]
The interesting case is when writing recursive functions. If you use sequences (and sequence comprehensions), they remove some unnecessary allocation of temporary lists and if you use yield! in a tail-call position, you can also avoid stack overflow exceptions:
let rec nums n =
if n = 100000 then []
else n::(nums (n+1))
// throws StackOverflowException
nums 0
let rec nums n = seq {
if n < 100000 then
yield n
yield! nums (n+1) }
// works just fine
nums 0 |> List.ofSeq
This is quite an interesting pattern, because it cannot be written in the same way using lists. When using lists, you cannot return some element and then make a recursive call, because it corresponds to n::(nums ...), which is not tail-recursive.
Looking at the generated code in ILSpy, you can see that list comprehensions are compiled to state machines (like methods using yield return in C#), then passed to something like List.ofSeq. Higher-order functions, on the other hand, are hand-coded, and frequently use mutable state or other imperative constructs to be as efficient as possible. As is often the case, the general-purpose mechanism is more expensive.
So, to answer your question, if performance is critical there is usually a higher-order function specific to your problem that should be preferred.
Adding to Tomas Petricek's answer. You can make the list version tail recursive.
let nums3 n =
let rec nums3internal acc n =
if n = 100000 then
acc
else
nums3internal (n::acc) (n+1) //Tail Call Optimization possible
nums3internal [] n |> List.rev
nums3 0
With the added benefit of a considerable speedup. At least when I measured with the stopwatch tool I get. (nums2 being the algorithm using Seq).
Nums2 takes 81.225500ms
Nums3 takes 4.948700ms
For higher numbers this advantage shrinks, because List.rev is inefficient. E.g. for 10000000 I get:
Nums2 takes 11054.023900ms
Nums3 takes 8256.693100ms

Difficulty thinking of properties for FsCheck

I've managed to get xUnit working on my little sample assembly. Now I want to see if I can grok FsCheck too. My problem is that I'm stumped when it comes to defining test properties for my functions.
Maybe I've just not got a good sample set of functions, but what would be good test properties for these functions, for example?
//transforms [1;2;3;4] into [(1,2);(3,4)]
pairs : 'a list -> ('a * 'a) list //'
//splits list into list of lists when predicate returns
// true for adjacent elements
splitOn : ('a -> 'a -> bool) -> 'a list -> 'a list list
//returns true if snd is bigger
sndBigger : ('a * 'a) -> bool (requires comparison)
There are already plenty of specific answers, so I'll try to give some general answers which might give you some ideas.
Inductive properties for recursive functions. For simple functions, this amounts probably to re-implementing the recursion. However, keep it simple: while the actual implementation more often than not evolves (e.g. it becomes tail-recursive, you add memoization,...) keep the property straightforward. The ==> property combinator usually comes in handy here. Your pairs function might make a good example.
Properties that hold over several functions in a module or type. This is usually the case when checking abstract data types. For example: adding an element to an array means that the array contains that element. This checks the consistency of Array.add and Array.contains.
Round trips: this is good for conversions (e.g. parsing, serialization) - generate an arbitrary representation, serialize it, deserialize it, check that it equals the original.
You may be able to do this with splitOn and concat.
General properties as sanity checks. Look for generally known properties that may hold - things like commutativity, associativity, idempotence (applying something twice does not change the result), reflexivity, etc. The idea here is more to exercise the function a bit - see if it does anything really weird.
As a general piece of advice, try not to make too big a deal out of it. For sndBigger, a good property would be:
let ``should return true if and only if snd is bigger`` (a:int) (b:int) =
sndBigger (a,b) = b > a
And that is probably exactly the implementation. Don't worry about it - sometimes a simple, old fashioned unit test is just what you need. No guilt necessary! :)
Maybe this link (by the Pex team) also gives some ideas.
I'll start with sndBigger - it is a very simple function, but you can write some properties that should hold about it. For example, what happens when you reverse the values in the tuple:
// Reversing values of the tuple negates the result
let swap (a, b) = (b, a)
let prop_sndBiggerSwap x =
sndBigger x = not (sndBigger (swap x))
// If two elements of the tuple are same, it should give 'false'
let prop_sndBiggerEq a =
sndBigger (a, a) = false
EDIT: This rule prop_sndBiggerSwap doesn't always hold (see comment by kvb). However the following should be correct:
// Reversing values of the tuple negates the result
let prop_sndBiggerSwap a b =
if a <> b then
let x = (a, b)
sndBigger x = not (sndBigger (swap x))
Regarding the pairs function, kvb already posted some good ideas. In addition, you could check that turning the transformed list back into a list of elements returns the original list (you'll need to handle the case when the input list is odd - depending on what the pairs function should do in this case):
let prop_pairsEq (x:_ list) =
if (x.Length%2 = 0) then
x |> pairs |> List.collect (fun (a, b) -> [a; b]) = x
else true
For splitOn, we can test similar thing - if you concatenate all the returned lists, it should give the original list (this doesn't verify the splitting behavior, but it is a good thing to start with - it at least guarantees that no elements will be lost).
let prop_splitOnEq f x =
x |> splitOn f |> List.concat = x
I'm not sure if FsCheck can handle this though (!) because the property takes a function as an argument (so it would need to generate "random functions"). If this doesn't work, you'll need to provide a couple of more specific properties with some handwritten function f. Next, implementing the check that f returns true for all adjacent pairs in the splitted lists (as kvb suggests) isn't actually that difficult:
let prop_splitOnAdjacentTrue f x =
x |> splitOn f
|> List.forall (fun l ->
l |> Seq.pairwise
|> Seq.forall (fun (a, b) -> f a b))
Probably the only last thing that you could check is that f returns false when you give it the last element from one list and the first element from the next list. The following isn't fully complete, but it shows the way to go:
let prop_splitOnOtherFalse f x =
x |> splitOn f
|> Seq.pairwise
|> Seq.forall (fun (a, b) -> lastElement a = firstElement b)
The last sample also shows that you should check whether the splitOn function can return an empty list as part of the returned list of results (because in that case, you couldn't find first/last element).
For some code (e.g. sndBigger), the implementation is so simple that any property will be at least as complex as the original code, so testing via FsCheck may not make sense. However, for the other two functions here are some things that you could check:
pairs
What's expected when the original length is not divisible by two? You could check for throwing an exception if that's the correct behavior.
List.map fst (pairs x) = evenEntries x and List.map snd (pairs x) = oddEntries x for simple functions evenEntries and oddEntries which you can write.
splitOn
If I understand your description of how the function is supposed to work, then you could check conditions like "For every list in the result of splitOn f l, no two consecutive entries satisfy f" and "Taking lists (l1,l2) from splitOn f l pairwise, f (last l1) (first l2) holds". Unfortunately, the logic here will probably be comparable in complexity to the implementation itself.