It seems that Ocaml batteries have comprehension syntax:
http://en.wikipedia.org/wiki/List_comprehension#OCaml
However, what module should I include to use this syntax? I already open Batteries, but it doesn't work. Or is there a more idiomatic way to do list comprehension? I can use List.map and BatList.remove_if to achieve similar results, but that is much less elegant.
Currently there're two libraries in OCaml that provide list comprehension, one was formerly a part of OCaml Batteries, another is shipped with camlp4. Neither is widely used and I, personally, do no recommend you to use any.
For list comprehension to work, you need to change the syntax of the language. This can be done with preprocessing your program, written in an extended syntax, with a camlp4 preprocessor. Also, list comprehension is not a first class citizen in OCaml community, and it is not well supported by the modern toolkits. Although, you can still easily play with it in a toplevel, for that you need, to install the list comprehension package:
opam install pa_comprehension
and load it into a toplevel, using the following directives:
# #use "topfind";;
# #camlp4o;;
# #require "pa_comprehension";;
# open Batteries;;
# [? 2 * x | x <- 0 -- max_int ; x * x > 3 ?];;
But again, my personal opinion that list comprehension is not the best way to structure your code.
Life without comprehension
The example, you provided, can be expressed using core_kernel Sequence module (an analog of the Batteries Enum)
let f n =
Sequence.(range 0 n |>
filter ~f:(fun x -> x * x > 3) |>
map ~f:(fun x -> x * 2))
Hence a filter |> map is such a common idiom there exists a filter_map function:
let f n =
Sequence.(range 0 n |>
filter_map ~f:(fun x ->
if x * x > 3 then Some (x * 2) else None))
You may notice, that this examples takes more code, than list comprehension. But as soon as your programs will start to mature from simple hello world applications with integers to something more sophisticated, you will agree that using explicit iterators is more readable and comprehensible.
Also, since libraries in Core are so consistent, you can use a simple List instead of Sequence just by substituting the latter by the former. But of course, List is eager, unlike the Sequence, so playing with max_int using lists is not a good idea.
Moreover, since all containers are monads, you can use monadic operators for mapping, like:
let odds n = List.(range 0 n >>| fun x -> x * 2 + 1)
list comprehension is already included in standard ocaml
#require "camlp4.listcomprehension";;
[ x * x | x <- [ 1;2;3;4;5] ];;
- : int list = [1; 4; 9; 16; 25]
Related
I have this code snippet in Ocaml which is taken from here. I know it fills a data structure for a demand (traffic matrix) with a the specified value and when the two hosts are the same it just fill the value with 0. In python or in any imerative language, we would use two for loop one inside another to do the task. I assume this is the reason we have two (fold_left) in this code in which each one is equivilant to a one for loop (I might be mistaken!). My question is how this code works? and what is ~f: and ~init:? are these labels. If they are labels why the compiler complains when I remove them or when I change them? even when I put these arguments in the right order?!
I have finished one book and have watched alot of youtube videos but still find it difficult to understand most of Ocaml code.
let create_3cycle_input () =
let topo = Net.Parse.from_dotfile "./data/topologies/3cycle.dot" in
let hosts = get_hosts topo in
let demands =
List.fold_left
hosts
~init:SrcDstMap.empty
~f:(fun acc u ->
List.fold_left
hosts
~init:acc
~f:(fun acc v ->
let r = if u = v then 0.0 else 53. in
SrcDstMap.set acc ~key:(u,v) ~data:r)) in
(hosts,topo,demands);;
Please, read my another SO answer that explains how fold_left works. Once you understand how a single fold works, we can move forward to the nested case (as well as to the labels).
When you have a collection of collections, i.e., when an element of a collection is another collection by itself, and you want to iterate over each element of those inner collections than you need to nest your folds. A good example, are matrices which could be seen as collections of vectors, where vectors are by themselves are also collections.
The iteration algorithm is simple,
state := init
for each inner-collection in outer-collection do
for each element in inner-collection do
state := user-function(state, element)
done
done
Or, the same in OCaml (using the Core version of the fold)
let fold_list_of_lists outer ~init ~f =
List.fold outer ~init ~f:(fun state inner ->
List.fold inner ~init:state ~f:(fun state elt ->
f state elt)
This function will have type 'a list list -> init:'b -> f:('b -> 'a -> 'b) -> 'b
and will be applicable to any list of lists.
Concerning the labels and their removal. The labels are keyworded arguments and enable passing arguments to a function in an arbitrary manner, which is very useful when you have so many arguments. Removing labels is sometimes possible, but could be disabled using a compiler option. And the Core library (which is used by the project that you have referenced) is disabling removing the labels, probably for the good sake.
In general, labels could be omitted if the application is total, i.e., when the returned value is not a function by itself. Since fold_left returns a type variable, it could always be a function, therefore we always need to use labels with the Core's List.fold (and List.fold_left) function.
In F# if one tries to zip two lists of different lengths one gets an error:
List.zip [1..4] [1..3]
// System.ArgumentException: The lists had different lengths.
However, it is very easy to define an alternative definition of zip that accepts two argument lists of different lengths:
let rec zip' (xs: 'T list) (ys: 'T list) =
match (xs, ys) with
| ([], _) -> []
| (_, []) -> []
| ((z::zs), (w::ws)) -> (z, w) :: zip' zs ws
zip' [1..4] [1..3]
// val it : (int * int) list = [(1, 1); (2, 2); (3, 3)]
Is there a good reason not to use this alternative definition? Why wasn't it adopted in the first place?
This is, indeed, a bit confusing because there is a mismatch between List.zip (which does not allow this) and Seq.zip (which truncates the longer list).
I think that zip which works only on lists of equal length is a reasonable default behaviour - if it automatically truncated data, there is a realistic chance that you would accidentally lose some useful data when using zip which might cause subtle bugs.
The fact that Seq.zip truncates the longer list is only sensible because sequences are lazy and so, by design, when I define a sequence I expect that the consumer might not read it all.
In summary, I think the behaviour difference is based on "what is the most sensible thing to do for a given data structure", but I do think having two names for the operations would make a lot more sense than calling both zip (alas, that's pretty much impossible to change now).
I remember that when I showed some code that I wrote to my professor he remarked, offhand, that
It rarely matters, but it's worth noting that fold* is a little bit more efficient than fold*' in SML/NJ, so you should prefer it over fold* when possible.
I forget whether fold* was foldr or foldl. I know that this is one of those micro-optimization things that probably doesn't make a big difference in practice, but I'd like to be in the habit of using the more efficient one when I have the choice.
Which is which? My guess is that this is SML/NJ specific and that MLton will be smart enough to optimize both down to the same machine code, but answers for other compilers are good to know.
foldl is tail-recursive, while foldr is not. Although you can do foldr in a tail-recursive way by reversing the list (which is tail recursive), and then doing foldl.
This is only going to matter if you are folding over huge lists.
Prefer the one that converts the given input into the intended output.
If both produce the same output such as with a sum, and if dealing with a list, folding from the left will be more efficient because the fold can begin with head element, while folding from the right will first require walking the list to find the last element before calculating the first intermediate result.
With arrays and similar random access data structures, there's probably not going to be much difference.
A compiler optimization that always chose the better of left and right would require the compiler to determine that left and right were equivalent over all possible inputs. Since foldl and foldr take a functions as arguments, this is a bit of a tall order.
I'm going to keep the accepted answer here, but I had the chance to speak to my professor, and his reply was actually the opposite, because I forgot a part of my question. The code in question was building up a list, and he said:
Prefer foldr over foldl when possible, because it saves you a reverse at the end in cases where you're building up a list by appending elements during the fold.
As in, for a trivial example:
- val ls = [1, 2, 3];
val ls = [1,2,3] : int list
- val acc = (fn (x, xs) => x::xs);
val acc = fn : 'a * 'a list -> 'a list
- foldl acc [] ls;
val it = [3,2,1] : int list
- foldr acc [] ls;
val it = [1,2,3] : int list
The O(n) save of a reverse is probably more important than the other differences between foldl and foldr mentioned in answers to this question.
I tried to add two lists with different lengths using this:
let sumList(a,b) = match a,b with
|[],_ -> []
|(x::xs,y::ys)-> (x + y)::diffList(xs,ys)
It returns Unbound value sumList. Is it possible to do this as in Haskell: zipWith(+) a b.
Possibly the actual error is "Unbound value diffList", since you don't define diffList in your code.
If this is a transcription error, then the next problem is that you need to declare sumList as a recursive function: let rec sumList (a, b) = ....
Your pattern match is not exhaustive. It fails if the first list is longer.
The Haskell zipWith is friendlier than the OCaml List.map2, which requires the lists to be the same length. I don't think there's anything so friendly in the OCaml standard library.
Ocaml's standard library contains various modules: List, Map, Nativeint, etc. I know that interfaces for these modules are provided (e.g. for the List module), but I am interested in the algorithms and their implementations used in modules' functions.
Where can I find that?
On your system: /usr/lib/ocaml/list.ml and other .ml files
On the web: https://github.com/ocaml/ocaml/blob/trunk/stdlib/list.ml and other .ml files in https://github.com/ocaml/ocaml/tree/trunk/stdlib
The List implementation is interesting to study. For example, the map function could be implemented like this:
let rec map f = function
| [] -> []
| a::l -> f a :: map f l
but is instead implemented like this:
let rec map f = function
| [] -> []
| a::l -> let r = f a in r :: map f l
What's the difference? Execute this:
List.map print_int [1;2;3] ;;
map print_int [1;2;3] ;;
The first one prints 123, but the second one prints 321! Since the evaluation of f a could produce side effects, it's important to force the correct order. This is what the official map implementation does. Indeed, the evaluation order of arguments is unspecified in OCaml even if all implementations follow the same order.
See also the Optimizing List.map post on the Jane Street blog for considerations on performance (List.map is efficient on small lists).
You can find the definitions in the OCaml source code. For example, implementation of the Map functions is in stdlib/map.ml in the OCaml source distribution.
They should already be installed on your system. Most likely (assuming a Unix system) they are located in /usr/lib/ocaml or /usr/local/lib/ocaml. Just open any of the .ml files.