OCaml precedence - ocaml

I'm not familiar with OCaml, but have been involved in analysing some OCaml code.
This piece of code puzzles me. What is the correct grouping, based on operator precedence?
let new_fmt () =
let b = new_buf () in
let fmt = Format.formatter_of_buffer b in
(fmt,
fun () ->
Format.pp_print_flush fmt ();
let s = Buffer.contents b in
Buffer.reset b;
s
)
There are three operators here: ";", "," and "fun". Based on the reference manual the precedence
order is comma > semicolon > fun, which I believe leads to the following groupings below.
Which one is picked by the OCaml compiler? Or is there another grouping that is the correct one?
grouping 1:
let new_fmt () =
let b = new_buf () in
let fmt = Format.formatter_of_buffer b in
((fmt,
fun () ->
Format.pp_print_flush fmt ());
(let s = Buffer.contents b in
Buffer.reset b;
s)
)
grouping 2:
let new_fmt () =
let b = new_buf () in
let fmt = Format.formatter_of_buffer b in
(fmt,
(fun () ->
Format.pp_print_flush fmt ();
let s = Buffer.contents b in
(Buffer.reset b;
s))
)

For what it's worth, there is another operator used in the code. It's represented by no symbols: the operation of applying a function to a value in OCaml is represented by juxtaposition. This operator has higher precedence than the others.
This code
fun () -> a ; b
parses as
fun () -> (a; b)
not as
(fun () -> a) ; b
It follows because as you say ; has higher precedence than fun (though this terminology is a little suspect).
Similarly
let c = d in e; f
parses as
let c = d in (e; f)
not as
(let c = d in e); f
So, the final expression parses like this:
(fmt,
fun () -> (Format.pp_print_flush fmt ();
let s = Buffer.contents b in
(Buffer.reset b; s))
)

grouping 2 is the correct one.
If you are unsure about how things are parsed, editor helpers may help you (sometimes): ocaml-mode or tuareg-mode (and probably other editor helpers) should give you auto-indentations corresponding with how the code is parsed:
let new_fmt () =
let b = new_buf () in
let fmt = Format.formatter_of_buffer b in
( fmt,
fun () ->
Format.pp_print_flush fmt ();
let s = Buffer.contents b in
Buffer.reset b;
s
)
The identation of let s = ... is below fun () -> which means that that part is within fun () -> .... If it were outside of fun () -> it should be indented differently, in the same level of fun () ->.
Another, very precise but probably over complex way is to examine how the code is parsed directly by ocamlc -dparsetree source.ml.

Related

Evaluation order of let-in expressions with tuples

My old notes on ML say that
let (๐‘ฃโ‚, โ€ฆ , ๐‘ฃโ‚™) = (๐‘กโ‚, โ€ฆ , ๐‘กโ‚™) in ๐‘กโ€ฒ
is a syntactic sugar for
(ฮป ๐‘ฃโ‚™. โ€ฆ (ฮป ๐‘ฃโ‚. ๐‘กโ€ฒ)๐‘กโ‚ โ€ฆ )๐‘กโ‚™
and that
let (๐‘ฃโ‚, ๐‘ฃโ‚‚) = ๐‘ก ๐‘กโ€ฒ in ๐‘กโ€ณ
is equivalent to
let ๐‘ฃ = ๐‘ก ๐‘กโ€ฒ in
let ๐‘ฃโ‚‚ = snd ๐‘ฃ in
let ๐‘ฃโ‚ = fst ๐‘ฃ in
๐‘กโ€ณ
where
each ๐‘ฃ (with or without a subscript) stands for a variable,
each ๐‘ก (with or without a sub- or a superscript) stands for a term, and
fst and snd deliver the first and second component of a pair, respectively.
I'm wondering whether I got the evaluation order right because I didn't note the original reference. Could anyone ((confirm or reject) and (supply a reference))?
It shouldn't matter whether it's:
let ๐‘ฃ = ๐‘ก ๐‘กโ€ฒ in
let ๐‘ฃโ‚‚ = snd ๐‘ฃ in
let ๐‘ฃโ‚ = fst ๐‘ฃ in
๐‘กโ€ณ
Or:
let ๐‘ฃ = ๐‘ก ๐‘กโ€ฒ in
let ๐‘ฃโ‚ = fst ๐‘ฃ in
let ๐‘ฃโ‚‚ = snd ๐‘ฃ in
๐‘กโ€ณ
Since neither fst nor snd have any side-effects. Side-effects may exist in the evaluation of ๐‘ก ๐‘กโ€ฒ but that's done before the let binding takes place.
Additionally, as in:
let (๐‘ฃโ‚, ๐‘ฃโ‚‚) = ๐‘ก ๐‘กโ€ฒ in ๐‘กโ€ณ
Neither ๐‘ฃโ‚ nor ๐‘ฃโ‚‚ is reliant on the value bound to the other to determine its value, so the order in which they're bound is again seemingly irrelevant.
All of that said, there may be an authoritative answer from those with deeper knowledge of the SML standard or the inner workings of OCaml's implementation. I simply am uncertain of how knowing it will provide any practical benefit.
Practical test
As a practical test, running some code where we bind a tuple of multiple expressions with side-effects to observe order of evaluation. In OCaml (5.0.0) the order of evaluation is observed to be right-to-left. We observe tthe same when it comes to evaluating the contents of a list where those expressions have side-effects as well.
# let f () = print_endline "f"; 1 in
let g () = print_endline "g"; 2 in
let h () = print_endline "h"; 3 in
let (a, b, c) = (f (), g (), h ()) in a + b + c;;
h
g
f
- : int = 6
# let f () = print_endline "f"; 1 in
let g () = print_endline "g"; 2 in
let h () = print_endline "h"; 3 in
let (c, b, a) = (h (), g(), f ()) in a + b + c;;
f
g
h
- : int = 6
# let f _ = print_endline "f"; 1 in
let g () = print_endline "g"; 2 in
let h () = print_endline "h"; 3 in
let a () = print_endline "a" in
let b () = print_endline "b" in
let (c, d, e) = (f [a (); b ()], g (), h ()) in
c + d + e;;
h
g
b
a
f
- : int = 6
In SML (SML/NJ v110.99.3) we observe the opposite: left-to-right evaluation of expressions.
- let
= fun f() = (print "f\n"; 1)
= fun g() = (print "g\n"; 2)
= fun h() = (print "h\n"; 3)
= val (a, b, c) = (f(), g(), h())
= in
= a + b + c
= end;
f
g
h
val it = 6 : int
- let
= fun f() = (print "f\n"; 1)
= fun g() = (print "g\n"; 2)
= fun h() = (print "h\n"; 3)
= val (c, b, a) = (h(), g(), f())
= in
= a + b + c
= end;
h
g
f
val it = 6 : int
- let
= fun f _ = (print "f\n"; 1)
= fun g() = (print "g\n"; 2)
= fun h() = (print "h\n"; 3)
= fun a() = print "a\n"
= fun b() = print "b\n"
= val (c, d, e) = (f [a(), b()], g(), h())
= in
= c + d + e
= end;
a
b
f
g
h
val it = 6 : int
Be aware that, in OCaml, due to the (relaxation of the) value restriction, let a = b in c is not equivalent to (fun a -> c)b. A counterexample is
# let id = fun x -> x in id 5, id 'a';;
- : int * char = (5, 'a')
# (fun id -> id 5, id 'a')(fun x -> x)
Error: This expression has type char but an expression was expected of type int
#
This means that they are semantically not the same construction (the let ... = ... in ... is strictly more general that the other).
This happens because, in general, the type system of OCaml doesn't allow types of the form (โˆ€ฮฑ.ฮฑโ†’ฮฑ) โ†’ int * char (because allowing them would make typing undecidable, which is not very practical), which would be the type of fun id -> id 5, id 'a'. Instead, it resorts to having the less general type โˆ€ฮฑ.(ฮฑโ†’ฮฑ) โ†’ ฮฑ * ฮฑ, which doesn't make it typecheck, because you can't unify both ฮฑ with char and with int.

How to use ocaml-re

I am currently trying to use ocaml-re. Documentation is sparse. I was wondering how I would, for instance, do the equivalent:
Str.regexp "example \\([A-Za-z]+\\)" using Re.Perl? I think it would help me to naturally get the rest of the documentation on my own. Thank you!
Bonus points if you convert this code from Str to Re.Perl:
let read_filename = "example.ts"
let filename = "example2.ts"
let () =
CCIO.(
let modify_file ~chunks =
let r = Str.regexp "example \\([A-Za-z]+\\)" in
match chunks () with
None -> chunks (* is the same as (fun () -> None) *)
| Some chunks ->
let test_chunks = Str.replace_first r "\\1" chunks in (* compute once *)
(fun () -> Some test_chunks) in
with_in read_filename
(fun ic ->
let chunks = read_chunks ic in
let new_chunks = modify_file ~chunks in
with_out ~flags:[Open_binary] ~mode:0o644 filename
(fun oc ->
write_gen oc new_chunks
)
)
)
Don't use Re.Perl, Re's API is much simpler. You can constructor your re with:
let re =
let open Re in
alt [rg 'A' 'Z'; rg 'a' 'z'] (* [A-Za-z] *)
|> rep1a (* [A-Za-z]+ *)
|> group (* ([A-Za-z]+) *)
|> compile

Prefix_action, suffix_action with sequences

I want to write a function prefix_action with seq (resp suffix_action), here is the code in BatEnum :
let prefix_action f t =
let full_action e =
e.count <- (fun () -> t.count());
e.next <- (fun () -> t.next ());
e.clone <- (fun () -> t.clone());
f ()
in
let rec t' =
{
count = (fun () -> full_action t'; t.count() );
next = (fun () -> full_action t'; t.next() );
clone = (fun () -> full_action t'; t.clone() );
fast = t.fast
} in t'
I want to know as we don't have clone in sequences, i want to know how i should considerate clone in these case (is it a use of the sequence) and if that's the case how can we have the number of times that the sequence is used?
Prefix_action Documentation
The sequence as it is defined don't have clone function just because it is "defined by default".
type 'a node =
| Nil
| Cons of 'a * 'a t
and 'a t = unit -> 'a node
As you can see it's just a function returning some sum type, simple value if you wish, there is no side effects (in fact they can be hiden in the body of the function, but for now let me trick you). Thus the clone function in this case is just an identity:
let clone s = s
Now if you look at the definition of enumeration you will notice little mutable keyword:
type 'a t = {
mutable count : unit -> int;
mutable next : unit -> 'a;
mutable clone : unit -> 'a t;
mutable fast : bool;
}
If we try to use same clone as for sequences, we will notice that the changes of one copy will affect the other:
# let e1 = { fast = true; (* ... *) };;
val e1 : 'a t = {fast = true; (* ... *)}
# let e2 = clone e1;;
val e2 : 'a t = {fast = true; (* ... *)}
# e1.fast <- false;;
- : unit = ()
# e2;;
'a t = {fast = false; (* ... *)}
That's why they need clone function.
So now you can implement your functions, for example prefix_action.
prefix_action f e will behave as e but guarantees that f () will be
invoked exactly once before the current first element of e is read.
The problem is in this "exactly once". I'm not sure what does it means, but let say that this means that if you pass sequence to prefix_action f and then two times to hd, then f will be executed only once (because if it means something different it's not interesting). And now we can return to this "side effects" story. Clearly, we can't implement prefix_action without them. The type of sequence doesn't contain any mutable keyword, but it contains functions! Hence, we can wrap our side effect into the function.
let prefix_action : (unit -> unit) -> 'a t -> 'a t = fun f s ->
let b = ref true in
fun () -> (if !b then f (); b := false); s ()
But now, as we have side effects, we need redefine clone. From the specification of prefix_action:
If prefix_action f e is cloned, f is invoked only once, during the
cloning.
Hence our clone:
let clone s = let _ = s (); s

Write pretty multilevel nested if-then-else code in OCaml?

In OCaml, if I have to write a function using many if-then-else, below is my stupid and ugly solution.
let foo () =
let a1 = ... in
if (a1) then
result1
else
let a2 = ... in
if (a2) then
result2
else
let a3 = ... in
if (a3) then
result3
else
let a4 = ... in
if (a4) then
result4
else
result5.
How to beautify the code above? I like C/C++ & Java style which use "return" to save indentation of next if-statement.
Can I do the same thing with OCaml?
int foo () = {
bool a1 = ...;
if (a1)
return result1;
bool a2 = ...;
if (a2)
return result2;
bool a3 = ...;
if (a3)
return result3;
bool a4 = ...;
if (a4)
return result4;
return result5;
}
There is no return statement in OCaml, though you can emulate one with the help of exceptions:
exception Ret of t
let my_ret x = raise (Ret x)
let foo () =
try
let a1 = ... in
if a1 then my_ret result1;
let a2 = ... in
if a2 then my_ret result2;
...
with Ret x -> x
Another helpful solution would be to use lazy evaluation:
let foo () =
let a1 = lazy ...
and a2 = lazy ...
and a3 = lazy ...
in
match a1, a2, a3 with
| lazy true, _, _ -> result1
| _, lazy true, _ -> result2
| _, _, lazy true -> result3
| _, _, _ -> result4
This is one of the examples using lazy, there probably are more concise way of expressing your calculation.
Core library provides a with_return function, that allows you to do a non-local exists from function:
open Core_kernel.Std
let foo () = with_return (fun goto ->
if a1 then goto.return 1;
if a2 then goto.return 2;
if a3 then goto.return 3;
if a4 then goto.return 4;
if a5 then goto.return 5;
return 6)
But generally it is better to use pattern-matching or to rethink your code. For example, if you have a list of predicates, and depending on what predicate is true you want to return a value, that means that you can encode this as a search in some mapping structure:
let foo () = [
clause1, expr1;
clause2, expr2;
clause3, expr3;
] |> List.Assoc.find true
|> Option.value ~default:expr4
Of course in this case you do not have short-circuit evaluation. You can fix this with lazy evaluation or with thunks. But unless your computations is really heavy or produce side-effects, its not worth it.
The if syntactic constructs indeed don't work the same way in C and OCaml. In C, if syntax forms are statements, in OCaml they are expressions. The closest you get in C to OCaml if is the ?: ternary operator. If you try to rewrite your C code using this operator instead of if, you will face the same challenge. That doesn't mean it's impossible however, as other answers give you solutions.
The simplest one, which works in both languages, is to cut your function body in several sub functions (*), and use continuations:
let rec foo () =
let a1 = โ€ฆ (* computation *) in
if a1
then result1
else foo2 ()
and foo2 () =
let a2 = โ€ฆ in
if a2
then result1
else foo3 ()
and foo3 () = โ€ฆ (* etc *)
It may still be a little cumbersome when writing object methods, but you can always use inner functions to regain "indentation balance" within the method scope.
Also note that the rec keyword is there for the sole purpose of allowing each continuation to follow its caller in the source layout, there's no real recursion here.
(*): #gsg also mentioned it in the comments.
Unlike if expressions, match clauses extend to the end of the function even if they contain multiple statements, without needing brackets. So you can do:
let foo () =
match ... with
| true -> result1
| false ->
match ... with
| true -> result2
| false ->
match ... with
| true -> result3
| false ->
match ... with
| true -> result4
| false -> result5
You didn't show where result1 comes from in your example, so I can't be sure, but you might find it's better to have the ... return an option with the result rather than a bool, e.g.
let foo () =
match ... with
| Some result1 -> result1
| None ->
...

How to implement the Russian doll pattern in Ocaml?

In Javascript there is a pattern called the Russian doll pattern (this may also be called a 'one-shot'). Basically, it's a function that replaces itself with another at some point.
Simple example:
var func = function(){
func = function(){ console.log("subsequent calls call this...");};
console.log("first call");
}
So the first time you call func it'll output "first call" and the next (and subsequent times) it's print "subsequent calls call this...". (this would be easy to do in Scheme as well, for example)
I've been puzzling on how to do this in Ocaml?
Edit: one solution I've come up with:
let rec func = ref( fun () -> func := ( fun () -> Printf.printf("subsequent..\n"));Printf.printf("First..\n"));;
Called as:
!func () ;;
Interestingly, if I do not include the 'rec' in the definition, it never calls the subsequent function... It always prints 'First...'.
yzzlr answer is very good, but two remarks:
It forces the input of the functions to be of type unit. You can use a polymorphic version:
let doll f1 f2 =
let rec f = ref (fun x -> f := f2; f1 x) in
(fun x -> !f x);;
You can do without the hairy recursion:
let doll f1 f2 =
let f = ref f1 in
f := (fun x -> f := f2; f1 x);
(fun x -> !f x);;
(Replacing recursion with mutation is a common trick; it can actually be used to define fixpoints without using "rec")
It's pretty straightforward, but you need to use side-effects. Here's a function that takes two thunks as arguments, and returns a new thunk that calls the first thunk the first time, and the second thunk every other time.
let doll f1 f2 =
let f = ref f1 in
(fun () ->
let g = !f in
f := f2;
g ())
This isn't quite optimal, because we'll keep on overwriting the ref with the same value over and over.
Here's a slightly better version, which uses a recursive definition.
let doll f1 f2 =
let rec f = ref (fun () -> f := f2;f1 ()) in
(fun () -> !f ())
So, now, you'll get this:
# let f = doll (fun () -> 1) (fun () -> 2);;
val f : unit -> int = <fun>
# f ();;
- : int = 1
# f ();;
- : int = 2