__ in Ocaml extracted from Coq - ocaml

Ocaml code extracted from Coq includes (in some cases) a type __ and a function __ defined as follows:
type __ = Obj.t
let __ = let rec f _ = Obj.repr f in Obj.repr f
The documentation says that in the past, such type was defined as unit (and thus __ could be taken as ()) but there exist (rare) cases where a value of type __ is applied to a value of type __.
__ uses undocumented functions of the Obj module from OCaml, but it seems that what is defined is essentially a totally polymorphic function that eats all its arguments (whatever their number).
Is there some documentation regarding the cases where __ cannot be eliminated and values of this type are applied to values of the same type, both from a theoretical (construct Coq terms where elimination is impossible) and from a practical (show a realistic case where this occurs) point of view?

The references cited in the README give a nice overview of the erasure problem. Specifically, both this report and this article exaplain in detail how the type schemes and logical parts of CIC terms are erased, and why one must have __ x = __. The problem is not exactly that __ may be applied to itself, but that it may be applied to anything at all.
Unfortunately, it is not at all clear if having this behavior is important in any non-pathological case. The motivation given there is to be able to extract any Coq term, and the documents do not mention any cases that are really interesting from a practical standpoint. The example given on 3 is this one:
Definition foo (X : Type) (f : nat -> X) (g : X -> nat) := g (f 0).
Definition bar := foo True (fun _ => I).
Executing Recursive Extraction bar. gives the following result:
type __ = Obj.t
let __ = let rec f _ = Obj.repr f in Obj.repr f
type nat =
| O
| S of nat
(** val foo : (nat -> 'a1) -> ('a1 -> nat) -> nat **)
let foo f g =
g (f O)
(** val bar : (__ -> nat) -> nat **)
let bar =
foo (Obj.magic __)
Since foo is polymorphic on Type, there is no way of simplifying the f O application on its body, because it could have computational content. However, since Prop is a subtype of Type, foo can also be applied to True, which is what happens in bar. When we try to reduce bar, thus, we will have __ being applied to O.
This particular case is not very interesting, because it would be possible to fully inline foo:
let bar g =
g __
Since True can't be applied to anything, if g corresponds to any legal Coq term, its __ argument also wouldn't be applied to anything, and therefore it would be safe to have __ = () (I believe). However, there are cases where it is not possible to know in advance whether an erased term can be further applied or not, which makes the general definition for __ necessary. Check out for instance the Fun example here, near the end of the file.

Related

Subtyping for Yojson element in a yojson list

I meet an error about subtyping.
For this code, List.map (fun ((String goal_feat):> Basic.t) -> goal_feat) (goal_feats_json:> Basic.t list).
I meet the following error in vscode:
This expression cannot be coerced to type
Yojson.Basic.t =
[ Assoc of (string * Yojson.Basic.t) list
| Bool of bool
| Float of float
| Int of int
| List of Yojson.Basic.t list
| Null
| String of string ];
it has type [< String of 'a ] -> 'b but is here used with type
[< Yojson.Basic.t ].
While compiling, I meet the following error.
Error: Syntax error: ')' expected.
If I change the code to List.map (fun ((String goal_feat): Basic.t) -> goal_feat) (goal_feats_json:> Basic.t list), which useq explicit type cast instead of subtyping, then the error disappeared. I can not understand what is the problem with my code when i use subtyping. Much appreciation to anyone who could give me some help.
First of all, most likely the answer that you're looking for is
let to_strings xs =
List.map (function `String x -> x | _ -> assert false) (xs :> t list)
The compiler is telling you that your function is handling only one case and you're passing it a list that may contain many other things, so there is a possibility for runtime error. So it is better to indicate to the compiler that you know that only the variants tagged with String are expected. This is what we did in the example above. Now our function has type [> Yojson.Basic.t].
Now back to your direct question. The syntax for coercion is (expr : typeexpr), however in the fun ((String goal_feat):> Basic.t) -> goal_feat snippet, String goal_feat is a pattern, and you cannot coerce a pattern, so we shall use parenthesized pattern here it to give it the right, more general, type1, e.g.,
let exp xs =
List.map (fun (`String x : t) -> x ) (xs :> t list)
This will tell the compiler that the parameter of your function shall belong to a wider type and immediately turn the error into warning 8,
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
(`Bool _|`Null|`Assoc _|`List _|`Float _|`Int _)
which says what I was saying in the first part of the post. It is usually a bad idea to leave warning 8 unattended, so I would suggest you to use the first solution, or, otherwise, find a way to prove to the compiler that your list doesn't have any other variants, e.g., you can use List.filter_map for that:
let collect_strings : t list -> [`String of string] list = fun xs ->
List.filter_map (function
| `String s -> Some (`String s)
| _ -> None) xs
And a more natural solution would be to return untagged strings (unless you really need the to be tagged, e.g., when you need to pass this list to a function that is polymorphic over [> t] (Besides, I am using t for Yojson.Basic.t to make the post shorter, but you should use the right name in your code). So here is the solution that will extract strings and make everyone happy (it will throw away values with other tags),
let collect_strings : t list -> string list = fun xs ->
List.filter_map (function
| `String s -> Some s
| _ -> None) xs
Note, that there is no need for type annotations here, and we can easily remove them to get the most general polymoprhic type:
let collect_strings xs =
List.filter_map (function
| `String s -> Some s
| _ -> None) xs
It will get the type
[> `String a] list -> 'a list
which means, a list of polymorphic variants with any tags, returning a list of objects that were tagged with the String tag.
1)It is not a limitation that coercion doesn't work on patterns, moreover it wouldn't make any sense to coerce a pattern. The coercion takes an expression with an existing type and upcasts (weakens) it to a supertype. A function parameter is not an expression, so there is nothing here to coerce. You can just annotate it with the type, e.g., fun (x : #t) -> x will say that our function expects values of type [< t] which is less general than the unannotated type 'a. To summarize, coercion is needed when you have a function that accepts an value that have a object or polymorphic variant type, and in you would like at some expressions to use it with a weakened (upcasted type) for example
type a = [`A]
type b = [`B]
type t = [a | b]
let f : t -> unit = fun _ -> ()
let example : a -> unit = fun x -> f (x :> t)
Here we have type t with two subtypes a and b. Our function f is accepting the base type t, but example is specific to a. In order to be able to use f on an object of type a we need an explicit type coercion to weaken (we lose the type information here) its type to t. Notice that, we do not change the type of x per se, so the following example still type checks:
let rec example : a -> unit = fun x -> f (x :> t); example x
I.e., we weakened the type of the argument to f but the variable x is still having the stronger type a, so we can still use it as a value of type a.

Declaring type of function in SML

I'm new to ML, but in other languages that use type inference, I have learned the habit of omitting the type of a thing whenever the inference on the right hand side is obvious to a human reader, and explicitly declaring the type of a thing whenever the inference is not obvious to a human. I like this convention, and would like to continue with it in my ML code.
I have the following example function declarations, that are equivalent:
fun hasFour [] = false
| hasFour (x::xs) = (x = 4) orelse hasFour xs
is equivalent to
val rec hasFour: int list -> bool =
fn [] => false
| (x::xs) => (x = 4) orelse hasFour xs
I like the latter form not only because it's easier for me to figure out what type the function is when I read it, but also because it explicitly declares the type of the function, and hence, if I screw something up in my implementation, there is no chance of accidentally declaring something that's syntactically valid but the wrong type, which is harder to debug later.
My question is: I want to use fun instead of val rec, because anonymous fn's can only take one parameter. So the latter technique is not syntactically valid for a function like int -> int -> bool. Is there a way to explicitly declare the type in fun? Or have I named all the preferred alternatives in this post, and should simply follow one of these patterns? The only way I could find to use fun with explicit type declaration was to add a type declaration to each parameter in the pattern, which is quite ugly and horrible, like so:
fun hasFour ([]:int list):bool = false
| hasFour (x::xs) = (x = 4) orelse hasFour xs
A colleague showed me some code following a pattern like this:
fun hasFour [] = false
| hasFour (x::xs) = (x = 4) orelse hasFour xs
val _ = op hasFour: int list -> bool
By declaring an unnamed variable and setting it to an instance of the function with a forced type, we effectively achieve the desired result, but the val _ must appear below the fully defined function, where it's less obvious to a human reader, unless you simply get used to this pattern and learn to expect it.
I asked a very similar question, Can I annotate the complete type of a fun declaration?, recently.
Your current solution would have been a nice answer to that.
You can have multiple curried arguments with multiple fn, e.g. like:
val rec member : ''a -> ''a list -> bool =
fn x => fn [] => false
| y::ys => x = y orelse member x ys
Or you can do as you currently do, or as matt suggests:
local
fun member _ [] = false
| member x (y::ys) = x = y orelse member x ys
in
val member = member : ''a -> ''a list -> bool
end
But the combination of using fun and having the complete type signature listed first is yet elusive.
For production-like code, the norm is to collect type signatures in a module signature. See ML for the Working Programmer, ch. 7: Signatures and abstraction, pp. 267-268. Although I imagine you'd want to use Ocaml then.

When should extensible variant types be used in OCaml?

I took a course on OCaml before extensible variant types were introduced, and I don't know much about them. I have several questions:
(This question was deleted because it attracted a "not answerable objectively" close vote.)
What are the low-level consequences of using EVTs, such as performance, memory representation, and (un-)marshaling?
Note that my question is about extensible variant type specifically, unlike the question suggested as identical to this one (that question was asked prior to the introduction of EVTs!).
Extensible variants are quite different from standard variants in term of
runtime behavior.
In particular, extension constructors are runtime values that lives inside
the module where they were defined. For instance, in
type t = ..
module M = struct
type t +=A
end
open M
the second line define a new extension constructor value A and add it to the
existing extension constructors of M at runtime.
Contrarily, classical variants do not really exist at runtime.
It is possible to observe this difference by noticing that I can use
a mli-only compilation unit for classical variants:
(* classical.mli *)
type t = A
(* main.ml *)
let x = Classical.A
and then compile main.ml with
ocamlopt classical.mli main.ml
without troubles because there are no value involved in the Classical module.
Contrarily with extensible variants, this is not possible. If I have
(* ext.mli *)
type t = ..
type t+=A
(* main.ml *)
let x = Ext.A
then the command
ocamlopt ext.mli main.ml
fails with
Error: Required module `Ext' is unavailable
because the runtime value for the extension constructor Ext.A is missing.
You can also peek at both the name and the id of the extension constructor
using the Obj module to see those values
let a = [%extension_constructor A]
Obj.extension_name a;;
: string = "M.A"
Obj.extension_id a;;
: int = 144
(This id is quite brittle and its value it not particurlarly meaningful.)
An important point is that extension constructor are distinguished using their
memory location. Consequently, constructors with n arguments are implemented
as block with n+1 arguments where the first hidden argument is the extension
constructor:
type t += B of int
let x = B 0;;
Here, x contains two fields, and not one:
Obj.size (Obj.repr x);;
: int = 2
And the first field is the extension constructor B:
Obj.field (Obj.repr x) 0 == Obj.repr [%extension_constructor B];;
: bool = true
The previous statement also works for n=0: extensible variants are never
represented as a tagged integer, contrarily to classical variants.
Since marshalling does not preserve physical equality, it means that extensible
sum type cannot be marshalled without losing their identity. For instance, doing
a round trip with
let round_trip (x:'a):'a = Marshall.from_string (Marshall.to_string x []) 0
then testing the result with
type t += C
let is_c = function
| C -> true
| _ -> false
leads to a failure:
is_c (round_trip C)
: bool = false
because the round-trip allocated a new block when reading the marshalled value
This is the same problem which already existed with exceptions, since exceptions
are extensible variants.
This also means that pattern-matching on extensible type is quite different
at runtime. For instance, if I define a simple variant
type s = A of int | B of int
and define a function f as
let f = function
| A n | B n -> n
the compiler is smart enough to optimize this function to simply accessing the
the first field of the argument.
You can check with ocamlc -dlambda that the function above is represented in
the Lambda intermediary representation as:
(function param/1008 (field 0 param/1008)))
However, with extensible variants, not only we need a default pattern
type e = ..
type e += A of n | B of n
let g = function
| A n | B n -> n
| _ -> 0
but we also need to compare the argument with each extension constructor in the
match leading to a more complex lambda IR for the match
(function param/1009
(catch
(if (== (field 0 param/1009) A/1003) (exit 1 (field 1 param/1009))
(if (== (field 0 param/1009) B/1004) (exit 1 (field 1 param/1009))
0))
with (1 n/1007) n/1007)))
Finally, to conclude with an actual example of extensible variants,
in OCaml 4.08, the Format module replaced its string-based user-defined tags
with extensible variants.
This means that defining new tags looks like this:
First, we start with the actual definition of the new tags
type t = Format.stag = ..
type Format.stag += Warning | Error
Then the translation functions for those new tags are
let mark_open_stag tag =
match tag with
| Error -> "\x1b[31m" (* aka print the content of the tag in red *)
| Warning -> "\x1b[35m" (* ... in purple *)
| _ -> ""
let mark_close_stag _tag =
"\x1b[0m" (*reset *)
Installing the new tag is then done with
let enable ppf =
Format.pp_set_tags ppf true;
Format.pp_set_mark_tags ppf true;
Format.pp_set_formatter_stag_functions ppf
{ (Format.pp_get_formatter_stag_functions ppf ()) with
mark_open_stag; mark_close_stag }
With some helper function, printing with those new tags can be done with
Format.printf "This message is %a.#." error "important"
Format.printf "This one %a.#." warning "not so much"
Compared with string tags, there are few advantages:
less room for a spelling mistake
no need to serialize/deserialize potentially complex data
no mix up between different extension constructor with the same name.
chaining multiple user-defined mark_open_stag function is thus safe:
each function can only recognise their own extension constructors.

What is <cycle> in data?

(I use OCaml version 4.02.3)
I defined a type self
# type self = Self of self;;
type self = Self of self
and its instance s
# let rec s = Self s;;
val s : self = Self <cycle>
Since OCaml is a strict language, I expected defining s will fall into infinite recursion. But the interpreter said s has a value and it is Self <cycle>.
I also applied a function to s.
# let f (s: self) = 1;;
val f : self -> int = <fun>
# f s;;
- : int = 1
It seems s is not evaluated before the function application (like in non-strict language).
How OCaml deal with cyclic data like s? Self <cycle> is a normal form?
OCaml is indeed an eager language, however s is a perfectly valid and fully evaluated term that happens to contain a cycle. For instance, this code yields the expected result:
let f (Self Self x) = x
f s == s;;
More precisely, the memory representation of constructors with at n arguments are boxed and read like this:
⋅—————————————————————————————————————————————⋅
| header | field[0] | field[1] | ⋯ | fiekd[n] |
⋅—————————————————————————————————————————————⋅
The header contains metadata whereas field[k] is an OCaml value, i.e. either an integer or a pointer. In the case of s, Self has only one argument, and thus only one field field[0]. The value of field[0] is then simply a pointer towards the start of the block. The term s is thus perfectly representable in OCaml.
Moreover, the toplevel printer is able to detect this kind of cycles and print an <cycle> to avoid falling into an infinite recursion when printing the value of s. Here, <cycle>, like <abstr> or <fun>, represents just a kind of value that the toplevel printer cannot print.
Note, however, that cyclical value will trigger infinite recursion in many situations, for instance f s = s where (=) is the structural equality
and not the physical one (i.e. (==)) triggers such recursion, another example would be
let rec ones = 1 :: ones;; (* prints [1;<cycle>] *)
let twos = List.map ((+) 1) ones;; (* falls in an infinite recursion *)

Look for a trick to iter on ocaml type constructors

I have an ocaml type :
type t = A | B | ...
and a function to print things about that type :
let pp_t fmt x = match x with
| A -> Format.fprintf fmt "some nice explanations about A"
| B -> Format.fprintf fmt "some nice explanations about B"
| ...
How could I write a function to print all the explanations ? Something equivalent to :
let pp_all_t fmt =
Format.fprintf fmt A;
Format.fprintf fmt B;
...
but that would warn me if I forget to add a new constructor.
It would be even better to have something that automatically build that function,
because my problem is that t is quiet big and changes a lot.
I can't imagine how I can "iterate" on the type constructors, but maybe there is a trick...
EDIT: What I finally did is :
type t = A | B | ... | Z
let first_t = A
let next_t = function A -> B | B -> C | ... | Z -> raise Not_found
let pp_all_t fmt =
let rec pp x = pp_t fmt x ; try let x = next_t x in pp x with Not_found -> ()
in pp first_t
so when I update t, the compiler warns me that I have to update pp_t and next_t, and pp_all_t doesn't have to change.
Thanks to you all for the advices.
To solve your problem for a complicated and evolving type, in practice I would probably write an OCaml program that generates the code from a file containing a list of the values and the associated information.
However, if you had a function incr_t : t -> t that incremented a value of type t, and if you let the first and last values of t stay fixed, you could write the following:
let pp_all_t fmt =
let rec loop v =
pp_t fmt v;
if v < Last_t then loop (incr_t v)
in
loop First_t
You can't have a general polymorphic incr_t in OCaml, because it only makes sense for types whose constructors are nullary (take no values). But you can write your own incr_t for any given type.
This kind of thing is handled quite nicely in Haskell. Basically, the compiler will write some number of functions for you when the definitions are pretty obvious. There is a similar project for OCaml called deriving. I've never used it, but it does seem to handle the problem of enumerating values.
Since you say you want a "trick", if you don't mind using the unsafe part of OCaml (which I personally do mind), you can write incr_t as follows:
let incr_t (v: t) : t =
(* Please don't use this trick in real code :-) ! See discussion below.
*)
if t < Last_t then
Obj.magic (Obj.magic v + 1)
else
failwith "incr_t: argument out of range"
I try to avoid this kind of code if at all possible, it's too dangerous. For example, it will produce nonsense values if the type t gets constructors that take values. Really it's "an accident waiting to happen".
One needs some form of metaprogramming for such tasks. E.g. you could explore deriving to generate incr_t from the Jeffrey's answer.
Here is a sample code for the similar task : https://stackoverflow.com/a/1781918/118799
The simplest thing you can do is to define a list of all the constructors:
let constructors_t = [A; B; ...]
let pp_all_t = List.iter pp_t constructors_t
This is a one-liner, simple to do. Granted, it's slightly redundant (which gray or dark magic would avoid), but it's still probably the best way to go in term of "does what I want" / "has painful side effects" ratio.