Adding parametrically a `Null constructor to polymorphic variants - ocaml

The following type declarations do not work:
type 'a or_null = [ 'a | `Null ]
and
type 'a or_null = [ 'a | `Null ] constraint 'a = [> `A | `B ]
With the message:
Error: The type 'a does not expand to a polymorphic variant type
Hint: Did you mean `a
I would like to achieve this without using another layer in the memory representation (and in the syntax). In particular, I want to avoid using an option type such as
type 'a or_null = | A of 'a | Null
Is there a way to have such a type using only polymorphic variants? The final goal would be to write e.g. monads on 'a or_null types. (And this is actually the tricky part.)

Polymorphic variants cannot track the absence of a specific constructor. This implies that we cannot really write the usual bind. If we try
let bind x f =
match x with
| `Null -> `Null
| x -> f x
we get
val bind: ([> `Null] as 'a) -> ('a -> ([>`Null] as 'b)) -> 'b
If for readability's sake, we add the following type abbreviation
type 'a m = [> `Null] as 'a
(which is an alternative definition of or_null) the previous type read as
val bind: 'a m -> ('a m -> 'b m) -> 'b m
In other words, the function argument f of bind must already handle the `Null case in its argument by itself because the type system cannot express the constraint x <> `Null in the second branch of the match.

Related

Combining parametric polymorphism and polymorphic variants (backtick types)

Suppose I have a type consisting of multiple polymorphic variants (covariantly) such as the following:
[> `Ok of int | `Error of string]
Let's further suppose that I want to factor this definition into some kind of type constructor and a concrete type int. My first attempt was the following:
type 'a error = [> `Ok of 'a | `Error of string]
However, using a definition like this produces a really strange type error mentioning a type variable 'b that doesn't appear anywhere in the definition.
$ ocaml
OCaml version 4.07.0
# type 'a error = [> `Ok of 'a | `Error of string ];;
Error: A type variable is unbound in this type declaration.
In type [> `Error of string | `Ok of 'a ] as 'b the variable 'b is unbound
This 'b is an autogenerated name, adding an explicit 'b shifts the variable to 'c.
$ ocaml
OCaml version 4.07.0
# type ('a, 'b) error = [> `Ok of 'a | `Error of 'b ];;
Error: A type variable is unbound in this type declaration.
In type [> `Error of 'b | `Ok of 'a ] as 'c the variable 'c is unbound
Using the invariant construction [ `Thing1 of type1 | `Thing2 of type 2 ] appears to work fine in this context.
$ ocaml
OCaml version 4.07.0
# type 'a error = [ `Ok of 'a | `Error of string ] ;;
type 'a error = [ `Error of string | `Ok of 'a ]
#
However, explicitly marking the type parameter as covariant does not salvage the original example.
$ ocaml
OCaml version 4.07.0
# type +'a error = [> `Ok of 'a | `Error of string];;
Error: A type variable is unbound in this type declaration.
In type [> `Error of string | `Ok of 'a ] as 'b the variable 'b is unbound
And, just for good measure, adding a contravariance annotation also does not work.
$ ocaml
OCaml version 4.07.0
# type -'a error = [> `Ok of 'a | `Error of string];;
Error: A type variable is unbound in this type declaration.
In type [> `Error of string | `Ok of 'a ] as 'b the variable 'b is unbound
Attempting to guess the name that the compiler will use for the unbound type variable and adding it as a parameter on the left also does not work and produces a very bizarre error message.
$ ocaml
OCaml version 4.07.0
# type ('a, 'b) string = [> `Ok of 'a | `Error of string] ;;
Error: The type constructor string expects 2 argument(s),
but is here applied to 0 argument(s)
Is there a way of making a type constructor that can effectively "substitute different types" for int in [> `Ok of int | `Error of string]?
This isn't an issue of variance, or parametric polymorphism, but of row polymorphism. When you add > or < it also adds an implicit type variable, the row variable, that will hold the "full" type. You can see this type variable made explicit in the error:
[> `Error of string | `Ok of 'a ] as 'b
Note the as 'b part at the end.
In order to alias the type you have to make the type variable explicit, so you can reference it as a type parameter on the alias:
type ('a, 'r) error = [> `Ok of 'a | `Error of string ] as 'r
Note also, in case you have or when you will, run into objects, that this applies there as well. An object type with .. has an implicit type variable that you need to make explicit in order to alias it:
type 'r obj = < foo: int; .. > as 'r

How to define a higher-order function which applies a polymorphic function to a specific type

If I define
fun id x = x
Then naturally id has type 'a -> 'a
Of course, id 0 evaluates to 0, which makes perfect sense.
Since this makes perfect sense, I should be able to encapsulate it by a function:
fun applyToZero (f: 'a -> 'a) = f 0
With the hope that applyToZero will have type ('a -> 'a) -> int and applyToZero id will evaluate to 0
But when I try to define applyToZero as above, SML/NJ gives an odd error message which begins:
unexpected exception (bug?) in SML/NJ: Match [nonexhaustive match failure]
raised at: ../compiler/Elaborator/types/unify.sml:84.37
This almost looks like a bug in the compiler itself. Weird, but possible.
But PolyML doesn't like it either (though its error message is less odd):
> fun applyToZero (f: 'a -> 'a) = f 0;
poly: : error: Type error in function application.
Function: f : 'a -> 'a
Argument: 0 : int
Reason: Can't unify int to 'a (Cannot unify with explicit type variable)
Found near f 0
The following does work:
fun ignoreF (f: 'a -> 'a) = 1
with the inferred type ('a -> 'a) -> int. This shows that it isn't impossible to create a higher order function of this type.
Why doesn't SML accept my definition of applyToZero? Is there any workaround that will allow me to define it so that its type is ('a -> 'a) -> int?
Motivation: in my attempt to solve the puzzle in this question, I was able to define a function tofun of type int -> 'a -> 'a and another function fromfun with the desired property that fromfun (tofun n) = n for all integers n. However, the inferred type of my working fromfun is ('int -> 'int) -> 'int). All of my attempts to add type annotations so that SML will accept it as ('a -> 'a) -> int have failed. I don't want to show my definition of fromfun since the person that asked that question might still be working on that puzzle, but the definition of applyToZero triggers exactly the same error messages.
It can't be done in plain Hindley-Milner, like used by SML, because it does not support so-called higher-ranked or first-class polymorphism. The type annotation 'a -> 'a and the type ('a -> 'a) -> int do not mean what you think they do.
That becomes clearer if we make the binder for the type variable explicit.
fun ignoreF (f: 'a -> 'a) = 1
actually means
fun 'a ignoreF (f: 'a -> 'a) = 1
that is, 'a is a parameter to the whole function ignoreF, not to its argument f. Consequently, the type of the function is
ignoreF : ∀ 'a. (('a -> 'a) -> int)
Here, I make the binder for 'a explicit in the type as a universal quantifier. That's how you write such types in type theory, while SML keeps all quantifiers implicit in its syntax. Now the type you thought this had would be written
ignoreF : (∀ 'a. ('a -> 'a)) -> int
Note the difference: in the first version, the caller of ignoreF gets to choose how 'a is instantiated, hence it could be anything, and the function cannot assume its int (which is why applyToZero does not type-check). In the second type, the caller of the argument gets to choose, i.e., ignoreF.
But such a type is not supported by Hindley-Milner. It only supports so-called prenex polymorphism (or rank 0 polymorphism) where all the ∀ are on the outermost level -- which is why it can keep them implicit, since there is no ambiguity under this restriction. The problem with higher-ranked polymorphism is that type inference for it is undecidable.
So your applyToZero cannot have the type you want in SML. The only way to achieve something like it is by using the module system and its functors:
functor ApplyToZero (val f : 'a -> 'a) = struct val it = f 0 end
Btw, the error message you quote from SML/NJ cannot possibly be caused by the code you showed. You must have done something else.
If we use Hindley-Milner type inference algorithm on fun applyToZero f = f 0 we are going to get f : int -> 'a because of the term f 0.
Obviously, f is a function f : 'b -> 'a. We apply this function to 0, thus 'b = int. Hence, the explicit type annotation f : 'a -> 'a produces the error you observe.
By the way, SML/NJ v110.80 works fine on my machine and prints the following error message:
stdIn:2.39-2.42 Error: operator and operand don't agree [overload - user bound tyvar]
operator domain: 'a
operand: [int ty]
in expression:
f 0

Function signature as type in OCaml

Is there a way to declare something like
type do = ('a -> 'b)
in OCaml? Specifically, to declare a function signature as a type
For free types 'a and 'b,'a -> 'b is not the type of any well behaved OCaml function, because it requires the function to produce a value of an arbitrary type.
So, you can't give a name to a type with unbound parameters:
# type uabfun = 'a -> 'b
Error: Unbound type parameter 'a
If you use specific types, there's no problem giving it a name:
# type iifun = int -> int;;
type iifun = int -> int
If type types 'a and 'b are parameters (rather than being free), there is also no problem:
# type ('a, 'b) abfun = 'a -> 'b;;
type ('a, 'b) abfun = 'a -> 'b

What can be done with the "constraint" keyword in OCaml

The OCaml manual describes the "constraint" keyword, which can be used in a type definition. However, I cannot figure out any usage that can be done with this keyword. When is this keyword is useful? Can it be used to remove polymorphic type variables? (so that a type 'a t in a module becomes just t and the module can be used in a functor argument which requires t with no variables.)
So, the constraint keywords, used in type or class definitions, let one "reduce the scope” of applicable types to a type parameter, so to speak. The documentation clearly announce that type expressions from both sides of the constraint equation will be unified to "refine" the types the constraint relates to. Because they are type expressions, you may use all the usual type level operators.
Examples:
# type 'a t = int * 'a constraint 'a * int = float * int;;
type 'a t = int * 'a constraint 'a = float
# type ('a,'b) t = 'c r constraint 'c = 'a * 'b
and 'a r = {v1 : 'a; v2 : int };;
type ('a,'b) t = ('a * 'b) r
and 'a r = { v1 : 'a; v2 : int; }
Observe how type unification simplifies the equations, in the first example by getting rid of the extraneous type product (* int), and in the second case eliminating it altogether. Note also that I used a type variable 'c which only appears in the right hand side of the type definition.
Two interesting uses are with polymorphic variants and class types, both based on row-polymorphism. Constraints allow to express certain subtyping relations. By subtyping, for variants, we mean a relation such that any constructor of a type is present in its subtypes. Some of these relations may already be expressed monomorphically:
# type sum_op = [ `add | `subtract ];;
type sum_op = [ `add | `subtract ]
# type prod_op = [ `mul | `div ];;
type prod_op = [ `mul | `div ]
# type op = [ sum_op | prod_op ];;
type op = [ `add | `div | `mul | `sub ]
There, op is a subtype of both sum_op and prod_op.
But in some cases, you have to introduce polymorphism, and this is where constraints come handy:
# type 'a t = 'a constraint [> op ] = 'a;;
type 'a t = 'a constraint 'a = [> op ]
The above let you denote the family of types which are subtypes of op : the type instance is 'a itself for a given instance of 'a t.
If we try to define the same type without a parameter, the type unification algorithm will complain:
# type t' = [> op];;
Error: A type variable is unbound in this type declaration.
In type [> op ] as 'a the variable 'a is unbound
The same sort of constraints may be expressed with class types, and the same problem may arise if the type definition is implicitly polymorphic by subtyping.
# class type ct = object method v : int end;;
class type ct = object method v : int end
# type i = #ct;;
Error: A type variable is unbound in this type declaration.
In type #ct as 'a the variable 'a is unbound
# type 'a i = 'a constraint 'a = #ct;;
type 'a i = 'a constraint 'a = #ct

OCaml function over polymorphic variants not sufficiently polymorphic?

OCaml gives function `A -> 1 | _ -> 0 the type [> `A] -> int, but why isn't that [> ] -> int?
This is my reasoning:
function `B -> 0 has type [<`B] -> int. Adding a `A -> 0 branch to make it function `A -> 1 | `B -> 0 loosens that to [<`A|`B] -> int. The function becomes more permissive in the type of argument it can accept. This makes sense.
function _ -> 0 has type 'a -> int. This type is unifiable with [> ] -> int, and [> ] is an already open type (very permissive). Adding the `A -> 0 branch to make it function `A -> 1 | _ -> 0 restricts the type to [>`A] -> int. That doesn't make sense to me. Indeed, adding still another branch `C -> 1 will make it [>`A|`C] -> int, further restricting the type. Why?
Note: I am not looking for workarounds, I'd just like to to know the logic behind this behavior.
On a related note, function `A -> `A | x -> x has type ([>`A] as 'a) -> 'a, and while that is also a restrictive open type for the parameter, I can understand the reason. The type should unify with 'a -> 'a, [>` ] -> 'b, 'c -> [>`A]; the only way to do it seems to be ([>`A] as 'a) -> 'a.
Does it exist a similar reason for my first example?
A possible answer is that the type [> ] -> int would allow an argument (`A 3) but this isn't allowed for function `A -> 1 | _ -> 0. In other words, the type needs to record the fact that `A takes no parameters.
The reason is a very practical one:
In older versions of OCaml the inferred type was
[`A | .. ] -> int
which meant that A takes no argument but may be absent.
However this type is unifiable with
[`B |`C ] -> int
which results in `A being discarded without any kind of check.
It makes easy introducing errors with misspellings.
For this reason variant constructors must either appear in an upper or a lower bound.
The typing of (function `A -> 1 | _ -> 0) is reasonable, as explained by Jeffrey. The reason why
(function `A -> 1 | _ -> 0) ((fun x -> (match x with `B -> ()); x) `B)
fails to type-check should be explained, in my opinion, by the latter part of the expression. Indeed the function (fun x -> (match x with `B -> ()); x) has input type [< `B] while its parameter `B has type [> `B ]. The unification of both types gives the closed type [ `B ] which is not polymorphic. It cannot be unified with the input type [> `A ] that you get from (function `A -> 1 | _ -> 0).
Fortunately, polymorphic variants do not only rely on (row) polymorphism. You can also use subtyping in situations, such as this, one where you want to enlarge a closed type: [ `B ] is a subtype of, for example, [`A | `B] which is an instance of [> `A ]. Subtyping casts are explicit in OCaml, using the syntax (expr :> ty) (casting to some type), or (expr : ty :> ty) in case the domain type cannot be inferred correct.
You can therefore write:
let b = (fun x -> (match x with `B -> ()); x) `B in
(function `A -> 1 | _ -> 0) (b :> [ `A | `B ])
which is well-typed.