OCaml has several different syntaxes for a polymorphic type annotation :
let f : 'a -> 'a = … (* Isn’t this one already polymorphic? (answer: NO) *)
let f : 'a. 'a -> 'a = …
let f : type a. a -> a = …
We often see them when using fancy algebraic datatypes (typically, GADTs), where they seem to be necessary.
What is the difference between these syntaxes? When and why each one must be used?
Below are alternative explanations with a varying amount of detail, depending on how much of a hurry you’re in. ;-)
I will use the following code (drawn from that other question) as a running example. Here, the type annotation on the definition of reduce is actually required to make it typecheck.
(* The type [('a, 'c) fun_chain] represents a chainable list of functions, i.e.
* such that the output type of one function is the input type of the next one;
* ['a] and ['c] are the input and output types of the whole chain.
* This is implemented as a recursive GADT (generalized algebraic data type). *)
type (_, _) fun_chain =
| Nil : ('a, 'a) fun_chain
| Cons : ('a -> 'b) * ('b, 'c) fun_chain -> ('a, 'c) fun_chain
(* [reduce] reduces a chain to just one function by composing all
* functions of the chain. *)
let rec reduce : type a c. (a, c) fun_chain -> a -> c =
fun chain x ->
begin match chain with
| Nil -> x
| Cons (f, chain') -> reduce chain' (f x)
end
The short story
On let-definitions, an annotation like : 'a -> 'a does not force polymorphism: the type-checker may refine the unification variable 'a to something. This bit of syntax is misleading indeed, because the same annotation on a val-declaration i.e. in a module signature does enforce polymorphism.
: type a. … is a type annotation with explicit (forced) polymorphism. You can think of this as the universal quantifier (∀ a, “for all a“). For instance,
let some : type a. a -> a option =
fun x -> Some x
means that “for all” type a, you can give an a to some and then it will return an a option.
The code at the beginning of this answer makes use of advanced features of the type system, namely, polymorphic recursion and branches with different types, and that leaves type inference at a loss. In order to have a program typecheck in such a situation, we need to force polymorphism like this. Beware that in this syntax, a is a type name (no leading quote) rather than a type unification variable.
: 'a. … is another syntax that forces polymorphism, but it is practically subsumed by : type a. … so you will hardly need it at all.
The pragmatic story
: type a. … is a short-hand syntax that combines two features:
an explicitly polymorphic annotation : 'a. …
useful for ensuring a definition is as general as intended
required when recursion is done with type parameters different from those of the initial call (“polymorphic recursion” i.e. recursion on “non-regular” ADTs)
a locally abstract type (type a) …
required when different branches have different types (i.e. when pattern-matching on “generalized” ADTs)
allows you to refer to type a from inside the definition, typically when building a first-class module (I won’t say more about this)
Here we use the combined syntax because our definition of reduce falls under both situations in bold.
We have polymorphic recursion because Cons builds a (a, c) fun_chain from a (b, c) fun_chain: the first type parameter differs (we say that fun_chain is a “non-regular” ADT).
We have branches with different types because Nil builds a (a, a) fun_chain whereas Cons builds a (a, c) fun_chain (we say that fun_chain is a “generalized” ADT, or GADT for short).
Just to be clear: : 'a. … and : type a. … produce the same signature for the definition. Choosing one syntax or the other only has an influence on how its body is typechecked. For most intents and purposes, you can forget about : 'a. … and just remember the combined form : type a. …. Alas, the latter does not completely subsume the former, there are rare situations where writing : type a. … wouldn’t work and you would need : 'a. … (see #octachron’s answer) but, hopefully, you won’t stumble upon them often.
The long story
Explicit polymorphism
OCaml type annotations have a dirty little secret: writing let f : 'a -> 'a = … doesn’t force f to be polymorphic in 'a. The compiler unifies the provided annotation with the inferred type and is free to instantiate the type variable 'a while doing so, leading to a less general type than intended. For instance let f : 'a -> 'a = fun x -> x+1 is an accepted program and leads to val f : int -> int. To ensure the function is indeed polymorphic (i.e. to have the compiler reject the definition if it is not general enough), you have to make the polymorphism explicit, with the following syntax:
let f : 'a. 'a -> 'a = …
For a non-recursive definition, this is merely the human programmer adding a constraint which makes more programs be rejected.
In the case of a recursive definition however, this has another implication. When typechecking the body, the compiler will unify the provided type with the types of all occurrences of the function being defined. Type variables which are not marked as polymorphic will be made equal in all recursive calls. But polymorphic recursion is precisely when we recurse with differing type parameters; without explicit polymorphism, that would either fail or infer a less general type than intended. To make it work, we explicitly mark which type variables should be polymorphic.
Note that there is a good reason why OCaml cannot typecheck polymorphic recursion on its own: there is undecidability around the corner (see Wikipedia for references).
As an example, let’s do the job of the typechecker on this faulty definition, where polymorphism is not made explicit:
(* does not typecheck! *)
let rec reduce : ('a, 'c) fun_chain -> 'a -> 'c =
fun chain x ->
begin match chain with
| Nil -> x
| Cons (f, chain') -> reduce chain' (f x)
end
We start with reduce : ('a, 'c) fun_chain -> 'a -> 'c and chain : ('a, 'c) fun_chain for some type variables 'a and 'c.
In the first branch, chain = Nil, so we learn that in fact chain : ('c, 'c) fun_chain and 'a == 'c. We unify both type variables. (That doesn’t matter right now, though.)
In the second branch, chain = Cons (f, chain') so there exists an arbitrary type b such that f : 'a -> b and chain' : (b, 'c) fun_chain. Then we must typecheck the recursive call reduce chain', so the expected argument type ('a, 'c) fun_chain must unify with the provided argument type (b, 'c) fun_chain; but nothing tells us that b == 'a. So we reject this definition, preferably (as is tradition) with a cryptic error message:
Error: This expression has type ($Cons_'b, 'c) fun_chain
but an expression was expected of type ('c, 'c) fun_chain
The type constructor $Cons_'b would escape its scope
If now we make polymorphism explicit:
(* still does not typecheck! *)
let rec reduce : 'a 'c. ('a, 'c) fun_chain -> 'a -> 'c =
…
Then typechecking the recursive call is not a problem anymore, because we now know that reduce is polymorphic with two “type parameters” (non-standard terminology), and these type parameters are instantiated independently at each occurrence of reduce; the recursive call uses b and 'c even though the enclosing call uses 'a and 'c.
Locally abstract types
But we have a second problem: the other branch, for constructor Nil, has made 'a be unified with 'c. Hence we end up inferring a less general type than what the annotation mandated, and we report an error:
Error: This definition has type 'c. ('c, 'c) fun_chain -> 'c -> 'c
which is less general than 'a 'c. ('a, 'c) fun_chain -> 'a -> 'c
The solution is to turn the type variables into locally abstract types, which cannot be unified (but we can still have type equations about them). That way, type equations are derived locally to each branch and they do not transpire outside of the match with construct.
The practical answer when hesitating between 'a . ... and type a. ... is to always use the latter form:
type a. ... works with:
polymorphic recursion
GADTs
raise type errors early
whereas:
'a. ... works with
polymorphic recursion
polymorphic quantification over row type variables
Thus type a. ... is generally a strictly superior version of 'a . ... .
Except for the last strange point. For the sake exhaustiveness, let me give an example of quantification over a row type variable:
let f: 'a. ([> `X ] as 'a) -> unit = function
| `X -> ()
| _ -> ()
Here the universal quantification allows us to control precisely the row variable type. For instance,
let f: 'a. ([> `X ] as 'a) -> unit = function
| `X | `Y -> ()
| _ -> ()
yields the following error
Error: This pattern matches values of type [? `Y ]
but a pattern was expected which matches values of type [> `X ]
The second variant type is bound to the universal type variable 'a,
it may not allow the tag(s) `Y
This use case is not supported by the form type a. ... mostly because the interaction of locally abstract type, GADTs type refinement and type constraints has not been formalized. Thus this second exotic use case is not supported.
TL;DR; In your question, only the last two forms are polymorphic type annotations. The latter of these two forms, in addition to annotating a type as polymorphic, introduces a locally abstract type1. This is the only difference.
The longer story
Now let's speak a little bit about the terminology. The following is not a type annotation (or, more properly, doesn't contain any type annotations),
let f : 'a -> 'a = …
It is called a type constraint. A type constraint requires the type of the defined value to be compatible with the specified type schemata.
In this definition,
let f : 'a. 'a -> 'a = …
we have a type constraint that includes a type annotation. The phrase "type annotation" in OCaml parlance means: annotating a type with some information, i.e., attaching some attribute or a property to a type. In this case, we annotate type 'a as polymorphic. We're not annotating the value f as polymorphic neither are we annotating the value f with type 'a -> 'a or 'a. 'a -> 'a. We are constraining the value of f to be compatible with type 'a -> 'a and annotate 'a as a polymorphic type variable.
For a long time, syntax 'a. was the only way to annotate type as polymorphic, but later OCaml introduced locally abstract types. They have the following syntax, which you could also add to your collection.
let f (type t) : t -> t = ...
Which creates a fresh abstract type constructor that you can use in the scope of the definition. It doesn't annotate t as polymorphic though, so if you want it to be explicitly annotated as polymorphic you could write,
let f : 'a. 'a -> 'a = fun (type t) (x : t) : t -> ...
which includes both an explicit type annotation of 'a as polymorphic and the introduction of a locally abstract type. Needless to say, it is cumbersome to write such constructions, so a little bit later (OCaml 4.00) they introduced syntactic sugar for that so that the above expression could be written as simple as,
let f : type t. t -> t = ...
Therefore, this syntax is just an amalgamation of two rather orthogonal features: locally abstract types and explicitly polymorphic types.
It is not however that the result of this amalgamation is stronger than its parts. It is more like an intersection. Whilst the generated type is both locally abstract and polymorphic, it is constrained to be a ground type. In other words, it constrains the kind of the type, but this is a completely different problem of higher-kinded polymorphism.
And to conclude the story, despite the syntax similarities, the following is not a type annotation,
val f : 'a -> 'a
It is called a value specification, which is a part of a signature, and it denotes that the value f has type 'a -> 'a.
1)) Locally abstract types have two main use cases. First, you can use them inside your expression in places where type variables are not permitted, e.g., in modules and exceptions definitions. Second, the scope of the locally abstract type exceeds the scope of the function, which you can employ by unifying types that are local to your expression with the abstract type to extend their scopes. The underlying idea is that the expression can not outlive its type and since in OCaml types could be created in runtime we have to be careful with the extent of the type as well. Unifying a locally created type with a locally abstract type via a function parameter guarantees that this type will be unified with some existing type in the place of the function application. Intuitively, it is like passing a reference for a type, so that the type could be returned from the function.
I'm new to OCaml, but have worked with Rust, Haskell, etc, and was very surprised when I was trying to implement bind on Either, and it doesn't appear that any of the general implementations have bind implemented.
JaneStreet's Base is missing it
What I assume is the standard library is missing it
bind was the first function I reached for... even before match, and the implementation seems quite easy:
let bind_either (m: ('e, 'a) Either.t) (f: 'a -> ('e, 'b) Either.t): ('e, 'b) Either.t =
match m with
| Right r -> f r
| Left l -> Left l
Am I missing something?
It is because we prefer a more specific Result.t, which has clear names for the ok state and for the exceptional state. And, in general, Either.t is not extremely popular amongst OCaml programmers as usually, a more specialized type could be used with the variant names that better communicate the domain-specific purpose of either branch. It is also worth mentioning that Either was introduced to the OCaml standard very recently, just 4.12, so it might become more popular.
As mentioned by #ivg, Either is relatively new to the standard library and generally one would prefer to use types that make more sense. For example, Result for error handling.
There is also another point of view, which also applies to Result. Monads act on types parameterised by one type.
In Haskell, this is much less obvious because it is possible to partially apply type constructors. Hence; bind:: (a -> b) -> Either a -> Either b allows you to go from Either a c to Either b c.
In trying to generalise the behaviour of a monad via parameterised modules (functors in the ML sense of the term), one would have to "trick" oneself into standardising, for example, the treatment of option (a type of arity 1) and either (or result) which are of arity 2.
There are several approaches. For example, expressing multiple interfaces to describe a monad. For example describing Monad2 and describing Monad in terms of Monad2 as is done in the Base library (https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Monad/index.html)
In Preface we used a rather different (and perhaps less generic) approach. We leave it to the user to set the left parameter of Either (via a functor) (and the right parameter for Result): https://github.com/xvw/preface/blob/master/lib/preface_stdlib/either.mli
However, we do not lose the ability to change the left-hand type of the calculation because Either also has a Bifunctor module that allows us to change the type of both parameters. The conversation is broadly described in this thread: https://discuss.ocaml.org/t/instance-modules-for-more-parametrized-types/5356/2
I have read in this post that ML dialects do not allow type variables of non-ground kind. E.g. the last statement is not representable:
-- Haskell code
type Ground = Int
type FirstOrder a = Maybe a
type SecondOrder c = c Int -- ML do not allow :c
OCaml has support of higher-kinded only at the level of modules. There are some explanations (here and author's comment here) about which features of OCaml clash with higher-kinded types opportunity.
If I understood it correctly, the main problem is in the following facts:
OCaml does not follow a "freshness" restriction for type definitions: construct type can define both an alias (an the type will remain the same) and a new fresh type
type alias definition can be hidden
AFAIK, Standard ML has different constructs for type definition and aliases: type for aliases and datatype for new fresh types introduction.
Unfortunatelly, I do not know SML well enough -- is it possible to export type aliases with its definition hidden? And can someone please show me if there are any other SML features that still do not go well with an opportunity of higher-kinded types?
Probably there will be some problems with functors -- Could one be so kind to show a code example of it? I've heard several times about such cases but still have not found a complete example of it.
Yes, SML can express the equivalent of higher-kinded types through functors, and can also make them abstract. Useless example:
functor F (type 'a t) :> sig type 'a u end =
struct
type 'a u = ('a t) t
end
However, unlike OCaml, SML does not (officially) have higher-order functors, so per the standard, you can only express second-order type constructors this way.
FWIW, OCaml may use the same keyword for type aliases and generative types (type vs datatype in SML), but they are still distinguished syntactically, by their right-hand side. So that's no real difference to SML.In both languages, an abstract occurring in a signature can be implemented as either a type alias or a generative type. So the problem for type inference that Leo is alluding to exists equally in both. Haskell can get away without that problem because it does not have the same expressiveness regarding type abstraction (i.e., no "sealing" operator for modules).
I can't for the life of me figure out why the following SML function is throwing a Warning in my homework problem:
fun my_func f ls =
case ls of
[] => raise MyException
| head :: rest => case f head of
SOME v => v
| NONE => my_func f rest
fun f a = if isSome a then a else NONE;
Whenever I call my_func with the following test functions:
my_func f [NONE, NONE];
my_func f [];
I always get the warning:
Warning: type vars not generalized because of
value restriction are instantiated to dummy types (X1,X2,...)
Whenever I pass in an options list containing at least one SOME value, this Warning is not thrown. I know it must be something to do with the fact that I am using polymorphism in my function currying, but I've been completely stuck as to how to get rid of these warnings.
Please help if you have any ideas - thank you in advance!
The value restriction referenced in the warning is one of the trickier things to understand in SML, however I will do my best to explain why it comes up in this case and try to point you towards a few resources to learn more.
As you know, SML will use type inference to deduce most of the types in your programs. In this program, the type of my_func will be inferred to be ('a -> 'b option) -> 'a list -> 'b. As you noted, it's a polymorphic type. When you call my_func like this
myfunc f [NONE, SOME 1, NONE];
... the type variables 'a and 'b are instantiated to int option and int.
However when you call it without a value such as SOME 1 above
myfunc f [NONE, NONE];
What do you think the type variables should be instantiated to? The types should be polymorphic -- something like 't option and 't for all types 't. However, there is a limitation which prevents values like this to take on polymorphic types.
SML defines some expressions as non-expansive values and only these values may take on polymorphic types. They are:
literals (constants)
variables
function expressions
constructors (except for ref) applied to non-expansive values
a non-expansive values with a type annotation
tuples where each field is a non-expansive value
records where each field is a non-expansive value
lists where each field is a non-expansive value
All other expressions, notably function calls (which is what the call to my_func is) cannot be polymorphic. Neither can references. You might be curious to see that the following does not raise a warning:
fn () => my_func f [NONE, NONE];
Instead, the type inferred is unit -> 'a. If you were to call this function however, you would get the warning again.
My understanding of the reason for this restriction is a little weak, but I believe that the underlying root issue is mutable references. Here's an example I've taken from the MLton site linked below:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
This program does not typecheck under SML, due to the value restriction. Were it not for the value restriction, this program would have a type error at runtime.
As I said, my understanding is shaky. However I hope I've shed a little light on the issue you've run into, although I believe that in your case you could safely ignore the warning. Here are some references should you decide you'd like to dig deeper:
http://users.cis.fiu.edu/~smithg/cop4555/valrestr.html
http://mlton.org/ValueRestriction
(BTW the MLton site is solid gold. There's so much hidden away here, so if you're trying to understand something weird about SML, I highly recommend searching here because you'll likely turn up a lot more than you initially wanted)
Since it seems like you're actually using SML/NJ, this is a pretty handy guide to the error messages and warnings that it will give you at compile time:
http://flint.cs.yale.edu/cs421/smlnj/doc/errors.html
I was reading today Jason Hickey's online book "Introduction to Objective Caml" and in the chapter on Functors (page 140) I ran into the following line inside the Set functor's definition:
let add = (::)
Running the code resulted in a not very illuminating 'Syntax error' error message. After plugin in the line into ocaml toplevel I figured out that :: is in fact not an operator, but rather a type constructor.
However, from what little I know of Haskell the equivalent : constructor can be treated as an operator (function) as well.
Prelude> :t (:)
(:) :: a -> [a] -> [a]
My question is: have OCaml constructors never been first class values (implying that the code from the book was wrong from the start) and why is this the case?
In Caml Light, OCaml's predecessor, it used to be the case that constructors where promoted to functions when partially applied. I'm not exactly sure why this feature was removed when moving to OCaml, and I lament this as well, but the explanation I heard was "nobody used that". So no List.map Some foo anymore...
:: is slightly special as an algebraic datatype constructor as it is the only infix constructor (hardcoded in the parser), but otherwise behaves like any other datatype constructor.