The first and the second flatMap work well. Why doesn't the third one work?
fun flatMap f xs = List.concat(List.map f xs)
fun flatMap f = List.concat o List.map f
val flatMap = (fn mmp => List.concat o mmp) o List.map;
This is due to a rule called "value polymorphism" or the "value restriction". According to this rule, a value declaration can't create a polymorphic binding if the expression might be "expansive"; that is, a value declaration can only create a polymorphic binding if it conforms to a highly restricted grammar that ensures it can't create ref cells or exception names.
In your example, since (fn mmp => List.concat o mmp) o List.map calls the function o, it's not non-expansive; you know that o doesn't create ref cells or exception names, but the grammar can't distinguish that.
So the declaration val flatMap = (fn mmp => List.concat o mmp) o List.map is still allowed, but it can't create a polymorphic binding: it has to give flatMap a monomorphic type, such as (int -> real list) -> int list -> real list. (Note: not all implementations of Standard ML can infer the desired type in all contexts, so you may need to add an explicit type hint.)
This restriction exists to ensure that we don't implicitly cast from one type to another by writing to a ref cell using one type and reading from it using a different type, or by wrapping one type in a polymorphic exception constructor and unwrapping it using a different type. For example, the below programs are forbidden by the value restriction, but if they were allowed, each would create a variable named garbage of type string that is initialized from the integer 17:
val refCell : 'a option ref =
ref NONE
val () = refCell := SOME 17
val garbage : string =
valOf (! refCell)
val (wrap : 'a -> exn, unwrap : exn -> 'a) =
let
exception EXN of 'a
in
(fn x => EXN x, fn EXN x => x)
end
val garbage : string =
unwrap (wrap 17)
For more information:
"ValueRestriction" in the MLton documentation
"Value restriction" on the English Wikipedia
"Types and Type Checking" in SML/NJ's guide to converting programs from Standard ML '90 to Standard ML '97. (Standard ML '90 had a different version of this rule, that was more permissive — it would have allowed your program — but considered "somewhat subtle" and in some cases "unpleasant", hence its replacement in Standard ML '97.)
the following sections of The Definition of Standard ML (Revised) (PDF):
§4.7 "Non-expansive Expressions", page 21, which defines which expressions are considered "non-expansive" (and can therefore be used in polymorphic value declarations).
§4.8 "Closure", pages 21–22, which defines the operation that makes a binding polymorphic; this operation enforces the value restriction by preventing the binding from becoming polymorphic if the expression might be expansive.
inference rule (15), page 26, which uses the aforementioned operation; see also the comment on page 27.
the comment on inference rule (20), page 27, which explains why the aforementioned operation is not applied to exception declarations. (Technically this is somewhat separate from the value restriction; but the value restriction would be useless without this.)
§G.4 "Value Polymorphism", pages 105–106, which discusses this change from Standard ML '90.
Related
OCaml arrays are mutable. For most mutable types, even an "empty" value does not have polymorphic type.
For example,
# ref None;;
- : '_a option ref = {contents = None}
# Hashtbl.create 0;;
- : ('_a, '_b) Hashtbl.t = <abstr>
However, an empty array does have a polymorphic type
# [||];;
- : 'a array = [||]
This seems like it should be impossible since arrays are mutable.
It happens to work out in this case because the length of an array can't change and thus there's no opportunity to break soundness.
Are arrays special-cased in the type system to allow this?
The answer is simple -- an empty array has the polymorphic type because it is a constant. Is it special-cased? Well, sort of, mostly because an array is a built-in type, that is not represented as an ADT, so yes, in the typecore.ml in the is_nonexpansive function, there is a case for the array
| Texp_array [] -> true
However, this is not a special case, it is just a matter of inferring which syntactic expressions form constants.
Note, in general, the relaxed value restriction allows generalization of expressions that are non-expansive (not just syntactic constants as in classical value restriction). Where non-expansive expression is either a expression in the normal form (i.e., a constant) or an expression whose computation wouldn't have any observable side effects. In our case, [||] is a perfect constant.
The OCaml value restriction is even more relaxed than that, as it allows the generalization of some expansive expressions, in case if type variables have positive variance. But this is a completely different story.
Also,ref None is not an empty value. A ref value by itself, is just a record with one mutable field, type 'a ref = {mutable contents : 'a} so it can never be empty. The fact that it contains an immutable value (or references the immutable value, if you like) doesn't make it either empty or polymorphic. The same as [|None|] that is also non-empty. It is a singleton. Besides, the latter has the weak polymorphic type.
I don't believe so. Similar situations arise with user-defined data types, and the behaviour is the same.
As an example, consider:
type 'a t = Empty | One of { mutable contents : 'a }
As with an array, an 'a t is mutable. However, the Empty constructor can be used in a polymorphic way just like an empty array:
# let n = Empty in n, n;;
- : 'a t * 'b t = (Empty, Empty)
# let o = One {contents = None};;
val o : '_weak1 option t = One {contents = None}
This works even when there is a value of type 'a present, so long as it is not in a nonvariant position:
type 'a t = NonMut of 'a | Mut of { mutable contents : 'a }
# let n = NonMut None in n, n;;
- : 'a option t * 'b option t = (NonMut None, NonMut None)
Note that the argument of 'a t is still nonvariant and you will lose polymorphism when hiding the constructor inside a function or module (roughly because variance will be inferred from arguments of the type constructor).
# (fun () -> Empty) ();;
- : '_weak1 t = Empty
Compare with the empty list:
# (fun () -> []) ();;
- : 'a list = []
Perhaps neither of these statements are categorically precise, but a monad is often defined as "a monoid in the category of endofunctors"; a Haskell Alternative is defined as "a monoid on applicative functors", where an applicative functor is a "strong lax monoidal functor". Now these two definitions sound pretty similar to the ignorant (me), but work out significantly differently. The neutral element for alternative has type f a and is thus "empty", and for monad has type a -> m a and thus has the sense "non-empty"; the operation for alternative has type f a -> f a -> f a, and the operation for monad has type (a -> f b) -> (b -> f c) -> (a -> f c). It seems to me that the real important detail is in the category of endofunctors versus over endofunctors, though perhaps the "strong lax" detail in alternative is important; but that's where I get confused because within Haskell at least, monads end up being alternatives: and I see that I do not yet have a precise categorical understanding of all the details here.
How can it be precisely expresseed what the difference is between alternative and monad, such that they are both monoids relating to endofunctors, and yet the one has an "empty" neutral and the other has a "non-empty" neutral element?
In general, a monoid is defined in a monoidal category, which is a category that defines some kind of (tensor) product of objects and a unit object.
Most importantly, the category of types is monoidal: the product of types a and b is just a type of pairs (a, b), and the unit type is ().
A monoid is then defined as an object m with two morphisms:
eta :: () -> m
mu :: (m, m) -> m
Notice that eta just picks an element of m, so it's equivalent to mempty, and curried mu becomes mappend of the usual Haskell Monoid class.
So that's a category of types and functions, but there is also a separate category of endofunctors and natural transformations. It's also a monoidal category. A tensor product of two functors is defined as their composition Compose f g, and unit is the identity functor Id. A monoid in that category is a monad. As before we pick an object m, but now it's an endofunctor; and two morphism, which now are natural transformations:
eta :: Id ~> m
mu :: Compose m m ~> m
In components, these two natural transformations become:
return :: a -> m a
join :: m (m a) -> m a
An applicative functor may also be defined as a monoid in the functor category, but with a more sophisticated tensor product called Day convolution. Or, equivalently, it can be defined as a functor that (laxly) preserves monoidal structure.
Alternative is a family of monoids in the category of types (not endofunctors). This family is generated by an applicative functor f. For every type a we have a monoid whose mempty is an element of f a and whose mappend maps pairs of f a to elements of f a. These polymorphic functions are called empty and <|>.
In particular, empty must be a polymorphic value, meaning one value per every type a. This is, for instance, possible for the list functor, where an empty list is polymorphic in a, or for Maybe with the polymorphic value Nothing. Notice that these are all polymorphic data types that have a constructor that doesn't depend on the type parameter. The intuition is that, if you think of a functor as a container, this constructor creates and empty container. An empty container is automatically polymorphic.
Both concepts are tied to the idea of a "monoidal category", which is a category you can define the concept of a monoid in (and certain other kinds of algebraic structures). You can think of monoidal categories as: a category defines an abstract notion of functions of one argument; a monoidal category defines an abstract notion of functions of zero arguments or multiple arguments.
A monad is a monoid in the category of endofunctors; in other words, it's a monoid where the product (a function of 2 arguments) and the identity (a function of 0 arguments) use the concept of multi-argument function defined by a particular (bizarre) monoidal category (the monoidal category of endofunctors and composition).
An applicative functor is a monoidal functor. In other words, it's a functor that preserves all the structure of a monoidal category, not just the part that makes it a category. It should be obvious that that means it has mapN functions for functions with any number of arguments, not just functions of one argument (like a normal functor has).
So a monad exists within a particular monoidal category (which happens to be a category of endofunctors), while an applicative functor maps between two monoidal categories (which happen to be the same category, hence it's a kind of endofunctor).
To supplement the other answers with some Haskell code, here is how we might represent the Day convolution monoidal structure #Bartosz Milewski refers to:
data Day f g a = forall x y. Day (x -> y -> a) (f x) (g y)
With the unit object being the functor Identity.
Then we can reformulate the applicative class as a monoid object with respect to this monoidal structure:
type f ~> g = forall x. f x -> g x
class Functor f => Applicative' f
where
dappend :: Day f f ~> f
dempty :: Identity ~> f
You might notice how this rhymes with other familiar monoid objects, such as:
class Functor f => Monad f
where
join :: Compose f f ~> f
return :: Identity ~> f
or:
class Monoid m
where
mappend :: (,) m m -> m
mempty :: () -> m
With some squinting, you might also be able to see how dappend is just a wrapped version of liftA2, and likewise dempty of pure.
I'm reading Expert F# 4.0 and at some point (p.93) the following syntax is introduced for list:
type 'T list =
| ([])
| (::) of 'T * 'T list
Although I understand conceptually what's going on here, I do not understand the syntax. Apparently you can put [] or :: between parentheses and they mean something special.
Other symbols aren't allowed, for example (++) or (||). So what's going on here?
And another thing is the 'operator' nature of (::). Suppose I have the following (weird) type:
type 'T X =
| None
| Some of 'T * 'T X
| (::) of 'T * 'T X
Now I can say:
let x: X<string> = Some ("", None)
but these aren't allowed:
let x: X<string> = :: ("", None)
let x: X<string> = (::) ("", None)
So (::) is actually something completely different than Some, although both are cases in a discriminated union.
Theoretically, F# spec (see section 8.5) says that union case identifiers must be alphanumeric sequences starting with an upper-case letter.
However, this way of defining list cons is an ML idiomatic thing. There would be riots in the streets if we were forced to write Cons (x, Cons(y, Cons (z, Empty))) instead of x :: y :: z :: [].
So an exception was made for just these two identifiers - ([]) and (::). You can use these, but only these two. Besides these two, only capitalized alphanumeric names are allowed.
However, you can define free-standing functions with these funny names:
let (++) a b = a * b
These functions are usually called "operators" and can be called via infix notation:
let x = 5 ++ 6 // x = 30
As opposed to regular functions that only support prefix notation - i.e. f 5 6.
There is a separate quite intricate set of rules about which characters are allowed in operators, which can be only unary, which can be only binary, which can be both, and how they define the resulting operator precedence. See section 4.1 of the spec or here for full reference.
I was reading a little bit about the value restriction in Standard ML and tried translating the example to OCaml to see what it would do. It seems like OCaml produces these types in contexts where SML would reject a program due to the value restriction. I've also seen them in other contexts like empty hash tables that haven't been "specialized" to a particular type yet.
http://mlton.org/ValueRestriction
Here's an example of a rejected program in SML:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
If you enter the first line verbatim into the SML of New Jersey repl you get
the following error:
- val r: 'a option ref = ref NONE;
stdIn:1.6-1.33 Error: explicit type variable cannot be generalized at its binding declaration: 'a
If you leave off the explicit type annotation you get
- val r = ref NONE
stdIn:1.6-1.18 Warning: type vars not generalized because of
value restriction are instantiated to dummy types (X1,X2,...)
val r = ref NONE : ?.X1 option ref
What exactly is this dummy type? It seems like it's completely inaccessible and fails to unify with anything
- r := SOME 5;
stdIn:1.2-1.13 Error: operator and operand don't agree [overload conflict]
operator domain: ?.X1 option ref * ?.X1 option
operand: ?.X1 option ref * [int ty] option
in expression:
r := SOME 5
In OCaml, by contrast, the dummy type variable is accessible and unifies with the first thing it can.
# let r : 'a option ref = ref None;;
val r : '_a option ref = {contents = None}
# r := Some 5;;
- : unit = ()
# r ;;
- : int option ref = {contents = Some 5}
This is sort of confusing and raises a few questions.
1) Could a conforming SML implementation choose to make the "dummy" type above accessible?
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
Weakly polymorphic types (the '_-style types) are a programming convenience rather than a true extension of the type system.
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
OCaml does not sacrifice value restriction, it rather implements a heuristic that saves you from systematically annotating the type of values like ref None whose type is only “weekly” polymorphic. This heuristic by looking at the current “compilation unit”: if it can determine the actual type for a weekly polymorphic type, then everything works as if the initial declaration had the appropriate type annotation, otherwise the compilation unit is rejected with the message:
Error: The type of this expression, '_a option ref,
contains type variables that cannot be generalized
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
This is because '_a is not a “real” type, for instance it is forbidden to write a signature explicitly defining values of this “type”:
# module A : sig val table : '_a option ref end = struct let option = ref None end;;
Characters 27-30:
module A : sig val table : '_a option ref end = struct let option = ref None end;;
^^^
Error: The type variable name '_a is not allowed in programs
It is possible to avoid using these weakly polymorphic types by using recursive declarations to pack together the weakly polymorphic variable declaration and the later function usage that completes the type definition, e.g.:
# let rec r = ref None and set x = r := Some(x + 1);;
val r : int option ref = {contents = None}
val set : int -> unit = <fun>
1) Could a conforming SML implementation choose to make the "dummy" type above accessible?
The revised Definition (SML97) doesn't specify that there be a "dummy" type; all it formally specifies is that the val can't introduce a polymorphic type variable, since the right-hand-side expression isn't a non-expansive expression. (There are also some comments about type variables not leaking into the top level, but as Andreas Rossberg points out in his Defects in the Revised Definition of Standard ML, those comments are really about undetermined types rather than the type variables that appear in the definition's formalism, so they can't really be taken as part of the requirements.)
In practice, I think there are four approaches that implementations take:
some implementations reject the affected declarations during type-checking, and force the programmer to specify a monomorphic type.
some implementations, such as MLton, prevent generalization, but defer unification, so that the appropriate monomorphic type can become clear later in the program.
SML/NJ, as you've seen, issues a warning and instantiates a dummy type that cannot subsequently be unified with any other type.
I think I've heard that some implementation defaults to int? I'm not sure.
All of these options are presumably permitted and apparently sound, though the "defer unification" approach does require care to ensure that the type doesn't unify with an as-yet-ungenerated type name (especially a type name from inside a functor, since then the monomorphic type may correspond to different types in different applications of the functor, which would of course have the same sorts of problems as a regular polymorphic type).
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
I'm not very familiar with OCaml, but from what you write, it sounds like it uses the same approach as MLton; so, it should not have to sacrifice soundness.
(By the way, despite what you imply, OCaml does have the value restriction. There are some differences between the value restriction in OCaml and the one in SML, but none of your code-snippets relates to those differences. Your code snippets just demonstrate some differences in how the restriction is enforced in OCaml vs. one implementation of SML.)
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
Again, I'm not very familiar with OCaml, but — yeah, that seems like a mistake to me!
To answer the second part of your last question,
3) [...] Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
That is because OCaml has a different interpretation of type variables occurring in type annotations: it interprets them as existentially quantified, not universally quantified. That is, a type annotation only has to be right for some possible instantiation of its variables, not for all. For example, even
let n : 'a = 5
is totally valid in OCaml. Arguably, this is rather misleading and not the best design choice.
To enforce polymorphism in OCaml, you have to write something like
let n : 'a. 'a = 5
which would indeed cause an error. However, this introduces a local quantifiers, so is still somewhat different from SML, and doesn't work for examples where 'a needs to be bound elsewhere, e.g. the following:
fun pair (x : 'a) (y : 'a) = (x, y)
In OCaml, you have to rewrite this to
let pair : 'a. 'a -> 'a -> 'a * 'a = fun x y -> (x, y)
I have:
module Functor(M : sig end) = struct
module NestedFunctor(M : sig end) = struct
end
end
This code is valid:
module V = Functor(struct end)
module W = V.NestedFunctor(struct end)
But this is invalid:
module M = Functor(struct end).NestedFunctor(struct end)
(* ^ Error: Syntax error *)
As I understand, a functor is a relation between a set of input modules and a set of possible output modules. But this example confuses my understanding. Why is the binding of the functor result with a new module name required to call nested functor of initial functor?
My compiler version = 4.01.0
I'm new to OCaml. When I found functors I imagined something as
Engine.MakeRunnerFor(ObservationStation
.Observe(Matrix)
.With(Printer))
I thought it is a good tool for the human-friendly architecture notation.
Then I was disappointed. Of course, this is a syntax error and I understand that. But I think this restriction inflates grammar and makes it less intuitive. And my "Why?" in the main question is in the context of the concept of language.
While I don't believe that this restriction is strictly necessary, it is probably motivated by certain limitations in OCaml's module type system. Without going into too much technical detail, OCaml requires all intermediate module types to be expressible as syntactic signatures. But with functors, that sometimes isn't possible. For example, consider:
module Functor(X : sig end) = struct
type t = T of int
module Nested(Y : sig end) = struct let x = T 5 end
end
Given this definition, the type of the functor Functor(struct end).Nested can't be expressed in OCaml syntax. It would need to be something like
functor(Y : sig end) -> sig val x : Functor(struct end).t end (* not legal OCaml! *)
but Functor(struct end).t isn't a valid type expression in OCaml, for reasons that are rather technical (in short, allowing a type like that would make deciding what types are equal -- as necessary during type checking -- much more involved).
Naming intermediate modules often avoids this dilemma. Given
module A = Functor(struct end)
the functor A.Nested has type
functor(Y : sig end) -> sig val x : A.t end
by referring to the named intermediate result A.
Using the terminology in the manual, types and the like (module types, class types, etc.) can be qualified by an extended-module-path where a qualifier can be a functor call, whereas non-types (core expressions, module expressions, classes, etc.) can only be qualified by a module-path where qualifiers have to be plain module names.
For example, you can write a type Functor(struct end).NestedFunctor(struct end).t but not an expression Functor(struct end).NestedFunctor(struct end).x or a module expression Functor(struct end).NestedFunctor(struct end).
Syntax-wise, allowing extended-module-path in expressions would be ambiguous: the expression F(M).x is parsed as the constructor F applied to the expression (M).x, where M is a constructor and the . operator is the record field access operator. This won't ever typecheck since M is obviously a variant to which the . operator can't be applied, but eliminating this at the parser would be complicated. There may be other ambiguities I'm not thinking of right now (with first-class modules?).
As far as the type checker is concerned, functor calls in types designation isn't a problem — they're allowed. However the argument has to itself be a path: you can write Set.Make(String).t but not Set.Make(struct type t = string let compare = … end).t. Allowing structures and first-class modules in type expressions would make the type checker more complex, because of the way OCaml manages abstract types. Every time you write Set.Make(String).t, it designates the same abstract type; but if you write
module M1 = Set.Make(struct type t let compare = String.compare end)
module M2 = Set.Make(struct type t let compare = String.compare end)
then M1 and M2 are distinct abstract types. The technical way to formulate this is that in OCaml, functor application is applicative: applying the same functor to the same argument always returns the same abstract type. But structures are generative: writing struct … end twice produces distinct abstract types — so Set.Make(struct type t let compare = String.compare end).t ≠ Set.Make(struct type t let compare = String.compare end).t — generative types lead to a non-reflexive equality between type expressions if you aren't careful what you allow in type expressions.
Code generation wouldn't be impacted much, because it could desugar Functor(struct … end).field as let module TMP = struct … end in Functor(TMP).field.
As far as I can see, there's no deep answer. The reported error is a syntax error. I.e., the grammar of OCaml just doesn't support this notation.
One way to summarize it is that in the grammar for a module expression, the dot always appears as part of a "long module identifier", i.e., between two capitalized identifiers. I checked this just now, and that's what I saw.