Which style is better to declare a type in Ocaml? - list

I often need to declare a type which contains a map or a list, for instance:
type my_type_1 = my_type_0 IntMap.t
type my_type_2 = my_type_0 List
Also I have seen another style of declaration which encapsulates map or list in a record, for instance:
type my_type_1 =
| Bot_1
| Nb_1 of my_type_0 IntMap.t
type my_type_2 =
| Bot_2
| Nb_2 of my_type_0 List
My question is, whether there are some cases where the second style is necessary and better than the first style?
Thank you very much!

The two types you give are not equivalent, because of the Bot constructor added in the second case. This means that the two my_type_1 do not have the same semantics. Incidentally, the construction Bot | Foo of 'a is already provided by the standard type 'a option, with constructors Some and None, so the type my_type_1 of your second sample is equivalent to a my_type_1 option in the first one.
Whether to use an option type or your own constructors names is up to you. In general, I would advise to you an option type if the semantics of your type coincides with the option idea of failure, being absent, or being undefined. Given your name Bot, I assume this is probably what you're doing, but defining your own constructor names is also ok and can be clearer in some circumstances. The matter has been discussed in depth in this blog post from ezyang.
Now, assuming your two types definition were equivalent (that is, in absence of the Bot) constructor, what's the purpose of adding an algebraic datatype layer, a new constructor, instead of using a simple type alias ? Well, it has the effect of making your type distinct from the representation type. For example, if you define type 'a stack = Stack of 'a list, 'a stack and 'a list cannot be confused for each other, and the compiler will raise an error if you do. So that can be used to enforce a (light) type separation, with the constructor acting as a type annotation:
let empty = Stack []
let length (Stack li) = List.length li
I'd say it's mostly a matter of taste, but I would recommend using an algebraic datatype instead of an alias when you want to be sure that there can be no mistake with the original type. The downside is that you have to wrap the operations of the original datatype, as I did in my length function above.

Those are not different styles, but different types: the first type declarations are an abbreviation for a specialized instance (for mytype_0) of the polymorphic List, or IntMap.
The second set of definitions present a "constructed" type, for which Bot_1 (and Bot_2) provide values. Those "alternatives" can be used, for example, to create functions of type T -> my_type_1 which return Bot_1 in a special case where the computation doesn't allow to return a list, in a similar way of what an option type permits. This is impossible with the first set of definitions (who must always provide the required list payload).

The second one isn't a "record" (which is a different thing). It creates an algebraic data type. I'm not sure how to explain it but if you've used Haskell or Standard ML you'll know. It's basically a tagged union. A my_type_1 is either a Bot_1 (which carries no data) or a Nb_1 (which carries a my_type_0 IntMap.t as data).
The first one is simply a type synonym (like a typedef in C).

Related

OCaml - Why is Either not a Monad

I'm new to OCaml, but have worked with Rust, Haskell, etc, and was very surprised when I was trying to implement bind on Either, and it doesn't appear that any of the general implementations have bind implemented.
JaneStreet's Base is missing it
What I assume is the standard library is missing it
bind was the first function I reached for... even before match, and the implementation seems quite easy:
let bind_either (m: ('e, 'a) Either.t) (f: 'a -> ('e, 'b) Either.t): ('e, 'b) Either.t =
match m with
| Right r -> f r
| Left l -> Left l
Am I missing something?
It is because we prefer a more specific Result.t, which has clear names for the ok state and for the exceptional state. And, in general, Either.t is not extremely popular amongst OCaml programmers as usually, a more specialized type could be used with the variant names that better communicate the domain-specific purpose of either branch. It is also worth mentioning that Either was introduced to the OCaml standard very recently, just 4.12, so it might become more popular.
As mentioned by #ivg, Either is relatively new to the standard library and generally one would prefer to use types that make more sense. For example, Result for error handling.
There is also another point of view, which also applies to Result. Monads act on types parameterised by one type.
In Haskell, this is much less obvious because it is possible to partially apply type constructors. Hence; bind:: (a -> b) -> Either a -> Either b allows you to go from Either a c to Either b c.
In trying to generalise the behaviour of a monad via parameterised modules (functors in the ML sense of the term), one would have to "trick" oneself into standardising, for example, the treatment of option (a type of arity 1) and either (or result) which are of arity 2.
There are several approaches. For example, expressing multiple interfaces to describe a monad. For example describing Monad2 and describing Monad in terms of Monad2 as is done in the Base library (https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Monad/index.html)
In Preface we used a rather different (and perhaps less generic) approach. We leave it to the user to set the left parameter of Either (via a functor) (and the right parameter for Result): https://github.com/xvw/preface/blob/master/lib/preface_stdlib/either.mli
However, we do not lose the ability to change the left-hand type of the calculation because Either also has a Bifunctor module that allows us to change the type of both parameters. The conversation is broadly described in this thread: https://discuss.ocaml.org/t/instance-modules-for-more-parametrized-types/5356/2

Is it possible to support higher-kinded types in Standard ML?

I have read in this post that ML dialects do not allow type variables of non-ground kind. E.g. the last statement is not representable:
-- Haskell code
type Ground = Int
type FirstOrder a = Maybe a
type SecondOrder c = c Int -- ML do not allow :c
OCaml has support of higher-kinded only at the level of modules. There are some explanations (here and author's comment here) about which features of OCaml clash with higher-kinded types opportunity.
If I understood it correctly, the main problem is in the following facts:
OCaml does not follow a "freshness" restriction for type definitions: construct type can define both an alias (an the type will remain the same) and a new fresh type
type alias definition can be hidden
AFAIK, Standard ML has different constructs for type definition and aliases: type for aliases and datatype for new fresh types introduction.
Unfortunatelly, I do not know SML well enough -- is it possible to export type aliases with its definition hidden? And can someone please show me if there are any other SML features that still do not go well with an opportunity of higher-kinded types?
Probably there will be some problems with functors -- Could one be so kind to show a code example of it? I've heard several times about such cases but still have not found a complete example of it.
Yes, SML can express the equivalent of higher-kinded types through functors, and can also make them abstract. Useless example:
functor F (type 'a t) :> sig type 'a u end =
struct
type 'a u = ('a t) t
end
However, unlike OCaml, SML does not (officially) have higher-order functors, so per the standard, you can only express second-order type constructors this way.
FWIW, OCaml may use the same keyword for type aliases and generative types (type vs datatype in SML), but they are still distinguished syntactically, by their right-hand side. So that's no real difference to SML.In both languages, an abstract occurring in a signature can be implemented as either a type alias or a generative type. So the problem for type inference that Leo is alluding to exists equally in both. Haskell can get away without that problem because it does not have the same expressiveness regarding type abstraction (i.e., no "sealing" operator for modules).

Prevent SML type from becoming eqtype without hiding constructors

In datatype declarations, Standard ML will produce an equality type if all of the type arguments to all of the variants are themselves eqtypes.
I've seen comments in a few places lamenting the inability of users to provide their own definition of equality and construct their own eqtypes and unexpected consequences of the SML rules (e.g. bare refs and arrays are eqtypes, but datatype Foo = Foo of (real ref) is not an eqtype).
Source: http://mlton.org/PolymorphicEquality
one might expect to be able to compare two values of type real t, because pointer comparison on a ref cell would suffice. Unfortunately, the type system can only express that a user-defined datatype admits equality or not.
I'm wondering whether it is possible to block eqtyping. Say, for instance, I am implementing a set as a binary tree (with an unnecessary variant) and I want to pledge away the ability to structurally compare sets with each other.
datatype 'a set = EmptySet | SetLeaf of 'a | SetNode of 'a * 'a set * 'a set;
Say I don't want people to be able to distinguish SetLeaf(5) and SetNode(5, EmptySet, EmptySet) with = since it's an abstraction-breaking operation.
I tried a simple example with datatype on = On | Off just to see if I could demote the type to a non-eqtype using signatures.
(* attempt to hide the "eq"-ness of eqtype *)
signature S = sig
type on
val foo : on
end
(* opaque transcription to kill eqtypeness *)
structure X :> S = struct
datatype on = On | Off
let foo = On
end
It seems that transparent ascription fails to prevent X.on from becoming an eqtype, but opaque ascription does prevent it. However, these solutions are not ideal because they introduce a new module and hide the data constructors. Is there a way to prevent a custom type or type constructor from becoming an eqtype or admitting equality without hiding its data constructors or introducing new modules?
Short answer is no. When a type's definition is visible, it's eq-ness is whatever the definition implies. The only way to prevent it being eq then is to tweak the definition such that it isn't, for example, by adding a dummy constructor with a real parameter.
Btw, small correction: your type foo should be an equality type. If your SML implementation disagrees then it has a bug. A different case is real bar when datatype 'a bar = Bar of 'a ref (which is what the MLton manual discusses). The reason that the first one works but the second doesn't is that ref is magic in SML: it has a form of polymorphic eq-ness that user types cannot have.

Cyclic type definition in OCaml

Obviously the following type definition is cyclic:
type node = int * node;;
Error: The type abbreviation node is cyclic
My question is how comes the following one is not cyclic?
type tree = Node of int * tree;;
The second definition also refers to itself.
One way to look at it is that node is an abbreviation for a type, not a new type itself. So the compiler (or anybody who's interested) has to look inside to see what it's an abbreviation for. Once you look inside you start noticing things that make it difficult to analyze (e.g., that it's a recursive type and hence can require many unfoldings).
On the other hand, tree is a new type that's characterized by its constructors. (In this case, just the one constructor Node). So the compiler (or other interested party) doesn't need to look inside at all to determine what the type is. Once you see Node the type is determined. Even if you do look inside, you only need to look down one level. This allows recursion without causing any difficulties in analysis.
As a practical matter, recursive types of the first sort are often unintentional, and they lead to strange typings. The second sort are virtually impossible to create by mistake because of the little signposts (constructors) all along the way; in fact they're kind of like the lifeblood of the type system.

What does `B means?

In toplevel, i get the following output:
#`B
- : [> `B ] = `B
then what does `B mean ? Why do we need it ?
Sincerely!
An identifier prefixed with a backquote like `B is a constructor of a polymorphic variant type. It's similar to the constructor of an algebraic type:
type abc = A | B | C
However, you can use polymorphic variant values without declaring them, and in general they're much more flexible than the usual algebraic types. The tradeoff is that they're also quite a bit trickier to use.
One thing people use them for is as simple named values, like enum values in C. Or, more precisely, like atoms in Lisp. You can use ordinary algebraic types for this, but you need to carefully maintain your definitions of them and guard against duplication. With polymorphic variants, you don't need to do either of these. You can use them without declaring them, and the constructors aren't required to be unique (two different types can have the same constructor).
Polymorphic variant constructors can also take parameters, as algebraic constructors can. So you can also write (`B 77), a constructor with a single int parameter.
This is a pretty big topic--see the above linked section of the OCaml manual for more details.
It's a polymorphic variant. From the documentation:
Variants as presented in section 1.4 are a powerful tool to build data structures and algorithms. However they sometimes lack flexibility when used in modular programming. This is due to the fact every constructor reserves a name to be used with a unique type. One cannot use the same name in another type, or consider a value of some type to belong to some other type with more constructors.
With polymorphic variants, this original assumption is removed. That is, a variant tag does not belong to any type in particular, the type system will just check that it is an admissible value according to its use. You need not define a type before using a variant tag. A variant type will be inferred independently for each of its uses.