In toplevel, i get the following output:
#`B
- : [> `B ] = `B
then what does `B mean ? Why do we need it ?
Sincerely!
An identifier prefixed with a backquote like `B is a constructor of a polymorphic variant type. It's similar to the constructor of an algebraic type:
type abc = A | B | C
However, you can use polymorphic variant values without declaring them, and in general they're much more flexible than the usual algebraic types. The tradeoff is that they're also quite a bit trickier to use.
One thing people use them for is as simple named values, like enum values in C. Or, more precisely, like atoms in Lisp. You can use ordinary algebraic types for this, but you need to carefully maintain your definitions of them and guard against duplication. With polymorphic variants, you don't need to do either of these. You can use them without declaring them, and the constructors aren't required to be unique (two different types can have the same constructor).
Polymorphic variant constructors can also take parameters, as algebraic constructors can. So you can also write (`B 77), a constructor with a single int parameter.
This is a pretty big topic--see the above linked section of the OCaml manual for more details.
It's a polymorphic variant. From the documentation:
Variants as presented in section 1.4 are a powerful tool to build data structures and algorithms. However they sometimes lack flexibility when used in modular programming. This is due to the fact every constructor reserves a name to be used with a unique type. One cannot use the same name in another type, or consider a value of some type to belong to some other type with more constructors.
With polymorphic variants, this original assumption is removed. That is, a variant tag does not belong to any type in particular, the type system will just check that it is an admissible value according to its use. You need not define a type before using a variant tag. A variant type will be inferred independently for each of its uses.
Related
I have read in this post that ML dialects do not allow type variables of non-ground kind. E.g. the last statement is not representable:
-- Haskell code
type Ground = Int
type FirstOrder a = Maybe a
type SecondOrder c = c Int -- ML do not allow :c
OCaml has support of higher-kinded only at the level of modules. There are some explanations (here and author's comment here) about which features of OCaml clash with higher-kinded types opportunity.
If I understood it correctly, the main problem is in the following facts:
OCaml does not follow a "freshness" restriction for type definitions: construct type can define both an alias (an the type will remain the same) and a new fresh type
type alias definition can be hidden
AFAIK, Standard ML has different constructs for type definition and aliases: type for aliases and datatype for new fresh types introduction.
Unfortunatelly, I do not know SML well enough -- is it possible to export type aliases with its definition hidden? And can someone please show me if there are any other SML features that still do not go well with an opportunity of higher-kinded types?
Probably there will be some problems with functors -- Could one be so kind to show a code example of it? I've heard several times about such cases but still have not found a complete example of it.
Yes, SML can express the equivalent of higher-kinded types through functors, and can also make them abstract. Useless example:
functor F (type 'a t) :> sig type 'a u end =
struct
type 'a u = ('a t) t
end
However, unlike OCaml, SML does not (officially) have higher-order functors, so per the standard, you can only express second-order type constructors this way.
FWIW, OCaml may use the same keyword for type aliases and generative types (type vs datatype in SML), but they are still distinguished syntactically, by their right-hand side. So that's no real difference to SML.In both languages, an abstract occurring in a signature can be implemented as either a type alias or a generative type. So the problem for type inference that Leo is alluding to exists equally in both. Haskell can get away without that problem because it does not have the same expressiveness regarding type abstraction (i.e., no "sealing" operator for modules).
C++ sometimes uses the suffix _type on type definitions (e.g. std::vector<T>::value_type),
also sometimes _t (e.g. std::size_t), or no suffix (normal classes, and also typedefs like std::string which is really std::basic_string<...>)
Are there any good conventions on when to use which name?
As #MarcoA.'s answer correctly points out, the suffix _t is largely inherited from C (and in the global namespace - reserved for POSIX).
This leaves us with "no suffix" and _type.
Notice that there is no namespace-scope name in std ending in _type*; all such names are members of classes and class templates (or, in the case of regex-related types, of a nested namespace which largely plays a role of a class). I think that's the distinction: types themselves don't use the _type suffix.
The suffix _type is only used on members which denote types, and moreover, usually when they denote a type somewhat "external" to the containing class. Compare std::vector<T>::value_type and std::vector<T>::size_type, which come from the vector's template parameters T and Allocator, respectively, against std::vector<T>::iterator, which is "intrinsic" to the vector class template.
* Not entirely true, there are a few such names (also pointed out in a comment by #jrok): common_type, underlying_type, is_literal_type, true_type, false_type. In the first three, _type is not really a suffix, it's an actual part of the name (e.g. a metafunction to give the common type or the underlying type). With true_type and false_type, it is indeed a suffix (since true and false are reserved words). I would say it's a type which represents a true/false value in the type-based metaprogramming sense.
As a C heritage the _t (that used to mean "defined via typedef") syntax has been inherited (they're also SUS/POSIX-reserved in the global namespace).
Types added in C++ and not present in the original C language (e.g. size_type) don't need to be shortened.
Keep in mind that to the best of my knowledge this is more of an observation on an established convention rather than a general rule.
Member types are called type or something_type in the C++ standard library. This is readable and descriptive, and the added verbosity is not usually a problem because users don't normally spell out those type names: most of them are used in function signatures, then auto takes care of member function return types, and in C++14 the _t type aliases take care of type trait static type members.
That leads to the second point: Free-standing, non-member types are usually called something_t: size_t, int64_t, decay_t, etc. There is certainly an element of heritage from C in there, but the convention is maintained in the continuing evolution of C++. Presumably, succinctness is still a useful quality here, since those types are expected to be spelled out in general.
Finally, all the above only applies to what I might call "generic type derivation": Given X, give me some related type X::value_type, or given an integer, give me the 64-bit variant. The convention is thus restricted to common, vocabulary-type names. The class names of your actual business logic (including std::string) presumably do not warrant such a naming pattern, and I don't think many people would like to have to mangle every type name.
If you will, the _t and _type naming conventions apply primarily to the standard library and to certain aspects of the standard library style, but you do not need to take them as some kind of general mandate.
My answer is only relevant for type names within namespaces (that aren't std).
Use no suffix usually, and _type for enums
So, here's the thing: the identifier foo_type can be interpreted as
"the identifier of the type for things which are foo's" (e.g. size_type overall_size = v1.size() + v2.size();)
"the identifier of the type for things which are kinds, or types, of foo" (e.g. employment_type my_employment_type = FIXED_TERM;)
When you have typedef'ed enums in play, I think you would tend towards the second interpretation - otherwise, what would you call your enum types?
The common aversion to using no suffix is that seeing the identifier foo is confusing: Is it a variable, a specific foo? Or is it the type for foos? ... luckily, that's not an issue when you're in a namespace: my_ns::foo is obviously a type - you can't get it wrong (assuming you don't use global variables...); so no need for a prefix there.
PS - I employ the practice of suffixing my typedef's within classes with _type (pointer_type, value_type, reference_type etc.) I know that contradicts my advice above, but I somehow feel bad breaking with tradition on this point.
Now, you could ask - what happens if you have enums within classes? Well, I try to avoid those, and place my enum inside the surrounding namespace.
I have just read about row polymorphism and how it can be used for extensible records and polymorphic variants.
However, Ocaml uses subtyping for polymorphic variants. Why? Is it more powerful than row polymorphism?
OCaml uses both row polymoprhism and subtyping for polymorphic variants (and objects, for that matter). Row polymorphism is involved for "open" object types < m1 : t1; m2 : t2; .. > (the .. being literally part of the type), or "open" variant types [> `K1 of t1 | `K2 of t2 ]. Subtyping is used to be able to cast between closed, non-polymorphic types <m1:t1; m2:t2> :> <m1:t1> or [ `K1 of t1 ] :> [ `K1 of t1 | `K2 of t2 ].
Row polymorphism allows to avoid the need for bounded quantification to express types such as "take an object that has at least the method m, and return an object of the same type": subtyping is therefore rather simple, explicit, and cannot be abstracted over. On the contrary, row polymorphism is easier to infer and will play better with the rest of the type system. It should be rarely necessary to use closed types and explicit subtyping, but that is occasionally convenient -- and in particular, keeping type closed can produce error messages that are easier to understand.
To complement Gabriel's answer, one way to think about this is that subtyping provides a weak form of both universal and existential polymorphism. When both directions of parametric polymorphism are available, then the expressiveness of subtyping is mostly subsumed (especially when there is no depth subtyping). But that's not the case in Ocaml.
Ocaml replaces the universal aspect by actual universal polymorphism, but keeps subtyping to give you a form of existential quantification that it otherwise doesn't have. That is needed to form e.g. heterogeneous collections, such as a <a: int> list in which you want to be able to store arbitrary objects that at least have an a method of the right type.
I would go even further and say that, while this is usually explained as subtyping in the Ocaml world, you could actually interpret closed rows as existentially quantified over an (unknown) tail. Coercion via :> would then be existential introduction, thereby staying more faithful to the world of parametric polymorphism that rows are built upon. (Of course, under this interpretation, # would do implicit existential elimination.) If I were to design an Ocaml-like system from scratch, I'd probably try to model it that way.
I often need to declare a type which contains a map or a list, for instance:
type my_type_1 = my_type_0 IntMap.t
type my_type_2 = my_type_0 List
Also I have seen another style of declaration which encapsulates map or list in a record, for instance:
type my_type_1 =
| Bot_1
| Nb_1 of my_type_0 IntMap.t
type my_type_2 =
| Bot_2
| Nb_2 of my_type_0 List
My question is, whether there are some cases where the second style is necessary and better than the first style?
Thank you very much!
The two types you give are not equivalent, because of the Bot constructor added in the second case. This means that the two my_type_1 do not have the same semantics. Incidentally, the construction Bot | Foo of 'a is already provided by the standard type 'a option, with constructors Some and None, so the type my_type_1 of your second sample is equivalent to a my_type_1 option in the first one.
Whether to use an option type or your own constructors names is up to you. In general, I would advise to you an option type if the semantics of your type coincides with the option idea of failure, being absent, or being undefined. Given your name Bot, I assume this is probably what you're doing, but defining your own constructor names is also ok and can be clearer in some circumstances. The matter has been discussed in depth in this blog post from ezyang.
Now, assuming your two types definition were equivalent (that is, in absence of the Bot) constructor, what's the purpose of adding an algebraic datatype layer, a new constructor, instead of using a simple type alias ? Well, it has the effect of making your type distinct from the representation type. For example, if you define type 'a stack = Stack of 'a list, 'a stack and 'a list cannot be confused for each other, and the compiler will raise an error if you do. So that can be used to enforce a (light) type separation, with the constructor acting as a type annotation:
let empty = Stack []
let length (Stack li) = List.length li
I'd say it's mostly a matter of taste, but I would recommend using an algebraic datatype instead of an alias when you want to be sure that there can be no mistake with the original type. The downside is that you have to wrap the operations of the original datatype, as I did in my length function above.
Those are not different styles, but different types: the first type declarations are an abbreviation for a specialized instance (for mytype_0) of the polymorphic List, or IntMap.
The second set of definitions present a "constructed" type, for which Bot_1 (and Bot_2) provide values. Those "alternatives" can be used, for example, to create functions of type T -> my_type_1 which return Bot_1 in a special case where the computation doesn't allow to return a list, in a similar way of what an option type permits. This is impossible with the first set of definitions (who must always provide the required list payload).
The second one isn't a "record" (which is a different thing). It creates an algebraic data type. I'm not sure how to explain it but if you've used Haskell or Standard ML you'll know. It's basically a tagged union. A my_type_1 is either a Bot_1 (which carries no data) or a Nb_1 (which carries a my_type_0 IntMap.t as data).
The first one is simply a type synonym (like a typedef in C).
What all constructs(class,struct,union) classify as types in C++? Can anyone explain the rationale behind calling and qualifying certain C++ constructs as a 'type'.
$3.9/1 - "There are two kinds of
types: fundamental types and compound
types. Types describe objects (1.8),
references (8.3.2), or functions
(8.3.5). ]"
Fundamental types are char, int, bool and so on.
Compound types are arrays, enums, classes, references, unions etc
A variable contains a value.
A type is a specification of the value. (eg, number, text, date, person, truck)
All variables must have a type, because they must hold strictly defined values.
Types can be built-in primitives (such as int), custom types (such as enums and classes), or some other things.
Other answers address the kinds of types C++ makes available, so I'll address the motivation part. Note that C++ didn't invent the notion of a data type. Quoting from the Wikipedia entry on Type system.
a type system may be defined as "a
tractable syntactic framework for
classifying phrases according to the
kinds of values they compute"
Another interesting definition from the Data Type page:
a data type (or datatype) is a
classification identifying one of
various types of data, such as
floating-point, integer, or Boolean,
stating the possible values for that
type, the operations that can be done
on that type, and the way the values
of that type are stored
Note that this last one is very close to what C++ means by "type". Perhaps obvious for the built-in (fundamental) types like bool:
possible values are true and false
operations - per definition of arithmetic operators that can accept bool as argument
the way it's stored - actually not mandated by the C++ standard, but one can guess that on some systems a type requiring only a single bit can be stored efficiently (although I think most C++ systems don't do this optimization).
For more complex, user created, types, the situation is more difficult. Consider enum types: you know exactly the range of values a variable of an enum type can get. What about struct and class? There, also, your type declaration tells the compiler what possible values the struct can have, what operations you can do on it (operator overloading and functions accepting objects of this type), and it will even infer how to store it.
Re range of values, although huge, remember it's finite. Even a struct with N 32-bit integers has a finite range of possible values, which is 2^(32N).
Cite from the book "Bjarne Stroustrup - Programming Principles and Practice Using C++", page 77, chapter 3.8:
A type defines a set of possible values and a set of operations (for an object).
An object is some memory that holds a value of a given type.
A value is a set of bits in memory interpreted according to a type.
A variable is a named object.
A declaration is a statement that gives a name to an object.
A definition is a declaration that sets aside memory for an object.
Sounds like a matter of semantics to me...A type refers to something with a construct that can be used to describe it in a away that conforms to traditional Object Oriented concepts(properties and methods). Anything that isn't called a type is probably created with a less robust construct.