Maybe and Either monads, short-circuiting, and performance - c++

Functional Programming in C++, at page 214, with reference to an expected<T,E> monad which is the same as Haskell's Either, reads
[...] as soon as any of the functions you're binding to returns an error, the execution will stop and return that error to the caller.
Then, in a caption just below, it reads
If you call mbind [equivalent to Haskell's >>=] on an expected that contains an error,, mbind won't even invoke the transformation function; it will just forward that error to the result.
which seems to "adjust" what was written earlier. (I'm pretty sure that either LYAH or RWH underlines somewhere that there's no short-circuiting; if you remember where, please, remind me about it.)
Indeed, my understanding, from Haskell, is that in a chain of monadic bindings, all of the bindings happen for real; then what they do with the function passed to them as a second argument, is up to the specific monad.
In the case of Maybe and Either, when the bindings are passed a Nothing or Left x argument, then the second argument is ignored.
Still, in these specific two cases, I wonder if there's a performance penalty in doing something like this
justPlus1 = Just . (+1)
turnToNothing = const Nothing
Just 3 >>= turnToNothing >>= justPlus1
>>= justPlus1
>>= justPlus1
>>= justPlus1
>>= justPlus1
as in these cases the chain can't really do anything other than what it does, given that
Nothing >>= _ = Nothing
Left l >>= _ = Left l

Consider the following expression:
result :: Maybe Int
result = x >>= f >>= g >>= h
In that expression, of course, x :: Maybe a for some a, and each of f, g, and h are functions, with h returning Maybe Int but the intermediate types of the pipeline could be anything wrapped in a Maybe. Perhaps f :: String -> Maybe String, g :: String -> Maybe Char, h :: Char -> Maybe Int.
Let's make the associativity explicit as well:
result :: Maybe Int
result = ((x >>= f) >>= g) >>= h
To evaluate the expression, each bind (>>=) must in fact be called, but not necessarily the functions f, g, or h. Ultimately the bind to h needs to inspect its left-hand argument to decide whether it is Nothing or Just something; in order to determine that we need to call the bind to g, and to decide that we need to call the bind to f, which must at least look at x. But once any one of these binds produces Nothing, we pay only for checking Nothing at each step (very cheap), and not for calling the (possibly expensive) downstream functions.
Suppose that x = Nothing. Then the bind to f inspects that, sees Nothing, and doesn't bother to call f at all. But we still need to bind the result of that in order to know if it's Nothing or not. And this continues down the chain until finally we get result = Nothing, having called >>= three times but none of the functions f, g, or h.
Either behaves similarly with Left values, and other monads may have different behaviors. A list may call each function one time, many times, or no times; the tuple monad calls each function exactly once, having no short-circuiting or other multiplicity features.

You seem to have a misunderstanding of how the Monad instances for these types work in Haskell. You say:
Indeed, my understanding, from Haskell, is that in a chain of monadic functions, all of the functions are called,
But this demonstrably isn't the case. Indeed any time you compute
Nothing >>= f
where f is any function of type a -> Maybe b, then it is computed according to the implementation of >>= for the Maybe monad, which is:
Just x >>= f = f x
Nothing >>= f = Nothing
So f will indeed be called in the Just case, but not in the Nothing case. So we see that there is indeed "short circuiting". Indeed, since Haskell is lazy, every function "short circuits" by default - nothing is ever computed unless it's needed to produce a result.
It's an interesting question to ask about performance - and not a question I personally know how to answer. Certainly, as I just explained, none of the following functions in the chain will be evaluated once a Nothing is encountered - but performing the pattern matching to see this is unlikely to be free. Perhaps the compiler is able to optimise this away, by reasoning that it can abandon the entire calculation once it hits a Nothing. But I'm not sure.

Related

SML Operator and operand don't agree in foldr

I'm working on an assignment where I have to write a function to get the length of a list. This is a trivial task, but I've come across something that I don't understand.
My simple code
val len = foldr (fn(_, y) => y + 1) 0
produces this warning
Warning: type vars not generalized because of
value restriction are instantiated to dummy types (X1,X2,...)
and when I try to run it in the REPL, I get this:
len [1, 2, 3, 4];
stdIn:18.1-18.17 Error: operator and operand don't agree [overload conflict]
operator domain: ?.X1 list operand:
[int ty] list in expression:
len (1 :: 2 :: 3 :: <exp> :: <exp>)
I don't understand why this doesn't work. I do know some functional programming principles, and this should work, since its very simple partial application.
Of course I can make it work without partial application, like this
fun len xs = foldr (fn(_, y) => y + 1) 0 xs
but I would like to understand why the first version doesn't work.
This is an instance of the value restriction rule application:
In short, the value restriction says that generalization can only occur if the right-hand side of an expression is syntactically a value.
Syntactically,
foldr (fn(_, y) => y + 1) 0
is not a value, it's a function application, that's why it hasn't been assigned a polymorphic type. It has been instantiated with a dummy type, which has a very limited use, e.g. this works:
len []
but in most cases len defined as val is useless.
This restriction exists to guarantee type safety in the presence of variable assignment (via references). More details can be found in the linked page.

Do OCaml 'underscore types' (e.g. '_a) introduce the possibility of runtime type errors / soundness violations?

I was reading a little bit about the value restriction in Standard ML and tried translating the example to OCaml to see what it would do. It seems like OCaml produces these types in contexts where SML would reject a program due to the value restriction. I've also seen them in other contexts like empty hash tables that haven't been "specialized" to a particular type yet.
http://mlton.org/ValueRestriction
Here's an example of a rejected program in SML:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
If you enter the first line verbatim into the SML of New Jersey repl you get
the following error:
- val r: 'a option ref = ref NONE;
stdIn:1.6-1.33 Error: explicit type variable cannot be generalized at its binding declaration: 'a
If you leave off the explicit type annotation you get
- val r = ref NONE
stdIn:1.6-1.18 Warning: type vars not generalized because of
value restriction are instantiated to dummy types (X1,X2,...)
val r = ref NONE : ?.X1 option ref
What exactly is this dummy type? It seems like it's completely inaccessible and fails to unify with anything
- r := SOME 5;
stdIn:1.2-1.13 Error: operator and operand don't agree [overload conflict]
operator domain: ?.X1 option ref * ?.X1 option
operand: ?.X1 option ref * [int ty] option
in expression:
r := SOME 5
In OCaml, by contrast, the dummy type variable is accessible and unifies with the first thing it can.
# let r : 'a option ref = ref None;;
val r : '_a option ref = {contents = None}
# r := Some 5;;
- : unit = ()
# r ;;
- : int option ref = {contents = Some 5}
This is sort of confusing and raises a few questions.
1) Could a conforming SML implementation choose to make the "dummy" type above accessible?
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
Weakly polymorphic types (the '_-style types) are a programming convenience rather than a true extension of the type system.
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
OCaml does not sacrifice value restriction, it rather implements a heuristic that saves you from systematically annotating the type of values like ref None whose type is only “weekly” polymorphic. This heuristic by looking at the current “compilation unit”: if it can determine the actual type for a weekly polymorphic type, then everything works as if the initial declaration had the appropriate type annotation, otherwise the compilation unit is rejected with the message:
Error: The type of this expression, '_a option ref,
contains type variables that cannot be generalized
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
This is because '_a is not a “real” type, for instance it is forbidden to write a signature explicitly defining values of this “type”:
# module A : sig val table : '_a option ref end = struct let option = ref None end;;
Characters 27-30:
module A : sig val table : '_a option ref end = struct let option = ref None end;;
^^^
Error: The type variable name '_a is not allowed in programs
It is possible to avoid using these weakly polymorphic types by using recursive declarations to pack together the weakly polymorphic variable declaration and the later function usage that completes the type definition, e.g.:
# let rec r = ref None and set x = r := Some(x + 1);;
val r : int option ref = {contents = None}
val set : int -> unit = <fun>
1) Could a conforming SML implementation choose to make the "dummy" type above accessible?
The revised Definition (SML97) doesn't specify that there be a "dummy" type; all it formally specifies is that the val can't introduce a polymorphic type variable, since the right-hand-side expression isn't a non-expansive expression. (There are also some comments about type variables not leaking into the top level, but as Andreas Rossberg points out in his Defects in the Revised Definition of Standard ML, those comments are really about undetermined types rather than the type variables that appear in the definition's formalism, so they can't really be taken as part of the requirements.)
In practice, I think there are four approaches that implementations take:
some implementations reject the affected declarations during type-checking, and force the programmer to specify a monomorphic type.
some implementations, such as MLton, prevent generalization, but defer unification, so that the appropriate monomorphic type can become clear later in the program.
SML/NJ, as you've seen, issues a warning and instantiates a dummy type that cannot subsequently be unified with any other type.
I think I've heard that some implementation defaults to int? I'm not sure.
All of these options are presumably permitted and apparently sound, though the "defer unification" approach does require care to ensure that the type doesn't unify with an as-yet-ungenerated type name (especially a type name from inside a functor, since then the monomorphic type may correspond to different types in different applications of the functor, which would of course have the same sorts of problems as a regular polymorphic type).
2) How does OCaml preserve soundness without the value restriction? Does it make weaker guarantees than SML does?
I'm not very familiar with OCaml, but from what you write, it sounds like it uses the same approach as MLton; so, it should not have to sacrifice soundness.
(By the way, despite what you imply, OCaml does have the value restriction. There are some differences between the value restriction in OCaml and the one in SML, but none of your code-snippets relates to those differences. Your code snippets just demonstrate some differences in how the restriction is enforced in OCaml vs. one implementation of SML.)
3) The type '_a option ref seems less polymorphic than 'a option ref. Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
Again, I'm not very familiar with OCaml, but — yeah, that seems like a mistake to me!
To answer the second part of your last question,
3) [...] Why isn't let r : 'a option ref = ref None;; (with an explicit annotation) rejected in OCaml?
That is because OCaml has a different interpretation of type variables occurring in type annotations: it interprets them as existentially quantified, not universally quantified. That is, a type annotation only has to be right for some possible instantiation of its variables, not for all. For example, even
let n : 'a = 5
is totally valid in OCaml. Arguably, this is rather misleading and not the best design choice.
To enforce polymorphism in OCaml, you have to write something like
let n : 'a. 'a = 5
which would indeed cause an error. However, this introduces a local quantifiers, so is still somewhat different from SML, and doesn't work for examples where 'a needs to be bound elsewhere, e.g. the following:
fun pair (x : 'a) (y : 'a) = (x, y)
In OCaml, you have to rewrite this to
let pair : 'a. 'a -> 'a -> 'a * 'a = fun x y -> (x, y)

Does compare work for all types?

Let's consider a type t and two variables x,y of type t.
Will the call compare x y be valid for any type t? I couldn't find any counterexample.
The polymorphic compare function works by recursively exploring the structure of values, providing an ad-hoc total ordering on OCaml values, used to define structural equality tested by the polymorphic = operator.
It is, by design, not defined on functions and closures, as observed by #antron. The recursive nature of the definition implies that structural equality is not defined on values containing a function or a closure. This recursive nature also imply that the compare function is not defined on recursive values, as mentioned by a #antron as well.
Structural equality, and therefore the compare function and the comparison operators, is not aware of structure invariants and cannot be used to compare (mildly) advanced data structures such as Sets, Maps, HashTbls and so on. If comparison of these structures is desired, a specialised function has to be written, this is why Set and Map define such a function.
When defining your own structures, a good rule of thumb is to distinguish between
concrete types, which are defined only in terms of primitive types and other concrete types. Concrete types should not be used for structures whose processing expects some invariants, because it is easy to create arbitrary values of this type breaking these invariants. For these types, the polymorphic comparison function and operators are appropriate.
abstract types, whose concrete definition is hidden. For these types, it is best to provide specialised comparison function. The mixture library defines a compare mixin that can be used to derive comparison operators from the implementation of a specialised compare function. Its use is illustrated in the README.
It doesn't work for function types:
# compare (fun x -> x) (fun x -> x);;
Exception: Invalid_argument "equal: functional value".
Likewise, it won't (generally) work for other types whose values can contain functions:
# type t = A | B of (int -> int);;
type t = A | B of (int -> int)
# compare A A;;
- : int = 0
# compare (B (fun x -> x)) A;;
- : int = 1
# compare (B (fun x -> x)) (B (fun x -> x));;
Exception: Invalid_argument "equal: functional value".
It also doesn't (generally) work for recursive values:
# type t = {self : t};;
type t = { self : t; }
# let rec v = {self = v};;
val v : t = {self = <cycle>}
# let rec v' = {self = v'};;
val v' : t = {self = <cycle>}
# compare v v;;
- : int = 0
# compare v v';;
(* Does not terminate. *)
These cases are also listed in the documentation for compare in Pervasives.

Multipying int64 types in OCAML

I am attempting to multiply two int types in OCAML and I am not sure on what I might be doing wrong
let prime = Int64.of_string("0x100000002b2") in
let temp = ref prime in
hash := Int64.mul(!temp,prime);
I get the error
Error: This expression has type 'a * 'b
but an expression was expected of type int64
Any suggestions on how I can fix this ?
Update:
I got reference to this method from here
I am curious what this means
val mul : int64 -> int64 -> int64
Multiplication.
How do we know how many parameters this method takes ?
Function parameters in OCaml are (in the usual idiom) placed after the name of the function, with no parentheses and no comma.
# Int64.mul 8L 9L;;
- : int64 = 72L
Commas are used to create tuples, but Int64.mul doesn't accept a tuple. It accepts two separate arguments as above. (In FP parlance, it's a curried function.)
(It might be worth working through a short tutorial on OCaml. You seem to be assuming it's like traditional C family languages, but it's rather different.)
Update
The type x -> y is the type of a function that accepts a parameter of type x and returns a value of type y. The type x -> y -> z is the type of a (curried) function that takes two parameters of types x and y and returns a value of type z. (This is a somewhat simplified way of looking at things, but is close enough to get started with.)
So the function mul that you cite takes two parameters of type int64 and returns a value of type int64.
(I repeat my advice about an OCaml tutorial. It's really worth learning about the OCaml type system before getting too deep into coding.)

OCaml type of the plus operator

Why is the type of a plus ( + ) considered to be int -> int -> int as opposed to (int * int) -> int? To me, the second makes sense because it "accepts" a 2-tuple (the addends) and returns a single int (their sum).
Thank you!
You can make a language where (+) has the type (int * int) -> int. In fact, SML works exactly this way. It just affects the meaning of infix operators. However OCaml conventions strongly favor the use of curried functions (of the type a -> b -> c) rather than uncurried ones. One nice result is that you can partially apply them. For example ((+) 7) is a meaningful expression of type int -> int. I find this notation useful quite often.
This might seem a little unhelpful, but it's because the function takes two arguments.
When a function takes a tuple, it is in effect taking a single argument.
Because (+) is an inline function, taking a single argument would not be useful, as it would look like + (1,2) as opposed to 1 + 2.