Wrapping side-effects in pure programming languages - monads

I'm researching possible ways to have computational effects in a pure programming language.
Monads are commonly presented as a way to wrap side effects in pure languages. I don't see how they help though. The problem I see is that monads can be copied or discarded.
In a pure language result of an operation should depend only on its arguments. But in a hypothetical language with
putStrLn :: String -> IO ()
the code
let a = putStrLn "1"
let b = putStrLn "2"
in ()
will get type Unit and fail to capture presence of side effects.
I can later also
let a1 = do
_ <- a
_ <- putStrLn "2.1"
let a2 = do
_ <- a
_ <- putStrLn "2.2"
in ()
Presumably diverging (not latest) state a into two paths a1 and a2. Or are all effects delayed until one selected IO () term is returned from the main function? I'm wondering then how purposely copyable monads holding mutable state are compiled (when a monad indeed diverged, at which point and how was the state copied?)
In contrast unique type seems to be able to naturally capture presence of the only one context
putStrLn :: 1 IO () -> String -> 1 IO ()
main :: 1 IO () -> 1 IO ()
main io =
let a = putStrLn io "1"
let b = putStrLn a "2"
// compile error, a has type 0 IO ()
// let a1 = putStrLn a "2.1"
in b
Which seems to capture that the state is irreversibly modified and cannot be accessed again. Thus all results of function calls indeed depend only on their arguments.

Related

Are ref's safe to use without a mutex in a parallel environment

I am writing a fairly asynchronous program using the Thread library. I'll have a lot of different threads that need to access string list ref, is this safe without using a Mutex? I learned earlier today that Hashtbl requires a Mutex, but I couldn't find any information on ref.
In brief, if you have concurrent writes to mutable shared resource, you should protect them with a mutex (or atomics).
In more details, there are three important points to keep in mind.
First, OCaml threads are not parallel. In OCaml 4 and before, the OCaml runtime uses a runtime lock to ensure that only one OCaml thread executes at any point in time. In OCaml 5, the unit of parallelism is domain, and to preserve backward compatibility, each domain uses a domain lock to ensure that only one OCaml thread executes by domain.
Second, OCaml is always memory safe. Even in presence of race conditions, memory access are guaranteed to be well-typed. For reference, this means that all values that you read from the reference will be values that were written to the reference; and not some strange amalgamate of your program states.
However, without synchronization, concurrent programs are not guaranteed to behave the same way as an equivalent sequential program.
For instance, the following program will reach the assert false clause
let global = ref []
let sum = ref 0
let incr () =
let state = !global in
let a = Array.init 1_000 Fun.id in
let updated = a :: state in
global := updated
let decr () =
match !global with
| [] -> assert false
| _ :: q ->
global := q
let balanced () =
let incrs = List.init 100 (fun _ -> Thread.create incr ()) in
let () = List.iter Thread.join incrs in
let decrs = List.init 100 (fun _ -> Thread.create decr ()) in
List.iter Thread.join decrs
let () =
while true do balanced () done
even if all calls to incr and decr are well balanced. The reason is that
the read and write to the shared global references in incr and decr are not guarantees to happen at the same time. Thus it is possible that two
calls to incr are interleaved in this way:
(* first call to incr *) | (* Second call to incr *)
let state = !global in |
let a = Array.init 1_000 Fun.id in |
| let state = !global in
let updated = a :: state in |
global := updated | let a = Array.init 1_000 Fun.id in
| let updated = a :: state in
| global := updated
which means that the second call to incr erases the update done by the first one, and after two calls to incr we end up with only one new element in the global list.
Consequently, synchronization primitives are a necessity as soon as you may have concurrent writes to the same mutable shared resource.
Third, in OCaml 5 (aka with parallelism) references cannot be used as synchronization primitives. This is the difference between references and atomics. In particular, if you have
module M: sig
val update: unit -> unit
val read: unit -> int option
end = struct
let written = ref false
let answer = ref 0
let update () =
answer := 1;
written := true
let read () =
if !written then Some !answer else None
end
then on some CPU architecture it might happen than read () returns Some 0 because there is no guarantee than the write to answer is seen before the write to written.
If your threads are accessing and modifying mutable state (Hashtbl values count as mutable state as do string list ref values) then yes, you should use Mutex.

How to make a pattern of ticktack game in Haskell?

Implement the function ticktack which has 2 arguments. First argument is a tuple of natural numbers and defines the number of rows and columns of a play field. Second list contains a record of a match of ticktacktoe game given by coordinates on which played in turns player 'x' and player 'o'. Print actual state of the game in the way where play-field will be bordered by characters '-' and '|', empty squares ' ' and characters 'x' and 'o' will be on squares where the players have played.
ticktack::(Int,Int) -> [(Int,Int)] -> Result
I already tried something like this:
type Result = [String]
pp :: Result -> IO ()
pp x = putStr (concat (map (++"\n") x))
ticktack::(Int,Int) -> [(Int,Int)] -> Result
ticktack (0,0) (x:xs) = []
ticktack (a,b) [] = []
ticktack (a,b) (x:xs) =[ [if a == fst x && b == snd x then 'x' else ' ' | b <- [1..b]]| a <- [1..a]] ++ ticktack (a,b) xs
But it returns me only N [Strings] with 1 result, so i need these results merge into one [String].
As #luqui says in a comment to the question:
You could either merge the outputs ... or you could search the history once for each space in
the board. ...
The former solution is described in a nearby question. The "chess" problem having been
solved there is only superficially distinct from your "noughts & crosses" problem, so it should
not be too hard to adapt the solution. However:
In that case, the board size is fixed and small, so we were not worried about the inefficiency
of merging the boards pairwise.
In this case, the board size is variable, so a solution by the latter method may be worth a try.
To make the algorithm even more efficient, instead of scrolling across the board and searching for
matching moves at every cell, we will scroll across the moves and assign values to a board
represented as a mutable array. Mutable arrays may be considered an "advanced technique" in
functional programming, so it could also be a good exercise for an intermediate Haskeller. I only
used them once or twice before, so let us see if I can figure this out!
How is this going to work?
At the heart of the program will be a rectangular array of bytes. An array goes in two flavours:
mutable and "frozen". While a frozen array cannot be changed, It is a rule that a mutable array
may only exist in a monadic context, so we can only freely pass around an array when it is frozen.
If this seems to be overcomplicated, I can only ask the reader to believe that the additional
safety guarantees are worth this complication.
Anyway, here are the types:
type Position = (Int, Int)
type Field s = STUArray s Position Char
type FrozenField = UArray Position Char
We will create a function that "applies" a list of moves to an array, thawing and freezing it as
needed.
type Move = (Char, Position)
applyMoves :: FrozenField -> [Move] -> FrozenField
(The idea of Move is that it is sufficient to put a mark on the board, without needing to know
whose turn it is.)
Applied to an empty field of the appropriate size, this function will solve our problem — we shall
only need to adjust the format of the input and the output.
empty :: Position -> FrozenField
positionsToMoves :: [Position] -> [Move]
arrayToLists :: FrozenField -> [[Char]]
Our final program will then look like this:
tictac :: Position -> [Position] -> IO ()
tictac corner = pp . arrayToLists . applyMoves (empty corner) . positionsToMoves
I hope it looks sensible? Even though we have not yet written any tangible code.
Can we write the code?
Yes.
First, we will need some imports. No one likes imports, but, for some reason, it is not yet
automated. So, here:
import Data.Foldable (traverse_)
import Data.Array.Unboxed
import Data.Array.ST
import GHC.ST (ST)
The simplest thing one can do with arrays is to create an empty one. Let us give it a try:
empty :: Position -> FrozenField
empty corner = runSTUArray (newArray ((1, 1), corner) ' ')
The idea is that newArray claims a region in memory and fills it with spaces, and runSTUArray
freezes it so that it can be safely transported to another part of a program. We could instead
"inline" the creation of the array and win some speed, but we only need to do it once, and I
wanted to keep it composable — I think the program will be simpler this way.
Another easy thing to do is to write the "glue" code that adjusts the input and output format:
positionsToMoves :: [Position] -> [Move]
positionsToMoves = zip (cycle ['x', 'o'])
arrayToLists :: FrozenField -> [[Char]]
arrayToLists u =
let ((minHeight, minWidth), (maxHeight, maxWidth)) = bounds u
in [ [ u ! (row, column) | column <- [minWidth.. maxWidth] ] | row <- [minHeight.. maxHeight] ]
Nothing unusual here, run-of-the-mill list processing.
Finally, the hard part — the code that applies any number of moves to a given frozen array:
applyMoves :: FrozenField -> [Move] -> FrozenField
applyMoves start xs = runSTUArray (foldST applyMove (thaw start) xs)
where
foldST :: (a -> b -> ST s ()) -> ST s a -> [b] -> ST s a
foldST f start' moves = do
u <- start'
traverse_ (f u) moves
return u
applyMove :: Field s -> Move -> ST s ()
applyMove u (x, i) = writeArray u i x
The pattern is the same as in the function empty: modify an array, then freeze it — and all the
modifications have to happen in an ST monad, for safety. foldST contains all the
"imperative" "inner loop" of our program.
(P.S.) How does this actually work?
Let us unwrap the UArray and STUArray types first. What are they and what is the difference?
UArray means "unboxed array", which is to say an array of values, as opposed to an array of
pointers. The value in our case is actually a Unicode character, not a C "byte" char, so it is not a byte, but a variable
size entity. When it is stored in unboxed form, it is converted to an Int32 and back invisibly
to us. An Int32 is of course way too much for our humble purpose of storing 3 different values,
so there is space for improvement here. To find out more about unboxed values, I invite you to
check the article that introduced them back in 1991, "Unboxed Values as First Class Citizens in
a Non-Strict Functional Language".
That the values are unboxed does not mean that you can change them though. A pure value in Haskell
is always immutable. So, were you to change a single value in an array, the whole array would be
copied — expensive! This is where STUArray comes in. ST stands for State Thread, and what
STUArray is is an "unfrozen" array, where you can overwrite individual values without copying
the whole thing. To ensure safety, it can only live in a monad, in this case the ST monad.
(Notice how an STUArray value never appears outside of an ST s wrap.) You can imagine an
ST computation as a small imperative process with its own memory, separate from the outside
world. The story goes that they invented ST first, and then figured out they can get IO from
it, so IO is actually ST in disguise. For more details on how ST works, check out the
original article from 1994: "Lazy Functional State Threads".
Let us now take a more careful look at foldST. What we see is that functionally, it does not
make sense. First we bind the value of start' to u, and then we return u — the same
variable. From the functional point of view, this is the same as writing:
u <- start'
return u
— Which would be equivalent to u by monad laws. The trick is in what happens inbetween:
traverse_ (f u) moves
Let us check the type.
λ :type traverse_
traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f ()
So, some function is being called, with u as argument, but the result is the useless () type.
In a functional setting, this line would mean nothing. But in a monad, bound values may appear
to change. f is a function that can change the state of a monad, and so can change the value of
the bound names when they are returned. The analogous code in C would go somewhat like this:
char* foldST(void f(char*, Move), int n_start, char start[], int n_moves, Move moves[])
{
// u <- start
char* u = malloc(sizeof(char) * n_start);
memcpy(u, start, sizeof(char) * n_start);
// traverse_ (f u) moves
for (int i = 0; i < n_moves; i++)
{
f(u, moves[i]);
}
// return u
return u;
}
In Haskell, the pointer arithmetic is abstracted away, but essentially traverse_ in ST works
like this. I am not really familiar with C nor with the inner workings of the ST abstraction, so
this is merely an analogy, not an attempt at a precise rendition. Nevertheless I hope it helps the reader to observe the similarity between ST and ordinary imperative C code.
Mission accomplished!
It runs reasonably fast. Takes only a moment to draw a million-step match on a million-sized
board. I hope it is also explained clearly enough. Do not hesitate to comment if something is amiss or unclear.

How to print lists with different types in Ocaml?

I'm working with lists in Ocaml, so I wrote a function that prints the content of the list.
Here's my code:
let leastB = [false; true; true; true; true; false; false; false; true]
let leastI = [-16; 4; 7; 3444; -100]
let prListInt l =
Printf.printf "[";
let rec prListIntrec l =
match l with
[] -> Printf.printf "]\n"
| g :: []-> Printf.printf "%d]\n" g
| g :: t -> Printf.printf "%d; " g; prListIntrec t
in
prListIntrec l
let prListB l =
Printf.printf "[";
let rec prListBrec l =
match l with
[] -> Printf.printf "]\n"
| g :: []-> Printf.printf "%B]\n" g
| g :: t -> Printf.printf "%B; " g; prListBrec t
in
prListBrec l
let () =
prListB leastB;
prListInt leastI
It's working fine, but do I need to have a new function for every type of data or is there a way to unify those functions ?
Any hints for improvments in the code (is it idiomatic Ocaml ?) ?
The idiomatic way to print a list of things in OCaml is to use the standard Format module, which, by the way, provides the pp_print_list function, that takes an item printer, a separator printer, and the list and prints a list of said items, e.g.,
let pp_comma ppf () = Format.fprintf ppf ",# "
let pp_int_list ppf ints =
Format.fprintf ppf "[%a]"
Format.(pp_print_list ~pp_sep:pp_comma pp_print_int) ints
# Format.printf "#[hello = [%a]#]#\n" pp_int_list [1;2;3];;
hello = [1, 2, 3]
The pp_print_list function has type
?pp_sep:(formatter -> unit -> unit) ->
(formatter -> 'a -> unit) ->
(formatter -> 'a list -> unit)
I.e., it is a function that takes two functions (one of which is optional) and returns a function that prints a list. If we will omit the pp_sep parameter for brevity, then we can say that pp_print_list takes an 'a printer and returns an 'a list printer, where printer is a function of type formatter -> 'a -> unit. The pp_print_list function is a so-called higher-order function since it takes another function as a parameter. In functional programming in general and in OCaml in particular, higher-order functions are very common, so nobody actually makes a big deal of it.
Since OCaml doesn't have any introspection facilities and the type information is erased during the compilation, there is no generic pretty-printing facilities in the language, beyond the Format module. Therefore, for each newly defined type we have to provide a printer, i.e., a function of type formatter -> t -> unit, where t is our new type. This is an idiom common to many statically compiled languages, cf. with Haskell's Show class or implementing the << function in C++. There are no type classes or overloading in OCaml, therefore there is no official or canonical name for the printer function, but there is a convention to name such function pp, cf Int.pp and Float.pp, etc in the Janestreet's Core library.
If we will go back to your solution, then it could be easily seen that printListBrec and prListIntrec are basically the same, they only differ in the way how the items are printed. Therefore, we can parametrize such function with a function that takes a value of the list item type and, for example, translates it to a string type, e.g.,
let rec prGen prItem l =
match l with
[] -> Printf.printf "]\n"
| g :: []-> Printf.printf "%s]\n" (prItem g) g
| g :: t -> Printf.printf "%s; " (prItem g); prGen prItem t
This function would be already more generic, though a little bit non-idiomatic. Since translating to a string and then printing is not very efficient (why to create an intermediate object if we can print it directly), we use printers and %a specifiers (see the corresponding Printf and Format modules for the description of the printers). Finally, we do not use camelCase in OCaml (despite that OCaml is OCaml).
If you compare the two function you might see that they are nearly identical except for the %d vs. %B.
Now one way to unify them would be to pass the format string as argument. But with the black box magic that is behind format strings that isn't the easiest. It's also not a general solution as not all values can be printed with a simple format string.
So to make this even more general you can pass a function that prints out one element of the list and unify all the remaining logic about adding "[", "; " and "]" and iterating over the list.
In your case constructing the functions to print one element is trivial because ocaml uses currying. You simply use (Printf.printf "%d") and (Printf.printf "%B"). In other cases more complex functions can be passed to print more complex list elements.
PS: Not giving any source because this could be homework. The info here should get you started in the right direction.

Passing a randomly generated list as a parameter in Haskell

I am new to Haskell and really having trouble with the whole IO thing.
I am trying to find out how long it takes to traverse a list in haskell. I wanted to generate a list of random numbers and pass it as a parameter to a function so that I can print each element of the list. I am using CRITERION package for the benchmark. Here is the code:
{-# LANGUAGE OverloadedStrings #-}
import System.Random
import Control.Exception
import Criterion.Main
printElements [] = return ()
printElements (x:xs) = do print(x)
printElements xs
randomList 0 = return []
randomList n = do
x <- randomRIO (1,100)
xs <- randomList (n-1)
return (x:xs)
main = defaultMain [
bgroup "printElements" [ bench "[1,2,3]" $ whnf printElements (randomList 10)
, bench "[4,5,6]" $ whnf printElements [4,5,6,4,2,5]
, bench "[7,8,9]" $ whnf printElements [7,8,9,2,3,4]
, bench "[10,11,12]" $ whnf printElements [10,11, 12,4,5]
]
]
Error when I run the code:
listtraversal.hs:18:67:
Couldn't match expected type ‘[a0]’ with actual type ‘IO [t0]’
In the second argument of ‘whnf’, namely ‘(randomList 10)’
In the second argument of ‘($)’, namely
‘whnf printElements (randomList 10)’
Put briefly, you need to bind your function to the IO value, instead of trying to apply it to the value wrapped inside the IO value.
-- instead of whnf printElements (randomList 10)
randomList 10 >>= whnf printElements
randomList does not return a list of values; it returns an IO action that, when executed, can produce a list of values. Ignoring the various constraints induced by the implementation, the type is
randomList :: (...) => t1 -> IO [t] -- not t1 -> [t]
As such, you can't directly work with the list of values that the IO action can produce; you need to use the monad instance to bind the value to an appropriate function. whnf printElements is one such function; it takes a list and returns an IO action.
whnf printElements :: Show a => [a] -> IO ()
Instead of pulling the list out and passing it to whnf printElements, we "push" the function into an IO value using >>=. That operator's type, specialized to the IO monad, is
(>>=) :: IO a -> (a -> IO b) -> IO b
In this case, the first IO a value is the IO [t] value returned by randomList. whnf printElements is the a -> IO b function we bind to.
The result is a new IO value that take the first IO value, pulls out the wrapped value, applies the given function, and returns the result.
In other words, the IO monad itself takes care of pulling apart the result from randomList and applying your function to it, rather than you doing it explicitly.
(You might have noticed that I've said that >>= binds a value to a function and vice versa. It is perhaps more accurate to say that >>= binds them together into a single IO action.)

A better way than counting the length of a list of units

I sometimes find myself writing code like this:
someFunc :: Foo -> Int
someFunc foo = length $ do
x <- someList
guard someGuard
return ()
Or equivalently:
someFunc foo = length [() | x <- someList, someGuard]
Is there a better way to perform this sort of computation? More efficient? More readable? More idiomatic?
Primo
guard someGuard
return ()
is redundant, guard already returns () if the condition is true. Then I suppose someGuard actually depends on x, otherwise it would be if someGuard then length someList else 0. The usual way to write it is
someFunc foo = filter (\x -> someGuard) someList
if the situation is really as simple as your example looks. For more complicated situations, using one of your example styles is the most direct way. I find the do-notation preferable if things get really complicated.
If you find yourself repeatedly programming to a pattern, the thing to do is write a higher-order function to encapsulate that pattern. You could use the body you have, but in order to be utterly confident that your code is not allocating, I would recommend to use foldl and strict application of an increment operator:
numberSatisfying :: Integral n => (a -> Bool) -> [a] -> n
numberSatisfying p = foldl (\n x -> if p x then (+1) $! n else n) 0
I have used QuickCheck to confirm this code equivalent to your original code. (And yes, it is pretty cool that QuickCheck will test with random predicates.)