What is the category of bugs that monads prevent? - monads

My understanding is that for the do monad, each step has a continuation and a closure.
This author writes:
We've seen that purity, strong types, and monads can:
...
Prevent bugs that might arise from confusion between different phases of execution.
My question is: What is the category of bugs that monads prevent?

Let's say you've written an algorithm that must receive a callback. You don't know what the callback will want to do, or what it's capable of. The most generic way to accept such a callback is to receive a function like:
Monad m => a -> m b
This gives complete freedom to your caller (who can chose any m that is a Monad), while denying any such freedom to your library. This prevents the introduction of side-effects in an otherwise pure library, while allowing side-effects to occur if the caller desires them.
I've used this pattern before in a pure register allocator. In that library I never needed effects myself, but wanted to allow the user to make use of their own effects (such as State) for creating new blocks and move instructions.

Effect separation
Like ordinary types give you a way to distinguish data, monads give you a way to distinguish effects.
Elixir
Here is an example in Elixir, which is a nearly-pure functional language on top of Erlang. This example is derived from a real-world situation that occurs very often at my work.
def handle_call(:get_config_foo, _, state) do:
{:reply, state.foo, state}
end
def handle_call(:get_bar, _, state) do:
{:reply, state.bar, state}
end
def handle_call({:set_bar, bar}, _, state) do:
{:reply, :ok, %{state | bar: bar}}
end
This defines an API of a GenServer, which is a little Erlang node that holds some state and allows you to query it as well as change it. The first call, :get_config_foo reads an immutable config setting. The second set of calls, :get_bar and {:set_bar, bar}, gets and sets a mutable state variable.
How I wish I had monads here, to prevent the following bug:
def handle_call({:set_bar, bar}, _, state) do:
{:reply, :ok, %{state | foo: bar}}
end
Can you spot the mistake? Well, I just wrote a readonly value. Nothing in Elixir prevents this. You can't mark some parts of your GenServer state as readonly, and others als read-write.
Haskell: Reader and State
In Haskell you can use different monads to specify different kinds of effects. Here are readonly state (Reader) and read-write state:
data Reader r a = Reader (r -> a)
data State s a = State (s -> (a, s))
Reader allows you to access the config state r to return some value a. State allows you to read the state, return some value and to modify the state. Both are monads, which essentially means that you can chain those state accesses sequentially in an imperative fashion. In Reader you can first read one config setting, and then (based on that first setting), read another setting. In State, you can read the state, and then (based on what you read) modify it in a further way. But you can never modify the state in Reader while you execute it.
Deterministic effects
Let me repeat this. Binding several calls to Reader assures you that you can never modify the reader state inbetween. If you have getConfigFoo1 :: Reader Config Foo1 and getConfigFoo2 :: Foo1 -> Reader Config Foo2 and you do getAllConfig = getConfigFoo1 >>= getConfigFoo2, then you have certainty that both queries will run on the same Config. Elixir does not have this feature and allows the above mentioned bug go unnoticed.
Other effects where this is useful is Writer (write-only state, e.g. logging) and Either (exception handling). When you have Writer instead of Reader or State, you can be sure that your state is ever only appended to. When you have Either, you know exactly the type of exceptions that can occur. This is all much better than using IO for logging and exception handling.

Related

Is there a proposal for ? operator / null-conditional operator for C++?

Rust has a ? operator for error propagation without exceptions. When a function returns a Result or Option type, one can write:
let a = get_a()?;
Which effectively means:
let _a = get_a();
let mut a = match _a {
Ok(value) => value,
Err(e) => return Err(e),
}
Or, in other words, if the returned value contains an error, the function will return and propagate the error. All of this happens in just one char, so the code is compact without too many if (err) return err branches.
Same concept exists in C#: Null-conditional operators
In C++, the closest thing I was able to find is tl::expected and Boost.outcome. But those two require use of e. g. lambdas to achieve same thing and it's not as concise. From what I understand, affecting control flow like that would require some kind of language feature or extension.
I tried to find a proposal that would implement it or be related at least and couldn't. Is there a C++ proposal for implementing it?
There is no such proposal. Nor is there likely to be one in the immediate future. The C++ standard library does not even have value-or-error types at the moment, so having an operator whose sole purpose is to automatically unpack such types seems very cart-before-the-horse.
At present, the C++ committee seems more interested in finding a way to make exception throwing cheaper, and thus moving towards a Python-like environment where you just use exceptions for this sort of thing.
For the sake of completeness (and for no other reason), I will mention that co_await exists. I bring this up because you can (ab)use co_await to do something behaviorally equivalent to what you want.
When you co_await on an object, the coroutine machinery transforms the given expression into its final form. And the coroutine machinery can suspend execution of the function, returning control to the caller. And this mechanism has the power to affect the return value of the function. Which looks kind of like a normal function return, if you squint hard enough.
With all this in mind, you can therefore (ab)use coroutine machinery for functions that return value-or-error types like expected. You can co_await on some value-or-error type. If the expression is a value, then the co_await will not suspend the function, and it will unpack the value from the expression. If the expression is an error, then co_await will "suspend" the function and propagate the error value into the return value for that function. But in "suspending" it, the coroutine machinery will never actually schedule execution of the coroutine to resume, so once control is given back to the caller, the coroutine is terminated.
That having been said, you should never do this. A non-comprehensive list of reasons why being:
It doesn't make sense, based on what co_await is intended to do. A coroutine is supposed to be a function that suspends execution in order to resume it when some asynchronous process is complete. That is, the function "await"s on something. But you're not waiting on anything; you're (ab)using co_await to create the effect of a transform-or-return. As such, a person reading the code won't necessarily know what's going on, since nothing is actually being "awaited" on.
Your function becomes a coroutine, and being a coroutine creates issues. That is, because coroutines are expected to be resumed, they have a certain amount of baggage. For example, co_return does not allow for guaranteed elision (and probably cannot be changed to permit it in the future). Every coroutine has a promise object that acts as an intermediary between the internal function and its return value. Coroutines are dynamically allocated, and not dynamically allocating them is considered an optimization. So compilers may not optimize this case out. And so on.
You can't have a "normal" coroutine that also propagates errors in this way. A function either uses co_await for errors or it uses co_await for actual awaiting; it can't do both. So if you want a coroutine that can fail, you have to do things manually.
You can achieve the same effect by combining the pattern matching proposal, the proposal for std::expected, and writing a macro:
#define TRY(e) inspect (e) { \
<typename remove_cvref_t<decltype(e)>::error_type> err => return err; \
<auto> val => val; \
}
The difficulty here is that while in Rust, every Result instantiation still has Ok and Err types, we don't have as convenient a mechanism to pull out what the error type would be from std::expected.
And the real implementation of this would probably want to generalize a bit to handle other Result-like error handling types that might expose the error variant slightly differently.

Does erlang supports function inside an if clause? [duplicate]

Why does the Erlang if statement support only specific functions in its guard?
i.e -
ok(A) ->
if
whereis(abc)=:=undefined ->
register(abc,A);
true -> exit(already_registered)
end.
In this case we get an "illegal guard" error.
What would be the best practice to use function's return values as conditions?
Coming from other programming languages, Erlang's if seems strangely restrictive, and in fact, isn't used very much, with most people opting to use case instead. The distinction between the two is that while case can test any expression, if can only use valid Guard Expressions.
As explained in the above link, Guard Expressions are limited to known functions that are guaranteed to be free of side-effects. There are a number of reasons for this, most of which boil down to code predictability and inspectability. For instance, since matching is done top-down, guard expressions that don't match will be executed until one is found that does. If those expressions had side-effects, it could easily lead to unpredictable and confusing outcomes during debugging. While you can still accomplish that with case expressions, if you see an if you can know there are no side effects being introduced in the test without needing to check.
One last, but important thing, is that guards have to terminate. If they did not, the reduction of a function call could go on forever, and as the scheduler is based around reductions, that would be very bad indeed, with little to go on when things went badly.
As a counter-example, you can starve the scheduler in Go for exactly this reason. Because Go's scheduler (like all micro-process schedulers) is co-operatively multitasked, it has to wait for a goroutine to yield before it can schedule another one. Much like in Erlang, it waits for a function to finish what it's currently doing before it can move on. The difference is that Erlang has no loop-alike. To accomplish looping, you recurse, which necessitates a function call/reduction, and allows a point for the scheduler to intervene. In Go, you have C-style loops, which do not require a function call in their body, so code akin to for { i = i+1 } will starve the scheduler. Not that such loops without function calls in their body are super-common, but this issue does exist.
On the contrary, in Erlang it's extremely difficult to do something like this without setting out to do so explicitly. But if guards contained code that didn't terminate, it would become trivial.
Check this question: About the usage of "if" in Erlang language
In short:
Only a limited number of functions are allowed in guard sequences, and whereis is not one of them
Use case instead.

What is Clojure volatile?

There has been an addition in the recent Clojure 1.7 release : volatile!
volatile is already used in many languages, including java, but what are the semantics in Clojure?
What does it do? When is it useful?
The new volatile is as close as a real "variable" (as it is from many other programming languages) as it gets for clojure.
From the announcement:
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation.
For instance, you can set/get and update them just like you would do with a variable in C.
The only addition (and hence the name) is the volatile keyword to the actual java object.
This is to prevent the JVM from optimization and makes sure that it reads the memory location every time it is accessed.
From the JIRA ticket:
Clojure needs a faster variant of Atom for managing state inside transducers. That is, Atoms do the job, but they provide a little too much capability for the purposes of transducers. Specifically the compare and swap semantics of Atoms add too much overhead. Therefore, it was determined that a simple volatile ref type would work to ensure basic propagation of its value to other threads and reads of the latest write from any other thread. While updates are subject to race conditions, access is controlled by JVM guarantees.
Solution overview: Create a concrete type in Java, akin to clojure.lang.Box, but volatile inside supports IDeref, but not watches etc.
This mean, a volatile! can still be accessed by multiple threads (which is necessary for transducers) but it does not allow to be changed by these threads at the same time since it gives you no atomic updates.
The semantics of what volatile does is very well explained in a java answer:
there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.
Now let's see why not use var-set or transients:
Volatile vs var-set
Rich Hickey didn't want to give truly mutable variables:
Without mutable locals, people are forced to use recur, a functional
looping construct. While this may seem strange at first, it is just as
succinct as loops with mutation, and the resulting patterns can be
reused elsewhere in Clojure, i.e. recur, reduce, alter, commute etc
are all (logically) very similar.
[...]
In any case, Vars
are available for use when appropriate.
And thus creating with-local-vars, var-set etc..
The problem with these is that they're true vars and the doc string of var-set tells you:
The var must be thread-locally bound.
This is, of course, not an option for core.async which potentially executes on different threads. They're also much slower because they do all those checks.
Why not use transients
Transients are similar in that they don't allow concurrent access and optimize mutating a data structure.
The problem is that transient only work with collection that implement IEditableCollection. That is they're simply to avoid expensive intermediate representation of the collection data structures. Also remember that transients are not bashed into place and you still need some memory location to store the actual transient.
Volatiles are often used to simply hold a flag or the value of the last element (see partition-by for instance)
Summary:
Volatile's are nothing else but a wrapper around java's volatile and have thus the exact same semantics.
Don't ever share them. Use them only very carefully.
Volatiles are a "faster atom" with no atomicity guarantees. They were introduced as atoms were considered too slow to hold state in transducers.
there are a new set of functions (volatile!, vswap!, vreset!, volatile?) to create and use volatile "boxes" to hold state in stateful transducers. Volatiles are faster than atoms but give up atomicity guarantees so should only be used with thread isolation

Is using clojure's stm as a global state considered a good practice?

in most of my clojure programs... and alot other clojure programs I see, there is some sort of global variable in an atom:
(def *program-state*
(atom {:name "Program"
:var1 1
:var2 "Another value"}))
And this state would be referred to occasionally in the code.
(defn program-name []
(:name #*program-state*))
Reading this article http://misko.hevery.com/2008/07/24/how-to-write-3v1l-untestable-code/ made me rethink global state but somehow, even though I completely agree with the article, I think its okay to use hash-maps in atoms because its providing a common interface for manipulating global state data (analogous to using different databases to store your data).
I would like some other thoughts on this matter.
This kind of thing can be OK, but it is also often a design smell so I would approach with caution.
Things to think about:
Consistency - can one part of the code change the program name? if so then the program-name function will behave inconsistently from the perspective of other threads. Not good!
Testability - is this easy to test? can one part of the test suite that is changing the program name safely run concurrently with another test that is reading the name?
Multiple instances - will you ever have two different parts of the application expecting to use a different program-name at the same time? If so, this is a strong hint that your mutable state should not be global.
Alternatives to consider:
Using a ref instead of an atom, you can at least ensure consistency of mutable state within transactions
Using binding you can limit mutability to a per-thread basis. This solves most of the concurrency issues and can be helpful when your global variables are being used like a set of thread-local configuration parameters.
Using immutable global state wherever you can. Does it really need to be mutable?
I think having a single global state that is occasionally updated in commutative ways is fine. When you start having two global states that need to be updated and threads start using them for communication, then I start to worry.
maintaining a count of current global users is fine:
Any thread can inc or dec this at any time without hurting another
If it changes out from under your thread nothing explodes.
maintaining the log directory is questionable:
When it changes will all threads stop writing to the old one?
if two threads change it will they converge.
Using this as a message queue is even more dubious:
I think it is fine to have such a global state (and in many cases it is required) but I would be careful about that the core logic of my application have functions that take the state as a parameter and return the updated state rather than directly accessing the global state. Basically I would prefer to have a controlled access to the global state from few set of function and everything else in my program should use these set of methods to access the state as that would allow me to abstract away the state implementation i.e initially I could start with an in memory atom, then may be move to some persistent storage.

How do Clojure futures and promises differ?

Both futures and promises block until they have calculated their values, so what is the difference between them?
Answering in Clojure terms, here are some examples from Sean Devlin's screencast:
(def a-promise (promise))
(deliver a-promise :fred)
(def f (future (some-sexp)))
(deref f)
Note that in the promise you are explicitly delivering a value that you select in a later computation (:fred in this case). The future, on the other hand, is being consumed in the same place that it was created. The some-expr is presumably launched behind the scenes and calculated in tandem (eventually), but if it remains unevaluated by the time it is accessed the thread blocks until it is available.
edited to add
To help further distinguish between a promise and a future, note the following:
promise
You create a promise. That promise object can now be passed to any thread.
You continue with calculations. These can be very complicated calculations involving side-effects, downloading data, user input, database access, other promises -- whatever you like. The code will look very much like your mainline code in any program.
When you're finished, you can deliver the results to that promise object.
Any item that tries to deref your promise before you're finished with your calculation will block until you're done. Once you're done and you've delivered the promise, the promise won't block any longer.
future
You create your future. Part of your future is an expression for calculation.
The future may or may not execute concurrently. It could be assigned a thread, possibly from a pool. It could just wait and do nothing. From your perspective you cannot tell.
At some point you (or another thread) derefs the future. If the calculation has already completed, you get the results of it. If it has not already completed, you block until it has. (Presumably if it hasn't started yet, derefing it means that it starts to execute, but this, too, is not guaranteed.)
While you could make the expression in the future as complicated as the code that follows the creation of a promise, it's doubtful that's desirable. This means that futures are really more suited to quick, background-able calculations while promises are really more suited to large, complicated execution paths. Too, promises seem, in terms of calculations available, a little more flexible and oriented toward the promise creator doing the work and another thread reaping the harvest. Futures are more oriented toward automatically starting a thread (without the ugly and error-prone overhead) and going on with other things until you -- the originating thread -- need the results.
Both Future and Promise are mechanisms to communicate result of asynchronous computation from Producer to Consumer(s).
In case of Future the computation is defined at the time of Future creation and async execution begins "ASAP". It also "knows" how to spawn an asynchronous computation.
In case of Promise the computation, its start time and [possible] asynchronous invocation are decoupled from the delivery mechanism. When computation result is available Producer must call deliver explicitly, which also means that Producer controls when result becomes available.
For Promises Clojure makes a design mistake by using the same object (result of promise call) to both produce (deliver) and consume (deref) the result of computation. These are two very distinct capabilities and should be treated as such.
There are already excellent answers so only adding the "how to use" summary:
Both
Creating promise or future returns a reference immediately. This reference blocks on #/deref until result of computation is provided by other thread.
Future
When creating future you provide a synchronous job to be done. It's executed in a thread from the dedicated unbounded pool.
Promise
You give no arguments when creating promise. The reference should be passed to other 'user' thread that will deliver the result.
In Clojure, promise, future, and delay are promise-like objects. They all represent a computation that clients can await by using deref (or #). Clients reuse the result, so that the computation is not run several times.
They differ in the way the computation is performed:
future will start the computation in a different worker thread. deref will block until the result is ready.
delay will perform the computation lazily, when the first client uses deref, or force.
promise offers most flexibility, as its result is delivered in any custom way by using deliver. You use it when neither future or delay match your use case.
I think chapter 9 of Clojure for the Brave has the best explanation of the difference between delay, future, and promise.
The idea which unifies these three concepts is this: task lifecycle. A task can be thought of as going through three stages: a task is defined, a task is executed, a task's result is used.
Some programming languages (like JavaScript) have similarly named constructs (like JS's Promise) which couple together several (or all) of the stages in the task lifecycle. In JS, for instance, it is impossible to construct a Promise object without providing it either with the function (task) which will compute its value, or resolveing it immediately with a constant value.
Clojure, however, eschews such coupling, and for this reason it has three separate constructs, each corresponding to a single stage in the task lifecycle.
delay: task definition
future: task execution
promise: task result
Each construct is concerned with its own stage of the task lifecycle and nothing else, thus disentangling higher order constructs like JS's Promise and separating them into their proper parts.
We see now that in JavaScript, a Promise is the combination of all three Clojure constructs listed above. Example:
const promise = new Promise((resolve) => resolve(6))
Let's break it down:
task definition: resolve(6) is the task.
task execution: there is an implied execution context here, namely that this task will be run on a future cycle of the event loop. You don't get a say in this; you can't, for instance, require that this task be resolved synchronously, because asynchronicity is baked into Promise itself. Notice how in constructing a Promise you've already scheduled your task to run (at some unspecified time). You can't say "let me pass this around to a different component of my system and let it decide when it wants to run this task".
task result: the result of the task is baked into the Promise object and can be obtained by thening or awaiting. There's no way to create an "empty" promised result to be filled out later by some yet unknown part of your system; you have to both define the task and simultaneously schedule it for execution.
PS: The separation which Clojure imposes allows these constructs to assume roles for which they would have been unsuited had they been tightly coupled. For instance, a Clojure promise, having been separated from task definition and execution, can now be used as a unit of transfer between threads.
Firstly, a Promise is a Future. I think you want to know the difference between a Promise and a FutureTask.
A Future represents a value that is not currently known but will be known in the future.
A FutureTask represents the result of a computation that will happen in future (maybe in some thread pool). When you try to access the result, if the computation has not happened yet, it blocks. Otherwise the result is returned immediately. There is no other party involved in the computing the result as the computation is specified by you in advance.
A Promise represents a result that will be delivered by the promiser to the promisee in future. In this case you are the promisee and the promiser is that one who gave you the Promise object. Similar to the FutureTask, if you try to access the result before the Promise has been fulfilled, it gets blocked till the promiser fulfills the Promise. Once the Promise is fulfilled, you get the same value always and immediately. Unlike a FutureTask, there is an another party involved here, one which made the Promise. That another party is responsible for doing the computation and fulfilling the Promise.
In that sense, a FutureTask is a Promise you made to yourself.