What is the difference between locking and atom/reset!/swap! in Clojure - clojure

I was reading some source code and came across locking usage in Clojure. It made me thinking about the atom version. So what are the differences between 2 code snippets, I think that they do the same thing?
(def lock (Object.))
(locking lock
...some operation)
(def state (atom true))
(when #state
(reset! state false)
...some operation
(reset! state true))

Locking (aka synchronization) is only ever needed when multiple threads are changing a piece of mutable state.
The locking macro is a low-level feature that one almost never needs to use in Clojure. It is similar in some ways to a synchronized block in Java.
In clojure, one normally just uses an atom for this purpose. On rare occasions an agent or ref is called for. In even rarer situations, you can use a dynamic Var to get thread-local mutable state.
Internally, a Clojure atom delegates all currency operations to the class java.util.concurrent.atomic.AtomicReference.
Your code snippet shows a misunderstanding of an atom's purpose and operation. Two concurrent threads could process your atom code snipped at the same time, so the attempt does not provide thread safety and would result in bugs and corrupted data.
If you want to explore the really primitive (i.e. Java 1.2) synchronization primitives, see:
The Oracle docs
The book Effective Java (1st edition shows most details, 3rd Ed. defers to higher-level classes)
The book Java Concurrency in Practice This book is so scary it'll turn your hair white. After reading this one, you'll be so afraid of roll-your-own concurrency that you'll never even think about doing it again.
Double-Checked Locking is Broken: A excellent example of how hard it is to get locking right (also here).

Related

How do functional languages handle shared state data?

I've been learning about functional programming and see that it can certainly make parallelism easier to handle, but I do not see how it makes handling shared resources easier. I've seen people talk about variable immutability being a key factor, but how does that help two threads accessing the same resource? Say two threads are adding a request to a queue. They both get a copy of the queue, make a new copy with their request added (since the queue is immutable), and then return the new queue. The first request to return will be overridden by the second, as the copies of the queue each thread got did not have the other thread's request present. So I assume there is a locking mechanism a la mutex available in functional languages? How then does that differ from an imperative approach to the problem? Or do practical applications of functional programming still require some imperative operations to handle shared resources?
As soon as your global data can be updated. you're breaking the pure functional paradigm. In that sense, you need some sort of imperative structure. However, this is important enough that most functional languages offer a way to do this, and you need it to be able to communicate with the rest of the world anyway. (The most complicated formal one is the IO monad of Haskell.) Apart from simple bindings for some other synchronization library, they would probably try to implement a lock-free, wait-free data structure if possible.
Some approaches include:
Data that is written only once and never altered can be accessed safely with no locks or waiting on most CPUs. (There is typically a memory fence instruction to ensure that the memory updates in the right order for both the producer and the consumer.)
Some data structures, such as a difference list, have the property that you can tack on updates without invalidating any existing data. Let's say you have the association list [(1,'a'), (2,'b'), (3,'c')] and you want to update by changing the third entry to 'g'. If you express this as (3,'g'):originalList, then you can update the current list with the new version, and keep originalList valid and unaltered. Any thread that saw it can still safely use it.
Even if you have to work around the garbage collector, each thread can make its own thread-local copy of the shared state so long as the original does not get deleted while it is being copied. The underlying low-level implementation would be a producer/consumer model that atomically updates a pointer to the state data and inserts memory-fence instructions before the update and the copy operations.
If the program has a way to compare-and-swap atomically, and the garbage collector is aware, each thread can use the receive-copy-update pattern. A thread-aware garbage collector will keep the older data around as long as any thread is using it, and recycle it when the last thread is done with it. This should not require locking in software (for example, on modern ISAs, incrementing or decrementing a word-sized counter is an atomic operation, and atomic compare-and-swap is wait-free).
The functional language can add an extension to call an IPC library written in some other language, and update data in place. In Haskell, this would be defined with the IO monad to ensure sequential memory consistency, but nearly every functional language has some way to exchange data with the system libraries.
So, a functional language does offer some guarantees that are useful for efficient concurrent programming. For example, most current ISAs impose no extra overhead on multiple reader threads when there is at most a single writer, certain consistency bugs cannot occur, and functional languages are well-suited to express this pattern.

Is shared_future<void> a legitimate replacement for a condition_variable?

Josuttis states ["Standard Library", 2nd ed, pg 1003]:
Futures allow you to block until data by another thread is provided or another thread is done. However, a future can pass data from one thread to another only once. In fact, a future's major purpose is to deal with return values or exceptions of threads.
On the other hand, a shared_future<void> can be used by multiple threads, to identify when another thread has done its job.
Also, in general, high-level concurrency features (such as futures) should be preferred to low-level ones (such as condition_variables).
Therefore, I'd like to ask: Is there any situation (requiring synchronization of multiple threads) in which a shared_future<void> won't suffice and a condition_variable is essential?
As already pointed out in the comments by #T.C. and #hlt, the use of futures/shared_futures is mostly limited in the sense that they can only be used once. So for every communication task you have to have a new future. The pros and cons are nicely explained by Scott Meyers in:
Item 39: Consider void futures for one-shot event
communication.
Scott Meyers: Effective Modern C++ (emphasis mine)
His conclusion is that using promise/future pairs dodges many of the problems with the use of condidition_variables, providing a nicer way of communicating one-shot events. The price to pay is that you are using dynamically allocated memory for the shared states and more importantly, that you have to have one promise/future pair for every event that you want to communicate.
While the notion of using high-level abstracts instead of low-level abstract is laudable, there is a misconception here. std::future is not a high-level replacement for std::conditional_variable. Instead, it is a specific high-level construct build for a specific use-case of std::condition_variable - namely, a one-time return of the value.
Obviously, not all uses of condition variable is for this scenario. For example, an message queue can not be implemented with std::future, no matter how much you try. Such a thread is another high-level construct built on low-level building block. So yes, shoot for high-level constructs, but do not expect a one-to-one map mapping between high and low level.

When to use core.async in Clojure?

When should I use Clojure's core.async library, what kind of applications need that kinda async thing?
Clojure provides 4 basic mutable models like refs, agents, atoms and thread locals/vars. Can't these mutable references provide in any way what core.async provides with ease?
Could you provide real world use cases for async programming?
How can I gain an understanding of it so that when I see a problem, it clicks and I say "This is the place I should apply core.async"?
Also we can use core.async in ClojureScript which is a single threaded environment, what are the advantages there (besides avoiding callback hell)?
You may wish to read this:
Clojure core.async Channels Introductory blog by Rich Hickey
Mastering Concurrent Processes with core.async Brave Clojure entry
The best use case for core.async is ClojureScript, since it allows you to simulate multi-threaded programming and avoid Callback Hell.
In JVM Clojure, core.async can also by handy where you want a (lightweight) producer-consumer architecure. Of course, you could always use native Java queues for that, as well.
It's important to point out that there are 2 common meanings associated to the word 'async' in programming circles:
asynchronous messaging: Systems in which components send messages without expecting a response from their consumers, and often without even knowing who the consumers are (via queues)
non-blocking (a.k.a event-driven) I/O: Programs structured in such a way that they don't block expensive computational resources (threads, cores, ...) while awaiting a response. There are several approches, with varying levels of abstraction, for dealing with such systems: callback-based APIs (low-level, difficult to manage because based on side-effects), Promise/Futures/Deferreds (representations of values that we don't yet have, more manageable as they are value-based), Green Threads ('logical' threads which emulate ordinary control flow, but are inexpensive)
core.async is very opinionated towards 1 (asynchronous messaging via queues), and provides a macro for implementing Green Threading (the go macro).
From my experience, if non-blocking is all you need, I would personally recommend starting with Manifold, which makes fewer assumptions about your use case, then use core.async for the more advanced use cases where it falls short; note that both libraries interoperate well.
I find it useful for fine-grained, configurable control of side-effect parallelism on the JVM.
e.g. If the following executes a read from Cassandra, returning an async/chan:
(arche.async/execute connection :key {:values {:id "1"}})
Then the following performs a series of executes in parallel, where that parallelism is configurable, and the results are returned in-order.
(async/pipeline-async
n-parallelism
out
#(arche/execute connection :key {:values %1 :channel %2})
(async/to-chan [{:id "1"} {:id "2"} {:id "3"} ... ]))
Probably quite particular to my niche, but you get the idea.
https://github.com/troy-west/arche
Core.async provides building blocks for socket like programming which is useful for coordinating producer/consumer interaction in Clojure. Core.async's lightweight threads (go blocks) let you write imperative style code reading from channels instead of using callbacks in the browser. On the JVM, the lightweight threads let you utilize your full CPU threads.
You can see an example of a full-stack CLJ/CLJS producer/consumer Core.async chat room here: https://github.com/briangorman/hablamos
Another killer feature of core.async is the pipeline feature. Often in data processing you will have the initial processing stage take up most of the CPU time, while later stages e.g. reducing will take up significantly less. With the async pipeline feature, you can split up the processing over channels to add parallelization to your pipeline. Core.async channels work with transducers, so channels play nice with the rest of the language now.

How to guard a mutable object against concurrent read/write access in Clojure?

I'm using jgrapht, a graph library for java as the backbone of my graph operations. It mutates its state on every change, like adding or removing an edge or a vertex.
I'm accessing this "object" from multiple threads / go-loops.
I've started naively wrapping the graph object with an atom but as far as I understand, it doesn't protect (and can't do) against directly changing the state of its content. Its a safeguard only if you are able to use reset! or swap! functions.
I've changed to refs and started doing my mutations in dosync blocks but I still notice some weird behaviour from time to time. They are hard to pinpoint since they appear on runtime. As you would expect.
I'm not experienced in Clojure ecosystem, so I'd appreciate if you can point me into a couple of alternative strategies for dealing with stateful objects from Java in Clojure.
PS: I'm aware of loom and would love to use it, but it lacks for my most important requirement: finding all simple cycles in a weighted directed graph with negative weights.
Clojure's STM features (eg. dosync, ref) only, as far as I understand them, interact with other features of clojure. They aren't going to interact with mutable Java objects in ways you might hope. The atom type isn't going to help here either unless you felt like copying the entire graph each time you wanted to perform an operation on it. Unfortunately in this case you have a mutable object which has uncertain thread safety characteristics. You're going to have to use some kind of lock. There's a built-in function to acquire the monitor lock: (locking obj body). You could just use the graph itself as the monitor object. Or you can create other kinds of locks like a readwrite like as needed. Each time you access part of the graph or mutate/update it you're going to need to acquire/release the lock.
Just a note that we recently received a contribution of AsSynchronizedGraph in JGraphT:
https://github.com/jgrapht/jgrapht/blob/master/jgrapht-core/src/main/java/org/jgrapht/graph/concurrent/AsSynchronizedGraph.java
It allows you to protect a graph with a wrapper which uses a ReadWriteLock. There are a bunch of usage caveats in the Javadoc.
It's not released yet, but it's available in the latest SNAPSHOT build.
As far as I understand, you are talking about a Java class, not a Clojure data structure. If I'm right, there is no sense of wrapping that instance into an atom or any other reference type because the rest of your code will still may modify that instance directly.
Clojure has special locking macro that puts a monitor on any Java object while executing a set of actions over it.
Its typical usage might be something like that:
(def graph (some.java.Graph. 1 2 3 4))
(locking graph
(.modify graph)
(.update graph))
See the documentation page for more info.

Readers-writers using STM in Clojure

There is the following version of readers-writers problem: multiple readers and writers, 2 or more readers can read simultaneously, if a writer is writing no one can read or write, it is preferred if all writers get an equal chance to write (for example in 100 rounds 5 writers should write about 20 times each). What is the proper way to implement this in Clojure using STM? I'm not looking for a complete code, just some general directions.
Clojure's built-in STM can't really include all the constraints you are looking for because readers never wait for writers and your requirements require readers to wait.
if you can forgive not blocking readers then you can go ahead and
(. (java.lang.Thread. #(dosync (write stuff)) start))
(. (java.lang.Thread. #(dosync (read stuff)) start))
if you need readers to block then you will need a different STM, the world has lots of them
Clojure's STM gives you much nicer guarantees than that. Writers wait for each other, but readers can still read while a writer is writing; it just sees the most-recent consistent state. If a writer isn't done writing yet, the reader doesn't see its changes at all.
As mentioned in other answers that readers don't block while reading and you want reader to block then you probably implement them as "writer" which write the same value it gets in its callback function. I know this is weird solution but may be this can help you out or give you some further directions.