I am looking into the Concurrency programming in Clojure .
http://clojure.org/concurrent_programming
I got to know that atom, ref and agent forms are used to maintain program state.
Only ref is used for coordinated updates , so dosync macro is used when performing changes.
So it is obvious that STM engine is involved at this point.
Just wanted to be clear about following doubt I have,
Does Clojure STM has a relationship with atom and agent forms too? or are
they just utilized java.util.concurrent.atomic capabilities ?
The STM is related to Agents in that send, send-off and send-via, when called inside a dosync block, only take effect once (and if) the transaction successfully commits.
There is no relationship between the STM and Atoms.
Related
I was reading some source code and came across locking usage in Clojure. It made me thinking about the atom version. So what are the differences between 2 code snippets, I think that they do the same thing?
(def lock (Object.))
(locking lock
...some operation)
(def state (atom true))
(when #state
(reset! state false)
...some operation
(reset! state true))
Locking (aka synchronization) is only ever needed when multiple threads are changing a piece of mutable state.
The locking macro is a low-level feature that one almost never needs to use in Clojure. It is similar in some ways to a synchronized block in Java.
In clojure, one normally just uses an atom for this purpose. On rare occasions an agent or ref is called for. In even rarer situations, you can use a dynamic Var to get thread-local mutable state.
Internally, a Clojure atom delegates all currency operations to the class java.util.concurrent.atomic.AtomicReference.
Your code snippet shows a misunderstanding of an atom's purpose and operation. Two concurrent threads could process your atom code snipped at the same time, so the attempt does not provide thread safety and would result in bugs and corrupted data.
If you want to explore the really primitive (i.e. Java 1.2) synchronization primitives, see:
The Oracle docs
The book Effective Java (1st edition shows most details, 3rd Ed. defers to higher-level classes)
The book Java Concurrency in Practice This book is so scary it'll turn your hair white. After reading this one, you'll be so afraid of roll-your-own concurrency that you'll never even think about doing it again.
Double-Checked Locking is Broken: A excellent example of how hard it is to get locking right (also here).
When should I use Clojure's core.async library, what kind of applications need that kinda async thing?
Clojure provides 4 basic mutable models like refs, agents, atoms and thread locals/vars. Can't these mutable references provide in any way what core.async provides with ease?
Could you provide real world use cases for async programming?
How can I gain an understanding of it so that when I see a problem, it clicks and I say "This is the place I should apply core.async"?
Also we can use core.async in ClojureScript which is a single threaded environment, what are the advantages there (besides avoiding callback hell)?
You may wish to read this:
Clojure core.async Channels Introductory blog by Rich Hickey
Mastering Concurrent Processes with core.async Brave Clojure entry
The best use case for core.async is ClojureScript, since it allows you to simulate multi-threaded programming and avoid Callback Hell.
In JVM Clojure, core.async can also by handy where you want a (lightweight) producer-consumer architecure. Of course, you could always use native Java queues for that, as well.
It's important to point out that there are 2 common meanings associated to the word 'async' in programming circles:
asynchronous messaging: Systems in which components send messages without expecting a response from their consumers, and often without even knowing who the consumers are (via queues)
non-blocking (a.k.a event-driven) I/O: Programs structured in such a way that they don't block expensive computational resources (threads, cores, ...) while awaiting a response. There are several approches, with varying levels of abstraction, for dealing with such systems: callback-based APIs (low-level, difficult to manage because based on side-effects), Promise/Futures/Deferreds (representations of values that we don't yet have, more manageable as they are value-based), Green Threads ('logical' threads which emulate ordinary control flow, but are inexpensive)
core.async is very opinionated towards 1 (asynchronous messaging via queues), and provides a macro for implementing Green Threading (the go macro).
From my experience, if non-blocking is all you need, I would personally recommend starting with Manifold, which makes fewer assumptions about your use case, then use core.async for the more advanced use cases where it falls short; note that both libraries interoperate well.
I find it useful for fine-grained, configurable control of side-effect parallelism on the JVM.
e.g. If the following executes a read from Cassandra, returning an async/chan:
(arche.async/execute connection :key {:values {:id "1"}})
Then the following performs a series of executes in parallel, where that parallelism is configurable, and the results are returned in-order.
(async/pipeline-async
n-parallelism
out
#(arche/execute connection :key {:values %1 :channel %2})
(async/to-chan [{:id "1"} {:id "2"} {:id "3"} ... ]))
Probably quite particular to my niche, but you get the idea.
https://github.com/troy-west/arche
Core.async provides building blocks for socket like programming which is useful for coordinating producer/consumer interaction in Clojure. Core.async's lightweight threads (go blocks) let you write imperative style code reading from channels instead of using callbacks in the browser. On the JVM, the lightweight threads let you utilize your full CPU threads.
You can see an example of a full-stack CLJ/CLJS producer/consumer Core.async chat room here: https://github.com/briangorman/hablamos
Another killer feature of core.async is the pipeline feature. Often in data processing you will have the initial processing stage take up most of the CPU time, while later stages e.g. reducing will take up significantly less. With the async pipeline feature, you can split up the processing over channels to add parallelization to your pipeline. Core.async channels work with transducers, so channels play nice with the rest of the language now.
I'm using jgrapht, a graph library for java as the backbone of my graph operations. It mutates its state on every change, like adding or removing an edge or a vertex.
I'm accessing this "object" from multiple threads / go-loops.
I've started naively wrapping the graph object with an atom but as far as I understand, it doesn't protect (and can't do) against directly changing the state of its content. Its a safeguard only if you are able to use reset! or swap! functions.
I've changed to refs and started doing my mutations in dosync blocks but I still notice some weird behaviour from time to time. They are hard to pinpoint since they appear on runtime. As you would expect.
I'm not experienced in Clojure ecosystem, so I'd appreciate if you can point me into a couple of alternative strategies for dealing with stateful objects from Java in Clojure.
PS: I'm aware of loom and would love to use it, but it lacks for my most important requirement: finding all simple cycles in a weighted directed graph with negative weights.
Clojure's STM features (eg. dosync, ref) only, as far as I understand them, interact with other features of clojure. They aren't going to interact with mutable Java objects in ways you might hope. The atom type isn't going to help here either unless you felt like copying the entire graph each time you wanted to perform an operation on it. Unfortunately in this case you have a mutable object which has uncertain thread safety characteristics. You're going to have to use some kind of lock. There's a built-in function to acquire the monitor lock: (locking obj body). You could just use the graph itself as the monitor object. Or you can create other kinds of locks like a readwrite like as needed. Each time you access part of the graph or mutate/update it you're going to need to acquire/release the lock.
Just a note that we recently received a contribution of AsSynchronizedGraph in JGraphT:
https://github.com/jgrapht/jgrapht/blob/master/jgrapht-core/src/main/java/org/jgrapht/graph/concurrent/AsSynchronizedGraph.java
It allows you to protect a graph with a wrapper which uses a ReadWriteLock. There are a bunch of usage caveats in the Javadoc.
It's not released yet, but it's available in the latest SNAPSHOT build.
As far as I understand, you are talking about a Java class, not a Clojure data structure. If I'm right, there is no sense of wrapping that instance into an atom or any other reference type because the rest of your code will still may modify that instance directly.
Clojure has special locking macro that puts a monitor on any Java object while executing a set of actions over it.
Its typical usage might be something like that:
(def graph (some.java.Graph. 1 2 3 4))
(locking graph
(.modify graph)
(.update graph))
See the documentation page for more info.
Given: a complex structure of various nested collections, with refs scattered in different levels.
Need: A way to take a snapshot of such a structure, while allowing writes to continue to happen in other threads.
So one the "reader" thread needs to read whole complex state in a single long transaction. The "writer" thread meanwhile makes modifications in multiple short transactions. As far as I understand, in such a case STM engine utilizes the refs history.
Here we have some interesting results. E.g., reader reaches some ref in 10 secs after beginning of transaction. Writer modifies this ref each 1 sec. It results in 10 values of ref's history. If it exceeds the ref's :max-history limit, the reader transaction will be run forever. If it exceeds :min-history, transaction may be rerun several times.
But really the reader needs just a single value of ref (the 1st one) and the writer needs just the recent one. All intermediate values in history list are useless. Is there a way to avoid such history overuse?
Thanks.
To me it's a bit of a "design smell" to have a large structure with lots of nested refs. You are effectively emulating a mutable object graph, which is a bad idea if you believe Rich Hickey's take on concurrency.
Some various thoughts to try out:
The idiomatic way to solve this problem in Clojure would be to put the state in a single top-level ref, with everything inside it being immutable. Then the reader can take a snapshot of the entire concurrent state for free (without even needing a transaction). Might be difficult to refactor to this from where you currently are, but I'd say it is best practice.
If you only want the reader to get a snaphot of the top level ref, you can just deref it directly outside of a transaction. Just be aware that the refs inside may continue to get mutated, so whether this is useful or not depends on the consistency requirements you have for the reader.
You can do everything within a (dosync...) transaction as normal for both readers and writer. You may get contention and transaction retries, but it may not be an issue.
You can create a "snapshot" function that quickly traverses the graph and dereferences all the refs within a transaction, returning the result with the refs stripped out (or replaced by new cloned refs). The reader calls snapshot once, then continues to do the rest of it's work after the snapshot is completed.
You could take a snapshot immediately each time after the writer finishes, and store it separately in an atom. Readers can use this directly (i.e. only the writer thread accesses the live data graph directly)
The general answer to your question is that you need two things:
A flag to indicate that the system is in "snapshot write" mode
A queue to hold all transactions that occur while the system is in snapshot mode
As far as what to do if the queue is overflows because the snapshot process isn't fast enough, well, there isn't much you can do about that except either optimize that process, or increase the size of your queue - it's going to be a balance that you'll have to strike depending on the needs of you app. It's a delicate balance, and is going to take some pretty extensive testing, depending on how complex your system is.
But you're on the right track. If you basically put the system in "snapshot write mode", then your reader/writer methods should automatically change where they are reading/writing from, so that the thread that is making changes gets all the "current values" and the thread reading the snapshot state is reading all the "snapshot values". You can split these up into separate methods - the snapshot reader will use the "snapshot value" methods, and all other threads will read the "current value" methods.
When the snapshot reader is done with its work, it needs to clear the snapshot state.
If a thread tries to read the "snapshot values" when no "snapshot state" is currently set, they should simply respond with the "current values" instead. No biggie.
Systems that allow snapshots of file systems to be taken for backup purposes, while not preventing new data from being written, follow a similar scheme.
Finally, unless you need to keep a record of all changes to the system (i.e. for an audit trail), then the queue of transactions actually doesn't need to be a queue of changes to be applied - it just needs to store the latest value of whatever thing you're changing in the system. When the "snapshot state" is cleared, you simply write all those non-committed values to the system, and call it done. The thing you might want to consider is making a log of those changes yet to be made, in case you need to recover from a crash, and have those changes still applied. The log file will give you a record of what happened, and can let you do this recovery. That's an oversimplification of the recovery process, but that's not really what your question is about, so I'll stop there.
What you are after is the state-of-the-art in high-performance concurrency. You should look at the work of Nathan Bronson, and his lab's collaborations with Aleksandar Prokopec, Phil Bagwell and the Scala team.
Binary Tree:
http://ppl.stanford.edu/papers/ppopp207-bronson.pdf
https://github.com/nbronson/snaptree/
Tree-of-arrays -based Hash Map
http://lampwww.epfl.ch/~prokopec/ctries-snapshot.pdf
However, a quick look at the implementations above should convince you this is not "roll-your-own" territory. I'd try to adapt an off-the-shelf concurrent data structure to your needs if possible. Everything I've linked to is freely available on the JVM, but its not native Clojure as such.
I want to use alot of reactive (dataflow) type programming techniques in my clojure program. Is uses "add-watcher" on clojure refs going to be good enough to do this. A simple case for this would be to update the GUI when the underlying data changes.
Yes, that is indeed a good idea. I have used it in my own code to update UI elements when the streaming data changes. Only thing you need to be careful of is that, the watchers are called synchronously in the agent's thread or the main thread if atom, ref or var. So to avoid blocking the thread, don't do too much processing in the watchers. If you need to do so then create a future.