I am doing something related with thread with clojure.
Now I am working in REPL, write and update code in REPL.
My problem is, sometime there will be some futures left running. And I lost the ref by some kind of mistakes. My only way to stop them is restart the repl.
I want to know is there any ways to stop the running futures(threads) if I have no ref on them?
Clojure public API provides the only way you can stop running futures which is the following:
(shutdown-agents)
But that will not interrupt your future jobs. Instead of that you can interrupt them with
(import 'clojure.lang.Agent)
(.shutdownNow Agent/soloExecutor)
But please keep in mind that after described operations your Agent/soloExecutor will not accept new tasks. One of the ways how to deal with it is to reassign soloExecutor java field in Agent class. Hopefully it's public and not final.
(import 'java.util.concurrent.Executors)
(set! Agent/soloExecutor (Executors/newCachedThreadPool)) ;; in original clojure version thread pool is created with thread factory, but that createThreadFactory method is private, anyways for your purposes this code may work just fine without any thread factories.
(future (Thread/sleep 1000) (println "done") 100) ;; now works fine
But in my opinion it is not recommended way to make things in repl. It's way better not to lose your future references.
Related
I was reading some source code and came across locking usage in Clojure. It made me thinking about the atom version. So what are the differences between 2 code snippets, I think that they do the same thing?
(def lock (Object.))
(locking lock
...some operation)
(def state (atom true))
(when #state
(reset! state false)
...some operation
(reset! state true))
Locking (aka synchronization) is only ever needed when multiple threads are changing a piece of mutable state.
The locking macro is a low-level feature that one almost never needs to use in Clojure. It is similar in some ways to a synchronized block in Java.
In clojure, one normally just uses an atom for this purpose. On rare occasions an agent or ref is called for. In even rarer situations, you can use a dynamic Var to get thread-local mutable state.
Internally, a Clojure atom delegates all currency operations to the class java.util.concurrent.atomic.AtomicReference.
Your code snippet shows a misunderstanding of an atom's purpose and operation. Two concurrent threads could process your atom code snipped at the same time, so the attempt does not provide thread safety and would result in bugs and corrupted data.
If you want to explore the really primitive (i.e. Java 1.2) synchronization primitives, see:
The Oracle docs
The book Effective Java (1st edition shows most details, 3rd Ed. defers to higher-level classes)
The book Java Concurrency in Practice This book is so scary it'll turn your hair white. After reading this one, you'll be so afraid of roll-your-own concurrency that you'll never even think about doing it again.
Double-Checked Locking is Broken: A excellent example of how hard it is to get locking right (also here).
I'm using jgrapht, a graph library for java as the backbone of my graph operations. It mutates its state on every change, like adding or removing an edge or a vertex.
I'm accessing this "object" from multiple threads / go-loops.
I've started naively wrapping the graph object with an atom but as far as I understand, it doesn't protect (and can't do) against directly changing the state of its content. Its a safeguard only if you are able to use reset! or swap! functions.
I've changed to refs and started doing my mutations in dosync blocks but I still notice some weird behaviour from time to time. They are hard to pinpoint since they appear on runtime. As you would expect.
I'm not experienced in Clojure ecosystem, so I'd appreciate if you can point me into a couple of alternative strategies for dealing with stateful objects from Java in Clojure.
PS: I'm aware of loom and would love to use it, but it lacks for my most important requirement: finding all simple cycles in a weighted directed graph with negative weights.
Clojure's STM features (eg. dosync, ref) only, as far as I understand them, interact with other features of clojure. They aren't going to interact with mutable Java objects in ways you might hope. The atom type isn't going to help here either unless you felt like copying the entire graph each time you wanted to perform an operation on it. Unfortunately in this case you have a mutable object which has uncertain thread safety characteristics. You're going to have to use some kind of lock. There's a built-in function to acquire the monitor lock: (locking obj body). You could just use the graph itself as the monitor object. Or you can create other kinds of locks like a readwrite like as needed. Each time you access part of the graph or mutate/update it you're going to need to acquire/release the lock.
Just a note that we recently received a contribution of AsSynchronizedGraph in JGraphT:
https://github.com/jgrapht/jgrapht/blob/master/jgrapht-core/src/main/java/org/jgrapht/graph/concurrent/AsSynchronizedGraph.java
It allows you to protect a graph with a wrapper which uses a ReadWriteLock. There are a bunch of usage caveats in the Javadoc.
It's not released yet, but it's available in the latest SNAPSHOT build.
As far as I understand, you are talking about a Java class, not a Clojure data structure. If I'm right, there is no sense of wrapping that instance into an atom or any other reference type because the rest of your code will still may modify that instance directly.
Clojure has special locking macro that puts a monitor on any Java object while executing a set of actions over it.
Its typical usage might be something like that:
(def graph (some.java.Graph. 1 2 3 4))
(locking graph
(.modify graph)
(.update graph))
See the documentation page for more info.
Assuming that the ref in the following code is modified in other transactions as well as the one below,
my concern is that this transaction will run until it's time to commit, fail on commit, then re-run the transaction.
(defn modify-ref [my-ref]
(dosync (if (some-prop-of-ref-true #my-ref)
(alter my-ref long-running-calculation))))
Here's my fear in full:
modify-ref is called, a transaction is started (call it A), and long-running-calculation starts
another transaction (call it B) starts, modifies my-ref, and returns (commits successfully)
long-running-calculation continues until it is finished
transaction A tries to commit but fails because my-ref has been modified
the transaction is restarted (call it A') with the new value of my-ref and exits because some-prop is not true
Here's what I would like to happen, and perhaps this is what happens (I just don't know, so I'm asking the question :-)
When the transaction B commits my-ref, I'd like transaction A to immediately stop (because the value of my-ref has changed) and restart with the new value. Is that what happens?
The reason I want this behavior is so that long-running-calculation doesn't waste all that CPU time on a calculation that is now obsolete.
I thought about using ensure, but I'm not sure how to use it in this context or if it is necessary.
It works as you fear.
Stopping a thread in the JVM doing whatever it is doing requires a collaborative effort so there is no generic way for Clojure (or any other JVM language) to stop a running computation. The computation must periodically check a signal to see if it should stop itself. See How do you kill a thread in Java?.
About how to implement it, I would say that is just too hard, so I would measure first if it is really really an issue. If it is, I would see if a traditional pessimistic lock is a better solution. If pessimistic locks is still not the solution, I would try to build something that runs the computation outside the transactions, use watchers on the refs and sets the refs conditionally after the computation if they have still the same value. Of course this runs outside the transactions boundaries and probably it is a lot more tricky that it sounds.
About ensure, only refs that are being modified participate in the transaction, so you can suffer for write skew. See Clojure STM ambiguity factor for a longer explanation.
This doesn't happen, because...well, how could it? Your function long-running-calculation doesn't have any code in it to handle stopping prematurely, and that's the code that's being run at the time you want to cancel the transaction. So the only way to stop it would be to preemptively stop the thread from executing and forcibly restart it at some other location. This is terribly dangerous, as java.lang.Tread/stop discovered back in Java 1.1; the side effects could be a lot worse than some wasted CPU cycles.
refs do attempt to solve this problem, sorta: if there's one long-running transaction that has to restart itself many times because shorter transactions keep sneaking in, it will take a stronger lock and run to completion. But this is a pretty rare occurrence (heck, even needing to use refs is rare, and this is a rare way for refs to behave).
As I write more core.async code, a very common pattern that emerges is a go-loop that alts over a sequence of channels and does some work in response to a message, e.g.:
(go-loop [state {}]
(let [[value task] (alts! tasks)]
...work...
(recur state))
I don't feel like I understand the tradeoffs of the various ways I can actually do the work though, so I thought I'd try to explore them here.
Inline or by calling a function: this blocks the loop from continuing until the work is complete. Since it's in a go block, one wouldn't want to do I/O or locking operations.
>! a message to a channel monitored by a worker: if the channel is full, this blocks the loop by parking until the channel has capacity. This allows the thread to do other work and allows back pressure.
>!! a message: if the channel is full, this blocks by sleeping the thread running the go loop. This is probably undesirable because go threads are a strictly finite resource.
>! a message within another go block: this will succeed nearly immediately unless there are no go threads available. Conversely, if the channel is full and is being consumed slowly, this could starve the system of go threads in short order.
>!! a message with a thread block: similar to the go block, but consuming system threads instead of go threads, so the upper bound is probably higher
puts! a message: it's unclear what the tradeoffs are
call the work function in a future: gives the work to a thread from the clojure agent pool to do, allows the go loop to continue. If the input rate exceeds the output rate, this grows the agent pool queue without bound.
Is this summary correct and comprehensive?
If the work to be done is entirely CPU-bound, then I would probably do it inline in the go block, unless it's an operation that may take a long time and I want the go block to continue responding to other messages.
In general, any work which doesn't block, sleep, or do I/O can be safely put in a go block without having a major impact on the throughput of the system.
You can use >! to submit work to a worker or pool of workers. I would almost never use >!! in a go block because it can block one of the finite number of threads allocated to running go blocks.
When you need to do I/O or a potentially long-running computation, use a thread instead of a go. This is very similar to future — it creates a real thread — but it returns a channel like go.
put! is a lower-level operation generally used at the "boundaries" of core.async to connect it to conventional callback-based interfaces. There's rarely any reason to use put! inside a go.
core.async can support fine-grained control over how threads are created. I demonstrated a few possibilities in a blog post, Parallel Processing with core.async.
I have light CPU intensive functions that I want to run in parallel. What is the concurrency primitive which I should use ?
Use of agents and futures is not worthwile, as the cost of creating a new thread for these process is not justified.
I want to basically run a few light functions in concurrently, without creating threads. Can i do that ?
Thanks,
Murtaza
Have you benchmarked?
Agents might well be a good solution anyway, since they use a fixed-size thread pool that gets re-used (so you aren't creating new threads constantly).
I've benchmarked quickly on my machine and can do over million agent calls in 3 seconds:
(def ag (agent 0))
(time (dotimes [i 1000000] (send ag inc)))
=> "Elapsed time: 2882.170586 msecs"
If agents are still too heavyweight (unlikely?), then you should probably be looking for a way to batch up a group of functions into a single block of work. If you do this, then the overhead of the concurrency primitives will be minimal.