I have some core.async code with a pipeline of two chans and three nodes :
a producer - function that puts values into chan1 with >!! (it's not in a go-block but the function is called from inside a go-loop)
a filter - another function that's not in a go-block but is called within a go-loop, which pulls items from chan1 (with <!!), does a test and if the test passes pushes them onto chan2 (with >!!)
a consumer - an ordinary loop that pulls n values of chan2 with
This code works as expected when I run it as a simple program. But when I copy and paste it to work within a unit-test, it freezes up.
My test code is roughly
(deftest a-test
(testing "blah"
(is (= (let [c1 (chan)
c2 (chan)
gen (make-generator c1)
filt (make-filter c1 c2)
result (collector c2 10) ]
result)
[0 2 4 6 8 10 12 14 16 18 20]))
))
where the generator creates a sequence of integers counting up from zero and the filter tests for evenness.
As far as I can tell, the filter is able to pull the first value from the c1, but is blocked waiting for a second value. Meanwhile, the generator is blocking while waiting to push its next value into c1.
But this doesn't happen when I run the code in a simple stand-alone program.
So, is there any reason that the unit-test framework might be interfering or causing problems with the threading management that core.async is providing? Is it possible to do unit-testing on async code like this?
I'm concerned that I'm not running the collector in any kind of go-block or go-loop so presumably it might be blocking the main thread. But equally, I presume I have to pull all the data back into the main thread eventually. And if not through that mechanism, how?
While using blocking IO within go-blocks/go-loops isn't the best solution, thread macro may be better fit here. It will execute passed body on separate thread, so you may freely use blocking operations there.
Related
haven't done any Clojure for couple years, so decided to go back and not ignore core.async this time around ) pretty cool stuff, that - but it surprised me almost immediately. Now, I understand that there's inherent indeterminism when multiple threads are involved, but there's something bigger than that at play here.
The source code for my oh-so-simple example, where I am trying to copy lines from STDIN to a file:
(defn append-to-file
"Write a string to the end of a file"
([filename s]
(spit filename (str s "\n")
:append true))
([s]
(append-to-file "/tmp/journal.txt" s)))
(defn -main
"I don't do a whole lot ... yet."
[& args]
(println "Initializing..")
(let [out-chan (a/chan)]
(loop [line (read-line)]
(if (empty? line) :ok
(do
(go (>! out-chan line))
(go (append-to-file (<! out-chan)))
(recur (read-line)))))))
except, of course, this turned out to be not so simple. I think I've narrowed it down to something that's not properly cleaned up. Basically, running the main function produces inconsistent results. Sometimes I run it 4 times, and see 12 lines in the output. But sometimes, 4 run will produce just 10 lines. Or like below, 3 times, 6 lines:
akamac.home ➜ coras git:(master) ✗ make clean
cat /dev/null > /tmp/journal.txt
lein clean
akamac.home ➜ coras git:(master) ✗ make compile
lein uberjar
Compiling coras.core
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT.jar
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar
akamac.home ➜ coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜ coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜ coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜ coras git:(master) ✗ make check
cat /tmp/journal.txt
line a
line z
line b
line a
line b
line z
(Basically, sometimes a run produced 3 lines, sometimes 0, sometimes 1 or 2).
The fact that lines appear in random order doesn't bother me - go blocks do things in a concurrent/threaded manner, and all bets are off. But why they don't do all of work all the time? (Because I am misusing them somehow, but where?)
Thanks!
There's many problems with this code, let me walk through them real quick:
1) Every time you call (go ...) you're spinning of a new "thread" that will be executed in a thread pool. It is undefined when this thread will run.
2) You aren't waiting for the completion of these threads, so it's possible (and very likely) that you will end up reading several lines form the file, writing several lines to the channel, before a read even occurs.
3) You are firing off multiple calls to append-to-file at the same time (see #2) these functions are not synchronized, so it's possible that multiple threads will append at once. Since access to files in most OSes is uncoordinated it's possible for two threads to write to your file at the same time, overwriting eachother's results.
4) Since you are creating a new go block for every line read, it's possible they will execute in a different order than you expect, this means the lines in the output file may be out of order.
I think all this can be fixed a bit by avoiding a rather common anti-pattern with core.async: don't create go blocks (or threads) inside unbounded or large loops. Often this is doing something you don't expect. Instead create one core.async/thread with a loop that reads from the file (since it's doing IO, never do IO inside a go block) and writes to the channel, and one that reads from the channel and writes to the output file.
View this as an assembly line build out of workers (go blocks) and conveyor belts (channels). If you built a factory you wouldn't have a pile of people and pair them up saying "you take one item, when you're done hand it to him". Instead you'd organize all the people once, with conveyors between them and "flow" the work (or data) between the workers. Your workers should be static, and your data should be moving.
.. and of course, this was a misuse of core.async on my part:
If I care about seeing all the data in the output, I must use a blocking 'take' on the channel, when I want to pass the value to my I/O code -- and, as it was pointed out, that blocking call should not be inside a go block. A single line change was all I needed:
from:
(go (append-to-file (<! out-chan)))
to:
(append-to-file (<!! out-chan))
I am trying to send requests to a server. Each request is referenced by an integer. The server will only respond to requests that come in ascending order - that is, if I send request 7 and then request 6, it will disregard request 6. I am working in a multi-threaded environment where several threads can send requests concurrently. In Java, I solved the problem this way:
synchronized(this){
r = requestId.incrementAndGet();//requestId is an AtomicInteger
socket.sendRequest(r, other_parameters);
}
In Clojure, I thought about defining request-id as an Atom and doing the following:
(send-request socket (swap! request-id inc) other-parameters)
Does that work or is it possible thread 1 increments the atom, but by the time the send-request function sends the request, thread 2 increments the atom again and somehow contacts the server first? What is the best way to avoid such a scenario?
Thank you,
Clojure's equivalent construct to synchronized is locking, which can be used in basically the same way:
(locking some-lock
(let [r (.incrementAndGet requestId)]
(.sendRequest socket r other_parameters)))
Where some-lock is the object you're locking on. I'm not sure what you'd want that to be in the absence of this.
I have a function which processes a stream of values from a shared channel and looks for a value which satisfies a specific predicate:
(defn find-my-object [my-channel]
(go-loop []
(when-some [value (<! my-channel)]
(if (some-predicate? value)
value
(recur)
Now I want this function to return some 'failure' value after waiting for a timeout, which I have implemented like this:
(alts! [(find-my-object my-channel) (timeout 1000)])
The problem with this is that the go-loop above continues to execute after the timeout. I want find-my-object to drain values from my-channel while the search is ongoing, but I don't want to close my-channel on timeout, since it's used in some other places. What is the idiomatic way to implement this functionality?
Read from the timeout channel inside of find-my-object, not outside of it. You can hardcode that if you like, or make the function take a second channel argument, which is the "please stop reading things" channel. Then you can pass it a timeout channel to get the behavior you want now, or pass it some other kind of channel to control it in a different way.
I have followed here and am getting things called segments arriving at the aggregator. These segments all arrive and I can print them out as they arrive. But what I want to do is make an immutable data structure (a vector) out of them as they arrive. Or even wait for them all to arrive and then make the vector. I'll be able to know when the last one has arrived and sort them. I need to conj the arriving segment to the existing so-far-built-up vector. I'm used to creating vectors like this using returns from function calls, but I can't see how this facility is available to me in a thread or go block.
Another option is to use (clojure.core.async/into [] source-chan) . Like other "reducer-chans" in the async toolkit, it's also built on the assumption that you will close! the source channel when you're ready to receive the result.
Assumedly, since you are able to receive multiple items, your async code is already in a loop.
In order to build up a vector from the items you get, you should use a loop binding.
(def acc-chan
(>/go-loop [accumulator []]
(let [item (>/<! source-chan)]
(if (nil? item)
accumulator
(recur (conj accumulator item)))))
The go-loop call will immediately return a acc-chan, which will receive the loop's return value (the accumulator) when the loop exits. The accumulator is re-bound on each iteration of the loop, adding another item to the end. When source is closed, the accumulator is returned from the loop, and placed onto acc-chan where you can read it and use the value.
I'm looking at Clojure core.async for the first time, and was going through this excellent presentation by Rich Hickey: http://www.infoq.com/presentations/clojure-core-async
I had a question about the example he shows at the end of his presentation:
According to Rich, this example basically tries to get a web, video, and image result for a specific query. It tries two different sources in parallel for each of those results, and just pulls out the fastest result for each. And the entire operation can take no more than 80ms, so if we can't get e.g. an image result in 80ms, we'll just give up. The 'fastest' function creates and returns a new channel, and starts two go processes racing to retrieve a result and put it on the channel. Then we just take the first result off of the 'fastest' channel and slap it onto the c channel.
My question: what happens to these three temporary, unnamed 'fastest' channels after we take their first result? Presumably there is still a go process which is parked trying to put the second result onto the channel, but no one is listening so it never actually completes. And since the channel is never bound to anything, it doesn't seem like we have any way of doing anything with it ever again. Will the go process & channel "realize" that no one cares about their results any more and clean themselves up? Or did we essentially just "leak" three channels / go processes in this code?
There is no leak.
Parked gos are attached to channels on which they attempted to perform an operation and have no independent existence beyond that. If other code loses interest in the channels a certain go is parked on (NB. a go can simultaneously become a putter/taker on many channels if it parks on alt! / alts!), then eventually it'll be GC'd along with those channels.
The only caveat is that in order to be GC'd, gos actually have to park first. So any go that keeps doing stuff in a loop without ever parking (<! / >! / alt! / alts!) will in fact live forever. It's hard to write this sort of code by accident, though.
Caveats and exceptions aside, you can test garbage collection on the JVM at the REPL.
eg:
(require '[clojure.core.async :as async])
=> nil
(def c (async/chan))
=> #'user/c
(def d (async/go-loop []
(when-let [v (async/<! c)]
(println v)
(recur))))
=> #'user/d
(async/>!! c :hi)
=> true
:hi ; core.async go block is working
(import java.lang.ref.WeakReference)
=> java.lang.ref.WeakReference ; hold a reference without preventing garbage collection
(def e (WeakReference. c))
=> #'user/e
(def f (WeakReference. d))
=> #'user/f
(.get e)
=> #object[...]
(.get f)
=> #object[...]
(def c nil)
=> #'user/c
(def d nil)
=> #'user/d
(println "We need to clear *1, *2 and *3 in the REPL.")
We need to clear *1, *2 and *3 in the REPL.
=> nil
(println *1 *2 *3)
nil #'user/d #'user/c
=> nil
(System/gc)
=> nil
(.get e)
=> nil
(.get f)
=> nil
What just happened? I setup a go block and checked it was working. Then used a WeakReference to observe the communication channel (c) and the go block return channel (d). Then I removed all references to c and d (including *1, *2 and *3 created by my REPL), requested garbage collection, (and got lucky, the System.gc Javadoc does not make strong guarantees) and then observed that my weak references had been cleared.
In this case at least, once references to the channels involved had been removed, the channels were garbage collected (regardless of my failure to close them!)
Assumedly a channel produced by fastest only returns the result of the fastest query method and then closes.
If a second result was produced, your assumption could hold that the fastest processeses are leaked. Their results are never consumed. If they relied on all their results to be consumed to terminate, they wouldn't terminate.
Notice that this could also happen if the channel t is selected in the alt! clause.
The usualy way to fix this would be to close the channel c in the last go block with close!. Puts made to a closed channel will then be dropped then and the producers can terminate.
The problem could also be solved in the implementation of fastest. The process created in fastest could itself make the put via alts! and timeout and terminate if the produced values are not consumed within a certain amount of time.
I guess Rich did not address the problem in the slide in favor of a less lengthy example.