Synchronization of the standard output in Clojure - concurrency

I have a multithreaded application written in Clojure. There is a problem of making a text in the console display correctly when multiple threads write to STDOUT. How can I do this correctly in Clojure, so the lines won't look interlaced? I think this would involve some kind of separate IO agent, but I'm not really sure how to do that.

I think this would involve some kind of separate IO agent
Yes, that should work. Create an agent (def printer (agent nil)) and call it with the appropriate print statement, e.g, (send printer #(println msg)). The messages are put in a queue and are executed (asynchronously) one at a time.
For logging purposes you could also look at tools.logging which uses agents under the hood.

Related

How to best shut down a clojure core.async pipeline of processes

I have a clojure processing app that is a pipeline of channels. Each processing step does its computations asynchronously (ie. makes a http request using http-kit or something), and puts it result on the output channel. This way the next step can read from that channel and do its computation.
My main function looks like this
(defn -main [args]
(-> file/tmp-dir
(schedule/scheduler)
(search/searcher)
(process/resultprocessor)
(buy/buyer)
(report/reporter)))
Currently, the scheduler step drives the pipeline (it hasn't got an input channel), and provides the chain with workload.
When I run this in the REPL:
(-main "some args")
It basically runs forever due to the infinity of the scheduler. What is the best way to change this architecture such that I can shut down the whole system from the REPL? Does closing each channel means the system terminates?
Would some broadcast channel help?
You could have your scheduler alts! / alts!! on a kill channel and the input channel of your pipeline:
(def kill-channel (async/chan))
(defn scheduler [input output-ch kill-ch]
(loop []
(let [[v p] (async/alts!! [kill-ch [out-ch (preprocess input)]]
:priority true)]
(if-not (= p kill-ch)
(recur))))
Putting a value on kill-channel will then terminate the loop.
Technically you could also use output-ch to control the process (puts to closed channels return false), but I normally find explicit kill channels cleaner, at least for top-level pipelines.
To make things simultaneously more elegant and more convenient to use (both at the REPL and in production), you could use Stuart Sierra's component, start the scheduler loop (on a separate thread) and assoc the kill channel on to your component in the component's start method and then close! the kill channel (and thereby terminate the loop) in the component's stop method.
I would suggest using something like https://github.com/stuartsierra/component to handle system setup. It ensures that you could easily start and stop your system in the REPL. Using that library, you would set it up so that each processing step would be a component, and each component would handle setup and teardown of channels in their start and stop protocols. Also, you could probably create an IStream protocol for each component to implement and have each component depend on components implementing that protocol. It buys you some very easy modularity.
You'd end up with a system that looks like the following:
(component/system-map
:scheduler (schedule/new-scheduler file/tmp-dir)
:searcher (component/using (search/searcher)
{:in :scheduler})
:processor (component/using (process/resultprocessor)
{:in :searcher})
:buyer (component/using (buy/buyer)
{:in :processor})
:report (component/using (report/reporter)
{:in :buyer}))
One nice thing with this sort of approach is that you could easily add components if they rely on a channel as well. For example, if each component creates its out channel using a tap on an internal mult, you could add a logger for the processor just by a logging component that takes the processor as a dependency.
:processor (component/using (process/resultprocessor)
{:in :searcher})
:processor-logger (component/using (log/logger)
{:in processor})
I'd recommend watching his talk as well to get an idea of how it works.
You should consider using Stuart Sierra's reloaded workflow, which depends on modelling your 'pipeline' elements as components, that way you can model your logical singletons as 'classes', meaning you can control the construction and destruction (start/stop) logic for each one of them.

Is Elixir's System.cmd blocking

I'd like to make a system that pulls github repositories automaticly using
System.cmd("git",["pull", link])
Is this command blocking? If I start it concurrently in many actors will I be always able to get as many pulls as actors (or at least socket limit for the system)?
If not is there anyway to acheive it?
Erlang and thus Elixir IO is non-blocking, so the IO of one process does not generally affect other processes in any way. Joe Armstrong describes this in a blog post:
So our code “looks like” we’re doing a synchronous blocking read.
Looks like was in quotes, because it’s not actually a blocking read,
it’s really an asynchronous read which does not block any other Erlang
processes.

Clojure agent's send function is blocking

(def queue-agent (agent (clojure.lang.PersistentQueue/EMPTY)))
(send queue-agent conj "some data for the queue")
(println "test output")
If I run this code, after a couple (!) of seconds the console will output test output and then nothing happens (program is not terminating). I've just checked against a couple of sources that all said the send function is asynchronous and should return immediately to the calling thread. So what's wrong with this? Why is it not returning? Is there something wrong with me? Or with my environment?
So you have two issues: long startup time, and the program does not exit.
Startup: Clojure does not do any tree shaking. When you run a Clojure program, you load and bootstrap the compiler, and initialize namespaces, on every run. A couple of seconds sounds about right for a bare bones Clojure program.
Hanging: If you use the agent thread pool, you must run shutdown-agents if you want the program to exit. The vm simply doesn't know it is safe to shut them down.

Why does the CPU profile in Visual vm show the process spends all its time in a promise deref when using clojure's core async to read a kafka stream

I am running a clojure app reading from a kaka stream. I am using the shovel github project https://github.com/l1x/shovel to read from a kafka stream. When I profiled my application using visual vm looking for hotspots I noticed that most of the cpu time about 70% is being spent in the function clojure.core$promise$reify__6310.deref.
The shovel api consumer is a thinwrapper on the Kafka consumergroup api. It reads from a kafka topic and publishes out to a core async channel. Should i be concerned that my application latencies would be affected if i continued using this api. Is there any explanation why the reify on the promise is taking this much cpu time.
In Clojure, $ is used in the printed representation of a class to represent an inner class. clojure.core$promise$reify__6310.deref means calling the method deref on a class that is created via reify as an inner class of clojure.core/promise. As it turns out, if you look at the class of a promise, it will show up as an inner reified class inside clojure.core$promise.
A promise in Clojure represents data that may not yet be available. You can see its behavior in a repl:
user> (def p (promise))
#'user/p
user> (class p)
clojure.core$promise$reify__6363
user> (deref p)
This will hang and give no result, and not give the next repl prompt, until you deliver to the promise from another repl connection, or interrupt the deref call. The fact that time is being spent on deref of a promise simply means that the program logic is waiting on values that are not yet computed (or have not yet come in via the network, etc.).

Clojure idiomatic for synching between threads

The scenario I try to resolve is a s follows, I have a testing program that makes a web to a web endpoint on a system.
This test program has a jetty web server running on which it expects a callback from the external system that completes a successful test cycle. In case that the callback is not received during an specific time range (timeout), the test fails.
To achieve this, I want the test runner to wait on an "event" that the jetty handler will set upon callback.
I thought about using java's CyclicBarrier but I wonder if there is an idiomatic way in clojure to solve this.
Thanks
You can use promise you asked about recently :) Something like this:
(def completion (promise))
; In test runner.
; Wait 5 seconds then fail.
(let [result (deref completion 5000 :fail)]
(if (= result :success)
(println "Great!")
(println "Failed :(")))
; In jetty on callback
(deliver completion :success)
In straight Clojure, using an agent that tracks outstanding callbacks would make sense, though in practice I would recommend using Aleph, which is a library for asynchronous web programming that makes even driven handlers rather easy. It produces ring handlers, which sounds like it would fit nicely with your existing code.