Clojure idiomatic for synching between threads - clojure

The scenario I try to resolve is a s follows, I have a testing program that makes a web to a web endpoint on a system.
This test program has a jetty web server running on which it expects a callback from the external system that completes a successful test cycle. In case that the callback is not received during an specific time range (timeout), the test fails.
To achieve this, I want the test runner to wait on an "event" that the jetty handler will set upon callback.
I thought about using java's CyclicBarrier but I wonder if there is an idiomatic way in clojure to solve this.
Thanks

You can use promise you asked about recently :) Something like this:
(def completion (promise))
; In test runner.
; Wait 5 seconds then fail.
(let [result (deref completion 5000 :fail)]
(if (= result :success)
(println "Great!")
(println "Failed :(")))
; In jetty on callback
(deliver completion :success)

In straight Clojure, using an agent that tracks outstanding callbacks would make sense, though in practice I would recommend using Aleph, which is a library for asynchronous web programming that makes even driven handlers rather easy. It produces ring handlers, which sounds like it would fit nicely with your existing code.

Related

Does Clojure Ring create a thread for each request?

I am making a Messenger bot and I am using Ring as my http framework.
Sometime I want to apply delays between messages sent by the bot. My expectation would be that it is safe to use Thread/sleep because this will make the active thread sleep and not the entire server. Is that so, or should I resort to clojure/core.async?
This is the code I would be writing without async:
(match [reply]
; The bot wants to send a message (text, images, videos etc.) after n milliseconds
[{:message message :delay delay}]
(do
(Thread/sleep interval delay)
(facebook/send-message sender-id message))
; More code would follow...
A link to Ring code where its behaviour in this sense is clear would be appreciated, as well as any other with explanation on the matter.
Ring is the wrong thing to ask this question about: ring is not an http server, but rather an abstraction over http servers. Ring itself does not have a fixed threading model: all it really cares about is that you have a function from request to response.
What really makes this decision is which ring adapter you use. By far the most common is ring-jetty-adapter, which is a jetty http handler that delegates to your function through ring. And jetty does indeed have a single thread for each request, so that you can sleep in one thread without impacting others (but as noted in another answer, threads are not free, so you don't want to do a ton of this regularly).
But there are other ring handlers with different threading models. For example, aleph includes a ring adapter based on netty, which uses java.nio for non-blocking IO in a small, limited threadpool; in that case, sleeping on a "request thread" is very disruptive.
Assuming you're talking about code in a handler, Thread/sleep in Ring does make the thread for the request sleep. If you have multiple requests you are burning up expensive server threads.
The reason why Ring blocks is because the (non-async) model is based on function composition, where the result of one function is the output for another. So they have to wait, where exactly I can pinpoint this in the code I don't know.
Putting it in a go-block is better, because then you are not blocking server threads. It can return the response while you send the message. Do note that you cannot use results from the go block.
If you also want a response asynchronously (without blocking a server thread) you can for example use Pedestal.
For most servers synchronous handlers are sufficient, but if you are using Thread/sleeps AND want a response I would recommend using asynchronous Ring handlers or Pedestal or another framework.

How to best shut down a clojure core.async pipeline of processes

I have a clojure processing app that is a pipeline of channels. Each processing step does its computations asynchronously (ie. makes a http request using http-kit or something), and puts it result on the output channel. This way the next step can read from that channel and do its computation.
My main function looks like this
(defn -main [args]
(-> file/tmp-dir
(schedule/scheduler)
(search/searcher)
(process/resultprocessor)
(buy/buyer)
(report/reporter)))
Currently, the scheduler step drives the pipeline (it hasn't got an input channel), and provides the chain with workload.
When I run this in the REPL:
(-main "some args")
It basically runs forever due to the infinity of the scheduler. What is the best way to change this architecture such that I can shut down the whole system from the REPL? Does closing each channel means the system terminates?
Would some broadcast channel help?
You could have your scheduler alts! / alts!! on a kill channel and the input channel of your pipeline:
(def kill-channel (async/chan))
(defn scheduler [input output-ch kill-ch]
(loop []
(let [[v p] (async/alts!! [kill-ch [out-ch (preprocess input)]]
:priority true)]
(if-not (= p kill-ch)
(recur))))
Putting a value on kill-channel will then terminate the loop.
Technically you could also use output-ch to control the process (puts to closed channels return false), but I normally find explicit kill channels cleaner, at least for top-level pipelines.
To make things simultaneously more elegant and more convenient to use (both at the REPL and in production), you could use Stuart Sierra's component, start the scheduler loop (on a separate thread) and assoc the kill channel on to your component in the component's start method and then close! the kill channel (and thereby terminate the loop) in the component's stop method.
I would suggest using something like https://github.com/stuartsierra/component to handle system setup. It ensures that you could easily start and stop your system in the REPL. Using that library, you would set it up so that each processing step would be a component, and each component would handle setup and teardown of channels in their start and stop protocols. Also, you could probably create an IStream protocol for each component to implement and have each component depend on components implementing that protocol. It buys you some very easy modularity.
You'd end up with a system that looks like the following:
(component/system-map
:scheduler (schedule/new-scheduler file/tmp-dir)
:searcher (component/using (search/searcher)
{:in :scheduler})
:processor (component/using (process/resultprocessor)
{:in :searcher})
:buyer (component/using (buy/buyer)
{:in :processor})
:report (component/using (report/reporter)
{:in :buyer}))
One nice thing with this sort of approach is that you could easily add components if they rely on a channel as well. For example, if each component creates its out channel using a tap on an internal mult, you could add a logger for the processor just by a logging component that takes the processor as a dependency.
:processor (component/using (process/resultprocessor)
{:in :searcher})
:processor-logger (component/using (log/logger)
{:in processor})
I'd recommend watching his talk as well to get an idea of how it works.
You should consider using Stuart Sierra's reloaded workflow, which depends on modelling your 'pipeline' elements as components, that way you can model your logical singletons as 'classes', meaning you can control the construction and destruction (start/stop) logic for each one of them.

Clojure agent's send function is blocking

(def queue-agent (agent (clojure.lang.PersistentQueue/EMPTY)))
(send queue-agent conj "some data for the queue")
(println "test output")
If I run this code, after a couple (!) of seconds the console will output test output and then nothing happens (program is not terminating). I've just checked against a couple of sources that all said the send function is asynchronous and should return immediately to the calling thread. So what's wrong with this? Why is it not returning? Is there something wrong with me? Or with my environment?
So you have two issues: long startup time, and the program does not exit.
Startup: Clojure does not do any tree shaking. When you run a Clojure program, you load and bootstrap the compiler, and initialize namespaces, on every run. A couple of seconds sounds about right for a bare bones Clojure program.
Hanging: If you use the agent thread pool, you must run shutdown-agents if you want the program to exit. The vm simply doesn't know it is safe to shut them down.

Why does the CPU profile in Visual vm show the process spends all its time in a promise deref when using clojure's core async to read a kafka stream

I am running a clojure app reading from a kaka stream. I am using the shovel github project https://github.com/l1x/shovel to read from a kafka stream. When I profiled my application using visual vm looking for hotspots I noticed that most of the cpu time about 70% is being spent in the function clojure.core$promise$reify__6310.deref.
The shovel api consumer is a thinwrapper on the Kafka consumergroup api. It reads from a kafka topic and publishes out to a core async channel. Should i be concerned that my application latencies would be affected if i continued using this api. Is there any explanation why the reify on the promise is taking this much cpu time.
In Clojure, $ is used in the printed representation of a class to represent an inner class. clojure.core$promise$reify__6310.deref means calling the method deref on a class that is created via reify as an inner class of clojure.core/promise. As it turns out, if you look at the class of a promise, it will show up as an inner reified class inside clojure.core$promise.
A promise in Clojure represents data that may not yet be available. You can see its behavior in a repl:
user> (def p (promise))
#'user/p
user> (class p)
clojure.core$promise$reify__6363
user> (deref p)
This will hang and give no result, and not give the next repl prompt, until you deliver to the promise from another repl connection, or interrupt the deref call. The fact that time is being spent on deref of a promise simply means that the program logic is waiting on values that are not yet computed (or have not yet come in via the network, etc.).

Synchronization of the standard output in Clojure

I have a multithreaded application written in Clojure. There is a problem of making a text in the console display correctly when multiple threads write to STDOUT. How can I do this correctly in Clojure, so the lines won't look interlaced? I think this would involve some kind of separate IO agent, but I'm not really sure how to do that.
I think this would involve some kind of separate IO agent
Yes, that should work. Create an agent (def printer (agent nil)) and call it with the appropriate print statement, e.g, (send printer #(println msg)). The messages are put in a queue and are executed (asynchronously) one at a time.
For logging purposes you could also look at tools.logging which uses agents under the hood.