Infinite lazy-sequence of events from external feed

Infinite lazy-sequence of events from external feed - clojure

Say I have a function, (get-events "feed"), that returns a vector of events in chronological order, taken from an external source.
Now, at any given moment, that function returns a list of events up to that point in time. Called a few seconds later, it will return a few more events, etc, as the feed continually grows.
If I want to create a lazy-seq that forever pulls new events from the feed, making sure it doesn't repeat those that have already been seen, how would I write this? I'm running into a stack overflow error when I don't use recur, but I can't use recur, because it doesn't appear in a tail position.
(def continually-list-events
([feed] (continually-list-events feed (hash-set)))
([feed seen]
(let [events-now (get-events feed)]
(into (remove seen events-now)
(lazy-seq
(continually-list-events feed
(into seen events-now))))))
You can see I'm trying to use an accumulator to track events already seen (in a set), and I'm making sure to always filter out the ones I've seen.

If each step keeps track of how many events have been received so far then that iteration can return a sequence of new events by dropping the old ones.
user> (->> (iterate (fn [[events-so-far contents]]
(let [events (get-events)
new-events (drop events-so-far events)]
[(count events) new-events])))
(mapcat second))
Then you can drop the counts from the sequence and flatten the chunks of events into a sequence of single events.
In your example the stackoverflow is because there is no call to cons after the call to lazy-seq so it's calculating the whole list as the first item in the sequence.
user> (defn example [x] (lazy-seq (cons x (example (inc x)))))
#'user/example
user> (take 5 (example 4))
(4 5 6 7 8)
user> (defn example [x] (lazy-seq (example (inc x))))
#'user/example
user> (take 5 (example 4))
... long pause then out of memory ...
PS: using lazy-seq directly is somewhat uncommon, though it's important to know how it works.

Related

Sleeping Barber Problem in Clojure using core/async and go

Hello and thank you for any help. I am just starting to learn Clojure, and think its amazing. Below is my codes for the sleeping barber problem. I thought that dropping-buffers from core/async would be perfect for this problem, and while it seems to work it never stops.
The haircuts and dropping buffer seem to work right.
---Edited
It does stop now. But I get an error trying to check customer-num for nil (I've noted the line in the code below). It seems like it can't do an if on a nil because it's nil!
(if (not (nil? customer-num)) ;; throws error => Cannot invoke "clojure.lang.IFn.invoke()" because the return value of "clojure.lang.IFn.invoke(Object)" is null
---End of edit
Also, how to get the return value of the number of haircuts to the calling operate-shop?
Sleeping barber problem as written up in Seven Languages in Seven Weeks. It was created by Edsger Dijkstra in 1965.
A barber shop takes customers.
Customers arrive at random intervals, from ten to thirty milliseconds.
The barber shop has three chairs in the waiting room.
The barber shop has one barber and one barber chair.
When the barber’s chair is empty, a customer sits in the chair, wakes up the barber, and gets a haircut.
If the chairs are occupied, all new customers will turn away.
Haircuts take twenty milliseconds.
After a customer receives a haircut, he gets up and leaves.
Determine how many haircuts a barber can give in ten seconds.
(ns sleepbarber
(:require [clojure.core.async
:as a
:refer [>! <! >!! <!! go go-loop chan dropping-buffer close! thread
alt! alts! alts!! timeout]]))
(def barber-shop (chan (dropping-buffer 3))) ;; no more than 3 customers waiting
(defn cut-hair []
(go-loop [haircuts 0]
(let [customer-num (<! barber-shop)]
(if (not (nil? customer-num)) ;; throws error => Cannot invoke "clojure.lang.IFn.invoke()" because the return value of "clojure.lang.IFn.invoke(Object)" is null
(do (<! (timeout 20)) ;; wait for haircut to finish
(println haircuts "haircuts!" (- customer-num haircuts) "customers turned away!!")
(recur (inc haircuts)))
haircuts))))
(defn operate-shop [open-time]
((let [[_ opening] (alts!! [(timeout open-time)
(go-loop [customer 0]
(<! (timeout (+ 10 (rand-int 20)))) ;; wait for random arrival of customers
(>! barber-shop customer)
(recur (+ customer 1)))])]
(close! barber-shop)
(close! opening)
)))
(cut-hair)
(operate-shop 2000)

Without running your code to confirm my suspicions, I see two problems with your implementation.
The first is that the body of operate-shop starts with ((, which you appear to intend as a grouping mechanism. But of course, in Clojure, (f x y) is how you call the function f with arguments x y. So your implementation calls alts!, then calls close!, then calls shutdown-agents - all intended so far - but then calls the result of alts! (which surely is not a function) with two nil arguments. So you should get a ClassCastException once your shop closes. Normally I would recommend just removing the outer parens, but since you're using core.async you should wrap the body in go, as in (go x y z). Is this your real code? If you call alts! outside of a go context, as your snippet suggests, you can only get a runtime error.
The second is that your first go-loop has no termination condition. You treat customer-num as if it were a number, but if the channel is closed, it will be nil: that's how you can tell a channel is closed. Involving it in subtraction should throw some kind of exception. Instead, you should check whether the result is nil, and if so, exit the loop as the shop is closed.

Closing a channel at the producer end when all the jobs are finished

For my Mandelbrot explorer project, I need to run several expensive jobs, ideally in parallel. I decided to try chunking the jobs, and running each chunk in its own thread, and end ended up with something like
(defn point-calculator [chunk-size points]
(let [out-chan (chan (count points))
chunked (partition chunk-size points)]
(doseq [chunk chunked]
(thread
(let [processed-chunk (expensive-calculation chunk)]
(>!! out-chan processed-chunk))))
out-chan))
Where points is a list of [real, imaginary] coordinates to be tested, and expensive-calculation is a function that takes the chunk, and tests each point in the chunk. Each chunk can take a long time to finish (potentially a minute or more depending on the chunk size and the number of jobs).
On my consumer end, I'm using
(loop []
(when-let [proc-chunk (<!! result-chan)]
; Do stuff with chunk
(recur)))
To consume each processed chunk. Right now, this blocks when the last chunk is consumed since the channel is still open.
I need a way of closing the channel when the jobs are done. This is proving difficult because of asynchronicity of the producer loop. I can't simply put a close! after the doseq since the loop doesn't block, and I can't just close when the last-indexed job is done, since the order is indeterminate.
The best idea I could come up with was maintaining a (atom #{}) of jobs, and disj each job as it finishes. Then I could either check for the set size in the loop, and close! when it's 0, or attach a watch to the atom and check there.
This seems very hackish though. Is there a more idiomatic way of dealing with this? Does this scenario suggest I'm using async incorrectly?

i would take a look at the take function from core-async. That is what it's documentation says:
"Returns a channel that will return, at most, n items from ch. After n items
have been returned, or ch has been closed, the return channel will close.
"
so it leads you to a simple fix: instead of returning out-chan you can just wrap it into take:
(clojure.core.async/take (count chunked) out-chan)
that should work.
Also i would recommend you to rewrite your example from blocking put/get to parking (<!, >!) and thread to go / go-loop which is more idiomatic usage for core async.

You may want to use async/pipeline(-blocking) to control parallelisms. And use aysnc/onto-chan to close the input channel automatically after all the chunks are copied.
E.g. below example shows a 16x improvement on elapsed time when parallelisms is set to 16.
(defn expensive-calculation [pts]
(Thread/sleep 100)
(reduce + pts))
(time
(let [points (take 10000 (repeatedly #(rand 100)))
chunk-size 500
inp-chan (chan)
out-chan (chan)]
(go-loop [] (when-let [res (<! out-chan)]
;; do stuff with chunk
(recur)))
(pipeline-blocking 16 out-chan (map expensive-calculation) inp-chan)
(<!! (onto-chan inp-chan (partition-all chunk-size points)))))

Waiting for n channels with core.async

In the same way alt! waits for one of n channels to get a value, I'm looking for the idiomatic way to wait for all n channels to get a value.
I need this because I "spawn" n go blocks to work on async tasks, and I want to know when they are all done. I'm sure there is a very beautiful way to achieve this.

Use the core.async map function:
(<!! (a/map vector [ch1 ch2 ch3]))
;; [val-from-ch-1 val-from-ch2 val-from-ch3]

You can say (mapv #(async/<!! %) channels).
If you wanted to handle individual values as they arrive, and then do something special after the final channel produces a value, you can use exploit the fact that alts! / alts!! take a vector of channels, and they are functions, not macros, so you can easily pass in dynamically constructed vectors.
So, you can use alts!! to wait on your initial collection of n channels, then use it again on the remaining channels etc.
(def c1 (async/chan))
(def c2 (async/chan))
(def out
(async/thread
(loop [cs [c1 c2] vs []]
(let [[v p] (async/alts!! cs)
cs (filterv #(not= p %) cs)
vs (conj vs v)]
(if (seq cs)
(recur cs vs)
vs)))))
(async/>!! c1 :foo)
(async/>!! c2 :bar)
(async/<!! out)
;= [:foo :bar]
If instead you wanted to take all values from all the input channels and then do something else when they all close, you'd want to use async/merge:
clojure.core.async/merge
([chs] [chs buf-or-n])
Takes a collection of source channels and returns a channel which
contains all values taken from them. The returned channel will be
unbuffered by default, or a buf-or-n can be supplied. The channel
will close after all the source channels have closed.

Iterator blocks in Clojure?

I am using clojure.contrib.sql to fetch some records from an SQLite database.
(defn read-all-foo []
(with-connection *db*
(with-query-results res ["select * from foo"]
(into [] res))))
Now, I don't really want to realize the whole sequence before returning from the function (i.e. I want to keep it lazy), but if I return res directly or wrap it some kind of lazy wrapper (for example I want to make a certain map transformation on result sequence), SQL-related bindings will be reset and connection will be closed after I return, so realizing the sequence will throw an exception.
How can I enclose the whole function in a closure and return a kind of iterator block (like yield in C# or Python)?
Or is there another way to return a lazy sequence from this function?

The resultset-seq that with-query-results returns is probably already as lazy as you're going to get. Laziness only works as long as the handle is open, as you said. There's no way around this. You can't read from a database if the database handle is closed.
If you need to do I/O and keep the data after the handle is closed, then open the handle, slurp it in fast (defeating laziness), close the handle, and work with the results afterward. If you want to iterate over some data without keeping it all in memory at once, then open the handle, get a lazy seq on the data, doseq over it, then close the handle.
So if you want to do something with each row (for side-effects) and discard the results without eating the whole resultset into memory, then you could do this:
(defn do-something-with-all-foo [f]
(let [sql "select * from foo"]
(with-connection *db*
(with-query-results res [sql]
(doseq [row res]
(f row))))))
user> (do-something-with-all-foo println)
{:id 1}
{:id 2}
{:id 3}
nil
;; transforming the data as you go
user> (do-something-with-all-foo #(println (assoc % :bar :baz)))
{:id 1, :bar :baz}
{:id 2, :bar :baz}
{:id 3, :bar :baz}
If you want your data to hang around long-term, then you may as well slurp it all in using your read-all-foo function above (thus defeating laziness). If you want to transform the data, then map over the results after you've fetched it all. Your data will all be in memory at that point, but the map call itself and your post-fetch data transformations will be lazy.

It is in fact possible to add a "terminating side-effect" to a lazy sequence, to be executed once, when the entire sequence is consumed for the first time:
(def s (lazy-cat (range 10) (do (println :foo) nil)))
(first s)
; => returns 0, prints out nothing
(doall (take 10 s))
; => returns (0 1 2 3 4 5 6 7 8 9), prints nothing
(last s)
; => returns 9, prints :foo
(doall s)
; => returns (0 1 2 3 4 5 6 7 8 9), prints :foo
; or rather, prints :foo if it it's the first time s has been
; consumed in full; you'll have to redefine it if you called
; (last s) earlier
I'm not sure I'd use this to close a DB connection, though -- I think it's considered best practice not to hold on to a DB connection indefinitely and putting your connection-closing call at the end of your lazy sequence of results would not only hold on to the connection longer than strictly necessary, but also open up the possibility that your programme will fail for an unrelated reason without ever closing the connection. Thus for this scenario, I would normally just slurp in all data. As Brian says, you can store it all somewhere unprocessed, than perform any transformations lazily, so you should be fine as long as you're not trying to pull in a really huge dataset in one chunk.
But then I don't know your exact circumstances, so if it makes sense from your point of view, you can definitely call a connection-closing function at the tail end of your result sequence. As Michiel Borkent points out, you wouldn't be able to use with-connection if you wanted to do this.

I have never used SQLite with Clojure before, but my guess is that with-connection closes the connection when it's body has been evaluated. So you need to manage the connection yourself if you want to keep it open, and close it when you finish reading the elements you're interested in.

There is no way to create a function or macro "on top" of with-connection and with-query-results to add lazyness. Both close the their Connection and ResultSet respectively, when control flow leaves the lexical scope.
As Michal said, it would be no problem to create a lazy seq, closing its ResultSet and Connection lazily. As he also said, it wouldn't be a good idea, unless you can guarantee that the sequences are eventually finished.
A feasible solution might be:
(def *deferred-resultsets*)
(defmacro with-deferred-close [&body]
(binding [*deferred-resultsets* (atom #{})]
(let [ret# (do ~#body)]
;;; close resultsets
ret# ))
(defmacro with-deferred-results [bind-form sql & body]
(let [resultset# (execute-query ...)]
(swap! *deferred-resultsets* conj resultset# )
;;; execute body, similar to with-query-results
;;; but leave resultset open
))
This would allow for e.g. keeping the resultsets open until the current request is finished.

Why does Clojure hang after having performed my calculations?

I'm experimenting with filtering through elements in parallel. For each element, I need to perform a distance calculation to see if it is close enough to a target point. Never mind that data structures already exist for doing this, I'm just doing initial experiments for now.
Anyway, I wanted to run some very basic experiments where I generate random vectors and filter them. Here's my implementation that does all of this
(defn pfilter [pred coll]
(map second
(filter first
(pmap (fn [item] [(pred item) item]) coll))))
(defn random-n-vector [n]
(take n (repeatedly rand)))
(defn distance [u v]
(Math/sqrt (reduce + (map #(Math/pow (- %1 %2) 2) u v))))
(defn -main [& args]
(let [[n-str vectors-str threshold-str] args
n (Integer/parseInt n-str)
vectors (Integer/parseInt vectors-str)
threshold (Double/parseDouble threshold-str)
random-vector (partial random-n-vector n)
u (random-vector)]
(time (println n vectors
(count
(pfilter
(fn [v] (< (distance u v) threshold))
(take vectors (repeatedly random-vector))))))))
The code executes and returns what I expect, that is the parameter n (length of vectors), vectors (the number of vectors) and the number of vectors that are closer than a threshold to the target vector. What I don't understand is why the programs hangs for an additional minute before terminating.
Here is the output of a run which demonstrates the error
$ time lein run 10 100000 1.0
[null] 10 100000 12283
[null] "Elapsed time: 3300.856 msecs"
real 1m6.336s
user 0m7.204s
sys 0m1.495s
Any comments on how to filter in parallel in general are also more than welcome, as I haven't yet confirmed that pfilter actually works.

You need to call shutdown-agents to kill the threads backing the threadpool used by pmap.
About pfilter, it should work but run slower than filter, since your predicate is simple. Parallelization isn't free so you have to give each thread moderately intensive tasks to offset the multithreading overhead. Batch your items before filtering them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Infinite lazy-sequence of events from external feed - clojure

Related

Sleeping Barber Problem in Clojure using core/async and go

Closing a channel at the producer end when all the jobs are finished

Waiting for n channels with core.async

Iterator blocks in Clojure?

Why does Clojure hang after having performed my calculations?

Categories

Resources