Getting "java.io.EOFException: JSON error" using the clojure twitter-api - clojure

I've written some simple clojure code that accesses the twitter streaming api. My code is essentially the same as the example code described in the twitter-api docs:
(def ^:dynamic *custom-streaming-callback*
(AsyncStreamingCallback. (comp println #(:text %) json/read-json #(str %2))
(comp println response-return-everything)
exception-print))
(defn start-filtering []
(statuses-filter :params {:follow 12345}
:oauth-creds *creds*
:callbacks *custom-streaming-callback*))
I'm following tweets about a specific user and using oauth for authentication (not shown). When I run the start-filtering method and a connection is opened with twitter everything works well for a spell, but if the stream is inactive for a bit (around 30 seconds), i.e. no tweets about this particular user are coming down the pike, the following error occurs:
#<EOFException java.io.EOFException: JSON error (end-of-file)>
I assumed from the twitter docs that when using a streaming connection, twitter keeps the stream open indefinitely. I must be making some incorrect assumptions. I'm currently diving into the clojure twitter-api code to see what's going on, but I thought more eyes would help me figure this out more quickly.

I had the same issue that you have. As you found, the streaming function emits an empty message if no data has been received in the last thirty seconds or so.
Trying to read this as json then causes the EOF exception that you see.
I don't know of any way to prevent these calls. In my case I worked around the issue with a simple conditional that falls back to an empty map when there is no JSON to read.
(if-not (clojure.string/blank? %)
(json/read-str % :key-fn keyword)
{})

Related

clojure Riemann project collectd

I am trying to do a custom configuration apparently simple using Riemann and Collectd. Basically I'd like to calculate the ratio between two streams. In order to do that I tried something like (as in Rieamann API project suggestion here):
(project [(service "cahe-miss")
(service "cache-all")]
(smap folds/quotient
(with :service "ratio"
index)))
Which apparently works, but after a while I noticed some of the results where miss calculated. After log debugging I finished with the following configuration in order to see what's happening and proint the values:
(project [(service "cache-miss")
(service "cache-all")]
(fn [[miss all]]
(if (or (nil? miss) (nil? all))
(do nil)
(do (where (= (:time miss) (:time all))
;to print time marks
(println (:time all))
(println (:time miss))
; to distinguish easily each event
(println "NEW LINE")
))
)
)
)
My surprise is that each time I get new data from collectd (every 10 seconds) the function I created is executed twice, like reusing previous unused data, and more over it looks like it doesn't care at all about my time equality constraint in the (where (= :time....) clasue. The problem is than I am dividing metrics with different time stamp. Below some ouput of the previous code:
1445606294
1445606294
NEW LINE -- First time I get data
1445606304
1445606294
NEW LINE
1445606304
1445606304
NEW LINE -- Second time I get data
1445606314
1445606304
NEW LINE
1445606314
1445606314
NEW LINE -- Third time I get data
Is there anyone that can give a hint on how to get the data formatted as I expected? I assume there is something I am not understading about the "project" function. Or something related to how incoming data is processed in riemann.
Thanks in advance!
Updated
I managed to solve my problem but still I don't have a clear idea of how it works, however I managed to do so. Right now I am receiving two different streams from collectd tail plugin (from nginx logs) and I managed to make the quotient between them as it follows:
(where (or (service "nginx/counter-cacheHit") (service "nginx/counter-cacheAll"))
(coalesce
(smap folds/quotient (with :service "cacheHit" (scale (* 1 100) index)))))
I have tested it widely and up to now it produces the right results. However I still don't understand several things... First, how it is that coalesce only returns data after both events are processed. Collectd sends the events of the both streams every two seconds with the same time mark, usin "project" instead of "coalesce" resulted in two different executions of smap each two seconds (one for each event), however coalesce results only with one execution of smap with the two events with the same time mark, which is exactly what I wanted.
Finally, I don't know which is the criteria to choose which is the numerator and denominator. Is it becaouse of the "or" clauses in "where" clause?
Anyway, with some blackmagic behind it but I managed to solve my problem ;^)
Thank you all!
taking the ratios between streams that where moving at different rates didn't work out for me. I have since settled on calculating ratios and rates within a fixed time interval or a moving time interval. This way you are capturing a consistent snapshot of events in a time block and calculating over this. Here is some elided code from comparing the rate a service is receiving events to the rate at which it is forwarding events:
(moving-time-window 30 ;; seconds
(smap (fn [events]
(let [in (or (->> events
(filter #(= (:service %) "event-received"))
count)
0)
out (or (->> events
(filter #(= (:service %) "event-sent"))
count)
0)
flow-rate (float (if (> in 0) (/ out in) 0))]
{:service "flow rate"
:metric flow-rate
:host "All"
:state (if (< flow-rate 0.99) "WARNING" "OK")
:time (:time (last events))
:ttl default-interval}))
(tag ["some" "tags" "here"] index)
(where (and
(< (:metric event) 0.9)
(= (:environment event) "production"))
(throttle 1 3600 send-to-slack))))
This takes in a fixed window of events, calculates the ratio for that block and emits an event containing that ratio as it's metric. then if the metric is bad it calls me on slack.

Clojure - core.async merge unidirectional channel

I have two unidirectional core.async channels :
channel out can only put!
channel in can only take!
And since this is ClojureScript the blocking operations are not available. I would like to make one bidirectional (in-out) channel out of those two (in and out).
(def in (async/chan))
(def out (async/chan))
(def in-out (io-chan in out)) ;; io or whatever the solution is
(async/put! in "test")
(async/take! ch (fn [x] (println x))) ;; should print "test"
(async/put! ch) ;; put into ch is equivalent to putting into `out`
I tried something like the following (not working) :
(defn io-chan [in-ch out-ch]
(let [io (chan)]
(go-loop []
(>! out-ch (<! io ))
(>! io (<! in-ch))
(recur))
io))
A schema might help :
out in-out
---------------> (unused)
<--------------- <---------------
in
----------------> ---------------->
<---------------- (unused)
Also, closing the bidirectional channel should close both underlying channels.
Is is possible ?
If I understand your use case right, I believe what you're trying to do is just a one-channel job.
On the other hand, if what you're trying to do is to present a channel-like interface for a composite of several channels (e.g some process takes data from in, processes it, and outputs the result to out), then you could always implement the right protocols (in the case of ClojureScript, cljs.core.async.impl.protocols/ReadPort and cljs.core.async.impl.protocols/WritePort).
I would personnaly not recommend it. Leaving aside the fact that you'd be relying on implementation details, I don't believe core.async channels are intended as encapsulation for processes, only as communication points between them. So in this use case, just pass the input channel to producers and the output channel to consumers.
Your example shows a flow basicly like this:
io ---> out-ch ---> worker ---> in-ch ---> io
^-------------------------------------------*
If we assume that worker reads from in-ch and writes to out-ch then perhaps these two channels are reversed in the example. if worker does the opposite then it's correct. in order to prevent loops it's important that you use non-buffered queues so you don't hear your own messages echoed back to yourself.
as a side note, there is no such thing as unidirectional and bi-directional channels. instead there are buffered and unbufferd channels. If we are talking over a buffered channel then when I have something to say to you, I park until you happen to be listening to the channel, then once you are ready to hear it I put my message into the channel and you receive it. Then to get a response I park until you are ready to send it, and once you are, you put it on the channel and I get it from the channel (all at once). This feels like a bi-directional channel though it's really just that unbuffered channels happen to coordinate this way.
If the channel if buffered then I might get my own message back from the channel, because I would finish putting it on the channel and then be ready to receive the response before you where even ready to receive the original message. If you need to use buffered channels like this then use two of them, one for each direction and they will "feel" like uni-directional channels.

How do clojure core.async channels get cleaned up?

I'm looking at Clojure core.async for the first time, and was going through this excellent presentation by Rich Hickey: http://www.infoq.com/presentations/clojure-core-async
I had a question about the example he shows at the end of his presentation:
According to Rich, this example basically tries to get a web, video, and image result for a specific query. It tries two different sources in parallel for each of those results, and just pulls out the fastest result for each. And the entire operation can take no more than 80ms, so if we can't get e.g. an image result in 80ms, we'll just give up. The 'fastest' function creates and returns a new channel, and starts two go processes racing to retrieve a result and put it on the channel. Then we just take the first result off of the 'fastest' channel and slap it onto the c channel.
My question: what happens to these three temporary, unnamed 'fastest' channels after we take their first result? Presumably there is still a go process which is parked trying to put the second result onto the channel, but no one is listening so it never actually completes. And since the channel is never bound to anything, it doesn't seem like we have any way of doing anything with it ever again. Will the go process & channel "realize" that no one cares about their results any more and clean themselves up? Or did we essentially just "leak" three channels / go processes in this code?
There is no leak.
Parked gos are attached to channels on which they attempted to perform an operation and have no independent existence beyond that. If other code loses interest in the channels a certain go is parked on (NB. a go can simultaneously become a putter/taker on many channels if it parks on alt! / alts!), then eventually it'll be GC'd along with those channels.
The only caveat is that in order to be GC'd, gos actually have to park first. So any go that keeps doing stuff in a loop without ever parking (<! / >! / alt! / alts!) will in fact live forever. It's hard to write this sort of code by accident, though.
Caveats and exceptions aside, you can test garbage collection on the JVM at the REPL.
eg:
(require '[clojure.core.async :as async])
=> nil
(def c (async/chan))
=> #'user/c
(def d (async/go-loop []
(when-let [v (async/<! c)]
(println v)
(recur))))
=> #'user/d
(async/>!! c :hi)
=> true
:hi ; core.async go block is working
(import java.lang.ref.WeakReference)
=> java.lang.ref.WeakReference ; hold a reference without preventing garbage collection
(def e (WeakReference. c))
=> #'user/e
(def f (WeakReference. d))
=> #'user/f
(.get e)
=> #object[...]
(.get f)
=> #object[...]
(def c nil)
=> #'user/c
(def d nil)
=> #'user/d
(println "We need to clear *1, *2 and *3 in the REPL.")
We need to clear *1, *2 and *3 in the REPL.
=> nil
(println *1 *2 *3)
nil #'user/d #'user/c
=> nil
(System/gc)
=> nil
(.get e)
=> nil
(.get f)
=> nil
What just happened? I setup a go block and checked it was working. Then used a WeakReference to observe the communication channel (c) and the go block return channel (d). Then I removed all references to c and d (including *1, *2 and *3 created by my REPL), requested garbage collection, (and got lucky, the System.gc Javadoc does not make strong guarantees) and then observed that my weak references had been cleared.
In this case at least, once references to the channels involved had been removed, the channels were garbage collected (regardless of my failure to close them!)
Assumedly a channel produced by fastest only returns the result of the fastest query method and then closes.
If a second result was produced, your assumption could hold that the fastest processeses are leaked. Their results are never consumed. If they relied on all their results to be consumed to terminate, they wouldn't terminate.
Notice that this could also happen if the channel t is selected in the alt! clause.
The usualy way to fix this would be to close the channel c in the last go block with close!. Puts made to a closed channel will then be dropped then and the producers can terminate.
The problem could also be solved in the implementation of fastest. The process created in fastest could itself make the put via alts! and timeout and terminate if the produced values are not consumed within a certain amount of time.
I guess Rich did not address the problem in the slide in favor of a less lengthy example.

Jetty threads getting blocked and dead locked

I am using jetty "7.6.8.v20121106" as a part of https://github.com/ring-clojure/ring/tree/master/ring-jetty-adapter with my server.
I am making calls using http://http-kit.org/ with following code. Essentially I am making server calls but ignoring the response. What I am finding is that all the server threads become blocked/deadlocked after that. This seems like a really easy way to bring to server down and wanted to understand what is going on here.
Code from client is:
(require '[org.httpkit.client :as hk-client])
(defn hget [id]
(hk-client/get (str "http://localhost:5000/v1/pubapis/auth/ping?ping=" id)))
(doall (map hget (take 100 (range))))) ; Gives problem
(doall (map deref (map hget (take 100 (range)))))) ; Doesn't give problem
Threads blocked at
sun.nio.cs.StreamEncoder.write(StreamEncoder.java:118)
and deadlocked at
java.io.PrintStream.write(PrintStream.java:479)
Would really appreciate if someone can help with what is going on over here.
Finally found what the problem was. Took lot of digging through and starting with a sample project to find this.
When I started learning clojure and copied the following from somewhere for logging:
(defn log [msg & vals]
(let [line (apply format msg vals)]
(locking System/out (println line))))
The line locking over there was causing dead lock in some situation. Do not know enough about concurrency to solve this. Will create a separate question for that.
Removing this line fixes the problem.

Intermittent error serving a binary file with Clojure/Ring

I am building an event collector in Clojure for Snowplow (using Ring/Compojure) and am having some trouble serving a transparent pixel with Ring. This is my code for sending the pixel:
(ns snowplow.clojure-collector.responses
(:import (org.apache.commons.codec.binary Base64)
(java.io ByteArrayInputStream)))
(def pixel-bytes (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAPAAAAAAAAAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==")))
(def pixel (ByteArrayInputStream. pixel-bytes))
(defn send-pixel
[]
{:status 200
:headers {"Content-Type" "image/gif"}
:body pixel})
When I start up my server, the first time I hit the path for send-pixel, the pixel is successfully delivered to my browser. But the second time - and every time afterwards - Ring sends no body (and content-length 0). Restart the server and it's the same pattern.
A few things it's not:
I have replicated this using wget, to confirm the intermittent-ness isn't a browser caching issue
I generated the "R01GOD..." base64 string at the command-line (cat original.gif | base64) so know there is no issue there
When the pixel is successfully sent, I have verified its contents are correct (diff original.gif received-pixel.gif)
I'm new to Clojure - my guess is there's some embarrassing dynamic gremlin in my code, but I need help spotting it!
I figured out the problem in the REPL shortly after posting:
user=> (import (org.apache.commons.codec.binary Base64) (java.io ByteArrayInputStream))
java.io.ByteArrayInputStream
user=> (def pixel-bytes (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==")))
#'user/pixel-bytes
user=> (def pixel (ByteArrayInputStream. pixel-bytes))
#'user/pixel
user=> (slurp pixel-bytes)
"GIF89a!�\n,L;"
user=> (slurp pixel-bytes)
"GIF89a!�\n,L;"
user=> (slurp pixel)
"GIF89a!�\n,L;"
user=> (slurp pixel)
""
So basically the problem was that the ByteArrayInputStream was getting emptied after the first call. Mutable data structures!
I fixed the bug by generating a new ByteArrayInputStream for each response, with:
:body (ByteArrayInputStream. pixel-bytes)}))
The problem is your pixel variable holds a stream. Once it has been read, there is no possibility to re-read it again.
Moreover, you do not need to deal with encoding issues. Ring serves static files as well. Just return:
(file-response "/path/to/pixel.gif")
It handles non-existing files as well. See the docs also.