Intermittent error serving a binary file with Clojure/Ring - clojure

I am building an event collector in Clojure for Snowplow (using Ring/Compojure) and am having some trouble serving a transparent pixel with Ring. This is my code for sending the pixel:
(ns snowplow.clojure-collector.responses
(:import (org.apache.commons.codec.binary Base64)
(java.io ByteArrayInputStream)))
(def pixel-bytes (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAPAAAAAAAAAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==")))
(def pixel (ByteArrayInputStream. pixel-bytes))
(defn send-pixel
[]
{:status 200
:headers {"Content-Type" "image/gif"}
:body pixel})
When I start up my server, the first time I hit the path for send-pixel, the pixel is successfully delivered to my browser. But the second time - and every time afterwards - Ring sends no body (and content-length 0). Restart the server and it's the same pattern.
A few things it's not:
I have replicated this using wget, to confirm the intermittent-ness isn't a browser caching issue
I generated the "R01GOD..." base64 string at the command-line (cat original.gif | base64) so know there is no issue there
When the pixel is successfully sent, I have verified its contents are correct (diff original.gif received-pixel.gif)
I'm new to Clojure - my guess is there's some embarrassing dynamic gremlin in my code, but I need help spotting it!

I figured out the problem in the REPL shortly after posting:
user=> (import (org.apache.commons.codec.binary Base64) (java.io ByteArrayInputStream))
java.io.ByteArrayInputStream
user=> (def pixel-bytes (Base64/decodeBase64 (.getBytes "R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==")))
#'user/pixel-bytes
user=> (def pixel (ByteArrayInputStream. pixel-bytes))
#'user/pixel
user=> (slurp pixel-bytes)
"GIF89a!�\n,L;"
user=> (slurp pixel-bytes)
"GIF89a!�\n,L;"
user=> (slurp pixel)
"GIF89a!�\n,L;"
user=> (slurp pixel)
""
So basically the problem was that the ByteArrayInputStream was getting emptied after the first call. Mutable data structures!
I fixed the bug by generating a new ByteArrayInputStream for each response, with:
:body (ByteArrayInputStream. pixel-bytes)}))

The problem is your pixel variable holds a stream. Once it has been read, there is no possibility to re-read it again.
Moreover, you do not need to deal with encoding issues. Ring serves static files as well. Just return:
(file-response "/path/to/pixel.gif")
It handles non-existing files as well. See the docs also.

Related

easiest way to use a i/o callback within concurrent http-kit/get instances

I am launching a few hundreds concurrent http-kit.client/get requests provided with a callback to write results to a single file.
What would be a good way to deal with thread-safety? Using chanand <!! from core.asyc?
Here's the code I would consider :
(defn launch-async [channel url]
(http/get url {:timeout 5000
:user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:10.0) Gecko/20100101 Firefox/10.0"}
(fn [{:keys [status headers body error]}]
(if error
(put! channel (json/generate-string {:url url :headers headers :status status}))
(put! channel (json/generate-string body))))))
(defn process-async [channel func]
(when-let [response (<!! channel)]
(func response)))
(defn http-gets-async [func urls]
(let [channel (chan)]
(doall (map #(launch-async channel %) urls))
(process-async channel func)))
Thanks for your insights.
Since you are already using core.async in your example, I thought I'd point out a few issues and how you can address them. The other answer mentions using a more basic approach, and I agree wholeheartedly that a simpler approach is just fine. However, with channels, you have a simple way of consuming the data which does not involve mapping over a vector, which will also grow large over time if you have many responses. Consider the following issues and how we can fix them:
(1) Your current version will crash if your url list has more than 1024 elements. There's an internal buffer for puts and takes that are asynchronous (i.e., put! and take! don't block but always return immediately), and the limit is 1024. This is in place to prevent unbounded asynchronous usage of the channel. To see for yourself, call (http-gets-async println (repeat 1025 "http://blah-blah-asdf-fakedomain.com")).
What you want to do is to only put something on the channel when there's room to do so. This is called back-pressure. Taking a page from the excellent wiki on go block best practices, one clever way to do this from your http-kit callback is to use the put! callback option to launch your next http get; this will only happen when the put! immediately succeeds, so you will never have a situation where you can go beyond the channel's buffer:
(defn launch-async
[channel [url & urls]]
(when url
(http/get url {:timeout 5000
:user-agent "Mozilla"}
(fn [{:keys [status headers body error]}]
(let [put-on-chan (if error
(json/generate-string {:url url :headers headers :status status})
(json/generate-string body))]
(put! channel put-on-chan (fn [_] (launch-async channel urls))))))))
(2) Next, you seem to be only processing one response. Instead, use a go-loop:
(defn process-async
[channel func]
(go-loop []
(when-let [response (<! channel)]
(func response)
(recur))))
(3) Here's your http-gets-async function. I see no harm in adding a buffer here, as it should help you fire off a nice burst of requests at the beginning:
(defn http-gets-async
[func urls]
(let [channel (chan 1000)]
(launch-async channel urls)
(process-async channel func)))
Now, you have the ability to process an infinite number of urls, with back-pressure. To test this, define a counter, and then make your processing function increment this counter to see your progress. Using a localhost URL that is easy to bang on (wouldn't recommend firing off hundreds of thousands of requests to, say, google, etc.):
(def responses (atom 0))
(http-gets-async (fn [_] (swap! responses inc))
(repeat 1000000 "http://localhost:8000"))
As this is all asynchronous, your function will return immediately and you can look at #responses grow.
One other interesting thing you can do is instead of running your processing function in process-async, you could optionally apply it as a transducer on the channel itself.
(defn process-async
[channel]
(go-loop []
(when-let [_ (<! channel)]
(recur))))
(defn http-gets-async
[func urls]
(let [channel (chan 10000 (map func))] ;; <-- transducer on channel
(launch-async channel urls)
(process-async channel)))
There are many ways to do this, including constructing it so that the channel closes (note that above, it stays open). You have java.util.concurrent primitives to help in this regard if you like, and they are quite easy to use. The possibilities are very numerous.
This is simple enough that I wouldn't use core.async for it. You can do this with an atom storing use a vector of the responses, then have a separate thread reading the contents of atom until it's seen all of the responses. Then, in your http-kit callback, you could just swap! the response into the atom directly.
If you do want to use core.async, I'd recommend a buffered channel to keep from blocking your http-kit thread pool.

write large data structures as EDN to disk in clojure

What is the most idiomatic way to write a data structure to disk in Clojure, so I can read it back with edn/read? I tried the following, as recommended in the Clojure cookbook:
(with-open [w (clojure.java.io/writer "data.clj")]
(binding [*out* w]
(pr large-data-structure)))
However, this will only write the first 100 items, followed by "...". I tried (prn (doall large-data-structure)) as well, yielding the same result.
I've managed to do it by writing line by line with (doseq [i large-data-structure] (pr i)), but then I have to manually add the parens at the beginning and end of the sequence to get the desired result.
You can control the number of items in a collection that are printed via *print-length*
Consider using spit instead of manually opening the writer and pr-str instead of manually binding to *out*.
(binding [*print-length* false]
(spit "data.clj" (pr-str large-data-structure))
Edit from comment:
(with-open [w (clojure.java.io/writer "data.clj")]
(binding [*print-length* false
*out* w]
(pr large-data-structure)))
Note: *print-length* has a root binding of nil so you should not need to bind it in the example above. I would check the current binding at the time of your original pr call.

Jetty threads getting blocked and dead locked

I am using jetty "7.6.8.v20121106" as a part of https://github.com/ring-clojure/ring/tree/master/ring-jetty-adapter with my server.
I am making calls using http://http-kit.org/ with following code. Essentially I am making server calls but ignoring the response. What I am finding is that all the server threads become blocked/deadlocked after that. This seems like a really easy way to bring to server down and wanted to understand what is going on here.
Code from client is:
(require '[org.httpkit.client :as hk-client])
(defn hget [id]
(hk-client/get (str "http://localhost:5000/v1/pubapis/auth/ping?ping=" id)))
(doall (map hget (take 100 (range))))) ; Gives problem
(doall (map deref (map hget (take 100 (range)))))) ; Doesn't give problem
Threads blocked at
sun.nio.cs.StreamEncoder.write(StreamEncoder.java:118)
and deadlocked at
java.io.PrintStream.write(PrintStream.java:479)
Would really appreciate if someone can help with what is going on over here.
Finally found what the problem was. Took lot of digging through and starting with a sample project to find this.
When I started learning clojure and copied the following from somewhere for logging:
(defn log [msg & vals]
(let [line (apply format msg vals)]
(locking System/out (println line))))
The line locking over there was causing dead lock in some situation. Do not know enough about concurrency to solve this. Will create a separate question for that.
Removing this line fixes the problem.

Parse a little-endian binary file, stuffing into a matrix

I have a binary file that contains an X by X matrix. The file itself is a sequence of single-precision floats (little-endian). What I would like to do is parse it, and stuff it into some reasonable clojure matrix data type.
Thanks to this question, I see I can parse a binary file with gloss. I now have code that looks like this:
(ns foo.core
(:require gloss.core)
(:require gloss.io)
(:use [clojure.java.io])
(:use [clojure.math.numeric-tower]))
(gloss.core/defcodec mycodec
(gloss.core/repeated :float32 :prefix :none))
(def buffer (byte-array (* 1200 1200)))
(.read (input-stream "/path/to/binaryfile") buffer)
(gloss.io/decode mycodec buffer)
This takes a while to run, but eventually dumps out a big list of numbers. Unfortunately, the numbers are all wrong. Upon further investigation, the numbers were read as big-endian.
Assuming there is some way to read these binary files as little-endian, I'd like to stuff the results into a matrix. This question seems to have settled on using Incanter with its Parallel Colt representation, however, that question was from '09, and I'm hoping to stick to clojure 1.4 and lein 2. Somewhere in my frenzy of googling, I saw other recommendations to use jblas or mahout. Is there a "best" matrix library for clojure these days?
EDIT: Reading a binary file is tantalizingly close. Thanks to this handy nio wrapper, I am able to get a memory mapped byte buffer as a short one-liner, and even reorder it:
(ns foo.core
(:require [clojure.java.io :as io])
(:require [nio.core :as nio])
(:import [java.nio ByteOrder]))
(def buffer (nio/mmap "/path/to/binaryfile"))
(class buffer) ;; java.nio.DirectByteBuffer
(.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
;; #<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=5760000 cap=5760000]>
However, reordering without doing the intermediate (def) step, fails:
(.order (nio/mmap f) java.nio.ByteOrder/LITTLE_ENDIAN)
;; clojure.lang.Compiler$CompilerException: java.lang.IllegalArgumentException: Unable to resolve classname: MappedByteBuffer, compiling:(/Users/peter/Developer/foo/src/foo/core.clj:12)
;; at clojure.lang.Compiler.analyzeSeq (Compiler.java:6462)
;; clojure.lang.Compiler.analyze (Compiler.java:6262)
;; etc...
I'd like to be able to create the reordered byte buffer this inside a function without defining a global variable, but right now it seems to not like that.
Also, once I've got it reordered, I'm not entirely sure what to do with my DirectByteBuffer, as it doesn't seem to be iterable. Perhaps for the remaining step of reading this buffer object (into a JBLAS matrix), I will create a second question.
EDIT 2: I am marking the answer below as accepted, because I think my original question combined too many things. Once I figure out the remainder of this I will try to update this question with complete code that starts with this ByteBuffer and that reads into a JBLAS matrix (which appears to be the right data structure).
In case anyone was interested, I was able to create a function that returns a properly ordered bytebuffer as follows:
;; This works!
(defn readf [^String file]
(.order
(.map
(.getChannel
(java.io.RandomAccessFile. file "r"))
java.nio.channels.FileChannel$MapMode/READ_ONLY 0 (* 1200 1200))
java.nio.ByteOrder/LITTLE_ENDIAN))
The nio wrapper I found looks to simplify / prettify this quite a lot, but it would appear I'm either not using it correctly, or there is something wrong. To recap my findings with the nio wrapper:
;; this works
(def buffer (nio/mmap "/bin/file"))
(def buffer (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN))
(def buffer (.asFloatBuffer buffer))
;; this fails
(def buffer
(.asFloatBuffer
(.order
(nio/mmap "/bin/file")
java.nio.ByteOrder/LITTLE_ENDIAN)))
Sadly, this is a clojure mystery for another day, or perhaps another StackOverflow question.
Open a FileChannel(), then get a memory mapped buffer. There are lots of tutorials on the web for this step.
Switch the order of the buffer to little endian by calling order(endian-ness) (not the no-arg version of order). Finally, the easiest way to extract floats would be to call asFloatBuffer() on it and use the resulting buffer to read the floats.
After that you can put the data into whatever structure you need.
edit Here's an example of how to use the API.
;; first, I created a 96 byte file, then I started the repl
;; put some little endian floats in the file and close it
user=> (def file (java.io.RandomAccessFile. "foo.floats", "rw"))
#'user/file
user=> (def channel (.getChannel file))
#'user/channel
user=> (def buffer (.map channel java.nio.channels.FileChannel$MapMode/READ_WRITE 0 96))
#'user/buffer
user=> (.order buffer java.nio.ByteOrder/LITTLE_ENDIAN)
#<DirectByteBuffer java.nio.DirectByteBuffer[pos=0 lim=96 cap=96]>
user=> (def fbuffer (.asFloatBuffer buffer))
#'user/fbuffer
user=> (.put fbuffer 0 0.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 1 1.0)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.put fbuffer 2 2.3)
#<DirectFloatBufferU java.nio.DirectFloatBufferU[pos=0 lim=24 cap=24]>
user=> (.close channel)
nil
;; memory map the file, try reading the floats w/o changing the endianness of the buffer
user=> (def file2 (java.io.RandomAccessFile. "foo.floats" "r"))
#'user/file2
user=> (def channel2 (.getChannel file2))
#'user/channel2
user=> (def buffer2 (.map channel2 java.nio.channels.FileChannel$MapMode/READ_ONLY 0 96))
#'user/buffer2
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
4.6006E-41
user=> (.get fbuffer2 2)
4.1694193E-8
;; change the order of the buffer and read the floats
user=> (.order buffer2 java.nio.ByteOrder/LITTLE_ENDIAN)
#<DirectByteBufferR java.nio.DirectByteBufferR[pos=0 lim=96 cap=96]>
user=> (def fbuffer2 (.asFloatBuffer buffer2))
#'user/fbuffer2
user=> (.get fbuffer2 0)
0.0
user=> (.get fbuffer2 1)
1.0
user=> (.get fbuffer2 2)
2.3
user=> (.close channel2)
nil
user=>

Getting "java.io.EOFException: JSON error" using the clojure twitter-api

I've written some simple clojure code that accesses the twitter streaming api. My code is essentially the same as the example code described in the twitter-api docs:
(def ^:dynamic *custom-streaming-callback*
(AsyncStreamingCallback. (comp println #(:text %) json/read-json #(str %2))
(comp println response-return-everything)
exception-print))
(defn start-filtering []
(statuses-filter :params {:follow 12345}
:oauth-creds *creds*
:callbacks *custom-streaming-callback*))
I'm following tweets about a specific user and using oauth for authentication (not shown). When I run the start-filtering method and a connection is opened with twitter everything works well for a spell, but if the stream is inactive for a bit (around 30 seconds), i.e. no tweets about this particular user are coming down the pike, the following error occurs:
#<EOFException java.io.EOFException: JSON error (end-of-file)>
I assumed from the twitter docs that when using a streaming connection, twitter keeps the stream open indefinitely. I must be making some incorrect assumptions. I'm currently diving into the clojure twitter-api code to see what's going on, but I thought more eyes would help me figure this out more quickly.
I had the same issue that you have. As you found, the streaming function emits an empty message if no data has been received in the last thirty seconds or so.
Trying to read this as json then causes the EOF exception that you see.
I don't know of any way to prevent these calls. In my case I worked around the issue with a simple conditional that falls back to an empty map when there is no JSON to read.
(if-not (clojure.string/blank? %)
(json/read-str % :key-fn keyword)
{})