line-seq freezes on java.io.BufferedReader in clojure - clojure

I'm trying to process an HTTP stream using clojure.
I am able to write the stream to a file, but I'm trying to process the messages using core.async.
I followed this answer here:
Processing a stream of messages from a http server in clojure
However when I call (line-seq ) on the java.io.BufferedReader, it freezes for me.
(defn trades-stream
[]
(let [session (new-session)
{:keys [url sessionid]} (:stream session)
dump-url (str url "?sessionid=" sessionid "&symbols=mu" )
lines (-> dump-url
(client/get {:as :stream})
:body
io/reader)]
(line-seq lines )))
Any idea how I would remidy this ? Thanks!

Note that line-seq is lazy and won't do anything until forced into a string or something. Perhaps try
(println (first (line-seq lines)))
or
(reduce conj [] (line-seq lines)) ; then print something
You can also use (slurp <input-stream>) to get the contents as a string.

Related

Generate and stream a zip-file in a Ring web app in Clojure

I have a Ring handler that needs to:
Zip a few files
Stream the Zip to the client.
Now I have it sort of working, but only the first zipped entry gets streamed, and after that it stalls/stops. I feel it has something to do with flushing/streaming that is wrong.
Here is my (compojure) handler:
(GET "/zip" {:as request}
:query-params [order-id :- s/Any]
(stream-lessons-zip (read-string order-id) (:db request) (:auth-user request)))
Here is the stream-lessons-zip function:
(defn stream-lessons-zip
[]
(let [lessons ...];... not shown
{:status 200
:headers {"Content-Type" "application/zip, application/octet-stream"
"Content-Disposition" (str "attachment; filename=\"files.zip\"")
:body (futil/zip-lessons lessons)}))
And i use a piped-input-stream to do the streaming like so:
(defn zip-lessons
"Returns an inputstream (piped-input-stream) to be used directly in Ring HTTP responses"
[lessons]
(let [paths (map #(select-keys % [:file_path :file_name]) lessons)]
(ring-io/piped-input-stream
(fn [output-stream]
; build a zip-output-stream from a normal output-stream
(with-open [zip-output-stream (ZipOutputStream. output-stream)]
(doseq [{:keys [file_path file_name] :as p} paths]
(let [f (cio/file file_path)]
(.putNextEntry zip-output-stream (ZipEntry. file_name))
(cio/copy f zip-output-stream)
(.closeEntry zip-output-stream))))))))
So I have confirmed that the 'lessons' vector contains like 4 entries, but the zip file only contains 1 entry. Furthermore, Chrome doesn't seem to 'finalize' the download, ie. it thinks it is still downloading.
How can I fix this?
It sounds like producing a stateful stream using blocking IO is not supported by http-kit. Non-stateful streams can be done this way:
http://www.http-kit.org/server.html#async
A PR to introduce stateful streams using blocking IO was not accepted:
https://github.com/http-kit/http-kit/pull/181
It sounds like the option to explore is to use a ByteArrayOutputStream to fully render the zip file to memory, and then return the buffer that produces. If this endpoint isn't highly trafficked and the zip file it produces is not large (< 1 gb) then this might work.
So, it's been a few years, but that code still runs in production (ie. it works). So I made it work back then, but forgot to mention it here (and forgot WHY it works, to be honest,.. it was very much trial/error).
This is the code now:
(defn zip-lessons
"Returns an inputstream (piped-input-stream) to be used directly in Ring HTTP responses"
[lessons {:keys [firstname surname order_favorite_name company_name] :as annotation
:or {order_favorite_name ""
company_name ""
firstname ""
surname ""}}]
(debug "zipping lessons" (count lessons))
(let [paths (map #(select-keys % [:file_path :file_name :folder_number]) lessons)]
(ring-io/piped-input-stream
(fn [output-stream]
; build a zip-output-stream from a normal output-stream
(with-open [zip-output-stream (ZipOutputStream. output-stream)]
(doseq [{:keys [file_path file_name folder_number] :as p} paths]
(let [f (cio/as-file file_path)
baos (ByteArrayOutputStream.)]
(if (.exists f)
(do
(debug "Adding entry to zip:" file_name "at" file_path)
(let [zip-entry (ZipEntry. (str (if folder_number (str folder_number "/") "") file_name))]
(.putNextEntry zip-output-stream zip-entry)
(.close baos)
(.writeTo baos zip-output-stream)
(.closeEntry zip-output-stream)
(.flush zip-output-stream)
(debug "flushed")))
(warn "File '" file_name "' at '" file_path "' does not exist, not adding to zip file!"))))
(.flush zip-output-stream)
(.flush output-stream)
(.finish zip-output-stream)
(.close zip-output-stream))))))

Consuming file contents with Clojure's core.async

I'm trying to use Clojure's core.async library to consume/process lines from a file. When my code executes an IOException: Stream closed is throw. Below is a REPL session that reproduces the same problem as in my code:
(require '[clojure.core.async :as async])
(require '[clojure.java.io :as io])
; my real code is a bit more involved with calls to drop, map, filter
; following line-seq
(def lines
(with-open [reader (io/reader "my-file.txt")]
(line-seq reader)))
(def ch
(let [c (async/chan)]
(async/go
(doseq [ln lines]
(async/>! c ln))
(async/close! c))
c))
; line that causes the error
; java.io.IOException: Stream closed
(async/<!! ch)
Since its is my first time doing something like this (async + file), maybe I have some misconceptions about how it should work. Can someone clarify what is the correct approach to send file lines into a channels pipeline?
Thanks!
As #Alan pointed out, your definition of lines closes the file without reading all of its lines, because line-seq returns a lazy sequence. If you expand your use of the with-open macro...
(macroexpand-1
'(with-open [reader (io/reader "my-file.txt")]
(line-seq reader)))
... you get this:
(clojure.core/let [reader (io/reader "my-file.txt")]
(try
(clojure.core/with-open []
(line-seq reader))
(finally
(. reader clojure.core/close))))
You can fix this problem by closing the file after you finish reading from it, rather than immediately:
(def ch
(let [c (async/chan)]
(async/go
(with-open [reader (io/reader "my-file.txt")]
(doseq [ln (line-seq reader)]
(async/>! c ln)))
(async/close! c))
c))
Your problem is the with-open statement. The file is closed as soon as this scope is exited. So, you open a line-seq and then close the file before reading any lines.
You will be better off for most files using the slurp function:
(require '[clojure.string :as str])
(def file-as-str (slurp "my-file.txt"))
(def lines (str/split-lines file-as-str))
See:
http://clojuredocs.org/clojure.core/slurp
http://clojuredocs.org/clojure.string/split-lines

get agent content and convert it to JSON

I'm still newbie in clojure and I'm trying to build application which read two files and write the diffrence on JSON file
(defn read-csv
"reads data."
[]
(with-open [rdr (
io/reader "resources/staples_data.csv")]
(doseq [line (rest(line-seq rdr))]
(println(vec(re-seq #"[^,]+" line))))))
(defn read-psv
"reads data."
[]
(with-open [rdr (
io/reader "resources/external_data.psv")]
(doseq [line (rest(line-seq rdr))]
; (print(vec(re-seq #"[^|]+" line))))))
(doall(vec(re-seq #"[^|]+" line))))))
(defn process-content []
(let [csv-records (agent read-csv)
psv-records (agent read-psv)]
(json/write-str {"my-data" #csv-records "other-data" #psv-records}))
)
Im getting an exception: Exception Don't know how to write JSON of class $read_csv clojure.data.json/write-generic (json.clj:385)
Please some help with some explanation, thanks in advance!
You are giving the agent a function as its initial value. Perhaps you meant to do an asynchronous call to that function instead? In that case, a future is a better match for your scenario as shown. agent is synchronous, it's send and send-off that are async, and they assume you are propagating some state across calls which doesn't match your usage here.
(defn process-content []
(let [csv-records (future-call read-csv)
psv-records (future-call read-psv)]
(json/write-str {"my-data" #csv-records "other-data" #psv-records})))
The problem after that is that doseq is only for side effects, and always returns nil. If you want the results read from the csv files (evaluating eagerly so you are still in the scope of the with-open call), use (doall (for ...)) as a replacement for (doseq ...). Also, the println in read-csv will need to be removed, or replaced with (doto (vec (re-seq #"[^,]+" line)) println) because println always returns nil, and I assume you want the actual data from the file, not a list of nils.

Errors when trying to pass args in clojure using clj-http

hopefully this is something simple for the more experienced out there. I am using clj-http and trying to pass the command line arg int it (to take a URL). I am an absolute Clojure beginer but I have managed to pass the args through to a ptintln which works.
(ns foo.core
(:require [clj-http.client :as client]))
(defn -main
[& args]
(def url (str args))
(println url)
(def resp (client/get url))
(def headers (:headers resp))
(def server (headers "server"))
(println server))
Error message
Ants-MacBook-Pro:target ant$ lein run "http://www.bbc.com"
("http://www.bbc.com")
Exception in thread "main" java.net.MalformedURLException: no protocol: ("http://www.bbc.com")
This works
(def resp (client/get "http://www.bbc.com"))
thanks in advance.
args is a list, which means that calling str on it returns the representation of the list, complete with parentheses and inner quotes, as you can see in your error trace:
(println (str '("http://www.bbc.com")))
;; prints ("http://www.bbc.com")
Of course, URLs don't start with parentheses and quotes, which is why the JVM tells you your URL is malformed.
What you really want to pass to get is not the string representation of your argument list, but your first argument:
(let [url (first args)]
(client/get url)) ;; Should work!
In addition, you should never use def calls within functions -- they create or rebind vars at the toplevel of your namespace, which don't want.
What you should be using instead is let forms, which create local variables (like url in my example). For more information on let, look at http://clojure.org/special_forms.
I'd probably structure your code like so:
(defn -main
[& args]
(let [url (first args)
resp (client/get url)
server (get-in resp [:headers "server"])]
(println url)
(println server)))

URL Checker in Clojure?

I have a URL checker that I use in Perl. I was wondering how something like this would be done in Clojure. I have a file with thousands of URLs and I'd like the output file to contain the URL (minus http://, https://) and a simple :1 for valid and :0 for false. Ideally, I could check each site concurrently, considering that this is one of Clojure's strengths.
Input
http://www.google.com
http://www.cnn.com
http://www.msnbc.com
http://www.abadurlisnotgood.com
Output
www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0
I assume by "valid URL" you mean HTTP response 200. This might work. It requires clojure-contrib. Change map to pmap to attempt to make it parallel, like Arthur Ulfeldt mentioned.
(use '(clojure.contrib duck-streams
java-utils
str-utils))
(import '(java.net URL
URLConnection
HttpURLConnection
UnknownHostException))
(defn check-url [url]
(str (re-sub #"^(?i)http:/+" "" url)
":"
(try
(let [c (cast HttpURLConnection
(.openConnection (URL. url)))]
(if (= 200 (.getResponseCode c))
1
0))
(catch UnknownHostException _
0))))
(defn check-urls-from-file [filename]
(doseq [line (map check-url
(read-lines (as-file filename)))]
(println line)))
Given your example as input:
user> (check-urls-from-file "urls.txt")
www.google.com:1
www.cnn.com:1
www.msnbc.com:1
www.abadurlisnotgood.com:0
Write a small function that appends a ":1" or ":0" to a url and then use pmap to apply it in parallel to all the urls.
(defn check-a-url [url] .... )
(pmap #(if (check-a-url %) (str url ":1") (str url ":0")))
Clojure now has a as-url function in clojure.java.io:
(as-url "http://google.com") ;;=> #object[java.net.URL 0x5dedf9bd "http://google.com"]
(str (as-url "http://google.com")) ;;=> "http://google.com"
(as-url "notanurl") ;; java.net.MalformedURLException
Based on that we could write a function like so:
(defn check-url
"checks if the url is well formed"
[url]
(str (clojure.string/replace-first url #"(http://|https://)" "")
":"
(try (as-url url) ;; built-in, does not perform an actual request, and does very little validation
1
(catch Exception e 0))))
(defn check-urls-from-file
"from Brian Carper answer"
[filename]
(doseq [line (map check-url (read-lines (as-file filename)))]
(println line)))
Instead of pmap, I used agents with send-off in conjunction with the above solution. I think this is better when there is blocking I/O. I believe pmap has limited concurrency too. Here's what I have so far. I wonder how this will scale with thousands of URLs.
(use '(clojure.contrib duck-streams
java-utils
str-utils))
(import '(java.net URL
URLConnection
HttpURLConnection
UnknownHostException))
(defn check-url [url]
(str (re-sub #"^(?i)http:/+" "" url)
":"
(try
(let [c (cast HttpURLConnection
(.openConnection (URL. url)))]
(if (= 200 (.getResponseCode c))
1
0))
(catch UnknownHostException _
0))))
(def urls (read-lines "urls.txt"))
(def agents (for [url urls] (agent url)))
(doseq [agent agents]
(send-off agent check-url))
(apply await agents)
(def x '())
(doseq [url (filter deref agents)]
(def x (cons #url x)))
(prn x)
(shutdown-agents)