Clojure: slurping structs from file fails with string attributes containing whitespaces - clojure

I have just started playing with Clojure and the first thing I thought I'd try is storing and retrieving a list of structs, like in Suart Halloway's example here.
My spit/slurp of a hash of structs works fine with, if I use struct instances without spaces in the attribute strings like the following:
(struct customer "Apple" "InfiniteLoop")
But if I use this:
(struct customer "Apple" "Infinite Loop 1")
I get an error:
Exception in thread "main" clojure.lang.LispReader$ReaderException: java.lang.ArrayIndexOutOfBoundsException: 7 (test-storing.clj:19)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:2719)
at clojure.lang.Compiler$DefExpr.eval(Compiler.java:298)
at clojure.lang.Compiler.eval(Compiler.java:4537)
at clojure.lang.Compiler.load(Compiler.java:4857)
at clojure.lang.Compiler.loadFile(Compiler.java:4824)
at clojure.main$load_script__5833.invoke(main.clj:206)
at clojure.main$init_opt__5836.invoke(main.clj:211)
at clojure.main$initialize__5846.invoke(main.clj:239)
at clojure.main$null_opt__5868.invoke(main.clj:264)
at clojure.main$legacy_script__5883.invoke(main.clj:295)
at clojure.lang.Var.invoke(Var.java:346)
at clojure.main.legacy_script(main.java:34)
at clojure.lang.Script.main(Script.java:20)
Caused by: clojure.lang.LispReader$ReaderException: java.lang.ArrayIndexOutOfBoundsException: 7
at clojure.lang.LispReader.read(LispReader.java:180)
at clojure.core$read__4168.invoke(core.clj:2083)
at clojure.core$read__4168.invoke(core.clj:2081)
at clojure.core$read__4168.invoke(core.clj:2079)
at clojure.core$read__4168.invoke(core.clj:2077)
at chap_03$load_db__54.invoke(chap_03.clj:71)
at clojure.lang.AFn.applyToHelper(AFn.java:173)
at clojure.lang.AFn.applyTo(AFn.java:164)
at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:2714)
... 12 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
at clojure.lang.PersistentArrayMap$Seq.first(PersistentArrayMap.java:216)
at clojure.lang.APersistentMap.hashCode(APersistentMap.java:101)
at clojure.lang.Util.hash(Util.java:55)
at clojure.lang.PersistentHashMap.entryAt(PersistentHashMap.java:134)
at clojure.lang.PersistentHashMap.containsKey(PersistentHashMap.java:130)
at clojure.lang.APersistentSet.contains(APersistentSet.java:33)
at clojure.lang.PersistentHashSet.cons(PersistentHashSet.java:59)
at clojure.lang.PersistentHashSet.create(PersistentHashSet.java:34)
at clojure.lang.LispReader$SetReader.invoke(LispReader.java:974)
at clojure.lang.LispReader$DispatchReader.invoke(LispReader.java:540)
at clojure.lang.LispReader.read(LispReader.java:145)
... 20 more
Depending on the amount of the fields in the struct, I might also just get a part of the string as an attribute name instead of the error. For example :Loop 1
I use a store-function like this:
(defn store-customer-db [customer-db filename]
(spit filename (with-out-str (print customer-db))))
And a read-function like this:
(defn load-db [filename]
(with-in-str (slurp filename)(read)))
From the output file of spit I can see that the print doesn't give double quotes to the strings which seems to be a problem for slurp. What would be the correct solution for this?
My Clojure version is 1.0, and the contrib is a few weeks old snapshot.

print and println are meant for human-readable output. If you want to print something that's meant to be read in again later, use pr or prn.
user> (read-string (with-out-str (prn {"Apple" "Infinite Loop"})))
{"Apple" "Infinite Loop"}
Whereas:
user> (read-string (with-out-str (print {"Apple" "Infinite Loop"})))
java.lang.ArrayIndexOutOfBoundsException: 3 (NO_SOURCE_FILE:0)
It's trying to run this code:
(read-string "{Apple Infinite Loop}")
which has an odd number of keys/values. Note the lack of quotation marks around the individual hash keys/values. Even if this read works (i.e. if you coincidentally supply an even number of parameters), what it reads won't be a hash-map full of Strings, but rather Symbols. So you'll be getting back something other than what you output.
user> (map class (keys (read-string (with-out-str (print {"foo bar" "baz quux"})))))
(clojure.lang.Symbol clojure.lang.Symbol)

For, say:
(def hashed-hobbits {:bilbo "Takes after his Mother's family" :frodo "ring bearer"})
You only need:
(spit "hobbitses.txt" hashed-hobbits)
and to read it back:
(def there-and-back-again (read-string (slurp "hobbitses.txt")))
spit/slurp wraps it all in a string but using read-string on the slurp interprets the string back to clojure code/data. Works on trollish data structures too!

Related

Why doesn't this keyword function lookup work in a hashmap?

I guess I need some eyeballs on this to make some sense of this
(println record)
(println (keys record) " - " (class record) " : " (:TradeId record) (:Stock record))
(doall (map #(println "Key " % "Value " (% record)) (keys record)))
Output:
{:Stock ATT, :AccountId 1, :TradeId 37, :Qty 100, :Price 117, :Date 2011-02-24T18:30:00.000Z, :Notes SPLIT 1:10, :Type B, :Brokerage 81.12}
(:Stock :AccountId :TradeId :Qty :Price :Date :Notes :Type :Brokerage) - clojure.lang.PersistentHashMap : nil ATT
Key :Stock Value ATT
Key :AccountId Value 1
Key :TradeId Value 37
...
The issue is (:TradeId record) doesn't work even though it exists as a key. Iterating through all keys and values - Line 3 - yields the right value.
Tried renaming the column in the csv but no change in behavior. I see no difference from the other columns (which work) except that this is the first column in the csv.
The hashmap is created from code like this - reading records from a CSV. Standard code from the clojure.data.csv package.
(->> (csv/read-csv reader)
csv-data->maps
(map #(my-function some-args %))
doall))
(defn csv-data->maps
"Return the csv records as a vector of maps"
[csv-data]
(map zipmap
(->> (first csv-data) ;; First row is the header
(map keyword) ;; Drop if you want string keys instead
repeat)
(rest csv-data)))
The "first column" thing is definitely suspicous and points to some invisible characters such as a BOM quietly attaching itself to your first keyword.
To debug try printing out the hex of the names of the keywords. And/or maybe you'll see something if you do a hex dump, e.g., with head -n 2 file.csv | od -x, of the first few lines of the input file.
I would try two things. First, print the type of each key. They should all be clojure.lang.Keyword, if the creation code you included is accurate and my-function preserves their type; but if you created it in some other way and misremembered, you might discover that the key is a symbol, or a string or something like that. In general, don't use println on anything but strings, because it's pretty low-fidelity. prn is better at conveying an accurate picture of your data - it's not perfect, but at least you can tell a string from a keyword with it.
Second, look at the printed values more carefully, e.g. with od -t x1 - or you could do it in process with something like:
(let [k (key (first m)), s (name k)]
(clojure.string/join " "
(for [c s]
(format "%02x" (int c)))))
If the result isn't "53 74 6f 63 6b", then you have some weird characters in your file - maybe nonprinting characters, maybe something that looks like a capital S but isn't, whatever.
Once I reached the point of trying anything, I copied the keyword from the REPL and pasted it into VSCode and sure enough - there was this weird looking character :?Id within the keyword. Using the weird keyword, the lookup worked.
Workaround: Added a dummy column as the first column.
Then things started to click into place, I remembered reading something on BOM in the csv reader project docs. https://github.com/clojure/data.csv#byte-order-mark
Downloaded a hexdump file viewer which confirmed the problem bytes at the start of the file.
o;?Id,AccountId,...
Final solution: Before passing the reader to the data.csv read function, skip over the unwanted bytes.
(.skip reader 1)
The world makes sense again.

Using script file like a function

In lua, I can do something like this:
a.lua:
return 1+2
b.lua:
print(dofile('a.lua'))
My question is: Does this kind of thing (using files like functions) has a standarized name in computer science? What languages besides lua support it? Particularly, can you do something like this in Clojure? (or Lisp)
The clojure equivalent would be
a.clj:
(+ 1 2)
b.clj:
(load-file "a.clj")
but please note that clojure namespaces would be the right approach.
In b.clj you can :use or :require/:refer select functions from any namespace.
To your second question you may want to lookup linkers.
(eval (read-string "(+ 1 2)"))
; 3
Use read instead of read-string to read from a file. But beware! This executes whatever code is in the file.
Note
read and read-string evaluate the first form they find. load-file (as in #KobbyPemson's answer) evaluates the lot, returning the result of the last, as though they were surrounded by an invisible do.
Given
(defn read-file [filename]
(with-open [r (java.io.PushbackReader. (clojure.java.io/reader filename))]
(eval (read r))))
(spit "testfile.txt" "(+ 1 2) 3 4")
then
(eval (read-string "(+ 1 2) 3 4"))
; 3
(read-file "testfile.txt")
; 3
(load-file "testfile.txt")
; 4
(adapted from ClojureDocs on read)

clojure: how to get values from lazy seq?

Iam new to clojure and need some help to get a value out of a lazy sequence.
You can have a look at my full data structure here: http://pastebin.com/ynLJaLaP
What I need is the content of the title:
{: _content AlbumTitel2}
I managed to get a list of all _content values:
(def albumtitle (map #(str (get % :title)) photosets))
(println albumtitle)
and the result is:
({:_content AlbumTitel2} {:_content test} {:_content AlbumTitel} {:_content album123} {:_content speciale} {:_content neues B5 Album} {:_content Album Nr 2})
But how can I get the value of every :_content?
Any help would be appreciated!
Thanks!
You could simply do this
(map (comp :_content :title) photosets)
Keywords work as functions, so the composition with comp will first retrieve the :title value of each photoset and then further retrieve the :_content value of that value.
Alternatively this could be written as
(map #(get-in % [:title :_content]) photosets)
A semi alternative solution is to do
(->> data
(map :title)
(map :_content))
This take advances of the fact that keywords are functions and the so called thread last macro. What it does is injecting the result of the first expression in as the last argument of the second etc..
The above code gets converted to
(map :_content (map :title data))
Clearly not as readable, and not easy to expand later either.
PS I asume something went wrong when the data was pasted to the web, because:
{: _content AlbumTitel2}
Is not Clojure syntax, this however is:
{:_content "AlbumTitel2"}
No the whitespace after :, and "" around text. Just in case you might want to paste some Clojure some other time.

One argument, many functions

I have an incoming lazy stream lines from a file I'm reading with tail-seq (to contrib - now!) and I want to process those lines one after one with several "listener-functions" that takes action depending on re-seq-hits (or other things) in the lines.
I tried the following:
(defn info-listener [logstr]
(if (re-seq #"INFO" logstr) (println "Got an INFO-statement")))
(defn debug-listener [logstr]
(if (re-seq #"DEBUG" logstr) (println "Got a DEBUG-statement")))
(doseq [line (tail-seq "/var/log/any/java.log")]
(do (info-listener logstr)
(debug-listener logstr)))
and it works as expected. However, there is a LOT of code-duplication and other sins in the code, and it's boring to update the code.
One important step seems to be to apply many functions to one argument, ie
(listen-line line '(info-listener debug-listener))
and use that instead of the boring and error prone do-statement.
I've tried the following seemingly clever approach:
(defn listen-line [logstr listener-collection]
(map #(% logstr) listener-collection))
but this only renders
(nil) (nil)
there is lazyiness or first class functions biting me for sure, but where do I put the apply?
I'm also open to a radically different approach to the problem, but this seems to be a quite sane way to start with. Macros/multi methods seems to be overkill/wrong for now.
Making a single function out of a group of functions to be called with the same argument can be done with the core function juxt:
=>(def juxted-fn (juxt identity str (partial / 100)))
=>(juxted-fn 50)
[50 "50" 2]
Combining juxt with partial can be very useful:
(defn listener [re message logstr]
(if (re-seq re logstr) (println message)))
(def juxted-listener
(apply juxt (map (fn [[re message]] (partial listner re message))
[[#"INFO","Got INFO"],
[#"DEBUG", "Got DEBUG"]]))
(doseq [logstr ["INFO statement", "OTHER statement", "DEBUG statement"]]
(juxted-listener logstr))
You need to change
(listen-line line '(info-listener debug-listener))
to
(listen-line line [info-listener debug-listener])
In the first version, listen-line ends up using the symbols info-listener and debug-listener themselves as functions because of the quoting. Symbols implement clojure.lang.IFn (the interface behind Clojure function invocation) like keywords do, i.e. they look themselves up in a map-like argument (actually a clojure.lang.ILookup) and return nil if applied to something which is not a map.
Also note that you need to wrap the body of listen-line in dorun to ensure it actually gets executed (as map returns a lazy sequence). Better yet, switch to doseq:
(defn listen-line [logstr listener-collection]
(doseq [listener listener-collection]
(listener logstr)))

Printing and reading lists from a file in Clojure

I have a Clojure data structure of the form:
{:foo '("bar" "blat")}
and have tried writing them to a file using the various pr/prn/print. However, each time the structure is written as
{:foo ("bar" "blat")}
then when I try to read in it using load-file, I get an error such as:
java.lang.ClassCastException: java.lang.String cannot be cast to clojure.lang.IF
n (build-state.clj:79)
presumably as the list is being evaluated as a function call when it is read. Is there any way to write the structure out with the lists in their quoted form?
thanks,
Nick
The inverse of printing is usually reading, not loading.
user> (read-string "{:foo (\"bar\" \"blat\")}")
{:foo ("bar" "blat")}
If you really need to print loadable code, you need to quote it twice.
user> (pr-str '{:foo '("bar" "blat")})
"{:foo (quote (\"bar\" \"blat\"))}"
user> (load-string (pr-str '{:foo '("bar" "blat")}))
{:foo ("bar" "blat")}