I would like to rewrite in scheme a script I made in clojure but I'm not sure how.
I wrote this simple script in clojure. It reads some csv files, processes them a bit and writes a new csv files.
It's based on a bunch of functions each accepting a map as an argument and returning a new map
The main loop uses the transducers. A transducer made of composing such functions in the right order. Like this
(def step1 (mapcat (comp
op/line-numbers
op/station
op/added-file-order
op/splitted-file
op/ingested-file)
))
The transducer is then made into a a lazy sequence. Like this
(defn thread [path]
(sequence
(comp step1 step2 step3 step4)
(op/files-collection path)))
The sequence is then lazily written to a file.
I'm mumbling about implementing the same functionality (and in the future maybe more) in guile scheme
I know scheme has streams (as lazy sequences) but I'm not sure the semantics is the same as in clojure.
How would such a thing be made in scheme ? What's the idiomatic scheme version of such a thing ?
Related
I have been using Clojure, ClojureScript, lein, shadow-cljs, re-frame,
reagent, Emacs, and CIDER to work on a Clojure/ClojureScript dynamic
web app project. I am new to Clojure.
At some point in the codebase there is a big use of doall command followed by the use of reduce in order to generate hiccup (HTML renderer):
(doall
(reduce
(fn ...) ...)
[] ; hiccup-output
project-variable)
I am new to Clojure. But this felt weird to me considering documentation:
When lazy sequences are produced via functions that have side
effects, any effects other than those needed to produce the first
element in the seq do not occur until the seq is consumed. doall can
be used to force any effects. Walks through the successive nexts of
the seq, retains the head and returns it, thus causing the entire
seq to reside in memory at one time.
1 - Isn't doall supposed to be used with lazy sequences?
2 - I believe reduce is not one. Am I wrong?
3 - If doall should not be used with reduce in this case, what would be the recommendation for a refactoring here?
Yes, doall only makes sense with lazy collections
reduce does not by itself create a lazy collection. But the reducing function (the first argument to reduce) might end up returning a lazy collection - in that case, doall might make sense
Impossible to say because the code that you've provided is incomplete and the parentheses are unbalanced
How in Clojure process collections like in Java streams - one by one thru all the functions instead of evaluating all the elements in all the stack frame. Also I would describe it as Unix pipes (next program pulls chunk by chunk from previous one).
As far as I understand your question, you may want to look into two things.
First, understand the sequence abstraction. This is a way of looking at collections which consumes them one by one and lazily. It is an important Clojure idiom and you'll meet well known functions like map, filter, reduce, and many more. Also the macro ->>, which was already mentioned in a comment, will be important.
After that, when you want to dig deeper, you probably want to look into transducers and reducers. In a grossly oversimplifying summary, they allow you combine several lazy functions into one function and then process a collection with less laziness, less memory consumption, more performance, and possibly on several threads. I consider these to be advanced topics, though. Maybe the sequences are already what you were looking for.
Here is a simple example from ClojureDocs.org
;; Use of `->` (the "thread-first" macro) can help make code
;; more readable by removing nesting. It can be especially
;; useful when using host methods:
;; Arguably a bit cumbersome to read:
user=> (first (.split (.replace (.toUpperCase "a b c d") "A" "X") " "))
"X"
;; Perhaps easier to read:
user=> (-> "a b c d"
.toUpperCase
(.replace "A" "X")
(.split " ")
first)
"X"
As always, don't forget the Clojure CheatSheet or Clojure for the Brave and True.
(def evil-code (str "(" (slurp "/mnt/src/git/clj/clojure/src/clj/clojure/core.clj") ")" ))
(def r (read-string evil-code ))
Works, but unsafe
(def r (clojure.edn/read-string evil-code))
RuntimeException Map literal must contain an even number of forms clojure.lang.Util.runtimeException (Util.java:219)
Does not work...
How to read Clojure code (presering all '#'s as themselves is desirable) into a tree safely? Imagine a Clojure antivirus that want to scan the code for threats and wants to work with data structure, not with plain text.
First of all you should never read clojure code directly from untrusted data sources. You should use EDN or another serialization format instead.
That being said since Clojure 1.5 there is a kind of safe way to read strings without evaling them. You should bind the read-eval var to false before using read-string. In Clojure 1.4 and earlier this potentially resulted in side effects caused by java constructors being invoked. Those problems have since been fixed.
Here is some example code:
(defn read-string-safely [s]
(binding [*read-eval* false]
(read-string s)))
(read-string-safely "#=(eval (def x 3))")
=> RuntimeException EvalReader not allowed when *read-eval* is false. clojure.lang.Util.runtimeException (Util.java:219)
(read-string-safely "(def x 3)")
=> (def x 3)
(read-string-safely "#java.io.FileWriter[\"precious-file.txt\"]")
=> RuntimeException Record construction syntax can only be used when *read-eval* == true clojure.lang.Util.runtimeException (Util.java:219)
Regarding reader macro's
The dispatch macro (#) and tagged literals are invoked at read time. There is no representation for them in Clojure data since by that time these constructs all have been processed. As far as I know there is no build in way to generate a syntax tree of Clojure code.
You will have to use an external parser to retain that information. Either you roll your own custom parser or you can use a parser generator like Instaparse and ANTLR. A complete Clojure grammar for either of those libraries might be hard to find but you could extend one of the EDN grammars to include the additional Clojure forms. A quick google revealed an ANTLR grammar for Clojure syntax, you could alter it to support the constructs that are missing if needed.
There is also Sjacket a library made for Clojure tools that need to retain information about the source code itself. It seems like a good fit for what you are trying to do but I don't have any experience with it personally. Judging from the tests it does have support for reader macro's in its parser.
According to the current documentation you should never use read nor read-string to read from untrusted data sources.
WARNING: You SHOULD NOT use clojure.core/read or
clojure.core/read-string to read data from untrusted sources. They
were designed only for reading Clojure code and data from trusted
sources (e.g. files that you know you wrote yourself, and no one
else has permission to modify them).
You should use read-edn or clojure.edn/read which were designed with that purpose in mind.
There was a long discussion in the mailing list regarding the use of read and read-eval and best practices regarding those.
I wanted to point out an old library (used in LightTable) that uses read-stringwith a techniques to propose a client/server communication
Fetch : A ClojureScript library for Client/Server interaction.
You can see in particular the safe-read method :
(defn safe-read [s]
(binding [*read-eval* false]
(read-string s)))
You can see the use of binding *read-eval* to false.
I think the rest of the code is worth watching at for the kind of abstractions it proposes.
In a PR, it is suggested that there is a security problem that can be fixed by using edn instead (...aaand back to your question) :
(require '[clojure.edn :as edn])
(defn safe-read [s]
(edn/read-string s))
I'm trying to use create a Clojure seq from some iterative Java library code that I inherited. Basically what the Java code does is read records from a file using a parser, sends those records to a processor and returns an ArrayList of result. In Java this is done by calling parser.readData(), then parser.getRecord() to get a record then passing that record into processor.processRecord(). Each call to parser.readData() returns a single record or null if there are no more records. Pretty common pattern in Java.
So I created this next-record function in Clojure that will get the next record from a parser.
(defn next-record
"Get the next record from the parser and process it."
[parser processor]
(let [datamap (.readData parser)
row (.getRecord parser datamap)]
(if (nil? row)
nil
(.processRecord processor row 100))))
The idea then is to call this function and accumulate the records into a Clojure seq (preferably a lazy seq). So here is my first attempt which works great as long as there aren't too many records:
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (datamap-seq parser processor)))))
I can create a parser and processor, and do something like (take 5 (datamap-seq parser processor)) which gives me a lazy seq. And as expected getting the (first) of that seq only realizes one element, doing count realizes all of them, etc. Just the behavior I would expect from a lazy seq.
Of course when there are a lot of records I end up with a StackOverflowException. So my next attempt was to use loop-recur to do the same thing.
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(loop [records (seq '())]
(if-let [record (next-record parser processor)]
(recur (cons record records))
records))))
Now using this the same way and defing it using (def results (datamap-seq parser processor)) gives me a lazy seq and doesn't realize any elements. However, as soon as I do anything else like (first results) it forces the realization of the entire seq.
Can anyone help me understand where I'm going wrong in the second function using loop-recur that causes it to realize the entire thing?
UPDATE:
I've looked a little closer at the stack trace from the exception and the stack overflow exception is being thrown from one of the Java classes. BUT it only happens when I have the datamap-seq function like this (the one I posted above actually does work):
(defn datamap-seq
"Returns a lazy seq of the records using the given parser and processor"
[parser processor]
(lazy-seq
(when-let [records (next-record parser processor)]
(cons records (remove empty? (datamap-seq parser processor))))))
I don't really understand why that remove causes problems, but when I take it out of this funciton it all works right (I'm doing the removal of empty lists somewhere else now).
loop/recur loops within the loop expression until the recursion runs out. adding a lazy-seq around it won't prevent that.
Your first attempt with lazy-seq / cons should already work as you want, without stack overflows. I can't spot right now what the problem with it is, though it might be in the java part of the code.
I'll post here addition to Joost's answer. This code:
(defn integers [start]
(lazy-seq
(cons
start
(integers (inc start)))))
will not throw StackOverflowExceptoin if I do something like this:
(take 5 (drop 1000000 (integers)))
EDIT:
Of course better way to do it would be to (iterate inc 0). :)
EDIT2:
I'll try to explain a little how lazy-seq works. lazy-seq is a macro that returns seq-like object. Combined with cons that doesn't realize its second argument until it is requested you get laziness.
Now take a look at how LazySeq class is implemented. LazySeq.sval triggers computation of the next value which returns another instance of "frozen" lazy sequence. Method LazySeq.seq even better shows mechanics behind the concept. Notice that to fully realize sequence it uses while loop. It in itself means that stack trace use is limited to short function calls that return another instances of LazySeq.
I hope this makes any sense. I described what I could deduce from the source code. Please let me know if I made any mistakes.
I have a defrecord called a bag. It behaves like a list of item to count. This is sometimes called a frequency or a census. I want to be able to do the following
(def b (bag/create [:k 1 :k2 3])
(keys bag)
=> (:k :k1)
I tried the following:
(defrecord MapBag [state]
Bag
(put-n [self item n]
(let [new-n (+ n (count self item))]
(MapBag. (assoc state item new-n))))
;... some stuff
java.util.Map
(getKeys [self] (keys state)) ;TODO TEST
Object
(toString [self]
(str ("Bag: " (:state self)))))
When I try to require it in a repl I get:
java.lang.ClassFormatError: Duplicate interface name in class file compile__stub/techne/bag/MapBag (bag.clj:12)
What is going on? How do I get a keys function on my bag? Also am I going about this the correct way by assuming clojure's keys function eventually calls getKeys on the map that is its argument?
Defrecord automatically makes sure that any record it defines participates in the ipersistentmap interface. So you can call keys on it without doing anything.
So you can define a record, and instantiate and call keys like this:
user> (defrecord rec [k1 k2])
user.rec
user> (def a-rec (rec. 1 2))
#'user/a-rec
user> (keys a-rec)
(:k1 :k2)
Your error message indicates that one of your declarations is duplicating an interface that defrecord gives you for free. I think it might actually be both.
Is there some reason why you cant just use a plain vanilla map for your purposes? With clojure, you often want to use plain vanilla data structures when you can.
Edit: if for whatever reason you don't want the ipersistentmap included, look into deftype.
Rob's answer is of course correct; I'm posting this one in response to the OP's comment on it -- perhaps it might be helpful in implementing the required functionality with deftype.
I have once written an implementation of a "default map" for Clojure, which acts just like a regular map except it returns a fixed default value when asked about a key not present inside it. The code is in this Gist.
I'm not sure if it will suit your use case directly, although you can use it to do things like
user> (:earth (assoc (DefaultMap. 0 {}) :earth 8000000000))
8000000000
user> (:mars (assoc (DefaultMap. 0 {}) :earth 8000000000))
0
More importantly, it should give you an idea of what's involved in writing this sort of thing with deftype.
Then again, it's based on clojure.core/emit-defrecord, so you might look at that part of Clojure's sources instead... It's doing a lot of things which you won't have to (because it's a function for preparing macro expansions -- there's lots of syntax-quoting and the like inside it which you have to strip away from it to use the code directly), but it is certainly the highest quality source of information possible. Here's a direct link to that point in the source for the 1.2.0 release of Clojure.
Update:
One more thing I realised might be important. If you rely on a special map-like type for implementing this sort of thing, the client might merge it into a regular map and lose the "defaulting" functionality (and indeed any other special functionality) in the process. As long as the "map-likeness" illusion maintained by your type is complete enough for it to be used as a regular map, passed to Clojure's standard function etc., I think there might not be a way around that.
So, at some level the client will probably have to know that there's some "magic" involved; if they get correct answers to queries like (:mars {...}) (with no :mars in the {...}), they'll have to remember not to merge this into a regular map (merge-ing the other way around would work fine).