How to safely read untrusted Clojure code (not just some serialized data)? - clojure

(def evil-code (str "(" (slurp "/mnt/src/git/clj/clojure/src/clj/clojure/core.clj") ")" ))
(def r (read-string evil-code ))
Works, but unsafe
(def r (clojure.edn/read-string evil-code))
RuntimeException Map literal must contain an even number of forms clojure.lang.Util.runtimeException (Util.java:219)
Does not work...
How to read Clojure code (presering all '#'s as themselves is desirable) into a tree safely? Imagine a Clojure antivirus that want to scan the code for threats and wants to work with data structure, not with plain text.

First of all you should never read clojure code directly from untrusted data sources. You should use EDN or another serialization format instead.
That being said since Clojure 1.5 there is a kind of safe way to read strings without evaling them. You should bind the read-eval var to false before using read-string. In Clojure 1.4 and earlier this potentially resulted in side effects caused by java constructors being invoked. Those problems have since been fixed.
Here is some example code:
(defn read-string-safely [s]
(binding [*read-eval* false]
(read-string s)))
(read-string-safely "#=(eval (def x 3))")
=> RuntimeException EvalReader not allowed when *read-eval* is false. clojure.lang.Util.runtimeException (Util.java:219)
(read-string-safely "(def x 3)")
=> (def x 3)
(read-string-safely "#java.io.FileWriter[\"precious-file.txt\"]")
=> RuntimeException Record construction syntax can only be used when *read-eval* == true clojure.lang.Util.runtimeException (Util.java:219)
Regarding reader macro's
The dispatch macro (#) and tagged literals are invoked at read time. There is no representation for them in Clojure data since by that time these constructs all have been processed. As far as I know there is no build in way to generate a syntax tree of Clojure code.
You will have to use an external parser to retain that information. Either you roll your own custom parser or you can use a parser generator like Instaparse and ANTLR. A complete Clojure grammar for either of those libraries might be hard to find but you could extend one of the EDN grammars to include the additional Clojure forms. A quick google revealed an ANTLR grammar for Clojure syntax, you could alter it to support the constructs that are missing if needed.
There is also Sjacket a library made for Clojure tools that need to retain information about the source code itself. It seems like a good fit for what you are trying to do but I don't have any experience with it personally. Judging from the tests it does have support for reader macro's in its parser.

According to the current documentation you should never use read nor read-string to read from untrusted data sources.
WARNING: You SHOULD NOT use clojure.core/read or
clojure.core/read-string to read data from untrusted sources. They
were designed only for reading Clojure code and data from trusted
sources (e.g. files that you know you wrote yourself, and no one
else has permission to modify them).
You should use read-edn or clojure.edn/read which were designed with that purpose in mind.
There was a long discussion in the mailing list regarding the use of read and read-eval and best practices regarding those.

I wanted to point out an old library (used in LightTable) that uses read-stringwith a techniques to propose a client/server communication
Fetch : A ClojureScript library for Client/Server interaction.
You can see in particular the safe-read method :
(defn safe-read [s]
(binding [*read-eval* false]
(read-string s)))
You can see the use of binding *read-eval* to false.
I think the rest of the code is worth watching at for the kind of abstractions it proposes.
In a PR, it is suggested that there is a security problem that can be fixed by using edn instead (...aaand back to your question) :
(require '[clojure.edn :as edn])
(defn safe-read [s]
(edn/read-string s))

Related

is clojure open to whatever works regarding naming conventions (of fns), especially regarding generic-like fns as partials

I'm quite new to functional programming and clojure. I like it.
I'd like to know what the community thinks about following fn-naming approach:
Is it a feasible way to go with such naming, or is it for some reason something to avoid?
Example:
(defn coll<spec>?
"Checks wether the given thing is a collection of elements of ::spec-type"
[spec-type thing]
(and (coll? thing)
(every? #(spec/valid? spec-type %)
thing)))
(defn coll<spec>?nil
"Checks if the given thing is a collection of elements of ::spec-type or nil"
[spec-type thing]
(or (nil? thing)
(coll<spec>? spec-type thing)))
now ... somewhere else I'm using partial applications in clojure defrecrod/specs ...
similar to this:
; somewhere inside a defrecord
^{:spec (partial p?/coll<spec>?nil ::widgets)} widgets component-widgets
now, I created this for colls, sets, hash-maps with a generic-like form for specs (internal application of spec/valid? ...) and with a direct appication of predicates and respectively a <p> instead of the <spec>
currently I'm discussing with my colleagues wether this is a decent and meaningful and useful approach - since it is at least valid to name function like this in clojure - or wether the community thinks this is rather a no-go.
I'd very much like your educated opinions on that.
It's a weird naming convention, but a lot of projects use weird naming conventions because their authors believe they help. I imagine if I read your codebase, variable naming conventions would probably not be the thing that surprises me most. This is not an insult: every project has some stuff that surprises me.
But I don't see why you need these specific functions at all. As Sean Corfield says in a comment, there are good spec combinator functions to build fancier specs out of simpler ones. Instead of writing coll<spec>?, you could just combine s/valid? with s/coll-of:
(s/valid? (s/coll-of spec-type) thing)
Likewise you can use s/or and nil? to come up with coll<spec>?nil.
(s/valid? (s/or nil? (s/coll-of spec-type) thing))
This is obviously more to write at each call site than a mere coll<spec>?nil, so you may dismiss my advice. But the point is you can extract functions for defining these specs, and use those specs.
(defn nil-or-coll-of [spec]
(s/or nil? (s/coll-of spec)))
(s/valid? (nil-or-coll-of spec-type) thing)
Importantly, this means you could pass the result of nil-or-coll-of to some other function that expects a spec, perhaps to build a larger spec out of it, or to try to conform it. You can't do that if all your functions have s/valid? baked into them.
Think in terms of composing general functions, not defining a new function from scratch for each thing you want to do.

Is it good to avoid macro in this example?

I read that data > functions > macros
Say you want to evaluate code in a postfix fashion.
Which approach would be better?
;; Macro
(defmacro reverse-fn [expression]
(conj (butlast expression) (last expression)))
(reverse-fn ("hello world" println))
; => "hello world"
;; Function and data
(def data ["hello world" println])
(defn reverse-fn [data]
(apply (eval (last data)) (butlast data)))
(reverse-fn ["hello world" println])
; => "hello world"
Thanks!
If you require different evaluation behavior for any kind of data in your code macros are your best choice because they can transform unevaluated data at compile time to the code you'd like to be evaluated instead.
Clojure has a programmatic macro system which allows the compiler to be extended by user code. Macros can be used to define syntactic constructs which would require primitives or built-in support in other languages. (http://clojure.org/macros)
The example you provide has specifically that requirement ("evaluate code in a postfix fashion") which is why a macro is the correct choice.
Your macro is better than your function: a macro is better than a function employing eval.
However, the function need not employ eval. You could write it
(defn reverse-fn [data]
(apply (last data) (butlast data)))
Then, for example, as before,
(reverse-fn [3 inc])
=> 4
On a related topic, you might find this explanation of transducers interesting.
Edit:
Notice that functions are literals, in the sense that a function evaluates to itself:
((eval +) 1 1)
=> 2
In general:
Macros have their use however; macros expand at the point they are encountered so you will have one or more code blocks being inlined in the resulting byte code. They are ideal for encapsulating High Order DSL terminologies.
Functions, on the other hand, are wonderful for reuse where you localize the purity of the function and it can be called from multiple other functions without increasing the code footprint. They are ideal for localized or global usage.
REPL behavior consideration: A single function is easier to rework without worrying about fully evaluating the entire source file(s) to ensure all macro reference expansions get updated.
Hope this helps
Simple rules are the best: Use a function if you can, a macro if you must.

Clojure style, anything like --> with doto semantics?

I'm starting to use clojure to test a java library. So my question is more "what is the clojure way of doing this".
I have a lot of code that looks like the following
...
(with-open [fs (create-filesystem)]
(let [root (.getFile fs path)]
(is (.isDirectory root))
(is (.isComplete root)))
(let [suc (.getFile fs (str path "/_SUCCESS"))]
(is (.isFile suc))
(is (.isComplete suc))))
Since this is a java object, I need to verify a set of properties are true on the object. I know about doto that will let me do things like
(doto (.getFile fs path)
(.setPath path2)
(.setName name2))
and --> will let me have a list of partial functions and have each result pass through each function. Been thinking that something like --> but keeps passing the same object like doto would help with these tests. Would something like this be a good way to do this, or am I not really doing this the clojure way?
Thanks for your time!
You can use doto:
(doto (.getFile fs path) (.setPath path2) (.setPath name2))
You may learn about other threading macros like cond->, as->, some-> etc... for "the Clojure way" purpose.
There's no macro that does exactly what you wanted. Only .. and doto are object-specific macros.
Because your question is more "the Clojure way" thing, I recommend this:
https://github.com/bbatsov/clojure-style-guide
Yes, you can use the .. macro for java interop with threading semantics.
Example:
(.. (SomeObject/newBuilder)
(withId 1)
(withFooEnabled true)
(build))
There is nothing wrong with using a let as you have; it is a very direct and clear implementation of what you want to do. I think it is the best approach :)
But it is interesting to examine how you could achieve this with Clojure built in macros too:
(doto (.getFile fs path)
(-> (.isDirectory) (is))
(-> (.isComplete) (is)))
doto is special in that it propagates the initial value. It takes subsequent forms and inserts the initial value into the second position of the forms.
-> allows you to arrange forms inside out... which means you can combine it with doto such that the initial value is threaded through whatever layers of forms you would like to execute on it.
Now doto and -> are macros, so the syntax can be a little confusing... if you prefer to use functions then juxt is handy for calling multiple functions:
((juxt #(is (.isDirectory %)) #(is (.isComplete))) (.getFile fs path))
But for this situation it's quite ugly :)
Coming back to your original question, you can of course also create your own syntax with a macro... I describe one approach here: http://timothypratley.blogspot.pt/2016/05/composing-test-assertions-in-pipeline.html which chains multiple assertions in a convenient way.
I think for the example you describe sticking with a let form is the best way as it is most obvious... the other forms are shown here to discuss the options available, some of which might be appropriate in other circumstances.

Use cases for metadata in Clojure [duplicate]

How have you used metadata in your Clojure program?
I saw one example from Programming Clojure:
(defn shout [#^{:tag String} message] (.toUpperCase message))
;; Clojure casts message to String and then calls the method.
What are some uses? This form of programming is completely new to me.
Docstrings are stored as metadata under the :doc key. This is probably the number 1 most apparent use of metadata.
Return and parameter types can be optionally tagged with metadata to improve performance by avoiding the overhead of reflecting on the types at runtime. These are also known as "type hints." #^String is a type hint.
Storing things "under the hood" for use by the compiler, such as the arglist of a function, the line number where a var has been defined, or whether a var holds a reference to a macro. These are usually automatically added by the compiler and normally don't need to be manipulated directly by the user.
Creating simple testcases as part of a function definition:
(defn #^{:test (fn [] (assert true))} something [] nil)
(test #'something)
If you are reading Programming Clojure, then Chapter 2 provides a good intro to metadata. Figure 2.3 provides a good summary of common metadata.
For diversity some answer, which does not concentrate on interaction with the language itself:
You can also eg. track the source of some data. Unchecked input is marked as :tainted. A validator might check things and then set the status to :clean. Code doing security relevant things might then barf on :tainted and only accept :cleaned input.
Meta Data was extremely useful for me for purposes of typing. I'm talking not just about type hints, but about complete custom type system. Simplest example - overloading of print-method for structs (or any other var):
(defstruct my-struct :foo :bar :baz)
(defn make-my-struct [foo bar baz]
(with-meta (struct-map my-struct :foo foo :bar baz :baz baz)
{:type ::my-struct}))
(defmethod print-method
[my-struct writer]
(print-method ...))
In general, together with Clojure validation capabilities it may increase safety and, at the same time, flexibility of your code very very much (though it will take some more time to do actual coding).
For more ideas on typing see types-api.
metadata is used by the compiler extensively for things like storing the type of an object.
you use this when you give type hints
(defn foo [ #^String stringy] ....
I have used it for things like storing the amount of padding that was added to a number. Its intended for information that is 'orthogonal' to the data and should not be considered when deciding if you values are the same.

How do I get core clojure functions to work with my defrecords

I have a defrecord called a bag. It behaves like a list of item to count. This is sometimes called a frequency or a census. I want to be able to do the following
(def b (bag/create [:k 1 :k2 3])
(keys bag)
=> (:k :k1)
I tried the following:
(defrecord MapBag [state]
Bag
(put-n [self item n]
(let [new-n (+ n (count self item))]
(MapBag. (assoc state item new-n))))
;... some stuff
java.util.Map
(getKeys [self] (keys state)) ;TODO TEST
Object
(toString [self]
(str ("Bag: " (:state self)))))
When I try to require it in a repl I get:
java.lang.ClassFormatError: Duplicate interface name in class file compile__stub/techne/bag/MapBag (bag.clj:12)
What is going on? How do I get a keys function on my bag? Also am I going about this the correct way by assuming clojure's keys function eventually calls getKeys on the map that is its argument?
Defrecord automatically makes sure that any record it defines participates in the ipersistentmap interface. So you can call keys on it without doing anything.
So you can define a record, and instantiate and call keys like this:
user> (defrecord rec [k1 k2])
user.rec
user> (def a-rec (rec. 1 2))
#'user/a-rec
user> (keys a-rec)
(:k1 :k2)
Your error message indicates that one of your declarations is duplicating an interface that defrecord gives you for free. I think it might actually be both.
Is there some reason why you cant just use a plain vanilla map for your purposes? With clojure, you often want to use plain vanilla data structures when you can.
Edit: if for whatever reason you don't want the ipersistentmap included, look into deftype.
Rob's answer is of course correct; I'm posting this one in response to the OP's comment on it -- perhaps it might be helpful in implementing the required functionality with deftype.
I have once written an implementation of a "default map" for Clojure, which acts just like a regular map except it returns a fixed default value when asked about a key not present inside it. The code is in this Gist.
I'm not sure if it will suit your use case directly, although you can use it to do things like
user> (:earth (assoc (DefaultMap. 0 {}) :earth 8000000000))
8000000000
user> (:mars (assoc (DefaultMap. 0 {}) :earth 8000000000))
0
More importantly, it should give you an idea of what's involved in writing this sort of thing with deftype.
Then again, it's based on clojure.core/emit-defrecord, so you might look at that part of Clojure's sources instead... It's doing a lot of things which you won't have to (because it's a function for preparing macro expansions -- there's lots of syntax-quoting and the like inside it which you have to strip away from it to use the code directly), but it is certainly the highest quality source of information possible. Here's a direct link to that point in the source for the 1.2.0 release of Clojure.
Update:
One more thing I realised might be important. If you rely on a special map-like type for implementing this sort of thing, the client might merge it into a regular map and lose the "defaulting" functionality (and indeed any other special functionality) in the process. As long as the "map-likeness" illusion maintained by your type is complete enough for it to be used as a regular map, passed to Clojure's standard function etc., I think there might not be a way around that.
So, at some level the client will probably have to know that there's some "magic" involved; if they get correct answers to queries like (:mars {...}) (with no :mars in the {...}), they'll have to remember not to merge this into a regular map (merge-ing the other way around would work fine).