Clojure basics: counting frequencies - clojure

I am learning Clojure, and I saw this bit of code online:
(count (filter #{42} coll))
And it does, as stated, count occurrences of the number 42 in coll. Is #{42} a function? The Clojure documentation on filter says that it should be, since the snippet works as advertised. I just have no idea how it works. If someone could clarify this for me, that would be great. My own solution to this same thing would have been:
(count (filter #(= %1 42) coll))
How come my filtering function has parenthesis and the snippet I found online has curly braces around the filtering function (#(...) vs. #{...})?

=> #{42}
#{42}
Defines a set...
=> (type #{42})
clojure.lang.PersistentHashSet
=> (supers (type #{42}))
#{clojure.lang.IHashEq java.lang.Object clojure.lang.IFn ...}
Interestingly the set implements IFn so you can treat it like a function. The behaviour of the function is "if this item exists in the set, return it".
=> (#{2 3} 3)
3
=> (#{2 3} 4)
nil
Other collections such as map and vector stand in as functions in a similar fashion, retrieving by key or index as appropriate.
=> ({:x 23 :y 26} :y)
26
=> ([5 7 9] 1)
7
Sweet, no? :-)

Yes, #{42} is a function,
because it's a set, and sets, amongst other capabilities, are
functions: they implement the clojure.lang.IFn interface.
Applied to any value in the set, they return it; applied to anything
else, they return nil.
So #{42} tests whether its argument is 42 (only nil and false are false, remember).
The Clojure way is to make everything a function that might usefully be one:
Sets work as a test for membership.
Maps work as key lookup.
Vectors work as index lookup.
Keywords work as lookup in the map argument.
This
often saves you a get,
allows you, as in the question, to pass naked data structures to higher order functions
such as filter and map, and
in the case of keywords, allows you to move transparently between maps and records
for holding your data.

Related

How to return value for empty collection passed to Clojure function

I have some function that returns greatest value for some key from passed maps. I want to return 0 when passed collection is empty. Of course I can make it using some conditional, but wondering if there is some more advanced technique for it?
(defn max-id [c]
"Using as: (max-id [{ :id 1 }, { :id 2 }])"
(if (empty? c)
0
(apply max (map :id c))))
;;;
(max-id []) => 0
(max-id [map-one map-two]) => 1024
Ironically, "advanced" is something that most advanced programmers tend to avoid when it is possible to do so, usually opting for simplicity when they can. The bottom line here is that your function works great, because of one reason: your intent is clear. Everyone can understand what happens if an empty collection is passed in, even if they are not a programmer. This is a huge benefit, and makes your solution already an ideal one, at least as far as the empty collection case goes.
Now, if you can guarantee a constraint that at least one of the :ids in a given collection will be zero or greater (i.e., we're dealing with positive ids), then you can do:
(defn max-id [c]
(reduce max 0 (map :id c)))
However, this does not work if you have a list of maps which all have a negative id, i.e.: [{:id -1} {:id -2}], as in this case the max-id should be -1, and so the above does not yield the correct answer.
There are other solutions, but they involve a conditional check, even if it's a little more implicit. For example:
(defn max-id [c]
(apply max (or (not-empty (map :id c)) [0])))
One option that I wouldn't recommend involves replacing max with a function which takes 0 or more args, checks to see whether there are any args, calls max if there are, and returns 0 if there are not. Unless you have some other use case that would require the max of nothing to be zero, I would avoid doing this, so I'm not even going to post it. This becomes overly complicated, and requires readers to go digging, so just don't do this.
As an aside, there is a core function that is not very well known that is arguably more idiomatic for finding the element whose value is largest for a given key, in a list. It is called max-key:
(defn max-id [c]
(:id (apply max-key :id c)))
However, I'd say that your function is easier to understand, if for no other reason than max-key is not as well known as max and map. So, just giving it here as an additional option for you.

Clojure - random access where the key itself is complex

Vectors are good for random access, but the key is just its position in the sequence, just a number. What about when you want the key itself be something more interesting, and you want fast random access? For this the obvious candidate would seem to be a Map. In most Map examples the keys used are keywords (with two dots at the front). Can I for example use a Vector as a key to a Map? Or not so much 'can', but would this be an idiomatic thing to do? And are there examples of this sort of thing out there? In a way I am thinking in relational database terms, except the structure being kept in memory.
I've done this--used other things as keys. Idiomatic? Why not? You can pretty much use anything as a key. (Maybe others will have a different opinion.)
Lookup will follow Clojure's equality semantics. The place where that gets interesting is if you want to use a defrecord or a deftype as a key. These function similarly in some respects, but deftype equality is normally by identity, i.e. = is equivalent to identical? for deftypes (but see amalloy's comment below). Functions also have identity semantics, I believe.
(defrecord BarRec [x y])
(deftype BarTyp [x y])
(def foo {125 1,
"this" 2,
{:a 10 :b 20} 3,
[1 2 3] 4,
(->BarRec 10 20) 5,
(BarTyp. 10 20) 6})
Notice that I create new instances of each key below:
(foo 125) ;=> 1
(foo "this") ;=> 2
(foo {:b 20 :a 10}) ;=> 3
(foo [1 2 3]) ;=> 4
(foo (->BarRec 10 20)) ;=> 5
(foo (->BarTyp 10 20)) ;=> nil
The new deftype instance doesn't find the map entry that uses the old deftype instance as a key, even though they have the same contents. Here's a clue to the reason why:
(= (->BarRec 10 20) (->BarRec 10 20)) ;=> true
(= (->BarTyp 10 20) (->BarTyp 10 20)) ;=> false
(def bar-typ (->BarTyp 10 20))
(= bar-typ bar-typ) ;=> true
This means that there are situations where using deftypes as keys is much more efficient than using defrecords: Comparing two defrecords requires comparing their contents, while comparing deftypes requires only deciding whether something is the same object--probably by pointer equality.
However, defrecords include lots of conveniences that deftypes don't have. And defrecords are Clojurely. Equality by strict identity is not Clojurely. Equality by identity is useful if you want to track a data structure whose contents change over time, but that kind of beast is not supposed to be running around wild in the Clojure forest. You might almost say that deftypes were created with a deprecated status from the beginning (but they will never go away afaik).
(Note: The point about the difference between hashing efficiency for defrecords and deftypes carries over to Java interop. Both data structures can be treated as Java classes, and when a Java hash map compares two defrecords or two deftypes, it calls hashCode() methods that follow the appropriate Clojure semantics. Using deftypes as hash keys in Java can be a lot faster.)
I think the reason there are not many examples of Clojure 'random access' is that doing so is not a functional way of programming. If 'everything is a stream' (not so much in the technical sense but in the 'flowing past' sense), then what is needed should already be there.
To fix, an existing stream-producing function in the program could have each of the items it produces 'made greater'. Or two accumulators could be returned from a reduce rather than one:
https://gist.github.com/stathissideris/9500581
In my case I was already creating a stream of positions, and need random access to something else (a Blob). So I made an existing reduce function return elements that had both position and Blob, already next to one another - so no need for the random access to Blob that prompted this question.
You may use objects of all types as keys in Clojure maps. They were designed to support this kind of use. The reason you see maps with keywords so frequently is because they are also the equivalent to data objects in OO languages.

Trying to get a lazy-contains? function in Clojure to return results consistant with contains? when dealing with a map

Here's a use of the standard 'contains?' function in Clojure-
(contains? {:state "active", :course_n "law", :course_i "C0"} :state)
and it returns the expected
true
I used the following
Clojure: Idiomatic way to call contains? on a lazy sequence
as a guide for building a lazy-contains? as this is what I need for my present use-case.
The problem I'm facing is that for a map these alternatives are not returning the same answer, giving either a false or a nil response. I've tried looking at the source for contains? and it's slow going trying to understand what's happening so I can correct the lazy-contains? appropriately (for the record Clojure is essentially my first programming language, and my exposure to Java is very limited).
Any thoughts or ideas on how I might approach this? I tried every variant on the linked question I could.
Thanks in advance.
Edited to remove the error pointed out by #amalloy.
I think your problem is with the way that maps present themselves as sequences.
Given
(def data {:state "active", :course_n "law", :course_i "C0"})
then
(seq data)
;([:state "active"] [:course_i "C0"] [:course_n "law"])
... a sequence of key-value pairs.
So if we define (following #chouser)
(defn lazy-contains? [coll x]
(some #(= x %) coll))
... then
(lazy-contains? data :state)
;nil
... a false result, whereas ...
(lazy-contains? data [:state "active"])
;true
This is what #Ankur was getting at, showing you a function treating a map as a sequence consistent with contains? on the map itself.
The standard contains? works with keyed/indexed collections - maps
or sets or vectors - where it tests for the presence of a key.
Our lazy-contains? works with anything sequable, including all the
standard collections, testing for the presence of a value.
Given the way that keyed/indexed collections present as sequences, these are bound to be inconsistent.
You can try the below implementation (for maps only):
(defn lazy-contains? [col key]
(some (fn [[k v]] (= k key)) col))
Remember, contains? is to check the existence of a key in a collection, in maps the key is obvious, in other supported collections (like vector) the key is the index.
A "lazy" implementation of contains? is undesirable where checking for presence
of a key in a hash-map or of a value in a set
(contains? #{:foo} :foo}) => true
(contains? {:foo :bar} :foo) => true
of an index of a vector array or string.
(contains? [:foo] 0) => true
(contains? (int-array 7) 6) => true
(contains? "foo" 2) => true
Quoting from the contains? docstring:
'contains?' operates constant or logarithmic time; it will not
perform a linear search for a value.
some is a tool for linear searching. When searching for an element in a set or vector, it can take the input sequence length times as long as contains? or longer in the worst case and will take more time than contains? in almost every case.
contains? can't be implemented "lazy" as it does not produce a sequence. However, some stops consuming a lazy sequence as soon as it has determined a return value.
(some zero? (range))
;; true
Notice that maps and sets are never sequential or lazy.

Clojure get nested map value

So I'm used to having a nested array or map of settings in my applications. I tried setting one up in Clojure like this:
(def gridSettings
{:width 50
:height 50
:ground {:variations 25}
:water {:variations 25}
})
And I wondered if you know of a good way of retrieving a nested value? I tried writing
(:variations (:ground gridSettings))
Which works, but it's backwords and rather cumbersome, especially if I add a few levels.
That's what get-in does:
(get-in gridSettings [:ground :variations])
From the docstring:
clojure.core/get-in
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of keys. Returns nil if the key
is not present, or the not-found value if supplied.
You can use the thread-first macro:
(-> gridSettings :ground :variations)
I prefer -> over get-in except for two special cases:
When the keys are an arbitrary sequence determined at runtime.
When supplying a not-found value is useful.
Apart from what other answers has mentioned (get-in and -> macro), sometimes you want to fetch multiple values from a map (nested or not), in those cases de-structuring can be really helpful
(let [{{gv :variations} :ground
{wv :variations} :water} gridSettings]
[gv wv])
Maps are partial functions (as in not total). Thus, one can simply apply them as functions. Based on the map from the question:
(gridSettings :ground)
;=> {:variations 25}
The result is a map. So, it can be applied again, which results in a very similar (but not backwards) "syntax" as proposed in the question:
((gridSettings :ground) :variations)
;=>25

Why Clojure idiom prefer to return nil instead of empty list like Scheme?

From a comment on another question, someone is saying that Clojure idiom prefers to return nil rather than an empty list like in Scheme. Why is that?
Like,
(when (seq lat) ...)
instead of
(if (empty? lat)
'() ...)
I can think of a few reasons:
Logical distinction. In Clojure nil means nothing / absence of value. Whereas '() "the empty list is a value - it just happens to be a value that is an empty list. It's quite often conceptually and logically useful to distinguish between the two.
Fit with JVM - the JVM object model supports null references. And quite a lot of Java APIs return null to mean "nothing" or "value not found". So to ensure easy JVM interoperability, it makes sense for Clojure to use nil in a similar way.
Laziness - the logic here is quite complicated, but my understanding is that using nil for "no list" works better with Clojure's lazy sequences. As Clojure is a lazy functional programming language by default, it makes sense for this usage to be standard. See http://clojure.org/lazy for some extra explanation.
"Falsiness" - It's convenient to use nil to mean "nothing" and also to mean "false" when writing conditional code that examines collections - so you can write code like (if (some-map :some-key) ....) to test if a hashmap contains a value for a given key.
Performance - It's more efficient to test for nil than to examine a list to see if it empty... hence adopting this idiom as standard can lead to higher performance idiomatic code
Note that there are still some functions in Clojure that do return an empty list. An example is rest:
(rest [1])
=> ()
This question on rest vs. next goes into some detail of why this is.....
Also note that the union of collection types and nil form a monoid, with concatenation the monoid plus and nil the monoid zero. So nil keeps the empty list semantics under concatenation while also representing a false or "missing" value.
Python is another language where common monoid identities represent false values: 0, empty list, empty tuple.
From The Joy of Clojure
Because empty collections act like true in Boolean contexts, you need an idiom for testing whether there's anything in a collection to process. Thankfully, Clojure provides such a technique:
(seq [1 2 3])
;=> (1 2 3)
(seq [])
;=> nil
In other Lisps, like Common Lisp, the empty list is used to mean nil. This is known as nil punning and is only viable when the empty list is falsey. Returning nil here is clojure's way of reintroducing nil punning.
Since I wrote the comment I will write a answer. (The answer of skuro provides all information but maybe a too much)
First of all I think that more importend things should be in first.
seq is just what everybody uses most of the time but empty? is fine to its just (not (seq lat))
In Clojure '() is true, so normaly you want to return something thats false if the sequence is finished.
if you have only one importend branch in your if an the other returnes false/'() or something like that why should you write down that branch. when has only one branch this is spezially good if you want to have sideeffects. You don't have to use do.
See this example:
(if false
'()
(do (println 1)
(println 2)
(println 3)))
you can write
(when true
(println 1)
(println 2)
(println 3))
Not that diffrent but i think its better to read.
P.S.
Not that there are functions called if-not and when-not they are often better then (if (not true) ...)