Clojure - random access where the key itself is complex - clojure

Vectors are good for random access, but the key is just its position in the sequence, just a number. What about when you want the key itself be something more interesting, and you want fast random access? For this the obvious candidate would seem to be a Map. In most Map examples the keys used are keywords (with two dots at the front). Can I for example use a Vector as a key to a Map? Or not so much 'can', but would this be an idiomatic thing to do? And are there examples of this sort of thing out there? In a way I am thinking in relational database terms, except the structure being kept in memory.

I've done this--used other things as keys. Idiomatic? Why not? You can pretty much use anything as a key. (Maybe others will have a different opinion.)
Lookup will follow Clojure's equality semantics. The place where that gets interesting is if you want to use a defrecord or a deftype as a key. These function similarly in some respects, but deftype equality is normally by identity, i.e. = is equivalent to identical? for deftypes (but see amalloy's comment below). Functions also have identity semantics, I believe.
(defrecord BarRec [x y])
(deftype BarTyp [x y])
(def foo {125 1,
"this" 2,
{:a 10 :b 20} 3,
[1 2 3] 4,
(->BarRec 10 20) 5,
(BarTyp. 10 20) 6})
Notice that I create new instances of each key below:
(foo 125) ;=> 1
(foo "this") ;=> 2
(foo {:b 20 :a 10}) ;=> 3
(foo [1 2 3]) ;=> 4
(foo (->BarRec 10 20)) ;=> 5
(foo (->BarTyp 10 20)) ;=> nil
The new deftype instance doesn't find the map entry that uses the old deftype instance as a key, even though they have the same contents. Here's a clue to the reason why:
(= (->BarRec 10 20) (->BarRec 10 20)) ;=> true
(= (->BarTyp 10 20) (->BarTyp 10 20)) ;=> false
(def bar-typ (->BarTyp 10 20))
(= bar-typ bar-typ) ;=> true
This means that there are situations where using deftypes as keys is much more efficient than using defrecords: Comparing two defrecords requires comparing their contents, while comparing deftypes requires only deciding whether something is the same object--probably by pointer equality.
However, defrecords include lots of conveniences that deftypes don't have. And defrecords are Clojurely. Equality by strict identity is not Clojurely. Equality by identity is useful if you want to track a data structure whose contents change over time, but that kind of beast is not supposed to be running around wild in the Clojure forest. You might almost say that deftypes were created with a deprecated status from the beginning (but they will never go away afaik).
(Note: The point about the difference between hashing efficiency for defrecords and deftypes carries over to Java interop. Both data structures can be treated as Java classes, and when a Java hash map compares two defrecords or two deftypes, it calls hashCode() methods that follow the appropriate Clojure semantics. Using deftypes as hash keys in Java can be a lot faster.)

I think the reason there are not many examples of Clojure 'random access' is that doing so is not a functional way of programming. If 'everything is a stream' (not so much in the technical sense but in the 'flowing past' sense), then what is needed should already be there.
To fix, an existing stream-producing function in the program could have each of the items it produces 'made greater'. Or two accumulators could be returned from a reduce rather than one:
https://gist.github.com/stathissideris/9500581
In my case I was already creating a stream of positions, and need random access to something else (a Blob). So I made an existing reduce function return elements that had both position and Blob, already next to one another - so no need for the random access to Blob that prompted this question.

You may use objects of all types as keys in Clojure maps. They were designed to support this kind of use. The reason you see maps with keywords so frequently is because they are also the equivalent to data objects in OO languages.

Related

In Clojure, how can I add support for common functions like empty? and count to my new type?

As I understand, Clojure makes it "easy" to solve the "expression problem".
But I can't find details how to do this. How can I create a new type (like defrecord) that handles things like empty? and count ?
The two examples empty? and count functions are part of Clojure's core and their implementations are driven by performance considerations, so they may not be the best examples for the solution of the expression problem. Anyway:
You can make empty? work by making seq work on your type, for example by implementing the Seqable interface.
You can make count work by implementing the Counted interface.
Example code:
(deftype Tuple [a b]
clojure.lang.Counted
(count [_] 2)
clojure.lang.Seqable
(seq [_] (list a b)))
(count (->Tuple 1 2)) ;=> 2
(empty? (->Tuple 1 2)) ;=> false
A more general solution for a new function would be either:
Creating a multimethod for your function. Now you need to write custom methods (via defmethod) for the supported types.
Creating a protocol that contains your function and making the types satisfy the protocol via extend-protocol or extend-type.
In either case you have the ability to create a default implementation and new implementations for new or existing types any time. Even during runtime!

Clojure basics: counting frequencies

I am learning Clojure, and I saw this bit of code online:
(count (filter #{42} coll))
And it does, as stated, count occurrences of the number 42 in coll. Is #{42} a function? The Clojure documentation on filter says that it should be, since the snippet works as advertised. I just have no idea how it works. If someone could clarify this for me, that would be great. My own solution to this same thing would have been:
(count (filter #(= %1 42) coll))
How come my filtering function has parenthesis and the snippet I found online has curly braces around the filtering function (#(...) vs. #{...})?
=> #{42}
#{42}
Defines a set...
=> (type #{42})
clojure.lang.PersistentHashSet
=> (supers (type #{42}))
#{clojure.lang.IHashEq java.lang.Object clojure.lang.IFn ...}
Interestingly the set implements IFn so you can treat it like a function. The behaviour of the function is "if this item exists in the set, return it".
=> (#{2 3} 3)
3
=> (#{2 3} 4)
nil
Other collections such as map and vector stand in as functions in a similar fashion, retrieving by key or index as appropriate.
=> ({:x 23 :y 26} :y)
26
=> ([5 7 9] 1)
7
Sweet, no? :-)
Yes, #{42} is a function,
because it's a set, and sets, amongst other capabilities, are
functions: they implement the clojure.lang.IFn interface.
Applied to any value in the set, they return it; applied to anything
else, they return nil.
So #{42} tests whether its argument is 42 (only nil and false are false, remember).
The Clojure way is to make everything a function that might usefully be one:
Sets work as a test for membership.
Maps work as key lookup.
Vectors work as index lookup.
Keywords work as lookup in the map argument.
This
often saves you a get,
allows you, as in the question, to pass naked data structures to higher order functions
such as filter and map, and
in the case of keywords, allows you to move transparently between maps and records
for holding your data.

Clojure get nested map value

So I'm used to having a nested array or map of settings in my applications. I tried setting one up in Clojure like this:
(def gridSettings
{:width 50
:height 50
:ground {:variations 25}
:water {:variations 25}
})
And I wondered if you know of a good way of retrieving a nested value? I tried writing
(:variations (:ground gridSettings))
Which works, but it's backwords and rather cumbersome, especially if I add a few levels.
That's what get-in does:
(get-in gridSettings [:ground :variations])
From the docstring:
clojure.core/get-in
([m ks] [m ks not-found])
Returns the value in a nested associative structure,
where ks is a sequence of keys. Returns nil if the key
is not present, or the not-found value if supplied.
You can use the thread-first macro:
(-> gridSettings :ground :variations)
I prefer -> over get-in except for two special cases:
When the keys are an arbitrary sequence determined at runtime.
When supplying a not-found value is useful.
Apart from what other answers has mentioned (get-in and -> macro), sometimes you want to fetch multiple values from a map (nested or not), in those cases de-structuring can be really helpful
(let [{{gv :variations} :ground
{wv :variations} :water} gridSettings]
[gv wv])
Maps are partial functions (as in not total). Thus, one can simply apply them as functions. Based on the map from the question:
(gridSettings :ground)
;=> {:variations 25}
The result is a map. So, it can be applied again, which results in a very similar (but not backwards) "syntax" as proposed in the question:
((gridSettings :ground) :variations)
;=>25

In clojure, why the type of an empty list is different from that of non-empty lists?

I want to judge if two values are of same type, but I found that the type of an empty list is clojure.lang.PersistentList$EmptyList rather than clojure.lang.PersistentList.
user=> (def la '())
#'user/la
user=> (def lb '(1 2))
#'user/lb
user=> (def t (map type [la lb]))
#'user/t
user=> t
(clojure.lang.PersistentList$EmptyList clojure.lang.PersistentList)
user=> (apply = t)
false
user=>
So, I'm wondering why is the type of an empty list different from that of non-empty lists and what's the correct way to tell if two things are of same type?
Don't rely on the concrete types of Clojure data structures. They are undocumented implementation details, and you have no guarantee that they won't change in future versions of Clojure.
It is much safer to rely on the abstractions (e.g. as defined by the IPersistentList or ISeq interfaces). These are much less likely to change in ways that might break your code (my understanding is that Rich Hickey is very big on backwards compatibility when it comes to abstractions. If you depend on a concrete implementation, I believe he would say it's your own fault if things break)
But even better, you should use functions in clojure.core such as seq? or list?, depending on exactly what it is you want to detect. Not only are these likely to maintain backwards compatibility for a long time, they also have a chance of working correctly on non-JVM versions of Clojure (e.g. ClojureScript).

How do I get core clojure functions to work with my defrecords

I have a defrecord called a bag. It behaves like a list of item to count. This is sometimes called a frequency or a census. I want to be able to do the following
(def b (bag/create [:k 1 :k2 3])
(keys bag)
=> (:k :k1)
I tried the following:
(defrecord MapBag [state]
Bag
(put-n [self item n]
(let [new-n (+ n (count self item))]
(MapBag. (assoc state item new-n))))
;... some stuff
java.util.Map
(getKeys [self] (keys state)) ;TODO TEST
Object
(toString [self]
(str ("Bag: " (:state self)))))
When I try to require it in a repl I get:
java.lang.ClassFormatError: Duplicate interface name in class file compile__stub/techne/bag/MapBag (bag.clj:12)
What is going on? How do I get a keys function on my bag? Also am I going about this the correct way by assuming clojure's keys function eventually calls getKeys on the map that is its argument?
Defrecord automatically makes sure that any record it defines participates in the ipersistentmap interface. So you can call keys on it without doing anything.
So you can define a record, and instantiate and call keys like this:
user> (defrecord rec [k1 k2])
user.rec
user> (def a-rec (rec. 1 2))
#'user/a-rec
user> (keys a-rec)
(:k1 :k2)
Your error message indicates that one of your declarations is duplicating an interface that defrecord gives you for free. I think it might actually be both.
Is there some reason why you cant just use a plain vanilla map for your purposes? With clojure, you often want to use plain vanilla data structures when you can.
Edit: if for whatever reason you don't want the ipersistentmap included, look into deftype.
Rob's answer is of course correct; I'm posting this one in response to the OP's comment on it -- perhaps it might be helpful in implementing the required functionality with deftype.
I have once written an implementation of a "default map" for Clojure, which acts just like a regular map except it returns a fixed default value when asked about a key not present inside it. The code is in this Gist.
I'm not sure if it will suit your use case directly, although you can use it to do things like
user> (:earth (assoc (DefaultMap. 0 {}) :earth 8000000000))
8000000000
user> (:mars (assoc (DefaultMap. 0 {}) :earth 8000000000))
0
More importantly, it should give you an idea of what's involved in writing this sort of thing with deftype.
Then again, it's based on clojure.core/emit-defrecord, so you might look at that part of Clojure's sources instead... It's doing a lot of things which you won't have to (because it's a function for preparing macro expansions -- there's lots of syntax-quoting and the like inside it which you have to strip away from it to use the code directly), but it is certainly the highest quality source of information possible. Here's a direct link to that point in the source for the 1.2.0 release of Clojure.
Update:
One more thing I realised might be important. If you rely on a special map-like type for implementing this sort of thing, the client might merge it into a regular map and lose the "defaulting" functionality (and indeed any other special functionality) in the process. As long as the "map-likeness" illusion maintained by your type is complete enough for it to be used as a regular map, passed to Clojure's standard function etc., I think there might not be a way around that.
So, at some level the client will probably have to know that there's some "magic" involved; if they get correct answers to queries like (:mars {...}) (with no :mars in the {...}), they'll have to remember not to merge this into a regular map (merge-ing the other way around would work fine).