Clojure - map or set with fixed value->key function? - clojure

I have quite a few records in my program that I end up putting in a map using one of their fields as key. For example
(defrecord Foo. [id afield anotherfield])
And then I'd add that to a map with the id as key. This is all perfectly doable, but a bit tedious, e.g. when adding a new instance of Foo to a map I need to extract the key first. I'm wondering if somewhere in clojure.core a data structure to do this already exist?
Basically I'd like to construct a set of Foo's by giving the set a value to key mapping function (i.e. :id) at construction time of the set, and then have that used when I want to add/find/remove/... a value.
So instead of:
(assoc my-map (:id a-foo) a-foo))
I could do, say:
(conj my-set a-foo)
And more interestingly, merge and merge-with support.

Sounds like a simple case where you would want to use a function to eliminate the "tedious" part.
e.g.
(defn my-assoc [some-map some-record]
(assoc some-map (:id some-record) some-record))
If you are doing this a lot and need different key functions, you might want to try a higher order function:
(defn my-assoc-builder [id-function]
(fn [some-map some-record]
(assoc some-map (id-function some-record) some-record)))
(def my-assoc-by-id (my-assoc-builder :id))
Finally, note that you could do the same with a macro. However a useful general rule with macros is not to use them unless you really need them. Thus in this case, since it can be done easily with a function, I'd recommend sticking to functions.

Well as (AFAIK) there is no such datasctructure (and even if there were, it would probably do same tedious stuff in the background), you can build upon your record fns for desired operations (which will in background do same tedious stuff that needs to be done).
Basically I'd like to construct a set
of Foo's by giving the set a value to
key mapping function (i.e. :id) at
construction time of the set, and then
have that used when I want to
add/find/remove/...
Didn't get this.. If you are holding your records in a set and then want to e.g. find one by id you would have to do even more tidious work looking at every record until you find the right one.. that's O(n), and when using map you will have O(1).
Did I use tedious too much? My suggestion is use map and do some tedious stuff.. It's all 1s and 0s after all :)

Related

How to solve "stateful problems" in Clojure?

I'm new to clojure and I'm having trouble to understand some concepts, specially pure functions and immutability.
One thing that I still can't comprehend is how you solve a problem like this in clojure:
A simple console application with a login method, where the user can't try to login more than 3 times in a 1 minute interval.
In C# for example I could add the UserId and a timestamp to a collection every time the user tries to login, then I would check if there has been more 3 than attempts in the last minute.
How would I do that in Clojure considering that I can't alter my collection ?
This is not a pratical question (although some code examples would be welcomed), I want to understand how you approach a problem like this.
You don't alter objects in most cases, you create new versions of the old objects:
(loop [attempt-dates []]
(if (login-is-correct)
(login)
(recur (conj attempt-dates (current-date-stamp)))))
In this case I'm using loop. Whatever I give to recur will be passed to the next iteration of loop. I'm creating a new list that contains the new stamp when I write (conj attempt-dates (current-date-stamp)), then that new list is passed to the next iteration of loop.
This is how it's done in most cases. Instead of thinking about altering the object, think about creating a transformed copy of the object and passing the copy along.
If you really truly do need mutable state though, you can use a mutable atom to hold an immutable state:
(def mut-state (atom []))
(swap! mut-state conj 1)
(println #mut-state) ; Prints [1]
[] is still immutable here, the new version is just replacing the old version in the mutable atom container.
Unless you're needing to communicate with UI callbacks or something similar though, you usually don't actually need mutability. Practice using loop/recur and reduce instead.

Save state on clojure

I'm starting on functional programming now and I'm getting very crazy of working without variables.
Every tutorial that I read says that it isn't cool redefine a variable but I don't know how to solve my actual problem without saving the state on a variable.
For example: I'm working on a API and I want to keep the values throught the requests. Lets say that I have a end-point that add a person, and I have an list of persons, I would like to redefine or alter the value of my persons list adding the new person. How can I do this?
Is there ok to use var-set, alter-var-root or conj!?
(for the api i'm using compojure-api and each person would be a Hash)
Clojure differentiates values from identity. You can use atoms to manage the state in your compojure application.
(def persons (atom [])) ;; init persons as empty vector
(swap! persons #(conj % {:name "John Doe"})) ;; append new value
You can find more in docs:
https://clojure.org/reference/atoms
https://clojure.org/reference/data_structures
https://clojuredocs.org/clojure.core/atom
You'll likely need a mutable state somewhere in a large application, but one isn't necessary in all cases.
I'm not familiar with compojure, but here's a small example using immutability that might be able to give you a better idea:
(loop [requests []
people []
(let [request (receive-request)]
; Use requests/people
; Then loop again with updated lists
(recur (conj requests request)
(conj people (make-person request))))])
I'm using hypothetical receive-request and make-person functions here.
The loop creates a couple bindings, and updates them at each recur. This is an easy way to "redefine a variable". This is comparable to pure recursion, where you don't mutate the end result at any point, you just change what value gets passed onto the next iteration.
Of course, this is super simple, and impractical since you're just receiving one request at a time. If you're receiving requests from multiple threads at the same time, this would be a justifiable case for an atom:
(defn listen [result-atom]
(Thread.
(fn []
(while true ; Infinite listener for simplicity
(let [request (receive-request)]
(swap! result-atom #(conj % (make-person request))))))))
(defn listen-all []
(let [result-atom (atom [])]
(listen result-atom)
(listen result-atom)))
; result-atom now holds an updating list of people that you can do stuff with
swap! mutates the atom by conjoining onto the list it holds. The list inside the atom isn't mutated, it was just replaced by a modified version of itself. Anyone holding onto a reference to the old list of people would be unaffected by the call to swap!.
A better approach would be to use a library like core/async, but that's getting away from the question.
The point is, you may need to use a mutable variable somewhere, but the need for them is a lot less than you're used to. In most cases, almost everything can be done using immutability like in the first example.

Non-map collection predicate?

Is there a Clojure predicate that means "collection, but not a map"?
Such a predicate is/would be valuable because there are many operations that can be performed on all collections except maps. For example (apply + ...) or (reduce + ...) can be used with vectors, lists, lazy sequences, and sets, but not maps, since the elements of a map in such a context end up as clojure.lang.MapEntrys. It's sets and maps that cause the problem with those predicates that I know of:
sequential? is true for vectors, lists, and lazy sequences, but it's false for both maps and sets. (seq? is similar but it's false for vectors.)
coll? and seqable? are true for both sets and maps, as well as for every other kind of collection I can think of.
Of course I can define such a predicate, e.g. like this:
(defn coll-but-not-map?
[xs]
(and (coll? xs)
(not (map? xs))))
or like this:
(defn sequential-or-set?
[xs]
(or (sequential? xs)
(set? xs)))
I'm wondering whether there's a built-in clojure.core (or contributed library) predicate that does the same thing.
This question is related to this one and this one but isn't answered by their answers. (If my question is a duplicate of one I haven't found, I'm happy to have it marked as such.)
For example (apply + ...) or (reduce + ...) can be used with vectors, lists, lazy sequences, and sets, but not maps
This is nothing about collections, I think. In your case, you have a problem not with general apply or reduce application, but with particular + function. (apply + [:a :b :c]) won't work either even though we are using a vector here.
My point is that you are trying to solve very domain specific problem, that's why there is no generic solution in Clojure itself. So use any proper predicate you can think of.
There's nothing that I've found or used that fits this description. I think your own predicate function is clear, simple, and easy to include in your code if you find it useful.
Maybe you are writing code that has to be very generic, but it's usually the case that a function both accepts and returns a consistent type of data. There are cases where this is not true, but it's usually the case that if a function can be the jack of all trades, it's doing too much.
Using your example -- it makes sense to add a vector of numbers, a list of numbers, or a set of numbers. But a map of numbers? It doesn't make sense, unless maybe it's the values contained in the map, and in this case, it's not reasonable for a single piece of code to be expected to handle adding both sequential data and associative data. The function should be handed something it expects, and it should return something consistent. This kind of reminds me of Stuart Sierra's blog post discussing consistency in this regard. Without more information I'm only guessing as to your use case, but it's something to consider.

Why clojure collections don't implement ISeq interface directly?

Every collection in clojure is said to be "sequable" but only list and cons are actually seqs:
user> (seq? {:a 1 :b 2})
false
user> (seq? [1 2 3])
false
All other seq functions first convert a collection to a sequence and only then operate on it.
user> (class (rest {:a 1 :b 2}))
clojure.lang.PersistentArrayMap$Seq
I cannot do things like:
user> (:b (rest {:a 1 :b 2}))
nil
user> (:b (filter #(-> % val (= 1)) {:a 1 :b 1 :c 2}))
nil
and have to coerce back to concrete data type. This looks like bad design to me, but most likely I just don't get it as yet.
So, why clojure collections don't implement ISeq interface directly and all seq functions don't return an object of the same class as the input object?
This has been discussed on the Clojure google group; see for example the thread map semantics from February of this year. I'll take the liberty of reusing some of the points I made in my message to that thread below while adding several new ones.
Before I go on to explain why I think the "separate seq" design is the correct one, I would like to point out that a natural solution for the situations where you'd really want to have an output similar to the input without being explicit about it exists in the form of the function fmap from the contrib library algo.generic. (I don't think it's a good idea to use it by default, however, for the same reasons for which the core library design is a good one.)
Overview
The key observation, I believe, is that the sequence operations like map, filter etc. conceptually divide into three separate concerns:
some way of iterating over their input;
applying a function to each element of the input;
producing an output.
Clearly 2. is unproblematic if we can deal with 1. and 3. So let's have a look at those.
Iteration
For 1., consider that the simplest and most performant way to iterate over a collection typically does not involve allocating intermediate results of the same abstract type as the collection. Mapping a function over a chunked seq over a vector is likely to be much more performant than mapping a function over a seq producing "view vectors" (using subvec) for each call to next; the latter, however, is the best we can do performance-wise for next on Clojure-style vectors (even in the presence of RRB trees, which are great when we need a proper subvector / vector slice operation to implement an interesting algorithm, but make traversals terrifying slow if we used them to implement next).
In Clojure, specialized seq types maintain traversal state and extra functionality such as (1) a node stack for sorted maps and sets (apart from better performance, this has better big-O complexity than traversals using dissoc / disj!), (2) current index + logic for wrapping leaf arrays in chunks for vectors, (3) a traversal "continuation" for hash maps. Traversing a collection through an object like this is simply faster than any attempt at traversing through subvec / dissoc / disj could be.
Suppose, however, that we're willing to accept the performance hit when mapping a function over a vector. Well, let's try filtering now:
(->> some-vector (map f) (filter p?))
There's a problem here -- there's no good way to remove elements from a vector. (Again, RRB trees could help in theory, but in practice all the RRB slicing and concatenating involved in producing "real vector" for filtering operations would absolutely destroy performance.)
Here's a similar problem. Consider this pipeline:
(->> some-sorted-set (filter p?) (map f) (take n))
Here we benefit from laziness (or rather, from the ability to stop filtering and mapping early; there's a point involving reducers to be made here, see below). Clearly take could be reordered with map, but not with filter.
The point is that if it's ok for filter to convert to seq implicitly, then it is also ok for map; and similar arguments can be made for other sequence functions. Once we've made the argument for all -- or nearly all -- of them, it becomes clear that it also makes sense for seq to return specialized seq objects.
Incidentally, filtering or mapping a function over a collection without producing a similar collection as a result is very useful. For example, often we care only about the result of reducing the sequence produced by a pipeline of transformations to some value or about calling a function for side effect at each element. For these scenarios, there is nothing whatsoever to be gained by maintaining the input type and quite a lot to be lost in performance.
Producing an output
As noted above, we do not always want to produce an output of the same type as the input. When we do, however, often the best way to do so is to do the equivalent of pouring a seq over the input into an empty output collection.
In fact, there is absolutely no way to do better for maps and sets. The fundamental reason is that for sets of cardinality greater than 1 there is no way to predict the cardinality of the output of mapping a function over a set, since the function can "glue together" (produce the same outputs for) arbitrary inputs.
Additionally, for sorted maps and sets there is no guarantee that the input set's comparator will be able to deal with outputs from an arbitrary function.
So, if in many cases there is no way to, say, map significantly better than by doing a seq and an into separately, and considering how both seq and into make useful primitives in their own right, Clojure makes the choice of exposing the useful primitives and letting users compose them. This lets us map and into to produce a set from a set, while leaving us the freedom to not go on to the into stage when there is no value to be gained by producing a set (or another collection type, as the case may be).
Not all is seq; or, consider reducers
Some of the problems with using the collection types themselves when mapping, filtering etc. don't apply when using reducers.
The key difference between reducers and seqs is that the intermediate objects produced by clojure.core.reducers/map and friends only produce "descriptor" objects that maintain information on what computations need to be performed in the event that the reducer is actually reduced. Thus, individual stages of the computation can be merged.
This allows us to do things like
(require '[clojure.core.reducers :as r])
(->> some-set (r/map f) (r/filter p?) (into #{}))
Of course we still need to be explicit about our (into #{}), but this is just a way of saying "the reducers pipeline ends here; please produce the result in the form of a set". We could also ask for a different collection type (a vector of results perhaps; note that mapping f over a set may well produce duplicate results and we may in some situations wish to preserve them) or a scalar value ((reduce + 0)).
Summary
The main points are these:
the fastest way to iterate over a collection typically doesn't involve produce intermediate results similar to the input;
seq uses the fastest way to iterate;
the best approach to transforming a set by mapping or filtering involves using a seq-style operation, because we want to iterate very fast while accumulating an output;
thus seq makes a great primitive;
map and filter, in their choice to deal with seqs, depending on the scenario, may avoid performance penalties without upsides, benefit from laziness etc., yet can still be used to produce a collection result with into;
thus they too make great primitives.
Some of these points may not apply to a statically typed language, but of course Clojure is dynamic. Additionally, when we do want to a return that matches input type, we're simply forced to be explicit about it and that, in itself, may be viewed as a good thing.
Sequences are a logical list abstraction. They provide access to a (stable) ordered sequence of values. They are implemented as views over collections (except for lists where the concrete interface matches the logical interface). The sequence (view) is a separate data structure that refers into the collection to provide the logical abstraction.
Sequence functions (map, filter, etc) take a "seqable" thing (something which can produce a sequence), call seq on it to produce the sequence, and then operate on that sequence, returning a new sequence. It is up to you whether you need to or how to re-collect that sequence back into a concrete collection. While vectors and lists are ordered, sets and maps are not and thus sequences over these data structures must compute and retain the order outside the collection.
Specialized functions like mapv, filterv, reduce-kv allow you to stay "in the collection" when you know you want the operation to return a collection at the end instead of sequence.
Seqs are ordered structures, whereas maps and sets are unordered. Two maps that are equal in value may have a different internal ordering. For example:
user=> (seq (array-map :a 1 :b 2))
([:a 1] [:b 2])
user=> (seq (array-map :b 2 :a 1))
([:b 2] [:a 1])
It makes no sense to ask for the rest of a map, because it's not a sequential structure. The same goes for a set.
So what about vectors? They're sequentially ordered, so we could potentially map across a vector, and indeed there is such a function: mapv.
You may well ask: why is this not implicit? If I pass a vector to map, why doesn't it return a vector?
Well, first that would mean making an exception for ordered structures like vectors, and Clojure isn't big on making exceptions.
But more importantly you'd lose one of the most useful properties of seqs: laziness. Chaining together seq functions, such as map and filter is a very common operation, and without laziness this would be much less performant and far more memory-intensive.
The collection classes follow a factory pattern i.e instead of implementing ISeq they implement Sequable i.e you can create a ISeq from the collection but the collection itself is not an ISeq.
Now even if these collections implemented ISeq directly I am not sure how that would solve your problem of having general purpose sequence functions that would return the original object, as that would not make sense at all as these general purpose functions are supposed to work on ISeq, they have no idea about which object gave them this ISeq
Example in java:
interface ISeq {
....
}
class A implements ISeq {
}
class B implements ISeq {
}
static class Helpers {
/*
Filter can only work with ISeq, that's what makes it general purpose.
There is no way it could return A or B objects.
*/
public static ISeq filter(ISeq coll, ...) { }
...
}

How can I watch a subset of a clojure tree (Hash Map) for changes?

I would like to watch for changes to different parts of a Clojure Hash map (accessed via a STM ref), which forms quite a large tree, and on changes to those parts I would like to invoke some registered listeners. How can I do this in clojure as I understand that "add-watch" only works on an entire reference?
Since Clojure maps are immutable, there isn't really such a thing a a change to a single part of a tree from a conceptual perspective.
I can see a couple of good options however:
Add a watch to the entire tree, but test whether the particular part you are interested in has changed. This should be quite quick and easy to test (use "get-in" to look up the right part of the tree)
Marshall all changes to the tree through a library of helper functions, which can intercept the kind of changes you are interested in.
I, too, would watch the whole tree and check subsets with get-in. You can quickly test whether the subtree has been changed by using an identical? test against the previous state. Something like
(defn change-tester [tree path]
(let [orig (get-in tree path)]
(fn [tree]
(not (identical? (get-in tree path) orig)))))
I don't use watchers very often so I don't know the syntax, but you could attach the above function somehow, I'm sure.
Clojure maps are immutable which means they are also thread safe which is a good thing. When you modify one with 'assoc' or similar you are creating a new copy in which your changed values are present. (Note that a full copy isn't made, but rather an efficient technique is employed to create a copy.)
I think perhaps the best way to do what you want is to create your own data structure, because essentially what you are asking for is a mutable HashMap as in Java, but not a Clojure Map.
You could create a wrapper around an existing Java HashMap which overrides the 'put' and 'putAll' methods so that you can detect what's being changed. If you have a HashMap within a HashMap you will want the sub HashMap to be of your new type as well so that you can detect changes at any level.
You might call it something like 'WatchfulHashMap'. Then, you will want to create an instance of this new HashMap like:
(def m (ref (WatchfulHashMap.)))
Thus making a single instance of it modifiable from anywhere in your app.