Aggregating transducers with intermediate values - clojure

I am still trying to understand better how to work with transducers in clojure. Here, I am interested in applying aggregating transducers, such as the ones in https://github.com/cgrand/xforms, but reporting at each step the intermediate values of the computation.
For instance, the following expression
(sequence (x/into #{}) [1 2 3])
yields (#{1 2 3}), which is only the final value of the reduction. Now, I would be interested in an transducer xf-incremental that given something like
(sequence (comp xf-incremental (x/into #{})) [1 2 3])
yields (#{1} #{1 2} #{1 2 3}).
The reason why I am interested in this is that I want to report intermediate values of a metric that aggregates over the history of processed values.
Any idea how can I do something of the sort in a generic way?
EDIT: Think of (x/into #{}) as an arbitrary transducer that aggregates results. Better examples could be x/avg or (x/reduce +) where I would expect
(sequence (comp xf-incremental x/avg) [1 2 3])
(sequence (comp xf-incremental (x/reduce +)) [1 2 3])
to return (1 3/2 2) and (1 3 6) respectively.
EDIT 2: another way of phrasing this is that I want a transducer that performs a reducing function and returns the accumulator at each step, which also can reuse all the available transducers so I do not need to rewrite basic functionalities.

Solution using clojure.core/reductions
You don't need a transducer to perform the computation that you are asking for. The function you are looking for to see all the intermediate results of reduce is called reductions and you provide it with conj and an empty set as arguments:
(rest (reductions conj #{} [1 2 3]))
;; => (#{1} #{1 2} #{1 3 2})
rest removes the first empty set, because that was the output you requested in the original question.
The function that builds up the result here is conj, lets refer to it as a step function. A transducer is a function that takes a step function as input and returns a new step function as output. So if we want to combine reductions with a transducer, we can just apply the transducer to conj:
(def my-transducer (comp (filter odd?)
(take 4)))
(dedupe (reductions (my-transducer conj) #{} (range)))
;; => (#{} #{1} #{1 3} #{1 3 5} #{7 1 3 5})
dedupe is there just to remove elements that are equal to preceding elements. You can remove it if you don't want to do that. In that case you get the following, because that is how the filtering transducer works:
(reductions (my-transducer conj) #{} (range)))
;; => (#{} #{} #{1} #{1} #{1 3} #{1 3} #{1 3 5} #{1 3 5} #{7 1 3 5})
Transducer-based solution using net.cgrand.xforms/reductions
Apparently, there is also a transducer version of reductions in the xforms library, which is closer to your initial code:
(require '[net.cgrand.xforms :as xforms])
(rest (sequence (xforms/reductions conj #{}) [1 2 3]))
;; => (#{1} #{1 2} #{1 3 2})
This xforms/reductions transducer can be composed with other transducer using comp to for example filter odd numbers and taking the first four of them:
(sequence (comp (filter odd?)
(take 4)
(xforms/reductions conj #{}))
(range))
;; => (#{} #{1} #{1 3} #{1 3 5} #{7 1 3 5})
In this case, you don't need dedupe. It is also possible to use other step functions with xforms/reductions, e.g. +:
(sequence (comp (filter odd?)
(take 10)
(xforms/reductions + 0)
(filter #(< 7 %)))
(range))
;; => (9 16 25 36 49 64 81 100)

Related

Clojure map a set to another set directly?

user=> (map inc #{1 2 3})
(2 4 3)
user=> (into #{} (map inc #{1 2 3}))
#{4 3 2}
Is there a way to apply a function to a set and return a set directly?
A slightly more generic way to do this is to use empty:
(defn my-map [f c]
(into (empty c)
(map f c)))
This yields following results:
(my-map inc #{1 2 3}) ;; => #{2 3 4}
(my-map inc [1 2 3]) ;; => [2 3 4]
(my-map inc '(1 2 3)) ;; => (4 3 2)
It would work for other persistent collections as well.
As Alex said, fmap from algo.generic provides this function, although if you look at the source it's doing exactly the same as your code. I'd recommend just putting your function in a util namespace in your code, it's probably not worth pulling in a whole library for one function.
With Clojure 1.7.0 (still in beta) you can do this using a transducer:
(into #{} (map inc) #{1 2 3})

Clojure: Why is flatten "the wrong thing to use"

I've read this kind of thing a couple of times since I've started Clojure.
For instance, here: How to convert map to a sequence?
And in some tweet I don't remember exactly that was more or less saying "if you're using flatten you're probably doing it wrong".
I would like to know, what is wrong with flatten?
I think this is what they were talking about in the answer you linked:
so> ((comp flatten seq) {:a [1 2] :b [3 4]})
(:b 3 4 :a 1 2)
so> (apply concat {:a [1 2] :b [3 4]})
(:b [3 4] :a [1 2])
Flatten will remove the structure from the keys and values, which is probably not what you want. There are use cases where you do want to remove the structure of nested sequences, and flatten was written for such cases. But for destructuring a map, you usually do want to keep the internal sequences as is.
Anything flatten can't flatten, it ought to return intact. At the top level, it doesn't.
(flatten 8)
()
(flatten {1 2, 3 4})
()
If you think you've supplied a sequence, but you haven't, you'll get the effect of supplying an empty sequence. This is the sort of leg-breaker that most core functions take care to preclude. For example, (str nil) => "".
flatten ought to work like this:
(defn flatten [x]
(if (sequential? x)
((fn flat [y] (if (sequential? y) (mapcat flat y) [y])) x)
x))
(flatten 8)
;8
(flatten [{1 2, 3 4}])
;({1 2, 3 4})
(flatten [1 [2 [[3]] 4]])
;(1 2 3 4)
You can find Steve Miner's faster lazy version of this here.
Probability of "probably"
Listen to people who say "you're probably doing it wrong", but also do not forget they say "probably", because it all depends on the problem.
For example if your task is to flatten the map where you could care less what was the key what was the value, you just need an unstructured sequence of all, then by all means, use flatten (or apply concat).
The reason it causes a "suspicion" is the fact that you had / were given a map to begin with, hence whoever gave it to you meant a "key value" paired structure, and if you flatten it, you lose that intention, as well as flexibility and clarity.
Keep in mind
In case you are still not sure what to do with a map for you particular problem, have a for comprehension in mind, since you would have a full control on what to do with the map as you iterate of it:
create a vector?
;; can also be (apply vector {:a 34 :b 42}), but just to use "for" for all consistently
user=> (into [] (for [[k v] {:a 34 :b 42}] [k v]))
[[:a 34] [:b 42]]
create another map?
user=> (into {} (for [[k v] {:a 34 :b 42}] [k (inc v)]))
{:a 35, :b 43}
create a set?
user=> (into #{} (for [[k v] {:a 34 :b 42}] [k v]))
#{[:a 34] [:b 42]}
reverse keys and values?
user=> (into {} (for [[k v] {:a 34 :b 42}] [v k]))
{34 :a, 42 :b}

How to iterate over ArrayMap in clojure?

I am totally new to clojure (started learning yesterday) and functional programming so please excuse my ignorance. I've been trying to read a lot of the clojure documentation, but much of it is totally over my head.
I'm trying to iterate over an ArrayMap of this set up:
{city1 ([[0 0] [0 1] [1 1] [1 0]]), city2 ([[3 3] [3 4] [4 4] [4 3]]), city3 ([[10 10] [10 11] [11 11] [11 10]])}
(^hopefully that syntax is correct, that is what it looks like my terminal is printing)
where the city name is mapped to a vector of vectors that define the points that make up that city's borders. I need to compare all of these points with an outside point in order to determine if the outside point is in one of these cities and if so which city it is in.
I'm using the Ray Casting Algorithm detailed here to determine if an outside point is within a vector of vectors.
Maps actually implement the clojure.lang.ISeq interface which means that you can use all the higher-level sequence operations on them. The single elements are pairs of the form [key value], so, to find the first element that matches a predicate in-city? you could e.g. use some:
(some
(fn [[city-name city-points]] ;; the current entry of the map
(when (in-city? the-other-point city-points) ;; check the borders
city-name)) ;; return the name of a matching city
cities)
You might also use keep to find all elements that match the predicate but I guess there is no overlap between cities in your example.
Update: Let's back off a little bit, since working with sequences is fun. I'm not gonna dive into all the sequence types and just use vectors ([1 2 3 ...]) for examples.
Okay, for a start, let's access our vector:
(first [1 2 3]) ;; => 1
(rest [1 2 3]) ;; => [2 3]
(last [1 2 3]) ;; => 3
(nth [1 2 3] 1) ;; => 2
The great thing about functional programming is, that functions are just values which you can pass to other functions. For example, you might want to apply a function (let's say "add 2 to a number") to each element in a sequence. This can be done via map:
(map
(fn [x]
(+ x 2))
[1 2 3])
;; => [3 4 5]
If you haven't seen it yet, there is a shorthand for function values where % is the first parameter, %2 is the second, and so on:
(map #(+ % 2) [1 2 3]) ;; => [3 4 5]
This is concise and useful and you'll probably see it a lot in the wild. Of course, if your function has a name or is stored in a var (e.g. by using defn) you can use it directly:
(map pos? [-1 0 1]) ;; => [false false true]
Using the predicate like this does not make a lot of sense since you lose the actual values that produce the boolean result. How about the following?
(filter pos? [-1 0 1]) ;; => [1]
(remove pos? [-1 0 1]) ;; => [-1 0]
This selects or discards the values matching your predicate. Here, you should be able to see the connection to your city-border example: You want to find all the cities in a map that include a given point p. But maps are not sequences, are they? Indeed they are:
(seq {:a 0 :b 1}) ;; => [[:a 0] [:b 1]]
Oh my, the possibilities!
(map first {:a 0 :b 1}) ;; => [:a :b]
(filter #(pos? (second %)) {:a 0 :b 1}) ;; => [[:b 1]]
filter retrieves all the matching cities (and their coordinates) but since you are only interested in the names - which are stored as the first element of every pair - you have to extract it from every element, similarly to the following (simpler) example:
(map first (filter #(pos? (second %)) {:a 0 :b 1}))
:: => [:b]
There actually is a function that combines map and filter. It's called keep and return every non-nil value its predicate produces. You can thus check the first element of every pair and then return the second:
(keep
(fn [pair]
(when (pos? (second pair))
(first pair)))
{:a 0 b 1})
;; => [:b]
Everytime you see yourself using a lot of firsts and seconds, maybe a few rests inbetween, you should think of destructuring. It helps you access parts of values in an easy way and I'll not go into detail here but it can be used with sequences quite intuitively:
(keep
(fn [[a b]] ;; instead of the name 'pair' we give the value's shape!
(when (pos? b)
a))
{:a 0 :b 1})
;; => [:b]
If you're only interested in the first result you can, of course, directly access it and write something like (first (keep ...)). But, since this is a pretty common use case, you get some offered to you by Clojure. It's like keep but will not look beyond the first match. Let's dive into your city example whose solution should begin to make sense by now:
(some
(fn [[city-name city-points]]
(when (in-city? p city-points)
city-name))
all-cities)
So, I hope this can be useful to you.

clojure: create a lazy-seq containing another lazy-seq

I would like to create a lazy-seq containing another lazy-seq using clojure.
The data structure that I aready have is a lazy-seq of map and it looks like this:
({:a 1 :b 1})
Now I would like to put that lazy-seq into another one so that the result would be a lazy-seq of a lazy-seq of map:
(({:a 1 :b 1}))
Does anyone know how to do this? Any help would be appreciated
Regards,
Here is an example of creating a list containing a list of maps:
=> (list (list {:a 1 :b 1}))
(({:a 1, :b 1}))
It's not lazy, but you can make both lists lazy with lazy-seq macro:
=> (lazy-seq (list (lazy-seq (list {:a 1 :b 1}))))
or the same code with -> macro:
=> (-> {:a 1 :b 1} list lazy-seq list lazy-seq)
Actually, if you'll replace lists here with vectors you'll get the same result:
=> (lazy-seq [(lazy-seq [{:a 1 :b 1}])])
(({:a 1, :b 1}))
I'm not sure what you're trying to do and why do you want both lists to be lazy. So, provide better explanation if you want further help.
generally, there's nothing special about having a lazy-seq containing many lazy-seq's, so i dont understand exactly what it is you are really after.
you could always do
(map list '({:a 1 :b 1})) ;; gives (({:a 1, :b 1}))
we can even verify that it maintains laziness:
(def a
(concat
(take 5 (repeat {:a 1 :b 2}))
(lazy-seq
(throw (Exception. "too eager")))))
(println (take 5 (map list a))) ;; works fine
(println (take 6 (map list a))) ;; throws an exception

Filter a set in Clojure clojure.set/select vs. clojure.core/filter

I would like to filter a set, something like:
(filter-set even? #{1 2 3 4 5})
; => #{2 4}
If I use clojure.core/filter I get a seq which is not a set:
(filter even? #{1 2 3 4 5})
; => (2 4)
So the best I came with is:
(set (filter even? #{1 2 3 4 5}))
But I don't like it, doesn't look optimal to go from set to list back to set. What would be the Clojurian way for this ?
UPDATE
I did the following to compare #A.Webb and #Beyamor approaches. Interestingly, both have almost identical performance, but clojure.set/select is slightly better.
(defn set-bench []
(let [big-set (set (take 1000000 (iterate (fn [x] (int (rand 1000000000))) 1)))]
(time (set (filter even? big-set))) ; "Elapsed time: 422.989 msecs"
(time (clojure.set/select even? big-set))) ; "Elapsed time: 345.287 msecs"
nil) ; don't break my REPL !
clojure.set is a handy API for common set operations.
In this case, clojure.set/select is a set-specific filter. It works by dissociating elements which don't meet the predicate from the given set.
(require 'clojure.set)
(clojure.set/select even? #{1 2 3 4 5})
; => #{2 4}