Clojure getting highest value from zipmap - clojure

So I've got my proposed zip map here and it works perfectly. As you can see I've got the data loading.
That is what it looks like in the repl which is perfect.
And right here is the map
:Year 2020, :Day 27, :January 59, :February 38
:Year 2020, :Day 28, :January 41, :February 57
:Year 2020, :Day 29, :January 56, :February 51
:Year 2020, :Day 31, :January 94, :February -999
:Year 2020, :Day 30, :January 76, :February -999
(map [:Day :Month
Bear in mind this is just a snippet of the code I've done. How would you propose I find the highest value day in January? And by highest I mean by the number next to the months
(into(sorted-map-by >(fn [:January]))Ha)
I tried this to no success, The "Ha" at the end is just the name of the function where I am initialising the zipmap and using io/reader to read the file

I would use max-key and reduce:
(def data [{:Year 2020, :Day 27, :January 59, :February 38}
{:Year 2020, :Day 28, :January 41, :February 57}
{:Year 2020, :Day 29, :January 56, :February 51}
{:Year 2020, :Day 31, :January 94, :February -999}
{:Year 2020, :Day 30, :January 76, :February -999}])
(reduce (partial max-key :January) data)
;; => {:Year 2020, :Day 31, :January 94, :February -999}
(:Day (reduce (partial max-key :January) data))
;; => 31

Not sure how your exact datastructure looks like, but assuming it's a vector of maps you can do something like this:
(def data
[{:Year 2020, :Day 27, :January 59, :February 38}
{:Year 2020, :Day 28, :January 41, :February 57}
{:Year 2020, :Day 29, :January 56, :February 51}
{:Year 2020, :Day 31, :January 94, :February -999}
{:Year 2020, :Day 30, :January 76, :February -999}])
(->> data
(sort-by :January)
last
:January)
;; => 94
Sorting the vector by using a keyword as a function to look up its value in the map, then taking the vector with the highest value for January, then get the value belonging to the key :January from that vector. Let me know if your data structure looks a bit different.

#Rulle's answer is very good.
Without max-key it would be:
(def data [{:Year 2020, :Day 27, :January 59, :February 38}
{:Year 2020, :Day 28, :January 41, :February 57}
{:Year 2020, :Day 29, :January 56, :February 51}
{:Year 2020, :Day 31, :January 94, :February -999}
{:Year 2020, :Day 30, :January 76, :February -999}])
(defn seq-max [seq greater key]
(reduce (fn [a b] (if (greater (key a) (key b)) a b)) seq))
;; if no key is wanted, choose the function `identity` as key!
;; e.g.
;; (seq-max (map :January data) > identity)
;; => 94
(:Day (seq-max data > :January)) ;; => 31

Related

clojure:why (some #(and ... doesn't return first false item

Sorry for such a basic question, but I can't figure out why this function works. I'm doing the "Clojure for the brave and true" guide, and happened upon this collection:
(def food-journal
[{:month 1 :day 1 :human 5.3 :critter 2.3}
{:month 1 :day 2 :human 5.1 :critter 2.0}
{:month 2 :day 1 :human 4.9 :critter 2.1}
{:month 2 :day 2 :human 5.0 :critter 2.5}
{:month 3 :day 1 :human 4.2 :critter 3.3}
{:month 3 :day 2 :human 4.0 :critter 3.8}
{:month 4 :day 1 :human 3.7 :critter 3.9}
{:month 4 :day 2 :human 3.7 :critter 3.6}])
and the use of this function to get the first map to have key :critter with a value above 3.
(some #(and (> (:critter %) 3) %) food-journal)
What I can't understand is the use of (and ), which in my opinion should return the first false value returned from the inner expression. That is, it should return the first map since that map's :critter isn't greater than 3.
some evaluates the predicate function on each element in food-journal until the predicate function returns a logical true value.
Each item (such as {:month 1 :day 1 :human 5.3 :critter 2.3}) in the collection happen to evaluate to true on their own.
So, the predicate function need to yield false for all items in the collection where :critter is not > 3.
and evaluates to true iff both the element itself AND :critter > 3 evaluates to true.
The confusing part here is that there is a logical AND between the element itself, and the boolean value from the greater than comparison.
So, an or instead would always be true, thus always returning the first element, thereby ignoring the greater than test.
The key here is how some is using the predicate function.
Returns the first logical true value of (pred x) for any x in coll,
else nil.
So when and returns non-true, some just keeps looking. The reason for the % as the final and form is so that some returns the actual item that matched, rather than the value of (> (:critter %) 3).

Complex data manipulation in Clojure

I'm working on a personal market analysis project. I've got a data structure representing all the recent turning points in the market, that looks like this:
[{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}
{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}
{:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}
{:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}
{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}
{:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}
{:low 1.117575, :time "2016-08-02T06:00:00.000000Z"}
{:low 1.117135, :time "2016-08-02T04:30:00.000000Z"}
{:low 1.11624, :time "2016-08-02T02:00:00.000000Z"}
{:low 1.115895, :time "2016-08-01T21:30:00.000000Z"}
{:low 1.11552, :time "2016-08-01T11:45:00.000000Z"}
{:low 1.11049, :time "2016-07-29T12:15:00.000000Z"}
{:low 1.108825, :time "2016-07-29T08:30:00.000000Z"}
{:low 1.10839, :time "2016-07-29T08:00:00.000000Z"}
{:low 1.10744, :time "2016-07-29T05:45:00.000000Z"}
{:low 1.10716, :time "2016-07-28T19:30:00.000000Z"}
{:low 1.10705, :time "2016-07-28T18:45:00.000000Z"}
{:low 1.106875, :time "2016-07-28T18:00:00.000000Z"}
{:low 1.10641, :time "2016-07-28T05:45:00.000000Z"}
{:low 1.10591, :time "2016-07-28T01:45:00.000000Z"}
{:low 1.10579, :time "2016-07-27T23:15:00.000000Z"}
{:low 1.105275, :time "2016-07-27T22:00:00.000000Z"}
{:low 1.096135, :time "2016-07-27T18:00:00.000000Z"}]
Conceptually, I want to match up :high/:low pairs, work out the price range (high-low) and midpoint (average of high & low), but I don't want every possible pair to be generated.
What I want to do is start from the 1st item in the collection {:high 1.121455, :time "2016-08-03T05:15:00.000000Z"} and walk "down" through the remainder of the collection, creating a pair with every :low item UNTIL I hit the next :high item. Once I hit that next :high item, I'm not interested in any further pairs. In this case, there's only a single pair created, which is the :high and the 1st :low - I stop there because the next (3rd) item is a :high. The 1 generated record should look like {:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}
Next, I'd move onto the 2nd item in the collection {:low 1.12109, :time "2016-08-03T05:15:00.000000Z"} and walk "down" through the remainder of the collection, creating a pair with every :high item UNTIL I hit the next :low item. In this case, I get 5 new records generated, being the :low and the next 5 :high items which are all consecutive; the first of these 5 records would look like
{:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
the second of these 5 records would look like
{:price-range 0.000835, :midpoint 1.1215075, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}]}
and so on.
After that, I get a :low so I stop there.
Then I'd move onto the 3rd item {:high 1.12173, :time "2016-08-03T04:30:00.000000Z"} and walk "down" creating pairs with every :low UNTIL I hit the next :high. In this case, I get 0 pairs generated, because the :high is followed immediately by another :high. Same for the next 3 :high items, which are all followed immediately by another :high
Next I get to the 7th item {:high 1.12338, :time "2016-08-02T18:15:00.000000Z"} and that should generate a pair with each of the following 20 :low items.
My generated result would be a list of all the pairs created:
[{:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}
{:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
...
If I was implementing this using something like Python, I'd probably use a couple of nested loops, use a break to exit the inner loop when I stopped seeing :highs to pair with my :low and vice-versa, and accumulate all the generated records into an array as I traversed the 2 loops. I just can't work out a good way to attack it using Clojure...
Any ideas?
first of all you can rephrase this the following way:
you have to find all the boundary points, where :high is followed by :low, or vice versa
you need to take the item before the bound, and make something with it and every item after bound, but until the next switching bound.
for the simplicity let's use the following data model:
(def data0 [{:a 1} {:b 2} {:b 3} {:b 4} {:a 5} {:a 6} {:a 7}])
the first part can be achieved by using partition-by function, that splits the input collection every time the function changes it's value for the processed item:
user> (def step1 (partition-by (comp boolean :a) data0))
#'user/step1
user> step1
(({:a 1}) ({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7}))
now you need to take every two of these groups and manipulate them. the groups should be like this:
[({:a 1}) ({:b 2} {:b 3} {:b 4})]
[({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7})]
this is achieved by the partition function:
user> (def step2 (partition 2 1 step1))
#'user/step2
user> step2
((({:a 1}) ({:b 2} {:b 3} {:b 4}))
(({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7})))
you have to do something for every pair of groups. You could do it with map:
user> (def step3 (map (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))
step2))
#'user/step3
user> step3
(([{:a 1} {:b 2}] [{:a 1} {:b 3}] [{:a 1} {:b 4}])
([{:b 4} {:a 5}] [{:b 4} {:a 6}] [{:b 4} {:a 7}]))
but since you need the concatenated list, rather then the grouped one, you would want to use mapcat instead of map:
user> (def step3 (mapcat (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))
step2))
#'user/step3
user> step3
([{:a 1} {:b 2}]
[{:a 1} {:b 3}]
[{:a 1} {:b 4}]
[{:b 4} {:a 5}]
[{:b 4} {:a 6}]
[{:b 4} {:a 7}])
that's the result we want (it almost is, since we just generate vectors, instead of maps).
now you could prettify it with the threading macro:
(->> data0
(partition-by (comp boolean :a))
(partition 2 1)
(mapcat (fn [[lbounds rbounds]]
(map #(vector (last lbounds) %)
rbounds))))
which gives you exactly the same result.
applied to your data it would look almost the same (with another result generating fn)
user> (defn hi-or-lo [item]
(item :high (item :low)))
#'user/hi-or-lo
user>
(->> data
(partition-by (comp boolean :high))
(partition 2 1)
(mapcat (fn [[lbounds rbounds]]
(let [left-bound (last lbounds)
left-val (hi-or-lo left-bound)]
(map #(let [right-val (hi-or-lo %)
diff (Math/abs (- right-val left-val))]
{:extremes [left-bound %]
:price-range diff
:midpoint (+ (min right-val left-val)
(/ diff 2))})
rbounds))))
(clojure.pprint/pprint))
it prints the following:
({:extremes
[{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}],
:price-range 3.6500000000017074E-4,
:midpoint 1.1212725}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}],
:price-range 6.399999999999739E-4,
:midpoint 1.12141}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}],
:price-range 8.350000000001412E-4,
:midpoint 1.1215074999999999}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}],
:price-range 0.001060000000000061,
:midpoint 1.12162}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}],
:price-range 0.0016400000000000858,
:midpoint 1.12191}
{:extremes
[{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}],
:price-range 0.0022900000000001253,
:midpoint 1.1222349999999999}
{:extremes
[{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}],
:price-range 0.004164999999999974,
:midpoint 1.1212975}
{:extremes
[{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
{:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}],
:price-range 0.004625000000000101,
:midpoint 1.1210675}
...
As an answer the question about "complex data manipulation", i would advice you to look through all the collections' manipulating functions from the clojure core, and then try to decompose any task to the application of those. There are not so many cases when you need something beyond them.

How do I remove certain vector objects from a list in Clojure?

I have variable test in Clojure like this:
( def test '([:circle {:cx 428, :cy 245, :r 32.2490309931942, :fill red}] [circle] [:line {:x1 461, :y1 222, :x2 365, :y2 163}] [:line {:x1 407, :y1 102, :x2 377, :y2 211}] [line]))
I want to remove the [line] and [circle] objects from it and for it to look like this:
([:circle {:cx 428, :cy 245, :r 32.2490309931942, :fill red}] [:line {:x1 461, :y1 222, :x2 365, :y2 163}] [:line {:x1 407, :y1 102, :x2 377, :y2 211}] )
Is there an easy way to do this in Clojure?
I've looked at this thread How can I remove an item from a sequence in Clojure?
and remove() but I still don't have it. that thread shows:
(remove #{:foo} #{:foo :bar}) ; => (:bar)
(remove #{:foo} [:foo :bar]) ; => (:bar)
(remove #{:foo} (list :foo :bar)) ; => (:bar)
but for me I have something more like:
(remove #????? ([:foo :bar] [foo] [bar]))
and I want to end up with ([:foo :bar]) only.
From the documentation of remove:
(remove pred) (remove pred coll)
Returns a lazy sequence of the items in coll for which (pred item) returns false.
Thus you need to provide a predicate that does so, for example to remove [circle]:
#(= '[circle] %)
This is a (anonymous) function that test whether it's argument is (value) equal to the vector [circle].
You can, of course, also generalize this to remove all one element vectors:
#(and (vector? %) (= 1 (.length %)))
Or remove every vector that contains not at least a keyword:
#(and (vector? %) (not-any? keyword? %))
I hope you get the picture :)
in this exact case you would probably need something like this:
(remove (comp symbol? first) test)
output:
([:circle {:cx 428, :cy 245, :r 32.2490309931942, :fill red}]
[:line {:x1 461, :y1 222, :x2 365, :y2 163}]
[:line {:x1 407, :y1 102, :x2 377, :y2 211}])
as you want to remove all vectors whose first value is symbol.
Of course if you want to remove all the vectors with the only value which is a symbol, you should be more specific:
(remove #(and (vector? %)
(== 1 (count %))
(symbol? (first %)))
test)
You could also invert your logic, not removing unneeded data, but keeping needed one:
(filter (comp keyword? first) test)
output:
([:circle {:cx 428, :cy 245, :r 32.2490309931942, :fill red}]
[:line {:x1 461, :y1 222, :x2 365, :y2 163}]
[:line {:x1 407, :y1 102, :x2 377, :y2 211}])
If you want to write your code just like the examples you have provided from the thread using a set of elements as the predicate for your remove (or some other functions), you only need to put the elements you want to get rid of inside of the set (just as you almost have done), but you need to be aware of symbols which need to be quoted.
So the first possible error-cause in your last example is not quoting the list of your vectors:
(remove #????? ([:foo :bar] [foo] [bar])) ;; this list can not be evaluated and
;; will cause an error
(remove #????? '([:foo :bar] [foo] [bar])) ;; make it a varied list by quoting it
Now you need also to quot the symbols inside of the #{} as predicate:
(remove #{['foo] ['bar]} '([:foo :bar] [foo] [bar])) ;; => ([:foo :bar])
Same rules apply also for your first example:
(remove #{['line] ['circle]} test)
;;=> ([:circle {:cx 428, :cy 245, :r 32.2490309931942, :fill red}]
;; [:line {:x1 461, :y1 222, :x2 365, :y2 163}]
;; [:line {:x1 407, :y1 102, :x2 377, :y2 211}])
will clean up your list of vectors.

How does key as function of a map and vice versa both work the same way in Clojure?

({:x 10, :y 20, :z 50} :y)
gives 20
and also
(:y {:x 10, :y 20, :z 50})
gives 20
How does it work internally in both cases,
For maps as function, I can understand that you can differentiate the form by understanding the first value is a map.
but how does key become a function ? at runtime, key could be any type of value, so how does the runtime understand that it has to treat this value as a function ?
Maps are functions, from the docs:
Maps implement IFn, for invoke() of one argument (a key) with an optional second argument (a default value), i.e. maps are functions of their keys. nil keys and values are ok.
So this:
({:x 10, :y 20, :z 50} :y)
applies function {:x 10, :y 20, :z 50} to :y.
Keywords are functions too, quoting the docs:
Keywords implement IFn for invoke() of one argument (a map) with an optional second argument (a default value). For example (:mykey my-hash-map :none) means the same as (get my-hash-map :mykey :none)
So you when you do:
(:y {:x 10, :y 20, :z 50})
you actually invoke :y with {:x 10, :y 20, :z 50} as argument.
Basically anything that implements IFn and is on the classpath can be treated as a function.

Appending into nested associative structures

I have a structure that I created at the REPL,
{1 {10 {:id 101, :name "Paul"},
20 {}},
2 {30 {}, 40 {}},
3 {50 {}, 60 {}}}
and I want to add a new k v to the key 1, such that the resulting structure looks like this,
{1 {10 {:id 101, :name "1x2"}, 20 {}, 11 {:id 102, :name "Ringo"}},
2 {30 {}, 40 {}}, 3 {50 {}, 60 {}}}.
I just discovered get-in update-in and assoc-in for working with nested structures like these, but cannot figure out how to add new elements within elements. In my app this is all wrapped in a ref and updated with dosync/alter, but for now, I just want to be able to do this at the REPL.
Maybe I've just been looking at this too long, but any attempt to use assoc or assoc-in just changes what is already there, and does not append new elements.
Given your input
(def input
{1 {10 {:id 101 :name "Paul"}
20 {}}
2 {30 {} 40 {}}
3 {50 {} 60 {}}})
You can use assoc-in to add an element to the nested map with key 1 like this:
(assoc-in input [1 11] {:id 102 :name "Ringo"})
which yields
{1 {10 {:id 101 :name "Paul"}
11 {:id 102 :name "Ringo"}
20 {}}
2 {30 {} 40 {}}
3 {50 {} 60 {}}}
Assoc-in doesn't need to point all the way to the deepest level of a structure.
If you use two calls to assoc-in you can use the second one to change the name "Paul" to "1x2" as per your example:
(assoc-in
(assoc-in input [1 11] {:id 102 :name "Ringo"})
[1 10 :name] "1x2"))
Which returns
{1 {10 {:id 101 :name "1x2"}
11 {:id 102 :name "Ringo"}
20 {}}
2 {30 {} 40 {}}
3 {50 {} 60 {}}}
For what it's worth you could still do this if you had to point to an existing node:
(update-in input [1] assoc 11
{:id 102 :name "Ringo"})