Enumerate over a sequence in Clojure? - clojure

In Python I can do this:
animals = ['dog', 'cat', 'bird']
for i, animal in enumerate(animals):
print i, animal
Which outputs:
0 dog
1 cat
2 bird
How would I accomplish the same thing in Clojure? I considered using a list comprehension like this:
(println
(let [animals ["dog" "cat" "bird"]]
(for [i (range (count animals))
animal animals]
(format "%d %d\n" i animal))))
But this prints out every combination of number and animal. I'm guessing there is a simple and elegant way to do this but I'm not seeing it.

There is map-indexed in core as of 1.2.
Your example would be:
(doseq [[i animal] (map-indexed vector ["dog" "cat" "bird"])]
(println i animal))

Quick solution:
(let [animals ["dog", "cat", "bird"]]
(map vector (range) animals))
Or, if you want to wrap it in a function:
(defn enum [s]
(map vector (range) s))
(doseq [[i animal] (enum ["dog", "cat", "bird"])]
(println i animal))
What happens here is the function vector is applied to each element in both sequences, and the result is collected in a lazy collection.
Go ahead, try it in your repl.

Use indexed from clojure.contrib.seq:
Usage: (indexed s)
Returns a lazy sequence of [index, item] pairs, where items come
from 's' and indexes count up from zero.
(indexed '(a b c d)) => ([0 a] [1 b] [2 c] [3 d]
For your example this is
(require 'clojure.contrib.seq)
(doseq [[i animal] (clojure.contrib.seq/indexed ["dog", "cat", "bird"])]
(println i animal))

map-indexed looks right but: Do we really need all of the doseq and deconstructing args stuff in the other answers?
(map-indexed println ["dog", "cat", "bird"])
EDIT:
as noted by #gits this works in the REPL but doesn't respect that clojure is lazy by default. dorun seems to be the closest among doseq, doall and doseq for this. doseq, howver, seems to be the idiomatic favorite here.
(dorun (map-indexed println ["dog", "cat", "bird"]))

Yet another option is to use reduce-kv, which pairs the elements of a vector with their indexes.
Thus,
(reduce-kv #(println %2 %3) nil ["dog" "cat" "bird"])
or perhaps the slightly more explicit
(reduce-kv (fn [_ i animal] (println i animal)) nil ["dog" "cat" "bird"])
I wouldn’t pick this solution over the one with doseq here, but it is good to be aware of this specialisation for vectors in reduce-kv.

Related

Is it bad practice to try and keep track of iterations while using reduce/map in Clojure?

So being new to Clojure and functional programming in general, I sometimes (to quote a book) "feel like your favourite tool has been taken from you". Trying to get a better grasp on this stuff I'm doing string manipulation problems.
So knowing the functional paradigm is all about recursion (and other things) I've been using tail recursive functions to do things I'd normally do with loops, then trying to implement using map or reduce. For those more experienced, does this sound like a sane thing to do?
I'm starting to get frustrated because I'm running into problems where I need to keep track of the index of each character when iterating over strings but that's proving difficult because reduce and map feel "isolated". I can't increment a value while a string is being reduced...
Is there something I'm missing; a function for exactly this.. Or can this specific case just not be implemented using these core functions? Or is the way I'm going about it just wrong and un-functional-like which is why I'm stuck?
Here's an example I'm having:
This function takes five separate strings then using reduce, builds a vector containing all the characters at position char-at in each string. How could you change this code so that char-at (in the anonymous function) gets incremented after each string gets passed? This is what I mean by it feels "isolated" and I don't know how to get around this.
(defn new-string-from-five
"This function takes a character at position char-at from each of the strings to make a new vector"
[five-strings char-at]
(reduce (fn [result string]
(conj result (get-char-at string char-at)))
[]
five-strings))
Old :
"abc" "def" "ghi" "jkl" "mno" -> [a d g j m] (always taken from index 0)
Modified :
"abc" "def" "ghi" "jkl" "mno" ->[a e i j n] (index gets incremented and loops back around)
I don't think there's anything insane about writing string manip functions to get your head around things, though it's certainly not the only way. I personally found clojure for the brave and true, 4clojure, and the clojurians slack channel most helpful when learning clojure.
On your question, probably the most common thing to do would be to add an index to your initial collection (in this case a string) using map-indexed
(user=> (map-indexed vector [9 9 9])
([0 9] [1 9] [2 9])
So for your example
(defn new-string-from-five
"This function takes a character at position char-at from each of the strings to make a new vector"
[five-strings char-at]
(reduce (fn [result [string-idx string]]
(conj result (get-char-at string (+ string-idx char-at))))
[]
(map-indexed vector five-strings)))
But how would I build map-indexed? Well
Non-lazily:
(defn map-indexed' [f coll]
(loop [idx 0
res []
rest-coll coll]
(if (empty? rest-coll)
res
(recur (inc idx) (conj res (f idx (first rest-coll))) (rest rest-coll)))))
Lazily (recommend not trying to understand this yet):
(defn map-indexed' [f coll]
(letfn [(map-indexed'' [idx f coll]
(if (empty? coll)
'()
(lazy-seq (conj (map-indexed'' (inc idx) f (rest coll)) (f idx (first coll))))))]
(map-indexed'' 0 f coll)))
You can use reductions:
(defn new-string-from-five
[five-strings]
(->> five-strings
(reductions
(fn [[res i] string]
[(get-char-at string i) (inc i)])
[nil 0])
rest
(mapv first)))
But in this case, I think map, mapv or map-indexed is cleaner. E.g.
(map-indexed
(fn [i s] (get-char-at s i))
["abc" "def" "ghi" "jkl" "mno"])

Alternative to mutable data structure in clojure [duplicate]

I developed a function in clojure to fill in an empty column from the last non-empty value, I'm assuming this works, given
(:require [flambo.api :as f])
(defn replicate-val
[ rdd input ]
(let [{:keys [ col ]} input
result (reductions (fn [a b]
(if (empty? (nth b col))
(assoc b col (nth a col))
b)) rdd )]
(println "Result type is: "(type result))))
Got this:
;=> "Result type is: clojure.lang.LazySeq"
The question is how do I convert this back to type JavaRDD, using flambo (spark wrapper)
I tried (f/map result #(.toJavaRDD %)) in the let form to attempt to convert to JavaRDD type
I got this error
"No matching method found: map for class clojure.lang.LazySeq"
which is expected because result is of type clojure.lang.LazySeq
Question is how to I make this conversion, or how can I refactor the code to accomodate this.
Here is a sample input rdd:
(type rdd) ;=> "org.apache.spark.api.java.JavaRDD"
But looks like:
[["04" "2" "3"] ["04" "" "5"] ["5" "16" ""] ["07" "" "36"] ["07" "" "34"] ["07" "25" "34"]]
Required output is:
[["04" "2" "3"] ["04" "2" "5"] ["5" "16" ""] ["07" "16" "36"] ["07" "16" "34"] ["07" "25" "34"]]
Thanks.
First of all RDDs are not iterable (don't implement ISeq) so you cannot use reductions. Ignoring that a whole idea of accessing previous record is rather tricky. First of all you cannot directly access values from an another partition. Moreover only transformations which don't require shuffling preserve order.
The simplest approach here would be to use Data Frames and Window functions with explicit order but as far as I know Flambo doesn't implement required methods. It is always possible to use raw SQL or access Java/Scala API but if you want to avoid this you can try following pipeline.
First lets create a broadcast variable with last values per partition:
(require '[flambo.broadcast :as bd])
(import org.apache.spark.TaskContext)
(def last-per-part (f/fn [it]
(let [context (TaskContext/get) xs (iterator-seq it)]
[[(.partitionId context) (last xs)]])))
(def last-vals-bd
(bd/broadcast sc
(into {} (-> rdd (f/map-partitions last-per-part) (f/collect)))))
Next some helper for the actual job:
(defn fill-pair [col]
(fn [x] (let [[a b] x] (if (empty? (nth b col)) (assoc b col (nth a col)) b))))
(def fill-pairs
(f/fn [it] (let [part-id (.partitionId (TaskContext/get)) ;; Get partion ID
xs (iterator-seq it) ;; Convert input to seq
prev (if (zero? part-id) ;; Find previous element
(first xs) ((bd/value last-vals-bd) part-id))
;; Create seq of pairs (prev, current)
pairs (partition 2 1 (cons prev xs))
;; Same as before
{:keys [ col ]} input
;; Prepare mapping function
mapper (fill-pair col)]
(map mapper pairs))))
Finally you can use fill-pairs to map-partitions:
(-> rdd (f/map-partitions fill-pairs) (f/collect))
A hidden assumption here is that order of the partitions follows order of the values. It may or may not be in general case but without explicit ordering it is probably the best you can get.
Alternative approach is to zipWithIndex, swap order of values and perform join with offset.
(require '[flambo.tuple :as tp])
(def rdd-idx (f/map-to-pair (.zipWithIndex rdd) #(.swap %)))
(def rdd-idx-offset
(f/map-to-pair rdd-idx
(fn [t] (let [p (f/untuple t)] (tp/tuple (dec' (first p)) (second p))))))
(f/map (f/values (.rightOuterJoin rdd-idx-offset rdd-idx)) f/untuple)
Next you can map using similar approach as before.
Edit
Quick note on using atoms. What is the problem there is lack of referential transparency and that you're leveraging incidental properties of a given implementation not a contract. There is nothing in the map semantics that requires elements to be processed in a given order. If internal implementation changes it may be no longer valid. Using Clojure
(defn foo [x] (let [aa #a] (swap! a (fn [&args] x)) aa))
(def a (atom 0))
(map foo (range 1 20))
compared to:
(def a (atom 0))
(pmap foo (range 1 20))

Best way to update several values in hashmap?

I have a hash map like this:
{:key1 "aaa bbb ccc" :key2 "ddd eee" :key3 "fff ggg" :do-not-split "abcdefg hijk"}
And I'd like to split some of the strings to get vectors:
; expected result
{:key1 ["aaa" "bbb" "ccc"] :key2 ["ddd" "eee"] :key3 ["fff" "ggg"] :do-not-split "abcdefg hijk"}
I use update-in three times now like the following but it seems ugly.
(-> my-hash (update-in [:key1] #(split % #"\s"))
(update-in [:key2] #(split % #"\s"))
(update-in [:key3] #(split % #"\s")))
I hope there's sth like (update-all my-hash [:key1 :key2 :key3] fn)
You can use reduce:
user=> (def my-hash {:key1 "aaa bbb ccc" :key2 "ddd eee" :key3 "fff ggg"})
#'user/my-hash
user=> (defn split-it [s] (clojure.string/split s #"\s"))
#'user/split-it
user=> (reduce #(update-in %1 [%2] split-it) my-hash [:key1 :key2 :key3])
{:key3 ["fff" "ggg"], :key2 ["ddd" "eee"], :key1 ["aaa" "bbb" "ccc"]}
Just map the values based on a function that makes the decision about whether to split or not.
user=> (def x {:key1 "aaa bbb ccc"
:key2 "ddd eee"
:key3 "fff ggg"
:do-not-split "abcdefg hijk"})
#'user/x
user=> (defn split-some [predicate [key value]]
(if (predicate key)
[key (str/split value #" ")]
[key value]))
#'user/split-some
user=> (into {} (map #(split-some #{:key1 :key2 :key3} %) x))
{:do-not-split "abcdefg hijk", :key3 ["fff" "ggg"], :key2 ["ddd" "eee"], :key1 ["aaa" "bbb" "ccc"]}
This is a different way of approaching the problem.
Think about it for a second: if your string were in a list, how would you approach it?
The answer is that you would use map to get a list of vectors:
(map #(split % #"\s") list-of-strings)
If you think harder you would arrive at the conclusion that what you really want is to map a function over the values of a map. Obviously map doesn't work here as it works for sequences only.
However, is there a generic version of map? It turns out there is! It's called fmap and comes from the concept of functors which you can ignore for now. This is how you would use it:
(fmap my-hash #(split % #"\s"))
See how the intent is a lot clearer now?
The only drawback is that fmap isn't a core function but it is available through the algo.generic library.
Of course if including a new library feels like too much at this stage, you can always steel the source code - and attribute to its author - from the library itself in this link:
(into (empty my-hash) (for [[k v] my-hash] [k (your-function-here v)]))

Howto find a listitem which contains a specific substring

I have a list of strings, fx '("abc" "def" "gih") and i would like to be able to search the list for any items containing fx "ef" and get the item or index returned.
How is this done?
Combining filter and re-find can do this nicely.
user> (def fx '("abc" "def" "gih"))
#'user/fx
user> (filter (partial re-find #"ef") fx)
("def")
user> (filter (partial re-find #"a") fx)
("abc")
In this case I like to combine them with partial though defining an anonymous function works fine in that case as well. It is also useful to use re-pattern if you don't know the search string in advance:
user> (filter (partial re-find (re-pattern "a")) fx)
("abc")
If you want to retrieve all the indexes of the matching positions along with the element you can try this:
(filter #(re-find #"ef" (second %)) (map-indexed vector '("abc" "def" "gih")))
=>([1 "def"])
map-indexed vector generates an index/value lazy sequence
user> (map-indexed vector '("abc" "def" "gih"))
([0 "abc"] [1 "def"] [2 "gih"])
Which you can then filter using a regular expression against the second element of each list member.
#(re-find #"ef" (second %))
Just indices:
Lazily:
(keep-indexed #(if (re-find #"ef" %2)
%1) '("abc" "def" "gih"))
=> (1)
Using loop/recur
(loop [[str & strs] '("abc" "def" "gih")
idx 0
acc []]
(if str
(recur strs
(inc idx)
(cond-> acc
(re-find #"ef" str) (conj idx)))
acc))
For just the element, refer to Arthur Ulfeldts answer.
Here is a traditional recursive definition that returns the index. It's easy to modify to return the corresponding string as well.
(defn strs-index [re lis]
(let [f (fn [ls n]
(cond
(empty? ls) nil
(re-find re (first ls)) n
:else (recur (rest ls) (inc n))))]
(f lis 0)))
user=> (strs-index #"de" ["abc" "def" "gih"])
1
user=> (strs-index #"ih" ["abc" "def" "gih"])
2
user=> (strs-index #"xy" ["abc" "def" "gih"])
nil
(Explanation: The helper function f is defined as a binding in let, and then is called at the end. If the sequence of strings passed to it is not empty, it searches for the regular expression in the first element of the sequence and returns the index if it finds the string. This uses the fact that re-find's result counts as true unless it fails, in which case it returns nil. If the previous steps don't succeed, the function starts over with the rest of the sequence and an incremented index. If it gets to the end of the sequence, it returns nil.)

Clojure compress vector

I am trying to find a Clojure-idiomatic way to "compress" a vector:
(shift-nils-left [:a :b :c :a nil :d nil])
;=> (true [nil nil :a :b :c :a :d])
(shift-nils-left [nil :a])
;=> (false [nil :a])
(shift-nils-left [:a nil])
;=> (true [nil :a])
(shift-nils-left [:a :b])
;=> (false [:a :b])
In other words, I want to move all of the nil values to the left end of the vector, without changing the length. The boolean indicates whether any shifting occurred. The "outside" structure can be any seq, but the inside result should be a vector.
I suspect that the function will involve filter (on the nil values) and into to add to a vector of nils of the same length as the original, but I'm not sure how to reduce the result back to the original length. I know how to this "long-hand", but I suspect that Clojure will be able to do it in a single line.
I am toying with the idea of writing a Bejeweled player as an exercise to learn Clojure.
Thanks.
I would write it like this:
(ns ...
(:require [clojure.contrib.seq-utils :as seq-utils]))
(defn compress-vec
"Returns a list containing a boolean value indicating whether the
vector was changed, and a vector with all the nils in the given
vector shifted to the beginning."
([v]
(let [shifted (vec (apply concat (seq-utils/separate nil? v)))]
(list (not= v shifted)
shifted))))
Edit: so, the same as what Thomas beat me to posting, but I wouldn't use flatten just in case you end up using some sort of seqable object to represent the jewels.
Maybe this way:
(defn shift-nils-left
"separate nil values"
[s]
(let [s1 (vec (flatten (clojure.contrib.seq/separate nil? s)))]
(list (not (= s s1)) s1)))
A little more low-level approach. It traverses the input seq just once as well as the vector of non-nils once. The two more highlevel approaches traverse the input sequence two times (for nil? and (complenent nil?)). The not= traverses the input a third time in the worst-case of no shift.
(defn compress-vec
[v]
(let [[shift? nils non-nils]
(reduce (fn [[shift? nils non-nils] x]
(if (nil? x)
[(pos? (count non-nils)) (conj nils nil) non-nils]
[shift? nils (conj non-nils x)]))
[false [] []] v)]
[shift? (into nils non-nils)]))
(def v [1 2 nil 4 5 nil 7 8] )
(apply vector (take 8 (concat (filter identity v) (repeat nil))))
This creates a sequence of the non- nil values in the vector using filter and then appends nils to the end of the sequence. This gives the values you want as a sequence and then converts them into a vector. The take 8 ensures that the vector is right size.